Skip to content

Commit 89303a1

Browse files
committed
docs: include a link to the 20B model docs
1 parent 5180a70 commit 89303a1

2 files changed

Lines changed: 253 additions & 2 deletions

File tree

README.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,9 @@
33
OpenChatKit provides a powerful, open-source base to create both specialized and general purpose chatbots for various applications. The kit includes an instruction-tuned language models, a moderation model, and an extensible retrieval system for including up-to-date responses from custom repositories. OpenChatKit models were trained on the OIG-43M training dataset, which was a collaboration between [Together](https://www.together.xyz/), [LAION](https://laion.ai), and [Ontocord.ai](https://ontocord.ai).
44

55
In this repo, you'll find code for:
6-
- Training an OpenChatKit model
7-
- Testing inference using the model
6+
- Training GPT-NeoXT-Chat-Base-20B, a 20B parameter chat model (see [docs/GPT-NeoXT-Chat-Base-20B.md](docs/GPT-NeoXT-Chat-Base-20B.md))
7+
- Training Pythia-Chat-Base-7B, a 7B parameter chat model
8+
- Testing inference using either of the chat models
89
- Augmenting the model with additional context from a retrieval index
910

1011
# Contents
@@ -22,6 +23,7 @@ In this repo, you'll find code for:
2223
* [Loguru](#loguru)
2324
* [Weights & Biases](#weights--biases)
2425
- [Experimental: Retrieval-Augmented Models](#experimental-retrieval-augmented-models)
26+
- [See Also](#see-also)
2527
- [License](#license)
2628
- [Citing OpenChatKit](#citing-openchatkit)
2729
- [Acknowledgements](#acknowledgements)
@@ -251,6 +253,9 @@ Zurich is located in Switzerland.
251253
>>>
252254
```
253255

256+
# See Also
257+
* [docs/GPT-NeoXT-Chat-Base-20B.md](docs/GPT-NeoXT-Chat-Base-20B.md). OpenChatKit also provides a larger, 20B parameter chat model that was trained on GPT-NeoXT-Chat-Base-20B from Eleuther AI.
258+
254259
# License
255260

256261
All code in this repository was developed by Together Computer except where otherwise noted. Copyright (c) 2023, Together Computer. All rights reserved. The code is licensed under the Apache 2.0 license.

docs/GPT-NeoXT-Chat-Base-20B.md

Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
# GPT-NeoXT-Chat-Base-20B
2+
3+
OpenChatKit includes an instruction-tuned 20 billion parameter language model called GPT-NeoXT-Chat-Base-20B, a 6 billion parameter moderation model, and an extensible retrieval system for including up-to-date responses from custom repositories. It was trained on the OIG-43M training dataset, which was a collaboration between [Together](https://www.together.xyz/), [LAION](https://laion.ai), and [Ontocord.ai](https://ontocord.ai). Much more than a model release, this is the beginning of an open source project. We are releasing a set of tools and processes for ongoing improvement with community contributions.
4+
5+
In this doc, you'll find steps for:
6+
- Training an OpenChatKit model
7+
- Testing inference using the model
8+
- Augmenting the model with additional context from a retrieval index
9+
10+
# Contents
11+
12+
- [Requirements](#requirements)
13+
- [Pre-trained Weights](#pre-trained-weights)
14+
- [Datasets](#datasets)
15+
* [Data Contributions](#data-contributions)
16+
- [Pretrained Base Model](#pretrained-base-model)
17+
- [Training and Finetuning](#training-and-finetuning)
18+
* [(Optional) 8bit Adam](#optional-8bit-adam)
19+
* [Train GPT-NeoX-Chat-Base-20B](#train-gpt-neox-chat-base-20b)
20+
- [Converting Weights to Huggingface Format](#converting-weights-to-huggingface-format)
21+
- [Inference](#inference)
22+
- [Monitoring](#monitoring)
23+
* [Loguru](#loguru)
24+
* [Weights & Biases](#weights--biases)
25+
- [Experimental: Retrieval-Augmented Models](#experimental-retrieval-augmented-models)
26+
- [Acknowledgements](#acknowledgements)
27+
28+
# Requirements
29+
30+
Before you begin, you need to install PyTorch and other dependencies.
31+
32+
1. Install [Miniconda](https://docs.conda.io/en/latest/miniconda.html) from their website.
33+
34+
2. Install [Git LFS](https://git-lfs.com/) from their website.
35+
36+
3. Install the `git lfs` hooks.
37+
38+
```shell
39+
git lfs install
40+
```
41+
42+
4. Install mamba in the `base` environment so it's available in all environments.
43+
44+
```shell
45+
conda install mamba -n base -c conda-forge
46+
```
47+
48+
5. Create an environment called OpenChatKit using the `environment.yml` file at the root of this repo.
49+
50+
```shell
51+
mamba env create -f environment.yml
52+
```
53+
54+
6. Activate the new conda environment.
55+
56+
```shell
57+
conda activate OpenChatKit
58+
```
59+
60+
# Pre-trained Weights
61+
62+
GPT-NeoXT-Chat-Base-20B is a 20B-parameter variant of GPT-NeoX, fine-tuned on conversational datasets. We are releasing pre-trained weights for this model as [togethercomputer/GPT-NeoXT-Chat-Base-20B](https://huggingface.co/togethercomputer/GPT-NeoXT-Chat-Base-20B) on Huggingface.
63+
64+
More details can be found on the model card for [GPT-NeoXT-Chat-Base-20B](https://huggingface.co/togethercomputer/GPT-NeoXT-Chat-Base-20B) on Huggingface.
65+
66+
# Datasets
67+
68+
The chat model was trained on the [OIG](https://huggingface.co/datasets/laion/OIG) dataset built by [LAION](https://laion.ai/), [Together](https://www.together.xyz/), and [Ontocord.ai](https://www.ontocord.ai/). To download the dataset from Huggingface run the command below from the root of the repo.
69+
70+
```shell
71+
python data/OIG/prepare.py
72+
```
73+
74+
Once the command completes, the data will be in the `data/OIG/files` directory.
75+
76+
## Data Contributions
77+
78+
You can help make this chat model better by contributing data! See the [OpenDataHub](https://github.com/togethercomputer/OpenDataHub) repo for more details.
79+
80+
# Pretrained Base Model
81+
82+
As mentioned above, the chat model is a fine-tuned variant of GPT-NeoX-20B from Eleuther AI. To download GPT-NeoX-20B and prepare it for fine tuning, run this command from the root of the repo.
83+
84+
```shell
85+
python pretrained/GPT-NeoX-20B/prepare.py
86+
```
87+
88+
The weights for this model will be in the `pretrained/GPT-NeoX-20B/EleutherAI_gpt-neox-20b`.
89+
90+
In case you want to fine-tune other gpt-neox models, e.g. [the Pythia model suite](https://huggingface.co/models?sort=downloads&search=pythia), you can specify the HF model name, for example:
91+
92+
```shell
93+
python pretrained/GPT-NeoX-20B/prepare.py --model-name EleutherAI/pythia-6.9b-deduped
94+
```
95+
96+
And the weights for this model will be in the `pretrained/GPT-NeoX-20B/EleutherAI_pythia-6.9b-deduped`.
97+
98+
99+
# Training and Finetuning
100+
101+
## (Optional) 8bit Adam
102+
103+
To use 8bit-adam during training, install the `bitsandbytes` package.
104+
105+
```shell
106+
pip install bitsandbytes # optional, to use 8bit-adam
107+
```
108+
109+
## Train GPT-NeoX-Chat-Base-20B
110+
111+
The `training/finetune_GPT-NeoXT-Chat-Base-20B.sh` script configures and runs the training loop. After downloading the dataset and the base model, run:
112+
113+
```shell
114+
bash training/finetune_GPT-NeoXT-Chat-Base-20B.sh
115+
```
116+
117+
The script launches 8 processes with a pipeline-parallel degree of 8 and a data-parallel degree of 1.
118+
119+
As the training loop runs, checkpoints are saved to the `model_ckpts` directory at the root of the repo.
120+
121+
Please see [the training README](training/README.md) for more details about customizing the training run.
122+
123+
The `training/finetune_Pythia-Chat-Base-7B.sh` script is another example to fine-tune a 7B pythia (gpt-neox) model. The script launches 8 processes with a pipeline-parallel degree of 4 and a data-parallel degree of 2.
124+
125+
# Converting Weights to Huggingface Format
126+
127+
Before you can use this model to perform inference, it must be converted to the Huggingface format. Run this command from the root of the repo to do so.
128+
129+
```shell
130+
mkdir huggingface_models \
131+
&& python tools/convert_to_hf_gptneox.py \
132+
--ckpt-path model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_100 \
133+
--save-path huggingface_models/GPT-NeoXT-Chat-Base-20B \
134+
--n-stages 8 \
135+
--n-layer-per-stage 6 \
136+
--fp16
137+
```
138+
where the `--fp16` flag will load and store models in fp16.
139+
140+
Make sure to replace `model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_100` with the latest checkpoint in the `model_ckpts/GPT-Neo-XT-Chat-Base-20B` directory.
141+
142+
If you need to convert ckpts of other gpt-neox variants, make sure to specify the correct config name for your variant.
143+
For example, if you want to convert a checkpoint fine-tuned from `EleutherAI/pythia-6.9b-deduped`, you should indicate this as a config name:
144+
```shell
145+
python tools/convert_to_hf_gptneox.py \
146+
--config-name EleutherAI/pythia-6.9b-deduped \
147+
--ckpt-path model_ckpts/Pythia-Chat-Base-7B/checkpoint_100 \
148+
--save-path huggingface_models/Pythia-Chat-Base-7B \
149+
--n-stages 4 \
150+
--n-layer-per-stage 8 \
151+
--fp16
152+
```
153+
154+
155+
# Inference
156+
157+
To help you test the model, we provide a simple test command line test harness to interact with the bot.
158+
159+
```shell
160+
python inference/bot.py
161+
```
162+
163+
By default the script will load the model named GPT-NeoXT-Chat-Base-20B model under the `huggingface_models` directory, but you can override that behavior by specifying `--model`.
164+
165+
For example, if you want to load the base model from our Huggingface, repo, you can run the following command which downloads the weights from HuggingFace.
166+
167+
```shell
168+
python inference/bot.py --model togethercomputer/GPT-NeoXT-Chat-Base-20B
169+
```
170+
171+
Once the model has loaded, enter text at the prompt and the model will reply.
172+
173+
```shell
174+
$ python inference/bot.py
175+
Loading /home/csris/src/github.com/togethercomputer/OpenChatKit/inference/../huggingface_models/GPT-NeoXT-Chat-Base-20B to cuda:1...
176+
Welcome to OpenChatKit shell. Type /help or /? to list commands.
177+
178+
>>> Hello.
179+
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
180+
Hello human.
181+
182+
>>>
183+
```
184+
185+
Commands are prefixed with a `/`, and the `/quit` command exits.
186+
187+
Please see [the inference README](inference/README.md) for more details about arguments, running on multiple/specific GPUs, and running on consumer hardware.
188+
189+
# Monitoring
190+
191+
By default, the training script simply prints the loss as training proceeds, but it can also output metrics to a file using [loguru](https://github.com/Delgan/loguru) or report them to Weights & Biases.
192+
193+
## Loguru
194+
195+
Add the flag `--train-log-backend loguru` to your training script to log to `./logs/file_{time}.log`
196+
197+
## Weights & Biases
198+
199+
To use Weights & Biases, first login with your Weights & Biases token.
200+
201+
```shell
202+
wandb login
203+
```
204+
205+
And set `--train-log-backend wandb` in the training script to enable logging to Weights & Biases.
206+
207+
# Experimental: Retrieval-Augmented Models
208+
209+
*Note: Retrieval is still experimental.*
210+
211+
The code in `/retrieval` implements a python package for querying a Faiss index of Wikipedia. The following steps explain how to use this index to augment queries in the test harness with context from the retriever.
212+
213+
1. Download the Wikipedia index.
214+
215+
```shell
216+
python data/wikipedia-3sentence-level-retrieval-index/prepare.py
217+
```
218+
219+
2. Run the bot with the `--retrieval` flag.
220+
221+
```shell
222+
python inference/bot.py --retrieval
223+
```
224+
225+
After starting, the bot will load both the chat model and the retrieval index, which takes a long time. Once the model and the index are loaded, all queries will be augmented with extra context.
226+
227+
228+
```shell
229+
$ python inference/bot.py --retrieval
230+
Loading /OpenChatKit/inference/../huggingface_models/GPT-NeoXT-Chat-Base-20B to cuda:0...
231+
Loading retrieval index...
232+
Welcome to OpenChatKit shell. Type /help or /? to list commands.
233+
234+
>>> Where is Zurich?
235+
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
236+
Where is Zurich?
237+
Zurich is located in Switzerland.
238+
239+
>>>
240+
```
241+
242+
# Acknowledgements
243+
244+
Our model is a fine-tuned version of [gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b), a large language model trained by [Eleuther AI](https://www.eleuther.ai). We evaluated our model on [HELM](https://crfm.stanford.edu/helm/latest/) provided by the [Center for Research on Foundation Models](https://crfm.stanford.edu). And we collaborated with both [CRFM](https://crfm.stanford.edu) and [HazyResearch](http://hazyresearch.stanford.edu) at Stanford to build this model.
245+
246+
We collaborated with [LAION](https://laion.ai/) and [Ontocord.ai](https://www.ontocord.ai/) to build the training data used to fine tune this model.

0 commit comments

Comments
 (0)