You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+77-67Lines changed: 77 additions & 67 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# OpenChatKit
2
2
3
-
OpenChatKit provides a powerful, open-source base to create both specialized and general purpose chatbots for various applications. The kit includes an instruction-tuned 20 billion parameter language model, a 6 billion parameter moderation model, and an extensible retrieval system for including up-to-date responses from custom repositories. It was trained on the OIG-43M training dataset, which was a collaboration between [Together](https://www.together.xyz/), [LAION](https://laion.ai), and [Ontocord.ai](https://ontocord.ai). Much more than a model release, this is the beginning of an open source project. We are releasing a set of tools and processes for ongoing improvement with community contributions.
3
+
OpenChatKit provides a powerful, open-source base to create both specialized and general purpose chatbots for various applications. The kit includes an instruction-tuned language models, a moderation model, and an extensible retrieval system for including up-to-date responses from custom repositories. OpenChatKit models were trained on the OIG-43M training dataset, which was a collaboration between [Together](https://www.together.xyz/), [LAION](https://laion.ai), and [Ontocord.ai](https://ontocord.ai).
4
4
5
5
In this repo, you'll find code for:
6
6
- Training an OpenChatKit model
@@ -9,16 +9,15 @@ In this repo, you'll find code for:
9
9
10
10
# Contents
11
11
12
-
-[Requirements](#requirements)
13
-
-[Pre-trained Weights](#pre-trained-weights)
14
-
-[Datasets](#datasets)
15
-
*[Data Contributions](#data-contributions)
16
-
-[Pretrained Base Model](#pretrained-base-model)
17
-
-[Training and Finetuning](#training-and-finetuning)
12
+
-[Getting Started](#getting-started)
13
+
*[Requirements](#requirements)
14
+
*[Chatting with Pythia-Chat-Base-7B](#chatting-with-pythia-chat-base-7b)
-[Converting Weights to Huggingface Format](#converting-weights-to-huggingface-format)
21
-
-[Inference](#inference)
18
+
*[Training the model](#training-the-model)
19
+
*[Converting weights to Huggingface format](#converting-weights-to-huggingface-format)
20
+
*[Testing the new model](#testing-the-new-model)
22
21
-[Monitoring](#monitoring)
23
22
*[Loguru](#loguru)
24
23
*[Weights & Biases](#weights--biases)
@@ -27,7 +26,15 @@ In this repo, you'll find code for:
27
26
-[Citing OpenChatKit](#citing-openchatkit)
28
27
-[Acknowledgements](#acknowledgements)
29
28
30
-
# Requirements
29
+
# Getting Started
30
+
31
+
In this tutorial, you will download Pythia-Chat-Base-7B, an instruction-tuned language model, and run some some inference requests against it using a command-line tool.
32
+
33
+
Pythia-Chat-Base-7B is a 7B-parameter fine-tuned variant of Pythia-6.9B-deduped from Eleuther AI. Pre-trained weights for this model are available on Huggingface as [togethercomputer/Pythia-Chat-Base-7B](https://huggingface.co/togethercomputer/Pythia-Chat-Base-7B) under an Apache 2.0 license.
34
+
35
+
More details can be found on the model card for [Pythia-Chat-Base-7B](https://huggingface.co/togethercomputer/Pythia-Chat-Base-7B) on Huggingface.
36
+
37
+
## Requirements
31
38
32
39
Before you begin, you need to install PyTorch and other dependencies.
GPT-NeoXT-Chat-Base-20B is a 20B-parameter variant of GPT-NeoX, fine-tuned on conversational datasets. We are releasing pre-trained weights for this model as [togethercomputer/GPT-NeoXT-Chat-Base-20B](https://huggingface.co/togethercomputer/GPT-NeoXT-Chat-Base-20B) on Huggingface.
74
+
To help you try the model, [`inference/bot.py`](inference/bot.py)is a simple command-line test harness that provides a shell inferface enabling you to chat with the model. Simply enter text at the prompt and the model replies. The test harness also maintains conversation history to provide the model with context.
65
75
66
-
More details can be found on the model card for [GPT-NeoXT-Chat-Base-20B](https://huggingface.co/togethercomputer/GPT-NeoXT-Chat-Base-20B) on Huggingface.
67
76
68
-
# Datasets
77
+
Start the bot by calling `bot.py` from the root for the repo.
69
78
70
-
The chat model was trained on the [OIG](https://huggingface.co/datasets/laion/OIG) dataset built by [LAION](https://laion.ai/), [Together](https://www.together.xyz/), and [Ontocord.ai](https://www.ontocord.ai/). To download the dataset from Huggingface run the command below from the root of the repo.
Loading the model can take some time, but once it's loaded, you are greeted with a prompt. Say hello.
71
84
72
85
```shell
73
-
python data/OIG/prepare.py
86
+
$ python inference/bot.py
87
+
Loading /home/csris/src/github.com/togethercomputer/OpenChatKit/inference/../huggingface_models/GPT-NeoXT-Chat-Base-20B to cuda:1...
88
+
Welcome to OpenChatKit shell. Type /help or /? to list commands.
89
+
90
+
>>> Hello.
91
+
Hello human.
92
+
93
+
>>>
74
94
```
75
95
76
-
Once the command completes, the data will be in the `data/OIG/files` directory.
96
+
Enter additional queries at the prompt, and the model replies. Under the covers, the shell is forming a prompt with all previous queries and passes that to the model to generate more text.
77
97
78
-
## Data Contributions
98
+
The shell also supports additional commands to inspect hyperparamters, the full prompt, and more. Commands are prefixed with a `/`.
79
99
80
-
You can help make this chat model better by contributing data! See the [OpenDataHub](https://github.com/togethercomputer/OpenDataHub) repo for more details.
100
+
> **Note**
101
+
> The `/quit` command exits the shell.
81
102
82
-
# Pretrained Base Model
103
+
Please see [the inference README](inference/README.md) for more details about arguments, running on multiple/specific GPUs, and running on consumer hardware.
83
104
84
-
As mentioned above, the chat model is a fine-tuned variant of GPT-NeoX-20B from Eleuther AI. To download GPT-NeoX-20B and prepare it for fine tuning, run this command from the root of the repo.
105
+
# Reproducing Pythia-Chat-Base-7B
85
106
86
-
```shell
87
-
python pretrained/GPT-NeoX-20B/prepare.py
88
-
```
107
+
This tutorial walks through reproducing the Pythia-Chat-Base-7B model by fine-tuning Eleuther AI's Pythia-6.9B-deduped model using the OIG dataset.
89
108
90
-
The weights for this model will be in the `pretrained/GPT-NeoX-20B/EleutherAI_gpt-neox-20b`.
109
+
## Downloading training data and the base model
91
110
92
-
In case you want to fine-tune other gpt-neox models, e.g. [the Pythia model suite](https://huggingface.co/models?sort=downloads&search=pythia), you can specify the HF model name, for example:
111
+
The chat model was trained on the [OIG](https://huggingface.co/datasets/laion/OIG) dataset built by [LAION](https://laion.ai/), [Together](https://www.together.xyz/), and [Ontocord.ai](https://www.ontocord.ai/). To download the dataset from Huggingface run the command below from the root of the repo.
> You can help make this chat model better by contributing data! See the [OpenDataHub](https://github.com/togethercomputer/OpenDataHub) repo for more details.
118
+
119
+
Once the command completes, the data will be in the `data/OIG/files` directory.
97
120
98
-
And the weights for this model will be in the `pretrained/GPT-NeoX-20B/EleutherAI_pythia-6.9b-deduped`.
121
+
Pythia-Chat-Base-7B is a fine-tuned variant of Pythia-6.9B-deduped from Eleuther AI. To download the model and prepare it for fine tuning, run this command from the root of the repo.
99
122
123
+
```shell
124
+
python pretrained/Pythia-6.9B-deduped/prepare.py
125
+
```
100
126
101
-
# Training and Finetuning
127
+
The weights for this model will be in the `pretrained/Pythia-6.9B-deduped/EleutherAI_pythia-6.9b-deduped` directory.
102
128
103
129
## (Optional) 8bit Adam
104
130
@@ -108,66 +134,48 @@ To use 8bit-adam during training, install the `bitsandbytes` package.
108
134
pip install bitsandbytes # optional, to use 8bit-adam
109
135
```
110
136
111
-
## Train GPT-NeoX-Chat-Base-20B
137
+
## Training the model
112
138
113
-
The `training/finetune_GPT-NeoXT-Chat-Base-20B.sh` script configures and runs the training loop. After downloading the dataset and the base model, run:
139
+
The `training/finetune_Pythia-Chat-Base-7B.sh` script configures and runs the training loop. After downloading the dataset and the base model, run:
114
140
115
141
```shell
116
-
bash training/finetune_GPT-NeoXT-Chat-Base-20B.sh
142
+
bash training/finetune_Pythia-Chat-Base-7B.sh
117
143
```
118
144
119
-
The script launches 8 processes with a pipeline-parallel degree of 8 and a data-parallel degree of 1.
120
-
121
145
As the training loop runs, checkpoints are saved to the `model_ckpts` directory at the root of the repo.
122
146
123
147
Please see [the training README](training/README.md) for more details about customizing the training run.
124
148
125
-
The `training/finetune_Pythia-Chat-Base-7B.sh` script is another example to fine-tune a 7B pythia (gpt-neox) model. The script launches 8 processes with a pipeline-parallel degree of 4 and a data-parallel degree of 2.
126
-
127
-
# Converting Weights to Huggingface Format
149
+
## Converting weights to Huggingface format
128
150
129
151
Before you can use this model to perform inference, it must be converted to the Huggingface format. Run this command from the root of the repo to do so.
where the `--fp16` flag will load and store models in fp16.
141
164
142
-
Make sure to replace `model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_100` with the latest checkpoint in the `model_ckpts/GPT-Neo-XT-Chat-Base-20B` directory.
165
+
Make sure to replace `model_ckpts/Pythia-Chat-Base-7B/checkpoint_100` with the latest checkpoint in the `model_ckpts/Pythia-Chat-Base-7B` directory.
143
166
144
-
If you need to convert ckpts of other gpt-neox variants, make sure to specify the correct config name for your variant.
145
-
For example, if you want to convert a checkpoint fine-tuned from `EleutherAI/pythia-6.9b-deduped`, you should indicate this as a config name:
To help you test the model, we provide a simple test command line test harness to interact with the bot.
169
+
You can use the OpenChatKit Shell test harness to chat with the new model. From the root of the repo, run
160
170
161
171
```shell
162
172
python inference/bot.py
163
173
```
164
174
165
-
By default the script will load the model named GPT-NeoXT-Chat-Base-20B model under the `huggingface_models` directory, but you can override that behavior by specifying `--model`.
166
-
167
-
For example, if you want to load the base model from our Huggingface, repo, you can run the following command which downloads the weights from HuggingFace.
175
+
By default the script will load the model named Pythia-Chat-Base-7B under the `huggingface_models` directory, but you can override that behavior by specifying `--model`.
Welcome to OpenChatKit shell. Type /help or /? to list commands.
179
187
180
188
>>> Hello.
181
-
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
182
189
Hello human.
183
190
184
191
>>>
185
192
```
186
193
187
-
Commands are prefixed with a `/`, and the `/quit` command exits.
194
+
The shell also supports additional commands to inspect hyperparamters, the full prompt, and more. Commands are prefixed with a `/`.
195
+
196
+
> **Note**
197
+
> The `/quit` command exits the shell.
188
198
189
199
Please see [the inference README](inference/README.md) for more details about arguments, running on multiple/specific GPUs, and running on consumer hardware.
190
200
@@ -208,7 +218,8 @@ And set `--train-log-backend wandb` in the training script to enable logging to
208
218
209
219
# Experimental: Retrieval-Augmented Models
210
220
211
-
*Note: Retrieval is still experimental.*
221
+
> **Warning**
222
+
> Retrieval support is experimental.
212
223
213
224
The code in `/retrieval` implements a python package for querying a Faiss index of Wikipedia. The following steps explain how to use this index to augment queries in the test harness with context from the retriever.
214
225
@@ -234,7 +245,6 @@ Loading retrieval index...
234
245
Welcome to OpenChatKit shell. Type /help or /? to list commands.
235
246
236
247
>>> Where is Zurich?
237
-
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
238
248
Where is Zurich?
239
249
Zurich is located in Switzerland.
240
250
@@ -281,6 +291,6 @@ For full terms, see the LICENSE file. If you have any questions, comments, or co
281
291
282
292
# Acknowledgements
283
293
284
-
Our model is a fine-tuned version of [gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b), a large language model trained by [Eleuther AI](https://www.eleuther.ai). We evaluated our model on [HELM](https://crfm.stanford.edu/helm/latest/) provided by the [Center for Research on Foundation Models](https://crfm.stanford.edu). And we collaborated with both [CRFM](https://crfm.stanford.edu) and [HazyResearch](http://hazyresearch.stanford.edu) at Stanford to build this model.
294
+
Our models are fine-tuned versions of large language models trained by [Eleuther AI](https://www.eleuther.ai). We evaluated our model on [HELM](https://crfm.stanford.edu/helm/latest/) provided by the [Center for Research on Foundation Models](https://crfm.stanford.edu). And we collaborated with both [CRFM](https://crfm.stanford.edu) and [HazyResearch](http://hazyresearch.stanford.edu) at Stanford to build this model.
285
295
286
296
We collaborated with [LAION](https://laion.ai/) and [Ontocord.ai](https://www.ontocord.ai/) to build the training data used to fine tune this model.
0 commit comments