Skip to content

Commit 5834060

Browse files
authored
Merge pull request #61 from LorrinWWW/conversion
`convert_to_hf_gptneox` adds support to other neox models
2 parents 148b574 + 3b171aa commit 5834060

4 files changed

Lines changed: 143 additions & 10 deletions

File tree

README.md

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,15 @@ python pretrained/GPT-NeoX-20B/prepare.py
8989

9090
The weights for this model will be in the `pretrained/GPT-NeoX-20B/EleutherAI_gpt-neox-20b`.
9191

92+
In case you want to fine-tune other gpt-neox models, e.g. [the Pythia model suite](https://huggingface.co/models?sort=downloads&search=pythia), you can specify the HF model name, for example:
93+
94+
```shell
95+
python pretrained/GPT-NeoX-20B/prepare.py --model-name EleutherAI/pythia-6.9b-deduped
96+
```
97+
98+
And the weights for this model will be in the `pretrained/GPT-NeoX-20B/EleutherAI_pythia-6.9b-deduped`.
99+
100+
92101
# Training and Finetuning
93102

94103
## (Optional) 8bit Adam
@@ -113,21 +122,38 @@ As the training loop runs, checkpoints are saved to the `model_ckpts` directory
113122

114123
Please see [the training README](training/README.md) for more details about customizing the training run.
115124

125+
The `training/finetune_Pythia-Chat-Base-7B.sh` script is another example to fine-tune a 7B pythia (gpt-neox) model. The script launches 8 processes with a pipeline-parallel degree of 4 and a data-parallel degree of 2.
126+
116127
# Converting Weights to Huggingface Format
117128

118129
Before you can use this model to perform inference, it must be converted to the Huggingface format. Run this command from the root of the repo to do so.
119130

120131
```shell
121-
mkdir huggingface_models \
122-
&& python tools/convert_to_hf_gptneox.py \
132+
mkdir huggingface_models \
133+
&& python tools/convert_to_hf_gptneox.py \
123134
--ckpt-path model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_100 \
124135
--save-path huggingface_models/GPT-NeoXT-Chat-Base-20B \
125136
--n-stages 8 \
126-
--n-layer-per-stage 6
137+
--n-layer-per-stage 6 \
138+
--fp16
127139
```
140+
where the `--fp16` flag will load and store models in fp16.
128141

129142
Make sure to replace `model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_100` with the latest checkpoint in the `model_ckpts/GPT-Neo-XT-Chat-Base-20B` directory.
130143

144+
If you need to convert ckpts of other gpt-neox variants, make sure to specify the correct config name for your variant.
145+
For example, if you want to convert a checkpoint fine-tuned from `EleutherAI/pythia-6.9b-deduped`, you should indicate this as a config name:
146+
```shell
147+
python tools/convert_to_hf_gptneox.py \
148+
--config-name EleutherAI/pythia-6.9b-deduped \
149+
--ckpt-path model_ckpts/Pythia-Chat-Base-7B/checkpoint_100 \
150+
--save-path huggingface_models/Pythia-Chat-Base-7B \
151+
--n-stages 4 \
152+
--n-layer-per-stage 8 \
153+
--fp16
154+
```
155+
156+
131157
# Inference
132158

133159
To help you test the model, we provide a simple test command line test harness to interact with the bot.

pretrained/GPT-NeoX-20B/prepare.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,28 +22,35 @@
2222
if not os.path.exists(save_path):
2323
os.mkdir(save_path)
2424

25+
print('loading model from HF...')
2526
config = AutoConfig.from_pretrained(args.model_name)
2627
config.save_pretrained(save_path)
2728
tokenizer = AutoTokenizer.from_pretrained(args.model_name)
2829
tokenizer.save_pretrained(save_path)
29-
3030
# offload model from memory to disk if offload-dir is specified
3131
if args.offload_dir is not None:
3232
if not os.path.exists(args.offload_dir):
3333
os.mkdir(args.offload_dir)
3434
model = AutoModelForCausalLM.from_pretrained(args.model_name, torch_dtype=torch.float16, device_map="auto", offload_folder=args.offload_dir)
3535
else:
3636
model = AutoModelForCausalLM.from_pretrained(args.model_name, torch_dtype=torch.float16)
37+
print('loaded model from HF...')
3738

39+
print('converting the embedding layer...')
3840
item = {}
3941
item['embed_in.weight'] = model.gpt_neox.embed_in.weight
4042
torch.save(item, os.path.join(save_path, 'pytorch_embs.pt'))
43+
print('converted the embedding layer.')
4144

4245
for i in range(len(model.gpt_neox.layers)):
46+
print(f'converting the {i}-th transformer layer...')
4347
torch.save(model.gpt_neox.layers[i].state_dict(), os.path.join(save_path, f'pytorch_{i}.pt'))
48+
print(f'converted the {i}-th transformer layer.')
4449

50+
print('converting the lm_head layer...')
4551
item = {}
4652
item['embed_out.weight'] = model.embed_out.weight
4753
item['final_layer_norm.weight'] = model.gpt_neox.final_layer_norm.weight
4854
item['final_layer_norm.bias'] = model.gpt_neox.final_layer_norm.bias
4955
torch.save(item, os.path.join(save_path, 'pytorch_lm_head.pt'))
56+
print('converted the lm_head layer.')

tools/convert_to_hf_gptneox.py

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -56,13 +56,14 @@ def load_decentralized_checkpoint(model, checkpoint_path, n_stages=2, n_layer_pe
5656

5757
elif i == n_stages - 1:
5858
for j in range(n_layer_per_stage):
59-
if i*n_layer_per_stage + j == 44:
60-
break
6159
_tmp = {k[len(f"{j}."):]:v for k,v in checkpoint.items() if k.startswith(f"{j}.")}
6260
if len(_tmp) == 0:
6361
break
6462
# torch.save(_tmp, os.path.join(output_path, f'pytorch_{i*n_layer_per_stage + j}.pt'))
6563
model.gpt_neox.layers[i*n_layer_per_stage + j].load_state_dict(_tmp)
64+
if i*n_layer_per_stage + j == len(model.gpt_neox.layers) - 1:
65+
j += 1
66+
break
6667

6768
_tmp = {k[len(f"{j}."):]:v for k,v in checkpoint.items() if k.startswith(f"{j}.")}
6869
if len(_tmp) == 0:
@@ -88,14 +89,17 @@ def load_decentralized_checkpoint(model, checkpoint_path, n_stages=2, n_layer_pe
8889
if __name__ == '__main__':
8990

9091
parser = argparse.ArgumentParser(description='Convert HF checkpoints')
92+
parser.add_argument('--config-name', type=str, default='EleutherAI/gpt-neox-20b',
93+
help='config-name')
9194
parser.add_argument('--ckpt-path', type=str, default=None,
92-
help='model-name')
95+
help='ckpt-path')
9396
parser.add_argument('--save-path', type=str, default=None,
94-
help='model-name')
97+
help='save-path')
9598
parser.add_argument('--n-stages', type=int, default=8,
9699
help='pipeline group size')
97100
parser.add_argument('--n-layer-per-stage', type=int, default=6,
98101
help='n layers per GPU device')
102+
parser.add_argument('--fp16', default=False, action='store_true')
99103
args = parser.parse_args()
100104

101105
assert args.ckpt_path is not None
@@ -104,13 +108,26 @@ def load_decentralized_checkpoint(model, checkpoint_path, n_stages=2, n_layer_pe
104108
if not os.path.exists(args.save_path):
105109
os.mkdir(args.save_path)
106110

107-
config = AutoConfig.from_pretrained('EleutherAI/gpt-neox-20b')
108-
tokenizer = AutoTokenizer.from_pretrained('EleutherAI/gpt-neox-20b')
111+
print('loading config...')
112+
config = AutoConfig.from_pretrained(args.config_name)
113+
print('loaded config.')
114+
print('loading tokenizer...')
115+
tokenizer = AutoTokenizer.from_pretrained(args.config_name)
116+
print('loaded tokenizer.')
117+
print('creating empty model...')
109118
model = create_empty_gptneox(config)
119+
if args.fp16:
120+
model = model.half()
121+
print('created empty model.')
122+
print('loading model ckpt...')
110123
load_decentralized_checkpoint(
111124
model, args.ckpt_path, n_stages=args.n_stages, n_layer_per_stage=args.n_layer_per_stage,
112125
)
126+
print('loaded model ckpt.')
113127

128+
print('saving HF model...')
114129
model.save_pretrained(args.save_path)
130+
print(f'saved HF model to `{args.save_path}`')
115131
config.save_pretrained(args.save_path)
116132
tokenizer.save_pretrained(args.save_path)
133+
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
DIR=$(cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd)
2+
3+
netif=lo
4+
export GLOO_SOCKET_IFNAME=${netif}
5+
export NCCL_SOCKET_IFNAME=${netif}
6+
export MODEL_NAME=Pythia-Chat-Base-7B
7+
8+
export SHOW_DATA=0
9+
10+
BASE_MODEL="${DIR}/../pretrained/GPT-NeoX-20B/EleutherAI_pythia-6.9b-deduped/"
11+
12+
CHECKPOINT_STEPS=100
13+
14+
DATASETS="\
15+
${DIR}/../data/OIG/files/unified_ni.jsonl:0.2,\
16+
${DIR}/../data/OIG/files/unified_p3.jsonl:0.5,\
17+
${DIR}/../data/OIG/files/unified_flan.jsonl:0.2,\
18+
${DIR}/../data/OIG/files/unified_chip2.jsonl:0.01,\
19+
${DIR}/../data/OIG/files/unified_rallio_safety_and_prosocial.jsonl:0.1,\
20+
${DIR}/../data/OIG/files/unified_soda_dialog.jsonl:0.1,\
21+
${DIR}/../data/OIG/files/unified_unifiedskg_instructions.jsonl:0.1,\
22+
${DIR}/../data/OIG/files/unified_merged_code_xp3.jsonl:0.1,\
23+
${DIR}/../data/OIG/files/unified_oscar_en_sample_dialog.jsonl:0.1,\
24+
${DIR}/../data/OIG/files/unified_ul2_plus_oscar_en_sample_dialog.jsonl:0.1,\
25+
${DIR}/../data/OIG/files/unified_multi_news.jsonl:0.05,\
26+
${DIR}/../data/OIG/files/unified_openai_summarize_tldr.jsonl:0.05,\
27+
${DIR}/../data/OIG/files/unified_squad_v2.jsonl:0.01,\
28+
${DIR}/../data/OIG/files/unified_nq.jsonl:0.01,\
29+
${DIR}/../data/OIG/files/unified_poetry_instructions.jsonl:0.01,\
30+
${DIR}/../data/OIG/files/unified_sqlv2.jsonl:0.01,\
31+
${DIR}/../data/OIG/files/unified_unnatural_instructions.jsonl:0.01,\
32+
${DIR}/../data/OIG/files/unified_conv_finqa.jsonl:0.01,\
33+
${DIR}/../data/OIG/files/unified_essays.jsonl:0.01,\
34+
${DIR}/../data/OIG/files/unified_plot_screenplay_books_dialog.jsonl:0.01,\
35+
${DIR}/../data/OIG/files/unified_grade_school_math_instructions.jsonl:0.01,\
36+
${DIR}/../data/OIG/files/unified_mathqa_flanv2_kojma_cot.jsonl:0.01,\
37+
${DIR}/../data/OIG/files/unified_joke_explanations.jsonl:0.01,\
38+
${DIR}/../data/OIG/files/unified_cuad.jsonl:0.01,\
39+
${DIR}/../data/OIG/files/unified_abstract_infill.jsonl:0.1,\
40+
${DIR}/../data/OIG/files/unified_image_prompts_instructions.jsonl:0.01 \
41+
"
42+
43+
ARGS="--model-name ${BASE_MODEL} \
44+
--tokenizer-name ${BASE_MODEL} \
45+
--project-name together \
46+
--model-type gptneox \
47+
--optimizer adam \
48+
--seed 42 \
49+
--load-pretrained-model true \
50+
--task-name \
51+
"${DATASETS}" \
52+
--checkpoint-path ${DIR}/../model_ckpts/${MODEL_NAME} \
53+
--total-steps 20000 --warmup-steps 10 --train-warmup-steps 0 \
54+
--checkpoint-steps ${CHECKPOINT_STEPS} \
55+
--lr 1e-5 --seq-length 2048 --batch-size 32 --micro-batch-size 1 --gradient-accumulate-step 1 \
56+
--dist-url tcp://127.0.0.1:7033 \
57+
--num-layers 8 --embedding-dim 4096 \
58+
--world-size 8 --pipeline-group-size 4 --data-group-size 2 \
59+
--job-id 0 --net-interface ${netif} \
60+
--fp16 \
61+
--dp-backend nccl \
62+
--dp-mode allreduce \
63+
--pp-mode gpipe --profiling no-profiling"
64+
65+
66+
(trap 'kill 0' SIGINT; \
67+
python ${DIR}/dist_clm_train.py $(echo ${ARGS}) --cuda-id 0 --rank 0 \
68+
& \
69+
python ${DIR}/dist_clm_train.py $(echo ${ARGS}) --cuda-id 1 --rank 1 \
70+
& \
71+
python ${DIR}/dist_clm_train.py $(echo ${ARGS}) --cuda-id 2 --rank 2 \
72+
& \
73+
python ${DIR}/dist_clm_train.py $(echo ${ARGS}) --cuda-id 3 --rank 3 \
74+
& \
75+
python ${DIR}/dist_clm_train.py $(echo ${ARGS}) --cuda-id 4 --rank 4 \
76+
& \
77+
python ${DIR}/dist_clm_train.py $(echo ${ARGS}) --cuda-id 5 --rank 5 \
78+
& \
79+
python ${DIR}/dist_clm_train.py $(echo ${ARGS}) --cuda-id 6 --rank 6 \
80+
& \
81+
python ${DIR}/dist_clm_train.py $(echo ${ARGS}) --cuda-id 7 --rank 7 \
82+
& \
83+
wait)

0 commit comments

Comments
 (0)