Skip to content

Commit 89714ec

Browse files
committed
add t5 finetuned notebook
1 parent 79f0e9e commit 89714ec

1 file changed

Lines changed: 371 additions & 0 deletions

File tree

Lines changed: 371 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,371 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "code",
5+
"execution_count": null,
6+
"id": "3b6fa335-df29-44a0-b340-81c97954b8f0",
7+
"metadata": {},
8+
"outputs": [],
9+
"source": [
10+
"!pip install -U sagemaker"
11+
]
12+
},
13+
{
14+
"cell_type": "markdown",
15+
"id": "eb82128b-ba4d-43b8-93fd-a920ab7bb1a6",
16+
"metadata": {},
17+
"source": [
18+
"## Prepare the dataset"
19+
]
20+
},
21+
{
22+
"cell_type": "markdown",
23+
"id": "466ae9d1-a1d1-405b-a16d-55ab2c3d9c2b",
24+
"metadata": {},
25+
"source": [
26+
"The input text has to be in the first column and output in the second column."
27+
]
28+
},
29+
{
30+
"cell_type": "code",
31+
"execution_count": 40,
32+
"id": "0b6dfdc2-5674-42e9-b2a9-9724ed56e989",
33+
"metadata": {},
34+
"outputs": [
35+
{
36+
"data": {
37+
"text/html": [
38+
"<div>\n",
39+
"<style scoped>\n",
40+
" .dataframe tbody tr th:only-of-type {\n",
41+
" vertical-align: middle;\n",
42+
" }\n",
43+
"\n",
44+
" .dataframe tbody tr th {\n",
45+
" vertical-align: top;\n",
46+
" }\n",
47+
"\n",
48+
" .dataframe thead th {\n",
49+
" text-align: right;\n",
50+
" }\n",
51+
"</style>\n",
52+
"<table border=\"1\" class=\"dataframe\">\n",
53+
" <thead>\n",
54+
" <tr style=\"text-align: right;\">\n",
55+
" <th></th>\n",
56+
" <th>modern</th>\n",
57+
" <th>original</th>\n",
58+
" </tr>\n",
59+
" </thead>\n",
60+
" <tbody>\n",
61+
" <tr>\n",
62+
" <th>0</th>\n",
63+
" <td>Here comes my master, your brother.</td>\n",
64+
" <td>Yonder comes my master, your brother.</td>\n",
65+
" </tr>\n",
66+
" <tr>\n",
67+
" <th>1</th>\n",
68+
" <td>Go hide, Adam, and you’ll hear how he abuses me.</td>\n",
69+
" <td>Go apart, Adam, and thou shalt hear how he wil...</td>\n",
70+
" </tr>\n",
71+
" <tr>\n",
72+
" <th>2</th>\n",
73+
" <td>here?</td>\n",
74+
" <td>Now, sir, what make you here?</td>\n",
75+
" </tr>\n",
76+
" <tr>\n",
77+
" <th>3</th>\n",
78+
" <td>Nothing. I’ve never been taught how to make an...</td>\n",
79+
" <td>Nothing. I am not taught to make anything.</td>\n",
80+
" </tr>\n",
81+
" <tr>\n",
82+
" <th>4</th>\n",
83+
" <td>Well, then, what are you messing up?</td>\n",
84+
" <td>What mar you then, sir?</td>\n",
85+
" </tr>\n",
86+
" <tr>\n",
87+
" <th>...</th>\n",
88+
" <td>...</td>\n",
89+
" <td>...</td>\n",
90+
" </tr>\n",
91+
" <tr>\n",
92+
" <th>11543</th>\n",
93+
" <td>The stuff you had at the Centaur, sir.</td>\n",
94+
" <td>Your goods that lay at host, sir, in the Centaur.</td>\n",
95+
" </tr>\n",
96+
" <tr>\n",
97+
" <th>11544</th>\n",
98+
" <td>You have a fat friend at your master’s house: ...</td>\n",
99+
" <td>There is a fat friend at your master’s house T...</td>\n",
100+
" </tr>\n",
101+
" <tr>\n",
102+
" <th>11545</th>\n",
103+
" <td>After you, sir. You’re older than me.</td>\n",
104+
" <td>Not I, sir. You are my elder.</td>\n",
105+
" </tr>\n",
106+
" <tr>\n",
107+
" <th>11546</th>\n",
108+
" <td>That’s a good point. How can we tell which of ...</td>\n",
109+
" <td>That’s a question. How shall we try it?</td>\n",
110+
" </tr>\n",
111+
" <tr>\n",
112+
" <th>11547</th>\n",
113+
" <td>We’ll draw straws. Meanwhile, after you.</td>\n",
114+
" <td>We’ll draw cuts for the signior. Till then, le...</td>\n",
115+
" </tr>\n",
116+
" </tbody>\n",
117+
"</table>\n",
118+
"<p>11548 rows × 2 columns</p>\n",
119+
"</div>"
120+
],
121+
"text/plain": [
122+
" modern \\\n",
123+
"0 Here comes my master, your brother. \n",
124+
"1 Go hide, Adam, and you’ll hear how he abuses me. \n",
125+
"2 here? \n",
126+
"3 Nothing. I’ve never been taught how to make an... \n",
127+
"4 Well, then, what are you messing up? \n",
128+
"... ... \n",
129+
"11543 The stuff you had at the Centaur, sir. \n",
130+
"11544 You have a fat friend at your master’s house: ... \n",
131+
"11545 After you, sir. You’re older than me. \n",
132+
"11546 That’s a good point. How can we tell which of ... \n",
133+
"11547 We’ll draw straws. Meanwhile, after you. \n",
134+
"\n",
135+
" original \n",
136+
"0 Yonder comes my master, your brother. \n",
137+
"1 Go apart, Adam, and thou shalt hear how he wil... \n",
138+
"2 Now, sir, what make you here? \n",
139+
"3 Nothing. I am not taught to make anything. \n",
140+
"4 What mar you then, sir? \n",
141+
"... ... \n",
142+
"11543 Your goods that lay at host, sir, in the Centaur. \n",
143+
"11544 There is a fat friend at your master’s house T... \n",
144+
"11545 Not I, sir. You are my elder. \n",
145+
"11546 That’s a question. How shall we try it? \n",
146+
"11547 We’ll draw cuts for the signior. Till then, le... \n",
147+
"\n",
148+
"[11548 rows x 2 columns]"
149+
]
150+
},
151+
"execution_count": 40,
152+
"metadata": {},
153+
"output_type": "execute_result"
154+
}
155+
],
156+
"source": [
157+
"import pandas as pd\n",
158+
"\n",
159+
"data = pd.read_csv(\"Shakespear/all_shakespeare.csv\", usecols=['modern', 'original'])[['modern', 'original']]\n",
160+
"data"
161+
]
162+
},
163+
{
164+
"cell_type": "code",
165+
"execution_count": 41,
166+
"id": "7089a045-9aa5-4c7a-9b53-8b0acfd96261",
167+
"metadata": {},
168+
"outputs": [],
169+
"source": [
170+
"data_1.to_csv(\"Shakespeare_Dataset_Full.csv\", index=False)"
171+
]
172+
},
173+
{
174+
"cell_type": "markdown",
175+
"id": "475ad7a8-6143-4d3e-baa8-9e4dad30b5b1",
176+
"metadata": {},
177+
"source": [
178+
"## Create Training Job"
179+
]
180+
},
181+
{
182+
"cell_type": "code",
183+
"execution_count": 42,
184+
"id": "ddd9d90d-d570-44b8-bdb4-66c99939355a",
185+
"metadata": {},
186+
"outputs": [],
187+
"source": [
188+
"import boto3\n",
189+
"s3_client = boto3.client('s3')\n",
190+
"s3_client.upload_file(\"Shakespeare_Dataset_Full.csv\", \"blog-posts-artifacts\", \"paraphrasing/training-data/Shakespeare_Dataset_Full.csv\")"
191+
]
192+
},
193+
{
194+
"cell_type": "code",
195+
"execution_count": 46,
196+
"id": "aeedf1cd-38d3-454d-9f09-8d8006884949",
197+
"metadata": {},
198+
"outputs": [],
199+
"source": [
200+
"import sagemaker\n",
201+
"from sagemaker.huggingface import HuggingFace\n",
202+
"\n",
203+
"# IAM role for executing training job\n",
204+
"role = 'YodaMaker'\n",
205+
"hyperparameters = {\n",
206+
" 'model_name_or_path': 't5-base',\n",
207+
" 'output_dir': '/opt/ml/model',\n",
208+
" 'train_file': '/opt/ml/input/data/train/Shakespeare_Dataset_Full.csv',\n",
209+
" 'source_prefix': 'paraphrase: ',\n",
210+
" 'learning_rate': 0.0001,\n",
211+
" 'do_train': True,\n",
212+
" 'num_train_epochs': 1,\n",
213+
" 'per_device_train_batch_size': 4,\n",
214+
" 'save_strategy': 'no',\n",
215+
"}"
216+
]
217+
},
218+
{
219+
"cell_type": "code",
220+
"execution_count": 47,
221+
"id": "eb99d596-291f-4840-9674-dbb8d5d4526f",
222+
"metadata": {},
223+
"outputs": [],
224+
"source": [
225+
"# Git configuration to download our fine-tuning script\n",
226+
"git_config = {'repo': 'https://github.com/huggingface/transformers.git','branch': 'v4.17.0'}\n",
227+
"\n",
228+
"# Creates Hugging Face estimator\n",
229+
"huggingface_estimator = HuggingFace(\n",
230+
" entry_point='run_summarization.py',\n",
231+
" source_dir='./examples/pytorch/summarization',\n",
232+
" output_path='s3://blog-posts-artifacts/paraphrasing/model-artifacts/',\n",
233+
" code_location='s3://blog-posts-artifacts/paraphrasing/training-checkpoints/',\n",
234+
" instance_type='ml.g4dn.xlarge',\n",
235+
" instance_count=1,\n",
236+
" role=role,\n",
237+
" git_config=git_config,\n",
238+
" transformers_version='4.17.0',\n",
239+
" pytorch_version='1.10.2',\n",
240+
" py_version='py38',\n",
241+
" hyperparameters = hyperparameters,\n",
242+
" tags=[{'Key':'owner','Value':'ali@datachef.co'}]\n",
243+
")"
244+
]
245+
},
246+
{
247+
"cell_type": "code",
248+
"execution_count": 48,
249+
"id": "17eda5a1-57cf-4bf6-8c54-b1d922c059be",
250+
"metadata": {},
251+
"outputs": [],
252+
"source": [
253+
"# Starting the training job\n",
254+
"huggingface_estimator.fit({'train': 's3://blog-posts-artifacts/paraphrasing/training-data/Shakespeare_Dataset_Full.csv'}, wait=False)"
255+
]
256+
},
257+
{
258+
"cell_type": "markdown",
259+
"id": "da378351-e082-452e-bcf1-93cb088b4894",
260+
"metadata": {},
261+
"source": [
262+
"## Deploy the trained model"
263+
]
264+
},
265+
{
266+
"cell_type": "code",
267+
"execution_count": 49,
268+
"id": "c8ddf69f-d4f8-49db-90c5-128b42c8d664",
269+
"metadata": {},
270+
"outputs": [
271+
{
272+
"name": "stdout",
273+
"output_type": "stream",
274+
"text": [
275+
"-----!"
276+
]
277+
}
278+
],
279+
"source": [
280+
"from sagemaker.huggingface import HuggingFaceModel\n",
281+
"import sagemaker\n",
282+
"\n",
283+
"# IAM role with permissions to create endpoint\n",
284+
"role = \"YodaMaker\"\n",
285+
"\n",
286+
"# S3 URI of the trained model\n",
287+
"model_uri = \"s3://blog-posts-artifacts/paraphrasing/model-artifacts/huggingface-pytorch-training-2022-05-11-09-33-42-249/output/model.tar.gz\"\n",
288+
"\n",
289+
"# Create Hugging Face Model Class\n",
290+
"huggingface_model = HuggingFaceModel(\n",
291+
" model_data=model_uri,\n",
292+
"\ttransformers_version='4.17.0',\n",
293+
"\tpytorch_version='1.10.2',\n",
294+
"\tpy_version='py38',\n",
295+
" role=role, \n",
296+
")\n",
297+
"\n",
298+
"# Deploy model to SageMaker Inference\n",
299+
"predictor = huggingface_model.deploy(\n",
300+
" initial_instance_count=1, # number of instances\n",
301+
" instance_type='ml.m5.2xlarge', # instance type\n",
302+
" tags=[{'Key':'owner','Value':'ali@datachef.co'}]\n",
303+
")"
304+
]
305+
},
306+
{
307+
"cell_type": "code",
308+
"execution_count": 58,
309+
"id": "1296247e-e90c-4625-b470-0e5ce4ddb146",
310+
"metadata": {},
311+
"outputs": [
312+
{
313+
"data": {
314+
"text/plain": [
315+
"[{'generated_text': 'The top of your wisdom is thou ability of enactment.'},\n",
316+
" {'generated_text': 'You are then the end in measure of your knowledge, your capacity to have it communicated to'},\n",
317+
" {'generated_text': 'The ultimate point of your knowledge is your capacity to convey it to his.'},\n",
318+
" {'generated_text': \"Final proof of your knowledge is your capacity to do it t' other.\"},\n",
319+
" {'generated_text': \"Your knowledge is the test, the end, that 'falsely test of \"},\n",
320+
" {'generated_text': 'The test of your knowledge is thy ability to be carried to another.'},\n",
321+
" {'generated_text': 'A truly honest test of your knowledge is your capacity to convey it to those who do not have'},\n",
322+
" {'generated_text': 'The chief test of your knowledge is your rigueur to tell it to another.'},\n",
323+
" {'generated_text': 'You must prove in your knowledge, to communicate it.'},\n",
324+
" {'generated_text': 'The absolute test of thy knowledge is to convey to another.'}]"
325+
]
326+
},
327+
"execution_count": 58,
328+
"metadata": {},
329+
"output_type": "execute_result"
330+
}
331+
],
332+
"source": [
333+
"#shakespeare\n",
334+
"predictor.predict({\"inputs\": \"paraphrase: The ultimate test of your knowledge is your capacity to convey it to another.\",\n",
335+
" \"parameters\" : {\"do_sample\":True, \"num_return_sequences\":10}})"
336+
]
337+
},
338+
{
339+
"cell_type": "code",
340+
"execution_count": 64,
341+
"id": "7762ae49-6e76-40a6-a67c-51c8a527e49b",
342+
"metadata": {},
343+
"outputs": [],
344+
"source": [
345+
"# Delete the endpoint\n",
346+
"predictor.delete_endpoint()"
347+
]
348+
}
349+
],
350+
"metadata": {
351+
"kernelspec": {
352+
"display_name": "Python 3",
353+
"language": "python",
354+
"name": "python3"
355+
},
356+
"language_info": {
357+
"codemirror_mode": {
358+
"name": "ipython",
359+
"version": 3
360+
},
361+
"file_extension": ".py",
362+
"mimetype": "text/x-python",
363+
"name": "python",
364+
"nbconvert_exporter": "python",
365+
"pygments_lexer": "ipython3",
366+
"version": "3.7.3"
367+
}
368+
},
369+
"nbformat": 4,
370+
"nbformat_minor": 5
371+
}

0 commit comments

Comments
 (0)