Skip to content

Commit 7baec71

Browse files
committed
Changed Dockerfile Environment names, updated Readme and Entrypoint Documentation
1 parent ce9add8 commit 7baec71

6 files changed

Lines changed: 30 additions & 29 deletions

File tree

Dockerfile

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@ WORKDIR /experiment/code
2222

2323
# Be careful to not add comments after the env variables - they will be added to the string
2424

25-
ENV do_train true
26-
ENV do_val true
27-
ENV do_test true
25+
ENV DO_TRAIN true
26+
ENV DO_VALID true
27+
ENV DO_TEST true
2828

2929
ENV lang java
3030
ENV lr 5e-5
@@ -35,12 +35,12 @@ ENV target_length 128
3535
ENV data_dir /dataset
3636
ENV output_dir /experiment/output
3737
ENV train_file $data_dir/train_minimal.jsonl
38-
ENV dev_file $data_dir/valid_minimal.jsonl
38+
ENV valid_file $data_dir/valid_minimal.jsonl
3939
ENV test_file $data_dir/test_minimal.jsonl
4040
ENV epochs 10
4141
ENV pretrained_model microsoft/codebert-base
4242

4343
ENV load_existing_model false
44-
#ENV load_model_path /models/pytorch_model.bin
44+
ENV load_model_path /models/pytorch_model.bin
4545

4646
ENTRYPOINT ["bash","./entrypoint.sh"]

README.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,7 @@
22

33
This repository holds a docker image which reproduces [Microsofts CodeBERT Code-To-Text Experiment](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Text/code-to-text).
44

5-
The subparts have been minimally changed (see [changes](./changes.md)), but mostly it is just wrapping the experiment in a cpu-based docker image.
6-
There is currently no GPU-Image.
5+
The subparts have been minimally changed (see [changes](./changes.md)), but mostly it is just wrapping the experiment in a docker-image.
76

87
The initial readme can be [found here](./initial_readme.md).
98

@@ -14,6 +13,9 @@ The shell file runs the instructions from the initial readme and adds some more
1413
It worked flawlessly for me on a mac, so I did not want to make extra docker image for data-preprocessing.
1514
Depending on your distribution, you might need to install things like wget.
1615

16+
**Note:** The step before is necessary! the `dataset.zip` only contains references to the dataset and is *unfolded* in `prepare.sh`.
17+
18+
1719
After that, change the docker-compose to point to your files (including filenames) and set environment variables as fit.
1820

1921
You can build the docker file beforehand using
@@ -92,4 +94,5 @@ CodeBert_CodeToText_Experiment_0_1 | ./entrypoint.sh: line 14: $'\r': command n
9294
CodeBert_CodeToText_Experiment_0_1 | ./entrypoint.sh: line 200: syntax error: unexpected end of file
9395
```
9496
This is due to windows changing the line-breaks / file encodings. Thanks windows.
97+
**Easy Solution**: run `dos2unix entrypoint.sh` and rebuild the container.
9598
Its might easier/faster to pull the image from this repository, or you have to [edit the entrypoint to be compatible with windows](https://askubuntu.com/questions/966488/how-do-i-fix-r-command-not-found-errors-running-bash-scripts-in-wsl).

docker-compose-minimal.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,5 +9,5 @@ services:
99
environment:
1010
epochs: 10
1111
train_file: /dataset/train_minimal.jsonl
12-
dev_file: /dataset/valid_minimal.jsonl
12+
valid_file: /dataset/valid_minimal.jsonl
1313
test_file: /dataset/test_minimal.jsonl

docker-compose-pretrained-minimal.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ services:
1515
do_train: "false"
1616
do_val: "true"
1717
do_test: "true"
18-
dev_file: /dataset/valid_minimal.jsonl
18+
valid_file: /dataset/valid_minimal.jsonl
1919
test_file: /dataset/test_minimal.jsonl
2020
no_cuda: "true"
2121
pretrained_model: microsoft/codebert-base

docker-compose.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,6 @@ services:
1313
environment:
1414
epochs: 5
1515
train_file: /dataset/train.jsonl
16-
dev_file: /dataset/valid.jsonl
16+
valid_file: /dataset/valid.jsonl
1717
test_file: /dataset/test.jsonl
1818
batch_size: 8

entrypoint.sh

Lines changed: 17 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,6 @@
44

55
# This file invokes the original python code of the codebert text with the environment variables set in the docker container.
66
# Additionally, it does a switch-case which flags for training, validation and testing have been set
7-
# And it uses an anaconda environment to provide the dependencies.
8-
9-
# Without Anacondas --no-capture-output flag the system prints from the run.py would be hidden until the anaconda process exits. This flag is optional but highly helpful.
10-
# Anacondas "-n" parameter specifies which conda-env is used to run the script. It must match the name provided in 'environment.yml'.
117

128
# The use of exit without a number returns the exit code of the fore-going statement - that is in this case the anaconda command.
139
# The Exit codes are necessary, as otherwise all cases are run (atleast, all cases with flags set).
@@ -21,12 +17,12 @@
2117
if [ "$load_existing_model" = true ]; then
2218
echo "Found flag to load a model under $load_model_path"
2319

24-
if [ "$do_train" = true -a "$do_test" = true -a "$do_val" = true ]; then
20+
if [ "$DO_TRAIN" = true -a "$DO_TEST" = true -a "$DO_VALID" = true ]; then
2521
echo "performing full run with training, validation and test"
2622
python ./run.py \
2723
--do_train --do_test --do_eval \
2824
--model_type roberta --model_name_or_path $pretrained_model \
29-
--train_filename $train_file --test_filename $test_file --dev_filename $dev_file \
25+
--train_filename $train_file --test_filename $test_file --dev_filename $valid_file \
3026
--output_dir $output_dir \
3127
--max_source_length $source_length \
3228
--max_target_length $target_length \
@@ -37,12 +33,12 @@ if [ "$load_existing_model" = true ]; then
3733
--load_model_path $load_model_path
3834
exit
3935
fi
40-
if [ "$do_train" = true -a "$do_val" = true ]; then
36+
if [ "$DO_TRAIN" = true -a "$DO_VALID" = true ]; then
4137
echo "performing run with training and validation"
4238
python ./run.py \
4339
--do_train --do_eval \
4440
--model_type roberta --model_name_or_path $pretrained_model \
45-
--train_filename $train_file --dev_filename $dev_file \
41+
--train_filename $train_file --dev_filename $valid_file \
4642
--output_dir $output_dir \
4743
--max_source_length $source_length \
4844
--max_target_length $target_length \
@@ -53,7 +49,7 @@ if [ "$load_existing_model" = true ]; then
5349
--num_train_epochs $epochs
5450
exit
5551
fi
56-
if [ "$do_train" = true -a "$do_test" = true ]; then
52+
if [ "$DO_TRAIN" = true -a "$DO_TEST" = true ]; then
5753
echo "performing run with training and test"
5854
python ./run.py \
5955
--do_train --do_test \
@@ -69,7 +65,7 @@ if [ "$load_existing_model" = true ]; then
6965
--load_model_path $load_model_path
7066
exit
7167
fi
72-
if [ "$do_train" = true ]; then
68+
if [ "$DO_TRAIN" = true ]; then
7369
echo "performing run with (only) training"
7470
python ./run.py \
7571
--do_train \
@@ -86,7 +82,7 @@ if [ "$load_existing_model" = true ]; then
8682
--load_model_path $load_model_path
8783
exit 0
8884
fi
89-
if [ "$do_test" = true ]; then
85+
if [ "$DO_TEST" = true ]; then
9086
echo "performing run with (only) testing"
9187
python ./run.py \
9288
--do_test \
@@ -106,12 +102,12 @@ fi
106102
# Case 2: No Pretrained Model
107103
# ============================================
108104

109-
if [ "$do_train" = true -a "$do_test" = true -a "$do_val" = true ]; then
105+
if [ "$DO_TRAIN" = true -a "$DO_TEST" = true -a "$DO_VALID" = true ]; then
110106
echo "performing full run with training, validation and test"
111107
python ./run.py \
112108
--do_train --do_test --do_eval \
113109
--model_type roberta --model_name_or_path $pretrained_model \
114-
--train_filename $train_file --test_filename $test_file --dev_filename $dev_file \
110+
--train_filename $train_file --test_filename $test_file --dev_filename $valid_file \
115111
--output_dir $output_dir \
116112
--max_source_length $source_length \
117113
--max_target_length $target_length \
@@ -121,12 +117,12 @@ if [ "$do_train" = true -a "$do_test" = true -a "$do_val" = true ]; then
121117
--num_train_epochs $epochs
122118
exit
123119
fi
124-
if [ "$do_train" = true -a "$do_val" = true ]; then
120+
if [ "$DO_TRAIN" = true -a "$DO_VALID" = true ]; then
125121
echo "performing run with training and validation"
126122
python ./run.py \
127123
--do_train --do_eval \
128124
--model_type roberta --model_name_or_path $pretrained_model \
129-
--train_filename $train_file --dev_filename $dev_file \
125+
--train_filename $train_file --dev_filename $valid_file \
130126
--output_dir $output_dir \
131127
--max_source_length $source_length \
132128
--max_target_length $target_length \
@@ -137,7 +133,7 @@ if [ "$do_train" = true -a "$do_val" = true ]; then
137133
exit
138134
fi
139135

140-
if [ "$do_train" = true -a "$do_test" = true ]; then
136+
if [ "$DO_TRAIN" = true -a "$DO_TEST" = true ]; then
141137
echo "performing run with training and test"
142138
python ./run.py \
143139
--do_train --do_test \
@@ -152,7 +148,7 @@ if [ "$do_train" = true -a "$do_test" = true ]; then
152148
--num_train_epochs $epochs
153149
exit
154150
fi
155-
if [ "$do_train" = true ]; then
151+
if [ "$DO_TRAIN" = true ]; then
156152
echo "performing run with (only) training"
157153
python ./run.py \
158154
--do_train \
@@ -168,7 +164,7 @@ if [ "$do_train" = true ]; then
168164
--num_train_epochs $epochs
169165
exit 0
170166
fi
171-
if [ "$do_test" = true ]; then
167+
if [ "$DO_TEST" = true ]; then
172168
echo "performing run with (only) testing"
173169
python ./run.py \
174170
--do_test \
@@ -182,7 +178,9 @@ if [ "$do_test" = true ]; then
182178
exit
183179
fi
184180

185-
# Case 3: Error / Unknown
181+
# ===================================
182+
# Case 3: Error / Unknown
183+
# ===================================
186184

187185
echo "no flags set - please inspect your compose"
188186
exit 1

0 commit comments

Comments
 (0)