-
Notifications
You must be signed in to change notification settings - Fork 0
XSEDE 2013 BigJob Tutorial
Use SSH to log into your tutorial account (Windows users can use PuTTY. Please use the username / password combination that were given to you at the beginning of the tutorial.
ssh <username>@repex1.tacc.utexas.edu
Once logged-in, make sure that you can log-in to Stampede:
ssh sagatut@stampede.tacc.utexas.edu exit
Next, you need to install BigJob in your user account. Since BigJob, just like saga-python, is written in Python, you can use virtualenv to create a local installation:
virtualenv $HOME/bigjobenv
. $HOME/bigjobenv/bin/activate
The BigJob package that we are using is called saga-bigjob and can be installed via pip:
pip install saga-bigjob
This example runs NUMBER_JOBS (32) concurrent '/bin/echo' tasks on TACC's Stampede cluster. A 32-core pilot job is initialized and 32 single-core tasks are submitted to it. This example also show basic error handling via 'try/except' and coordinated shutdown (removing pilot from stampede's queue) once all tasks have finished running via finally (line 74).
Preparation
-
Take a look at the full example code on GitHub.
-
Create a new file in your home directory, copy & paste the code into it and save it, e.g., as
simple-ensemble.py.
Execution
Execute the Python script:
python simple-ensemble.py
The output will look something like this:
* Submitted task '0' with id 'cu-262ee4a2-e992-11e2-9fe1-14109fd519a1' to stampede.tacc.utexas.edu
* Submitted task '1' with id 'cu-26464cbe-e992-11e2-9fe1-14109fd519a1' to stampede.tacc.utexas.edu
[...]
* Submitted task '31' with id 'cu-2905ac74-e992-11e2-9fe1-14109fd519a1' to stampede.tacc.utexas.edu
Waiting for tasks to finish...
Terminating BigJob...
Discussion
This tutorial example extends and improves the first example by adding file transfer: once the 32 tasks have finished executing, we use SAGA-Python to transfer the individual output files back to the local machine.
Preparation
-
Take a look at the full example code on GitHub.
-
Create a new file in your home directory, copy & paste the code into it and save it, e.g., as
simple-ensemble-datatransfer.py.
Execution
Execute the Python script:
python simple-ensemble-datatransfer.py
The output will look something like this:
* Submitted task '0' with id 'cu-9bfd334c-e996-11e2-8e8b-14109fd519a1' to stampede.tacc.utexas.edu
* Submitted task '1' with id 'cu-9c169a1c-e996-11e2-8e8b-14109fd519a1' to stampede.tacc.utexas.edu
[...]
* Submitted task '31' with id 'cu-2905ac74-e992-11e2-9fe1-14109fd519a1' to stampede.tacc.utexas.edu
Waiting for tasks to finish...
* Output for 'cu-9bfd334c-e996-11e2-8e8b-14109fd519a1' copied to: './ex-2-stdout-cu-9bfd334c-e996-11e2-8e8b-14109fd519a1.txt'
* Output for 'cu-9c169a1c-e996-11e2-8e8b-14109fd519a1' copied to: './ex-2-stdout-cu-9c169a1c-e996-11e2-8e8b-14109fd519a1.txt'
[...]
* Output for 'cu-a0bc8202-e996-11e2-8e8b-14109fd519a1' copied to: './ex-2-stdout-cu-a0bc8202-e996-11e2-8e8b-14109fd519a1.txt'
Terminating BigJob...
If you open and look at the ex-2-stdout-* files, you will see the output of the tasks.
This tutorial example introduces task synchronization. It submits a set of 32 '/bin/echo' tasks (task set A). For every successfully completed task, we submits another '/bin/cat' task from task set B to the same Pilot-Job. Task from set A can be seen as producers and tasks from task set B as consumers, since B-tasks read 'consume' the output file an A-tasks.
Preparation
-
Take a look at the full example code on GitHub.
-
Create a new file in your home directory, copy & paste the code into it and save it, e.g., as
chained_ensemble.py.
Execution
Execute the Python script:
python chained_ensemble.py
The output will look something like this:
* Submitted 'A' task '0' with id 'cu-27ab3846-e9a9-11e2-88eb-14109fd519a1'
* Submitted 'A' task '1' with id 'cu-27c2cca4-e9a9-11e2-88eb-14109fd519a1'
[...]
One 'A' task cu-27ab3846-e9a9-11e2-88eb-14109fd519a1 finished. Launching a 'B' task.
* Submitted 'B' task '31' with id 'cu-352139c6-e9a9-11e2-88eb-14109fd519a1'
[...]
* Output for 'cu-352139c6-e9a9-11e2-88eb-14109fd519a1' copied to: './ex2-stdout-cu-352139c6-e9a9-11e2-88eb-14109fd519a1.txt'
* Output for 'cu-353e2946-e9a9-11e2-88eb-14109fd519a1' copied to: './ex2-stdout-cu-353e2946-e9a9-11e2-88eb-14109fd519a1.txt'
[...]
* Output for 'cu-399a8ea8-e9a9-11e2-88eb-14109fd519a1' copied to: './ex2-stdout-cu-399a8ea8-e9a9-11e2-88eb-14109fd519a1.txt'
Terminating BigJob...
If you open and look at the ex-3-stdout-* files, you will see the output of the B-tasks, which is just the 'forwarded' content they read from the A-task outputs.
Discussion
This tutorial example shows another form of task set synchronization. It exemplifies a simple workflow which submit a set of tasks (set A) and (set B) and wait until they are completed until it submits another set of tasks (set C). Both A- and B-tasks are 'producers'. C-tasks 'consumers' and concatenate the output of an A- and a B-tasks.
Preparation
-
Take a look at the full example code on GitHub.
-
Create a new file in your home directory, copy & paste the code into it and save it, e.g., as
coupled_ensembles.py.
Execution
Execute the Python script:
python coupled_ensembles.py
The output will look something like this:
* Submitted 'A' task '0' with id 'cu-833b3762-e9ac-11e2-b250-14109fd519a1'
* Submitted 'A' task '1' with id 'cu-8352c0f8-e9ac-11e2-b250-14109fd519a1'
[...]
* Submitted 'A' task '31' with id 'cu-86137aee-e9ac-11e2-b250-14109fd519a1'
* Submitted 'B' task '0' with id 'cu-862ad342-e9ac-11e2-b250-14109fd519a1'
[...]
* Submitted 'B' task '31' with id 'cu-88fe4c2a-e9ac-11e2-b250-14109fd519a1'
Waiting for 'A' and 'B' tasks to complete...
* Submitted 'C' task '0' with id 'cu-ffb024ce-e9ac-11e2-b250-14109fd519a1'
[...]
* Submitted 'C' task '31' with id 'cu-0281b708-e9ad-11e2-b250-14109fd519a1'
Waiting for 'C' tasks to complete...
* Output for 'cu-ffb024ce-e9ac-11e2-b250-14109fd519a1' copied to: './ex4-stdout-cu-ffb024ce-e9ac-11e2-b250-14109fd519a1.txt'
[...]
* Output for 'cu-0281b708-e9ad-11e2-b250-14109fd519a1' copied to: './ex4-stdout-cu-0281b708-e9ad-11e2-b250-14109fd519a1.txt'
Terminating BigJob...
If you open and look at the ex-4-stdout-* files, you will see the output of the C-tasks which is the concatenated output of the A- and B- tasks.
Discussion
DESCRIBE EXAMPLE
Preparation
-
Install the Python Image Library (PIL):
pip install PIL
-
Download the Mandelbrot application kernel and the 'bootstrap' script:
curl --insecure -Os https://raw.github.com/saga-project/BigJob/develop-prod/examples/xsede2013/mandelbrot.sh curl --insecure -Os https://raw.github.com/saga-project/BigJob/develop-prod/examples/xsede2013/mandelbrot.py
-
Take a look at the full example code on GitHub.
-
Create a new file in your home directory, copy & paste the code into it and save it, e.g., as
bigjob_mandebrot.py.
Execution
Execute the Python script:
python bigjob_mandelbrot.py
The output will look something like this:
* Submitted task 'cu-6f26b08c-ee05-11e2-9309-005056a13723' to sagatut@stampede.tacc.utexas.edu
* Submitted task 'cu-6f3cff54-ee05-11e2-9309-005056a13723' to sagatut@stampede.tacc.utexas.edu
[...]
* Submitted task 'cu-706628ec-ee05-11e2-9309-005056a13723' to sagatut@stampede.tacc.utexas.edu
Waiting for tasks to finish...
* Copying sftp://sagatut@stampede.tacc.utexas.edu//home1/02554/sagatut/XSEDETutorial/tutorial00/example5/tile_x0_y0.gif back to /home/tutorial-00
* Copying sftp://sagatut@stampede.tacc.utexas.edu//home1/02554/sagatut/XSEDETutorial/tutorial-00/example5/tile_x0_y1.gif back to /home/tutorial-00
[...]
* Copying sftp://sagatut@stampede.tacc.utexas.edu//home1/02554/sagatut/XSEDETutorial/tutorial-00/example5/tile_x3_y3.gif back to /home/tutorial-00
* Stitching together the whole fractal: mandelbrot_full.gif
Terminating BigJob...
You can copy the final output file mandelbrot_full.gif back to your laptop (e.g., via sftp or scp) and open it with an image viewer. You should see the full 8192x8192 Mandelbrot fractal.
Discussion
TODO: Compare and contrast with saga-python mandelbrot