Skip to content

XSEDE 2013 BigJob Tutorial

Melissa edited this page Jul 17, 2013 · 60 revisions

Preparation

Use SSH to log into your tutorial account (Windows users can use PuTTY. Please use the username / password combination that were given to you at the beginning of the tutorial.

ssh <username>@repex1.tacc.utexas.edu

Once logged-in, make sure that you can log-in to Stampede:

ssh sagatut@stampede.tacc.utexas.edu exit

Installation

Next, you need to install BigJob in your user account. Since BigJob, just like saga-python, is written in Python, you can use virtualenv to create a local installation:

virtualenv $HOME/bigjobenv
. $HOME/bigjobenv/bin/activate

The BigJob package that we are using is called saga-bigjob and can be installed via pip:

pip install saga-bigjob

Code Example 1: Simple Ensemble

This example runs NUMBER_JOBS (32) concurrent '/bin/echo' tasks on TACC's Stampede cluster. A 32-core pilot job is initialized and 32 single-core tasks are submitted to it. This example also show basic error handling via 'try/except' and coordinated shutdown (removing pilot from stampede's queue) once all tasks have finished running via finally (line 74).

Preparation

  1. Take a look at the full example code on GitHub.

  2. Create a new file in your home directory, copy & paste the code into it and save it, e.g., as simple-ensemble.py.

Execution

Execute the Python script:

python simple-ensemble.py

The output will look something like this:

* Submitted task '0' with id 'cu-262ee4a2-e992-11e2-9fe1-14109fd519a1' to stampede.tacc.utexas.edu
* Submitted task '1' with id 'cu-26464cbe-e992-11e2-9fe1-14109fd519a1' to stampede.tacc.utexas.edu
[...]
* Submitted task '31' with id 'cu-2905ac74-e992-11e2-9fe1-14109fd519a1' to stampede.tacc.utexas.edu
Waiting for tasks to finish...
Terminating BigJob...

Discussion

Code Example 2: Adding Data Transfer

This tutorial example extends and improves the first example by adding file transfer: once the 32 tasks have finished executing, we use SAGA-Python to transfer the individual output files back to the local machine.

Preparation

  1. Take a look at the full example code on GitHub.

  2. Create a new file in your home directory, copy & paste the code into it and save it, e.g., as simple-ensemble-datatransfer.py.

Execution

Execute the Python script:

python simple-ensemble-datatransfer.py

The output will look something like this:

* Submitted task '0' with id 'cu-9bfd334c-e996-11e2-8e8b-14109fd519a1' to stampede.tacc.utexas.edu
* Submitted task '1' with id 'cu-9c169a1c-e996-11e2-8e8b-14109fd519a1' to stampede.tacc.utexas.edu
[...]
* Submitted task '31' with id 'cu-2905ac74-e992-11e2-9fe1-14109fd519a1' to stampede.tacc.utexas.edu
Waiting for tasks to finish...
* Output for 'cu-9bfd334c-e996-11e2-8e8b-14109fd519a1' copied to: './ex-2-stdout-cu-9bfd334c-e996-11e2-8e8b-14109fd519a1.txt'
* Output for 'cu-9c169a1c-e996-11e2-8e8b-14109fd519a1' copied to: './ex-2-stdout-cu-9c169a1c-e996-11e2-8e8b-14109fd519a1.txt'
[...]
* Output for 'cu-a0bc8202-e996-11e2-8e8b-14109fd519a1' copied to: './ex-2-stdout-cu-a0bc8202-e996-11e2-8e8b-14109fd519a1.txt'
Terminating BigJob...

If you open and look at the ex-2-stdout-* files, you will see the output of the tasks.

Code Example 3: Chained Ensemble

This tutorial example introduces task synchronization. It submits a set of 32 '/bin/echo' tasks (task set A). For every successfully completed task, we submits another '/bin/cat' task from task set B to the same Pilot-Job. Task from set A can be seen as producers and tasks from task set B as consumers, since B-tasks read 'consume' the output file an A-tasks.

Preparation

  1. Take a look at the full example code on GitHub.

  2. Create a new file in your home directory, copy & paste the code into it and save it, e.g., as chained_ensemble.py.

Execution

Execute the Python script:

python chained_ensemble.py

The output will look something like this:

* Submitted 'A' task '0' with id 'cu-27ab3846-e9a9-11e2-88eb-14109fd519a1'
* Submitted 'A' task '1' with id 'cu-27c2cca4-e9a9-11e2-88eb-14109fd519a1'
[...]
One 'A' task cu-27ab3846-e9a9-11e2-88eb-14109fd519a1 finished. Launching a 'B' task.
* Submitted 'B' task '31' with id 'cu-352139c6-e9a9-11e2-88eb-14109fd519a1'
[...]
* Output for 'cu-352139c6-e9a9-11e2-88eb-14109fd519a1' copied to: './ex2-stdout-cu-352139c6-e9a9-11e2-88eb-14109fd519a1.txt'
* Output for 'cu-353e2946-e9a9-11e2-88eb-14109fd519a1' copied to: './ex2-stdout-cu-353e2946-e9a9-11e2-88eb-14109fd519a1.txt'
[...]
* Output for 'cu-399a8ea8-e9a9-11e2-88eb-14109fd519a1' copied to: './ex2-stdout-cu-399a8ea8-e9a9-11e2-88eb-14109fd519a1.txt'
Terminating BigJob...

If you open and look at the ex-3-stdout-* files, you will see the output of the B-tasks, which is just the 'forwarded' content they read from the A-task outputs.

Discussion

Code Example 4: Coupled Ensemble

This tutorial example shows another form of task set synchronization. It exemplifies a simple workflow which submit a set of tasks (set A) and (set B) and wait until they are completed until it submits another set of tasks (set C). Both A- and B-tasks are 'producers'. C-tasks 'consumers' and concatenate the output of an A- and a B-tasks.

Preparation

  1. Take a look at the full example code on GitHub.

  2. Create a new file in your home directory, copy & paste the code into it and save it, e.g., as coupled_ensembles.py.

Execution

Execute the Python script:

python coupled_ensembles.py

The output will look something like this:

* Submitted 'A' task '0' with id 'cu-833b3762-e9ac-11e2-b250-14109fd519a1'
* Submitted 'A' task '1' with id 'cu-8352c0f8-e9ac-11e2-b250-14109fd519a1'
[...]
* Submitted 'A' task '31' with id 'cu-86137aee-e9ac-11e2-b250-14109fd519a1'
* Submitted 'B' task '0' with id 'cu-862ad342-e9ac-11e2-b250-14109fd519a1'
[...]
* Submitted 'B' task '31' with id 'cu-88fe4c2a-e9ac-11e2-b250-14109fd519a1'
Waiting for 'A' and 'B' tasks to complete...
* Submitted 'C' task '0' with id 'cu-ffb024ce-e9ac-11e2-b250-14109fd519a1'
[...]
* Submitted 'C' task '31' with id 'cu-0281b708-e9ad-11e2-b250-14109fd519a1'
Waiting for 'C' tasks to complete...
* Output for 'cu-ffb024ce-e9ac-11e2-b250-14109fd519a1' copied to: './ex4-stdout-cu-ffb024ce-e9ac-11e2-b250-14109fd519a1.txt'
[...]
* Output for 'cu-0281b708-e9ad-11e2-b250-14109fd519a1' copied to: './ex4-stdout-cu-0281b708-e9ad-11e2-b250-14109fd519a1.txt'

Terminating BigJob...

If you open and look at the ex-4-stdout-* files, you will see the output of the C-tasks which is the concatenated output of the A- and B- tasks.

Discussion

Code Example 5: Mandelbrot

In this example, we split up the calculation of a Mandelbrot set into several tiles.

Rather than submit a single job for each tile, we want to submit one BigJob and then execute each individual tile job in a distributed fashion. This is why BigJob is useful. We reserve the resources we need for all of the jobs, but submit just one job that requests all of these resources. Once the job becomes active, the compute units are executed in a distributed fashion. The tiles are then retrieved using features included in BigJob such as the SAGA File API, and the final image is stitched together from the individual tiles.

Mandelbrot Tiles

Preparation

  1. Install the Python Image Library (PIL):

    pip install PIL
  2. Download the Mandelbrot application kernel and the 'bootstrap' script:

    curl --insecure -Os https://raw.github.com/saga-project/BigJob/develop-prod/examples/xsede2013/mandelbrot.sh
    
    curl --insecure -Os https://raw.github.com/saga-project/BigJob/develop-prod/examples/xsede2013/mandelbrot.py
  3. Take a look at the full example code on GitHub.

  4. Create a new file in your home directory, copy & paste the code into it and save it, e.g., as bigjob_mandelbrot.py.

Execution

Execute the Python script:

python bigjob_mandelbrot.py

The output will look something like this:

* Submitted task 'cu-6f26b08c-ee05-11e2-9309-005056a13723' to sagatut@stampede.tacc.utexas.edu
* Submitted task 'cu-6f3cff54-ee05-11e2-9309-005056a13723' to sagatut@stampede.tacc.utexas.edu
[...]
* Submitted task 'cu-706628ec-ee05-11e2-9309-005056a13723' to sagatut@stampede.tacc.utexas.edu
Waiting for tasks to finish...
* Copying sftp://sagatut@stampede.tacc.utexas.edu//home1/02554/sagatut/XSEDETutorial/tutorial00/example5/tile_x0_y0.gif back to /home/tutorial-00
* Copying sftp://sagatut@stampede.tacc.utexas.edu//home1/02554/sagatut/XSEDETutorial/tutorial-00/example5/tile_x0_y1.gif back to /home/tutorial-00
[...]
* Copying sftp://sagatut@stampede.tacc.utexas.edu//home1/02554/sagatut/XSEDETutorial/tutorial-00/example5/tile_x3_y3.gif back to /home/tutorial-00
* Stitching together the whole fractal: mandelbrot_full.gif
Terminating BigJob...

You can copy the final output file mandelbrot_full.gif back to your laptop (e.g., via sftp or scp) and open it with an image viewer. You should see the full 8192x8192 Mandelbrot fractal.

Discussion

TODO: Compare and contrast with saga-python mandelbrot

Clone this wiki locally