Skip to content

XSEDE 2013 BigJob Tutorial

Ole Weidner edited this page Jul 16, 2013 · 60 revisions

Preparation

Use SSH to log into your tutorial account (Windows users can use PuTTY. Please use the username / password combination that were given to you at the beginning of the tutorial.

ssh <username>@repex1.tacc.utexas.edu

Once logged-in, make sure that you can log-in to Stampede:

ssh sagatut@stampede.tacc.utexas.edu exit

Installation

Next, you need to install BigJob in your user account. Since BigJob, just like saga-python, is written in Python, you can use virtualenv to create a local installation:

virtualenv $HOME/bigjobenv
. $HOME/bigjobenv/bin/activate

The BigJob package that we are using is called saga-bigjob and can be installed via pip:

pip install saga-bigjob

Code Example 1: Simple Ensemble

This example runs NUMBER_JOBS (32) concurrent '/bin/echo' tasks on TACC's Stampede cluster. A 32-core pilot job is initialized and 32 single-core tasks are submitted to it. This example also show basic error handling via 'try/except' and coordinated shutdown (removing pilot from stampede's queue) once all tasks have finished running via finally (line 74).

Preparation

  1. Take a look at the full example code on GitHub.

  2. Create a new file in your home directory, copy & paste the code into it and save it, e.g., as simple-ensemble.py.

Execution

Execute the Python script:

python simple-ensemble.py

The output will look something like this:

* Submitted task '0' with id 'cu-262ee4a2-e992-11e2-9fe1-14109fd519a1' to stampede.tacc.utexas.edu
* Submitted task '1' with id 'cu-26464cbe-e992-11e2-9fe1-14109fd519a1' to stampede.tacc.utexas.edu
[...]
* Submitted task '31' with id 'cu-2905ac74-e992-11e2-9fe1-14109fd519a1' to stampede.tacc.utexas.edu
Waiting for tasks to finish...
Terminating BigJob...

Discussion

Code Example 2: Adding Data Transfer

This tutorial example extends and improves the first example by adding file transfer: once the 32 tasks have finished executing, we use SAGA-Python to transfer the individual output files back to the local machine.

Preparation

  1. Take a look at the full example code on GitHub.

  2. Create a new file in your home directory, copy & paste the code into it and save it, e.g., as simple-ensemble-datatransfer.py.

Execution

Execute the Python script:

python simple-ensemble-datatransfer.py

The output will look something like this:

* Submitted task '0' with id 'cu-9bfd334c-e996-11e2-8e8b-14109fd519a1' to stampede.tacc.utexas.edu
* Submitted task '1' with id 'cu-9c169a1c-e996-11e2-8e8b-14109fd519a1' to stampede.tacc.utexas.edu
[...]
* Submitted task '31' with id 'cu-2905ac74-e992-11e2-9fe1-14109fd519a1' to stampede.tacc.utexas.edu
Waiting for tasks to finish...
* Output for 'cu-9bfd334c-e996-11e2-8e8b-14109fd519a1' copied to: './ex-2-stdout-cu-9bfd334c-e996-11e2-8e8b-14109fd519a1.txt'
* Output for 'cu-9c169a1c-e996-11e2-8e8b-14109fd519a1' copied to: './ex-2-stdout-cu-9c169a1c-e996-11e2-8e8b-14109fd519a1.txt'
[...]
* Output for 'cu-a0bc8202-e996-11e2-8e8b-14109fd519a1' copied to: './ex-2-stdout-cu-a0bc8202-e996-11e2-8e8b-14109fd519a1.txt'
Terminating BigJob...

If you open and look at the ex-2-stdout-* files, you will see the output of the tasks.

Code Example 3: Chained Ensemble

This tutorial example introduces task synchronization. It submits a set of 32 '/bin/echo' tasks (task set A). For every successfully completed task, we submits another '/bin/cat' task from task set B to the same Pilot-Job. Task from set A can be seen as producers and tasks from task set B as consumers, since B-tasks read 'consume' the output file an A-tasks.

Preparation

  1. Take a look at the full example code on GitHub.

  2. Create a new file in your home directory, copy & paste the code into it and save it, e.g., as chained_ensemble.py.

Execution

Execute the Python script:

python chained_ensemble.py

The output will look something like this:

* Submitted 'A' task '0' with id 'cu-27ab3846-e9a9-11e2-88eb-14109fd519a1'
* Submitted 'A' task '1' with id 'cu-27c2cca4-e9a9-11e2-88eb-14109fd519a1'
[...]
One 'A' task cu-27ab3846-e9a9-11e2-88eb-14109fd519a1 finished. Launching a 'B' task.
* Submitted 'B' task '31' with id 'cu-352139c6-e9a9-11e2-88eb-14109fd519a1'
[...]
* Output for 'cu-352139c6-e9a9-11e2-88eb-14109fd519a1' copied to: './ex2-stdout-cu-352139c6-e9a9-11e2-88eb-14109fd519a1.txt'
* Output for 'cu-353e2946-e9a9-11e2-88eb-14109fd519a1' copied to: './ex2-stdout-cu-353e2946-e9a9-11e2-88eb-14109fd519a1.txt'
[...]
* Output for 'cu-399a8ea8-e9a9-11e2-88eb-14109fd519a1' copied to: './ex2-stdout-cu-399a8ea8-e9a9-11e2-88eb-14109fd519a1.txt'
Terminating BigJob...

If you open and look at the ex-3-stdout-* files, you will see the output of the B-tasks, which is just the 'forwarded' content they read from the A-task outputs.

Discussion

Code Example 4: Coupled Ensemble

This tutorial example shows another form of task set synchronization. It exemplifies a simple workflow which submit a set of tasks (set A) and (set B) and wait until they are completed until it submits another set of tasks (set C). Both A- and B-tasks are 'producers'. C-tasks 'consumers' and concatenate the output of an A- and a B-tasks.

Preparation

  1. Take a look at the full example code on GitHub.

  2. Create a new file in your home directory, copy & paste the code into it and save it, e.g., as coupled_ensembles.py.

Execution

Execute the Python script:

python coupled_ensembles.py

The output will look something like this:

* Submitted 'A' task '0' with id 'cu-833b3762-e9ac-11e2-b250-14109fd519a1'
* Submitted 'A' task '1' with id 'cu-8352c0f8-e9ac-11e2-b250-14109fd519a1'
[...]
* Submitted 'A' task '31' with id 'cu-86137aee-e9ac-11e2-b250-14109fd519a1'
* Submitted 'B' task '0' with id 'cu-862ad342-e9ac-11e2-b250-14109fd519a1'
[...]
* Submitted 'B' task '31' with id 'cu-88fe4c2a-e9ac-11e2-b250-14109fd519a1'
Waiting for 'A' and 'B' tasks to complete...
* Submitted 'C' task '0' with id 'cu-ffb024ce-e9ac-11e2-b250-14109fd519a1'
[...]
* Submitted 'C' task '31' with id 'cu-0281b708-e9ad-11e2-b250-14109fd519a1'
Waiting for 'C' tasks to complete...
* Output for 'cu-ffb024ce-e9ac-11e2-b250-14109fd519a1' copied to: './ex4-stdout-cu-ffb024ce-e9ac-11e2-b250-14109fd519a1.txt'
[...]
* Output for 'cu-0281b708-e9ad-11e2-b250-14109fd519a1' copied to: './ex4-stdout-cu-0281b708-e9ad-11e2-b250-14109fd519a1.txt'

Terminating BigJob...

If you open and look at the ex-4-stdout-* files, you will see the output of the C-tasks which is the concatenated output of the A- and B- tasks.

Discussion

Code Example 5: Mandelbrot

DESCRIBE EXAMPLE

Preparation

  1. Install the Python Image Library (PIL):

    pip install PIL
  2. Download the Mandelbrot application kernel and the 'bootstrap' script:

    curl --insecure -Os https://raw.github.com/saga-project/BigJob/develop-prod/examples/xsede2013/mandelbrot.sh
    
    curl --insecure -Os https://raw.github.com/saga-project/BigJob/develop-prod/examples/xsede2013/mandelbrot.py
  3. Take a look at the full example code on GitHub.

  4. Create a new file in your home directory, copy & paste the code into it and save it, e.g., as bigjob_mandebrot.py.

Execution

Execute the Python script:

python bigjob_mandelbrot.py

The output will look something like this:

* Submitted task 'cu-6f26b08c-ee05-11e2-9309-005056a13723' to sagatut@stampede.tacc.utexas.edu
* Submitted task 'cu-6f3cff54-ee05-11e2-9309-005056a13723' to sagatut@stampede.tacc.utexas.edu
[...]
* Submitted task 'cu-706628ec-ee05-11e2-9309-005056a13723' to sagatut@stampede.tacc.utexas.edu
Waiting for tasks to finish...
* Copying sftp://sagatut@stampede.tacc.utexas.edu//home1/02554/sagatut/XSEDETutorial/tutorial00/example5/tile_x0_y0.gif back to /home/tutorial-00
  • Copying sftp://sagatut@stampede.tacc.utexas.edu//home1/02554/sagatut/XSEDETutorial/tutorial-00/example5/tile_x0_y1.gif back to /home/tutorial-00 [...]
  • Copying sftp://sagatut@stampede.tacc.utexas.edu//home1/02554/sagatut/XSEDETutorial/tutorial-00/example5/tile_x3_y3.gif back to /home/tutorial-00
  • Stitching together the whole fractal: mandelbrot_full.gif Terminating BigJob...

Discussion

TODO: Compare and contrast with saga-python mandelbrot

Clone this wiki locally