|
1 | 1 | .. _instances-label: |
2 | 2 |
|
3 | | -Analyzing Instances |
4 | | -=================== |
| 3 | +WfInstances: Workflow Instances |
| 4 | +=============================== |
5 | 5 |
|
6 | 6 | Workflow execution instances have been widely used to profile and characterize |
7 | 7 | workflow executions, and to build distributions of workflow execution behaviors, |
@@ -50,6 +50,122 @@ methods for interacting with the workflow instance, including: |
50 | 50 | WfChef) for :ref:`generating-workflows-recipe-label`, they can also be used |
51 | 51 | in a standalone manner. |
52 | 52 |
|
| 53 | +Parsing Workflow Execution Logs |
| 54 | +------------------------------- |
| 55 | + |
| 56 | +The most common way for obtaining **workflow instances** from actual workflow |
| 57 | +executions is to parse execution logs. As part of the WfCommons project, we |
| 58 | +are constantly developing parsers for commonly used workflow management systems. |
| 59 | +The parsers provided in this Python package automatically scans execution logs |
| 60 | +to produce instances using :ref:`json-format-label`. |
| 61 | + |
| 62 | +Each parser class is derived from the abstract |
| 63 | +:class:`~wfcommons.wfinstances.logs.abstract_logs_parser.LogsParser` class. Thus, each |
| 64 | +parser provides a |
| 65 | +:meth:`~wfcommons.wfinstances.logs.abstract_logs_parser.LogsParser.build_workflow` |
| 66 | +method. |
| 67 | + |
| 68 | +Makeflow |
| 69 | +++++++++ |
| 70 | + |
| 71 | +`Makeflow <http://ccl.cse.nd.edu/software/makeflow/>`_ is a workflow system for |
| 72 | +executing large complex workflows on clusters, clouds, and grids. The Makeflow |
| 73 | +language is similar to traditional "Make", so if you can write a Makefile, then you |
| 74 | +can write a Makeflow. A workflow can be just a few commands chained together, or |
| 75 | +it can be a complex application consisting of thousands of tasks. It can have an |
| 76 | +arbitrary DAG structure and is not limited to specific patterns. Makeflow is used |
| 77 | +on a daily basis to execute complex scientific applications in fields such as data |
| 78 | +mining, high energy physics, image processing, and bioinformatics. It has run on |
| 79 | +campus clusters, the Open Science Grid, NSF XSEDE machines, NCSA Blue Waters, and |
| 80 | +Amazon Web Services. Makeflow logs provide time-stamped event instances from these |
| 81 | +executions. The following example shows the analysis of Makeflow execution logs, |
| 82 | +stored in a local folder (:code:`execution_dir`), for a workflow execution using the |
| 83 | +:class:`~wfcommons.wfinstances.logs.makeflow.MakeflowLogsParser` class: :: |
| 84 | + |
| 85 | + import pathlib |
| 86 | + from wfcommons.wfinstances import MakeflowLogsParser |
| 87 | + |
| 88 | + # creating the parser for the Makeflow workflow |
| 89 | + execution_dir = pathlib.Path('/path/to/makeflow/execution/dir/blast/chameleon-small-001/') |
| 90 | + resource_monitor_logs_dir = pathlib.Path('/path/to/makeflow/resource/monitor/logs/dir') |
| 91 | + parser = MakeflowLogsParser(execution_dir=execution_dir, |
| 92 | + resource_monitor_logs_dir=resource_monitor_logs_dir) |
| 93 | + |
| 94 | + # generating the workflow instance object |
| 95 | + workflow = parser.build_workflow('makeflow-workflow-test') |
| 96 | + |
| 97 | + # writing the workflow instance to a JSON file |
| 98 | + workflow_path = pathlib.Path('./makeflow-workflow.json') |
| 99 | + workflow.write_json(workflow_path) |
| 100 | + |
| 101 | +.. note:: |
| 102 | + The :class:`~wfcommons.wfinstances.logs.makeflow.MakeflowLogsParser` class requires |
| 103 | + that Makeflow workflows to run with the |
| 104 | + `Resource Monitor <https://cctools.readthedocs.io/en/latest/resource_monitor/>`_ |
| 105 | + tool (e.g., execute the workflow using the :code:`--monitor=logs`). |
| 106 | + |
| 107 | +Nextflow |
| 108 | +++++++++ |
| 109 | + |
| 110 | +`Nextflow <https://nextflow.io>`_ is a reactive workflow framework and a programming DSL |
| 111 | +that eases the writing of data-intensive computational pipelines. It is designed around |
| 112 | +the idea that the Linux platform is the lingua franca of data science. Linux provides |
| 113 | +many simple but powerful command-line and scripting tools that, when chained together, |
| 114 | +facilitate complex data manipulations. Nextflow extends this approach, adding the ability |
| 115 | +to define complex program interactions and a high-level parallel computational environment |
| 116 | +based on the dataflow programming model. The following example shows the analysis of |
| 117 | +Nextflow execution logs, stored in a local folder (:code:`execution_dir`), for a workflow |
| 118 | +execution using the :class:`~wfcommons.wfinstances.logs.nextflow.NextflowLogsParser` class: :: |
| 119 | + |
| 120 | + import pathlib |
| 121 | + from wfcommons.wfinstances import NextflowLogsParser |
| 122 | + |
| 123 | + # creating the parser for the Nextflow workflow |
| 124 | + execution_dir = pathlib.Path('/path/to/nextflow/execution/dir/') |
| 125 | + parser = NextflowLogsParser(execution_dir=execution_dir) |
| 126 | + |
| 127 | + # generating the workflow instance object |
| 128 | + workflow = parser.build_workflow('nextflow-workflow-test') |
| 129 | + |
| 130 | + # writing the workflow instance to a JSON file |
| 131 | + workflow_path = pathlib.Path('./nextflow-workflow.json') |
| 132 | + workflow.write_json(workflow_path) |
| 133 | + |
| 134 | +.. note:: |
| 135 | + The :class:`~wfcommons.wfinstances.logs.nextflow.NextflowLogsParser` class assumes |
| 136 | + that workflow executions will produce an :code:`execution_report_*.html` and an |
| 137 | + :code:`execution_timeline_*.html` files. |
| 138 | + |
| 139 | +Pegasus |
| 140 | ++++++++ |
| 141 | + |
| 142 | +`Pegasus <http://pegasus.isi.edu>`_ is being used in production to execute workflows |
| 143 | +for dozens of high-profile applications in a wide range of scientific domains. Pegasus |
| 144 | +provides the necessary abstractions for scientists to create workflows and allows for |
| 145 | +transparent execution of these workflows on a range of compute platforms including |
| 146 | +clusters, clouds, and national cyberinfrastructures. Workflow execution with Pegasus |
| 147 | +includes data management, monitoring, and failure handling, and is managed by HTCondor |
| 148 | +DAGMan. Individual workflow tasks are managed by a workload management framework, |
| 149 | +HTCondor, which supervises task executions on local and remote resources. Pegasus |
| 150 | +logs provide time-stamped event instances from these executions. The following example shows |
| 151 | +the analysis of Pegasus execution logs, stored in a local folder (:code:`submit_dir`), for a |
| 152 | +workflow execution using the :class:`~wfcommons.wfinstances.logs.pegasus.PegasusLogsParser` |
| 153 | +class: :: |
| 154 | + |
| 155 | + import pathlib |
| 156 | + from wfcommons.wfinstances import PegasusLogsParser |
| 157 | + |
| 158 | + # creating the parser for the Pegasus workflow |
| 159 | + submit_dir = pathlib.Path('/path/to/pegasus/submit/dir/seismology/chameleon-100p-001/') |
| 160 | + parser = PegasusLogsParser(submit_dir=submit_dir) |
| 161 | + |
| 162 | + # generating the workflow instance object |
| 163 | + workflow = parser.build_workflow('pegasus-workflow-test') |
| 164 | + |
| 165 | + # writing the workflow instance to a JSON file |
| 166 | + workflow_path = pathlib.Path('./pegasus-workflow.json') |
| 167 | + workflow.write_json(workflow_path) |
| 168 | + |
53 | 169 | The Instance Analyzer |
54 | 170 | --------------------- |
55 | 171 |
|
|
0 commit comments