Skip to content

Commit 5ddcc91

Browse files
Merge branch 'master' of https://github.com/wfcommons/wfcommons
2 parents 58112e0 + 4bfd66c commit 5ddcc91

25 files changed

Lines changed: 181 additions & 222 deletions

File tree

docs/source/analyzing_instances.rst

Lines changed: 23 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,39 @@
1-
.. _traces-label:
1+
.. _instances-label:
22

33
Analyzing Instances
4-
================
4+
===================
55

66
Workflow execution instances have been widely used to profile and characterize
77
workflow executions, and to build distributions of workflow execution behaviors,
88
which are used to evaluate methods and techniques in simulation or in real
99
conditions.
1010

11-
The first axis of the WfCommons project targets the analysis of actual workflow
12-
execution instances (i.e., the workflow execution profile data and characterizations)
13-
in order to build **recipes** of workflow applications. These recipes contain
14-
the necessary information for generating synthetic, yet realistic, workflow
15-
instances that resemble the structure and distribution of the original workflow
16-
executions.
11+
The WfCommons project targets the analysis of actual workflow execution instances
12+
(i.e., the workflow execution profile data and characterizations)
13+
in order to build :ref:`workflow-recipe-label` of workflow applications.
14+
These recipes contain the necessary information for generating synthetic, yet
15+
realistic, workflow instances that resemble the structure and distribution of
16+
the original workflow executions.
1717

1818
A `list of workflow execution instances <https://wfcommons.org/instances>`_
1919
that are compatible with :ref:`json-format-label` is kept constantly updated
2020
in our project website.
2121

22-
Workflow Execution Instances
23-
-------------------------
22+
.. _wfinstances-label:
23+
24+
WfInstances
25+
-----------
2426

2527
A workflow execution instance represents an actual execution of a scientific
2628
workflow on a distributed platform (e.g., clouds, grids, HPC, etc.). In the
2729
WfCommons project, an instance is represented in a JSON file following the
28-
schema described in :ref:`json-format-label` section. This Python package
30+
schema described in :ref:`json-format-label`. This Python package
2931
provides an *instance loader* tool for importing workflow execution instances
3032
for analysis. For instance, the code snippet below shows how an instance can
3133
be loaded using the :class:`~wfcommons.trace.trace.Trace` class: ::
3234

3335
from wfcommons import Trace
34-
trace = Trace(input_trace='/path/to/trace/file.json')
36+
trace = Trace(input_trace='/path/to/instance/file.json')
3537

3638
The :class:`~wfcommons.trace.trace.Trace` class provides a number of
3739
methods for interacting with the workflow instance, including:
@@ -41,16 +43,21 @@ methods for interacting with the workflow instance, including:
4143
- :meth:`~wfcommons.trace.trace.Trace.roots`: gets the roots of the workflow (i.e., the tasks without any predecessors).
4244
- :meth:`~wfcommons.trace.trace.Trace.write_dot`: writes a dot file of the instance.
4345

46+
.. note::
47+
Although the analysis methods are inherently used by WfCommons (specifically
48+
WfChef) for :ref:`generating-workflows-recipe-label`, they can also be used
49+
in a standalone manner.
50+
4451
The Instance Analyzer
45-
------------------
52+
---------------------
4653

4754
The :class:`~wfcommons.trace.trace_analyzer.TraceAnalyzer` class provides
4855
a number of tools for analyzing collection of workflow execution instances. The
4956
goal of the :class:`~wfcommons.trace.trace_analyzer.TraceAnalyzer` is to
5057
perform analyzes of one or multiple workflow execution instances, and build
5158
summaries of the analyzes per workflow' task type prefix.
5259

53-
.. note::
60+
.. warning::
5461

5562
Although any workflow execution instance represented as a
5663
:class:`~wfcommons.trace.trace.Trace` object (i.e., compatible with
@@ -89,7 +96,7 @@ summary showing the best fit probability distribution for runtime of the
8996
...
9097
}
9198

92-
Workflow analysis summaries can then be used to develop :ref:`workflow-recipe-label`,
99+
Workflow analysis summaries are used by WfChef to develop :ref:`workflow-recipe-label`,
93100
in which themselves are used to :ref:`generate realistic synthetic workflow instances
94101
<generating-workflows-label>`.
95102

@@ -113,7 +120,7 @@ plots (runtime, and input and output files) into the :code:`fits` folder using
113120
from os.path import isfile, join
114121

115122
# obtaining list of instance files in the folder
116-
INSTANCES_PATH = "/Path/to/some/instance/folder/"
123+
INSTANCES_PATH = "/path/to/some/instance/folder/"
117124
instance_files = [f for f in listdir(INSTANCES_PATH) if isfile(join(INSTANCES_PATH, f))]
118125

119126
# creating the instance analyzer object

docs/source/generating_workflows_recipe.rst

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,18 @@
11
.. _generating-workflows-recipe-label:
22

3-
Generating Workflows Recipe
3+
Generating Workflows Recipes
44
============================
55

66
**WfChef** is the WfCommons component that automates the construction of
77
synthetic workflow generators for any given workflow application. The input
88
to this component is a set of real workflow instances described in the
9-
*WfFormat* (e.g., instances available in **WfInstances**).
10-
WfChef automatically analyzes the real workflow instances for
9+
:ref:`json-format-label` (e.g., instances available in **WfInstances**).
10+
WfChef automatically analyzes a set of real workflow instances for
1111
two purposes. First, it discovers workflow subgraphs that represent
1212
fundamental task dependency patterns. Second, it derives
13-
statistical models of the workflow tasks' performance characteristics (more details :ref:`.. _traces-label:`).
14-
WfChef then outputs a **recipe** that will be used by **WfGen**
13+
statistical models of the workflow tasks' performance characteristics
14+
(see :ref:`instances-label`).
15+
WfChef then outputs a **recipe** that will be used by **WfGen**
1516
(see :ref:`generating-workflows-label`) to generate realistic synthetic
1617
workflow instances with any arbitrary number of tasks.
1718

@@ -20,12 +21,12 @@ workflow instances with any arbitrary number of tasks.
2021
Workflow Recipes
2122
----------------
2223

23-
A **workfflow recipe** is a data structure that encodes the discovered pattern occurrences
24+
A **workflow recipe** is a data structure that encodes the discovered pattern occurrences
2425
as well as the statistical models of workflow task characteristics.
2526
The WfCommons package provides a number of *workflow recipes* for generating realistic
2627
synthetic workflow instances.
2728

28-
All workflow recipes provide a common method, :code:`from_num_tasks`, that defines the upper
29+
All workflow recipes provide a common method, :code:`from_num_tasks`, that defines the lower
2930
bound for the total number of tasks in the synthetic workflow.
3031

3132

docs/source/images/wfcommons.png

22 KB
Loading

docs/source/index.rst

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,8 @@
55

66
`WfCommons <https://wfcommons.org>`__ is a community framework for
77
enabling scientific workflow research and development. This Python package
8-
provides a collection of tools for:
9-
10-
- Analyzing instances of actual workflow executions;
11-
- Producing recipes structures for creating workflow recipes for workflow
12-
generation; and
13-
- Generating synthetic realistic workflow instances.
8+
provides methods for analyzing instances, deriving recipes, and generating
9+
representative synthetic workflow instances.
1410

1511
.. figure:: images/wfcommons.png
1612
:scale: 90 %
@@ -46,6 +42,7 @@ support@wfcommons.org.
4642
introduction.rst
4743
parsing_logs.rst
4844
analyzing_instances.rst
45+
generating_workflows_recipe.rst
4946
generating_workflows.rst
5047

5148
.. toctree::

docs/source/introduction.rst

Lines changed: 43 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,60 +1,67 @@
11
The WfCommons Project
22
=======================
33

4-
The `WfCommons project <https://wfcommons.org>`_ is a community framework
4+
The `WfCommons project <https://wfcommons.org>`_ is an open source framework
55
for enabling scientific workflow research and development by providing foundational
66
tools for analyzing workflow execution instances, and generating synthetic, yet
77
realistic, workflow instances that can be used to develop new techniques, algorithms
88
and systems that can overcome the challenges of efficient and robust execution of
9-
ever larger workflows on increasingly complex distributed infrastructures. The
10-
figure below shows an overview of the workflow research life cycle process that
11-
integrates the three axis of the WfCommons project:
9+
ever larger workflows on increasingly complex distributed infrastructures.
10+
11+
The figure below shows an overview of the research and development life cycle that
12+
integrates the four major components WfCommons: (i) workflow execution instances
13+
(**WfInstances**), (ii) workflow recipes (**WfChef**), (iii) workflow generator
14+
(**WfGen**), and (iv) workflow simulator (**WfSim**).
1215

1316
.. figure:: images/wfcommons.png
1417
:align: center
1518

1619
The WfCommons conceptual architecture.
1720

18-
The *first axis* (**Workflow Instances**) of the WfCommons project targets the
19-
collection and curation of open access production workflow executions from
20-
various scientific applications shared in a common instance format (i.e.,
21-
:ref:`json-format-label`). We keep a `list of workflow execution instances
21+
**WfInstances.**
22+
The WfInstances component provides a collection and curation of open-access
23+
production workflow instances from various scientific applications, all made
24+
available using a common format (i.e., :ref:`json-format-label`).
25+
A workflow instance is built based on logs of an actual execution of a scientific
26+
workflow on a distributed platform (e.g., clouds, grids, clusters) using a
27+
workflow system. We keep a `list of workflow execution instances
2228
<https://wfcommons.org/instances>`_ in our project website.
2329

24-
The *second axis* (**Workflow Generator**) of the WfCommons project targets
25-
the generation of realistic synthetic workflow instances based on workflow execution
26-
profiles extracted from execution instances. We are constantly seeking for additional
27-
workflow execution instances for refining or developing new workflow recipes for
28-
the WfCommons's workflow generator.
29-
30-
The *third axis* (**Workflow Simulator**) of the WfCommons project fosters the
31-
use of simulation for the development, evaluation, and verification of scheduling
32-
and resource provisioning algorithms (e.g., multi-objective function optimization,
33-
etc.), evaluation of current and emerging computing platforms (e.g., clouds, IoT,
34-
extreme scale, etc.), among others. We keep a `list of open source workflow
35-
management systems simulators and simulation frameworks
36-
<https://wfcommons.org/simulation>`_ that provide support for the WfCommons
37-
JSON format in our project website.
30+
**WfChef.**
31+
The WfChef component automates the construction of synthetic workflow generators
32+
(recipes) for any given workflow application. The input to this component is a set
33+
of real workflow instances described in the :ref:`json-format-label` (e.g.,
34+
instances available in WfInstances).
3835

39-
This Python package provides a collection of tools for:
36+
**WfGen.**
37+
The WfGen component targets the generation of realistic synthetic workflow instances.
38+
WfGen takes as input a workflow recipe produced by WfChef for a particular application
39+
and a desired number of tasks. WfGen then automatically generates synthetic, yet
40+
realistic, randomized workflow instances with (approximately) the desired number of
41+
tasks.
4042

41-
- Analyzing instances of actual workflow executions;
42-
- Producing recipes structures for creating workflow recipes for workflow
43-
generation; and
44-
- Generating synthetic realistic workflow instances.
43+
**WfSim.**
44+
The WfCommons project fosters the use of simulation for the development, evaluation,
45+
and verification of scheduling and resource provisioning algorithms (e.g.,
46+
multi-objective function optimization, etc.), evaluation of current and emerging
47+
computing platforms (e.g., clouds, IoT, extreme scale, etc.), among others.
48+
We do not develop simulators as part of the WfCommons project. Instead, the WfSim
49+
component catalogs open source WMS simulators that provide support for
50+
:ref:`json-format-label`. We keep a `list of open source workflow
51+
management systems simulators and simulation frameworks
52+
<https://wfcommons.org/simulation>`_ on our project website.
4553

4654
.. _json-format-label:
4755

48-
The WfCommons JSON Format
49-
---------------------------
56+
WfFormat
57+
--------
5058

5159
The WfCommons project uses a common format for representing workflow execution
52-
instances and generated synthetic workflows instances, so that workflow simulators and
53-
simulation frameworks (that provide support for WfCommons format) can use
54-
such instances interchangeably. This common format uses a JSON specification
55-
available in the
56-
`WfCommons JSON schema GitHub <https://github.com/wfcommons/workflow-schema>`_
60+
instances and generated synthetic workflows instances. Workflow simulators and
61+
simulation frameworks that support WfFormat can then use both types of instances
62+
interchangeably. WfFormat uses a JSON specification available in the
63+
`WfFormat Schema GitHub <https://github.com/wfcommons/workflow-schema>`_
5764
repository. The current version of the WfCommons Python package uses the schema
5865
version :code:`1.1`. The schema GitHub repository provides detailed explanation
59-
of the WfCommons JSON format (including required fields), and also a validator
60-
script for verifying the compatibility of instances.
66+
of WfFormat (including required fields), and also a validator script for verifying
67+
the compatibility of instances.

docs/source/parsing_logs.rst

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,11 @@
33
Parsing Workflow Execution Logs
44
===============================
55

6-
The most common way for obtaining instances from actual workflow executions is to parse
7-
execution logs. As part of the WfCommons project, we are constantly developing
8-
parsers for commonly used workflow management systems.
6+
The most common way for obtaining **workflow instances** from actual workflow
7+
executions is to parse execution logs. As part of the WfCommons project, we
8+
are constantly developing parsers for commonly used workflow management systems.
9+
The parsers provided in this Python package automatically scans execution logs
10+
to produce instances using :ref:`json-format-label`.
911

1012
Each parser class is derived from the abstract
1113
:class:`~wfcommons.trace.logs.abstract_logs_parser.LogsParser` class. Thus, each
@@ -18,7 +20,7 @@ Makeflow
1820

1921
`Makeflow <http://ccl.cse.nd.edu/software/makeflow/>`_ is a workflow system for
2022
executing large complex workflows on clusters, clouds, and grids. The Makeflow
21-
language is similar to traditional Make, so if you can write a Makefile, then you
23+
language is similar to traditional "Make", so if you can write a Makefile, then you
2224
can write a Makeflow. A workflow can be just a few commands chained together, or
2325
it can be a complex application consisting of thousands of tasks. It can have an
2426
arbitrary DAG structure and is not limited to specific patterns. Makeflow is used

docs/source/user_api_generator.rst

Lines changed: 0 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -13,75 +13,3 @@ wfcommons.generator.generator
1313
:members:
1414
:undoc-members:
1515
:show-inheritance:
16-
17-
wfcommons.generator.workflow.blast\_recipe
18-
--------------------------------------------
19-
20-
.. automodule:: wfcommons.generator.workflow.blast_recipe
21-
:members:
22-
:undoc-members:
23-
:show-inheritance:
24-
25-
wfcommons.generator.workflow.bwa\_recipe
26-
------------------------------------------
27-
28-
.. automodule:: wfcommons.generator.workflow.bwa_recipe
29-
:members:
30-
:undoc-members:
31-
:show-inheritance:
32-
33-
wfcommons.generator.workflow.cycles\_recipe
34-
---------------------------------------------
35-
36-
.. automodule:: wfcommons.generator.workflow.cycles_recipe
37-
:members:
38-
:undoc-members:
39-
:show-inheritance:
40-
41-
wfcommons.generator.workflow.epigenomics\_recipe
42-
--------------------------------------------------
43-
44-
.. automodule:: wfcommons.generator.workflow.epigenomics_recipe
45-
:members:
46-
:undoc-members:
47-
:show-inheritance:
48-
49-
wfcommons.generator.workflow.genome\_recipe
50-
---------------------------------------------
51-
52-
.. automodule:: wfcommons.generator.workflow.genome_recipe
53-
:members:
54-
:undoc-members:
55-
:show-inheritance:
56-
57-
wfcommons.generator.workflow.montage\_recipe
58-
----------------------------------------------
59-
60-
.. automodule:: wfcommons.generator.workflow.montage_recipe
61-
:members:
62-
:undoc-members:
63-
:show-inheritance:
64-
65-
wfcommons.generator.workflow.seismology\_recipe
66-
-------------------------------------------------
67-
68-
.. automodule:: wfcommons.generator.workflow.seismology_recipe
69-
:members:
70-
:undoc-members:
71-
:show-inheritance:
72-
73-
wfcommons.generator.workflow.soykb\_recipe
74-
--------------------------------------------
75-
76-
.. automodule:: wfcommons.generator.workflow.soykb_recipe
77-
:members:
78-
:undoc-members:
79-
:show-inheritance:
80-
81-
wfcommons.generator.workflow.srasearch\_recipe
82-
------------------------------------------------
83-
84-
.. automodule:: wfcommons.generator.workflow.srasearch_recipe
85-
:members:
86-
:undoc-members:
87-
:show-inheritance:

requirements.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,5 @@ requests>=2.24.0
88
scipy>=1.5.2
99
setuptools>=49.3.1
1010
pyyaml>=5.3.1
11+
pandas>=1.2.4
12+
stringcase>=1.2.0

0 commit comments

Comments
 (0)