Skip to content

Commit 4cddfbd

Browse files
authored
Merge pull request #30 from WorkflowConversion/cwl_support
Added CWL Support
2 parents 7d1d48c + 0478aa9 commit 4cddfbd

15 files changed

Lines changed: 1153 additions & 909 deletions

File tree

README.md

Lines changed: 160 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,184 @@
11
# CTDConverter
2-
32
Given one or more CTD files, `CTD2Converter` generates the needed wrappers to include them in workflow engines, such as Galaxy and CWL.
43

54
## Dependencies
5+
`CTDConverter` has the following python dependencies:
66

7-
`CTDConverter` relies on [CTDopts]. The dependencies of each of the converters are as follows:
7+
- [CTDopts]
8+
- `lxml`
9+
- `ruamel.yaml`
810

9-
### Galaxy Converter
11+
### Installing Dependencies
12+
We recommend the use of `conda` to manage all dependencies. If you're not sure what `conda` is, make sure to read the [using-conda](conda documentation).
1013

11-
- Generation of Galaxy ToolConfig files relies on `lxml` to generate nice-looking XML files.
14+
The easiest way to get you started with CTD conversion is to create a `conda` environment on which you'll install all dependencies. Using environments in `conda` allows you to have parallel, independent python environments, thus avoiding conflicts between libraries. If you haven't installed `conda`, check [conda-install](conda's installation guide).
1215

13-
## Installing Dependencies
14-
You can install the [CTDopts] and `lxml` modules via `conda`, like so:
16+
Once you've installed `conda`, create an environment named `ctd-converter`, like so:
1517

1618
```sh
17-
$ conda install lxml
18-
$ conda install -c workflowconversion ctdopts
19+
$ conda create --name ctd-converter
1920
```
2021

21-
Note that the [CTDopts] module is available on the `workflowconversion` channel.
22+
You will now need to *activate* the environment by executing the following command:
2223

23-
Of course, you can just download [CTDopts] and make it available through your `PYTHONPATH` environment variable. To get more information about how to install python modules, visit: https://docs.python.org/2/install/.
24+
```sh
25+
$ source activate ctd-converter
26+
```
2427

28+
Install the required dependencies as follows (the order of execution **is actually important**, due to transitive dependencies):
2529

26-
## How to install CTDConverter
30+
```sh
31+
$ conda install --channel workflowconversion ctdopts
32+
$ conda install lxml
33+
$ conda install --channel conda-forge ruamel.yaml
34+
$ conda install libxml2=2.9.2
35+
```
36+
37+
`lxml` depends on `libxml2`. When you install `lxml` you'll get the latest version of `libxml2` (2.9.4) by default. You would usually want the latest version, but there is, however, a bug in validating XML files against a schema in this version of `libxml2`.
38+
39+
If you require validation of input CTDs against a schema (which we recommend), you will need to downgrade to the latest known version of `libxml2` that works, namely, 2.9.2.
2740

28-
1. Download the source code from https://github.com/genericworkflownodes/CTDConverter.
41+
You could just download dependencies manually and make them available through your `PYTHONPATH` environment variable, if you're into that. To get more information about how to install python modules without using `conda`, visit: https://docs.python.org/2/install/.
42+
43+
## How to install `CTDConverter`
44+
`CTDConverter` is not a python module, rather, a series of scripts, so installing it is as easy as downloading the source code from https://github.com/genericworkflownodes/CTDConverter. Once you've installed all dependencies, downloaded `CTDConverter` and activated your `conda` environment, you're good to go.
2945

3046
## Usage
47+
The first thing that you need to tell `CTDConverter` is the output format of the converted wrappers. `CTDConverter` supports conversion of CTDs into Galaxy and CWL. Invoking it is as simple as follows:
48+
49+
$ python convert.py [FORMAT] [ADDITIONAL_PARAMETERS ...]
50+
51+
Here `[FORMAT]` can be any of the supported formats (i.e., `cwl`, `galaxy`). `CTDConverter` offers a series of format-specific scripts and we've designed these scripts to behave *somewhat* similarly. All converter scripts have the same core functionality, that is, read CTD files, parse them using [CTDopts], validate against a schema, etc. Of course, each converter script might add extra functionality that is not present in other engines. Only the Galaxy converter script supports generation of a `tool_conf.xml` file, for instance.
3152

32-
Check the detailed documentation for each of the converters:
53+
The following sections in this file describe the parameters that all converter scripts share.
54+
55+
Please refer to the detailed documentation for each of the converters for more information:
3356

3457
- [Generation of Galaxy ToolConfig files](galaxy/README.md)
58+
- [Generation of CWL task files](cwl/README.md)
59+
60+
## Fail Policy while processing several Files
61+
`CTDConverter` can parse several CTDs and convert them. However, the process will be interrupted and an error code will be returned at the first encountered error (e.g., a CTD is not valid, there are missing support files, etc.).
62+
63+
## Converting a single CTD
64+
In its simplest form, the converter takes an input CTD file and generates an output file. The following usage of `CTDConverter`:
65+
66+
$ python convert.py [FORMAT] -i /data/sample_input.ctd -o /data/sample_output.xml
67+
68+
will parse `/data/sample_input.ctd` and generate an appropriate converted file under `/data/sample_output.xml`. The generated file can be added to your workflow engine as usual.
69+
70+
## Converting several CTDs
71+
When converting several CTDs, the expected value for the `-o`/`--output` parameter is a folder. For example:
72+
73+
$ python convert.py [FORMAT] -i /data/ctds/one.ctd /data/ctds/two.ctd -o /data/converted-files
74+
75+
Will convert `/data/ctds/one.ctd` into `/data/converted-files/one.[EXT]` and `/data/ctds/two.ctd` into `/data/converted-files/two.[EXT]`. Each converter has a preferred extension, here shown as a variable (`[EXT]`). Galaxy prefers `xml`, while CWL prefers `cwl`.
76+
77+
You can use wildcard expansion, as supported by most modern operating systems:
78+
79+
$ python convert.py [FORMAT] -i /data/ctds/*.ctd -o /data/converted-files
80+
81+
## Common Parameters
82+
### Input File(s)
83+
* Purpose: Provide input CTD file(s) to convert.
84+
* Short/long version: `-i` / `--input`
85+
* Required: yes.
86+
* Taken values: a list of input CTD files.
87+
88+
Examples:
89+
90+
Any of the following invocations will convert `/data/input_one.ctd` and `/data/input_two.ctd`:
91+
92+
$ python convert.py [FORMAT] -i /data/input_one.ctd -i /data/input_two.ctd -o /data/generated
93+
$ python convert.py [FORMAT] -i /data/input_one.ctd /data/input_two.ctd -o /data/generated
94+
$ python convert.py [FORMAT] --input /data/input_one.ctd /data/input_two.ctd -o /data/generated
95+
$ python convert.py [FORMAT] --input /data/input_one.ctd --input /data/input_two.ctd -o /data/generated
96+
97+
The following invocation will convert `/data/input.ctd` into `/data/output.xml`:
98+
99+
$ python convert.py [FORMAT] -i /data/input.ctd -o /data/output.xml
100+
101+
Of course, you can also use wildcards, which will be automatically expanded by any modern operating system. This is extremely useful if you want to convert several files at a time. Let's assume that the folder `/data/ctds` contains three files: `input_one.ctd`, `input_two.ctd` and `input_three.ctd`. The following two invocations will produce the same output in the `/data/wrappers` folder:
102+
103+
$ python convert.py [FORMAT] -i /data/input_one.ctd /data/input_two.ctd /data/input_three.ctd -o /data/wrappers
104+
$ python convert.py [FORMAT] -i /data/*.ctd -o /data/wrappers
105+
106+
### Output Destination
107+
* Purpose: Provide output destination for the converted wrapper files.
108+
* Short/long version: `-o` / `--output-destination`
109+
* Required: yes.
110+
* Taken values: if a single input file is given, then a single output file is expected. If multiple input files are given, then an existent folder in which all converted CTDs will be written is expected.
111+
112+
Examples:
113+
114+
A single input is given, and the output will be generated into `/data/output.xml`:
115+
116+
$ python convert.py [FORMAT] -i /data/input.ctd -o /data/output.xml
117+
118+
Several inputs are given. The output is the already existent folder, `/data/wrappers`, and at the end of the operation, the files `/data/wrappers/input_one.[EXT]` and `/data/wrappers/input_two.[EXT]` will be generated:
119+
120+
$ python convert.py [FORMAT] -i /data/ctds/input_one.ctd /data/ctds/input_two.ctd -o /data/stubs
121+
122+
Please note that the output file name is **not** taken from the name of the input file, rather from the name of the tool, that is, from the `name` attribute in the `<tool>` element in its corresponding CTD. By convention, the name of the CTD file and the name of the tool match.
123+
124+
### Blacklisting Parameters
125+
* Purpose: Some parameters present in the CTD are not to be exposed on the output files. Think of parameters such as `--help`, `--debug` that might won't make much sense to be exposed to final users in a workflow management system.
126+
* Short/long version: `-b` / `--blacklist-parameters`
127+
* Required: no.
128+
* Taken values: A list of parameters to be blacklisted.
129+
130+
Example:
131+
132+
$ pythonconvert.py [FORMAT] ... -b h help quiet
133+
134+
In this case, `CTDConverter` will not process any of the parameters named `h`, `help`, or `quiet`, that is, they will not appear in the generated output files.
135+
136+
### Schema Validation
137+
* Purpose: Provide validation of input CTDs against a schema file (i.e, a XSD file).
138+
* Short/long version: `-V` / `--validation-schema`
139+
* Required: no.
140+
* Taken values: location of the schema file (e.g., CTD.xsd).
141+
142+
CTDs can be validated against a schema. The master version of the schema can be found on [CTDSchema].
143+
144+
If a schema is provided, all input CTDs will be validated against it.
145+
146+
**NOTE:** Please make sure to read the [section on issues with schema validation](#issues-with-libxml2-and-schema-validation) if you require validation of CTDs against a schema.
147+
148+
### Hardcoding Parameters
149+
* Purpose: Fixing the value of a parameter and hide it from the end user.
150+
* Short/long version: `-p` / `--hardcoded-parameters`
151+
* Required: no.
152+
* Taken values: The path of a file containing the mapping between parameter names and hardcoded values to use.
153+
154+
It is sometimes required that parameters are hidden from the end user in workflow systems and that they take a predetermined, fixed value. Allowing end users to control parameters similar to `--verbosity`, `--threads`, etc., might create more problems than solving them. For this purpose, the parameter `-p`/`--hardcoded-parameters` takes the path of a file that contains up to three columns separated by whitespace that map parameter names to the hardcoded value. The first column contains the name of the parameter and the second one the hardcoded value. Only the first two columns are mandatory.
155+
156+
If the parameter is to be hardcoded only for certain tools, a third column containing a comma separated list of tool names for which the hardcoding will apply can be added.
157+
158+
Lines starting with `#` will be ignored. The following is an example of a valid file:
159+
160+
# Parameter name # Value # Tool(s)
161+
threads 8
162+
mode quiet
163+
xtandem_executable xtandem XTandemAdapter
164+
verbosity high Foo, Bar
165+
166+
The parameters `threads` and `mode` will be set to `8` and `quiet`, respectively, for all parsed CTDs. However, the `xtandem_executable` parameter will be set to `xtandem` only for the `XTandemAdapter` tool. Similarly, the parameter `verbosity` will be set to `high` for the `Foo` and `Bar` tools only.
167+
168+
### Providing a default executable Path
169+
* Purpose: Help workflow engines locate tools by providing a path.
170+
* Short/long version: `-x` / `--default-executable-path`
171+
* Required: no.
172+
* Taken values: The default executable path of the tools in the target workflow engine.
173+
174+
CTDs can contain an `<executablePath>` element that will be used when executing the tool binary. If this element is missing, the value provided by this parameter will be used as a prefix when building the appropriate sections in the output files.
175+
176+
The following invocation of the converter will use `/opt/suite/bin` as a prefix when providing the executable path in the output files for any input CTD that lacks the `<executablePath>` section:
35177

178+
$ python convert.py [FORMAT] -x /opt/suite/bin ...
179+
36180

37181
[CTDopts]: https://github.com/genericworkflownodes/CTDopts
182+
[CTDSchema]: https://github.com/WorkflowConversion/CTDSchema
183+
[conda-install]: https://conda.io/docs/install/quick.html
184+
[using-conda]: https://conda.io/docs/using/envs.html

common/__init__.py

Whitespace-only changes.

common/exceptions.py

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
#!/usr/bin/env python
2+
# encoding: utf-8
3+
4+
"""
5+
@author: delagarza
6+
"""
7+
8+
from CTDopts.CTDopts import ModelError
9+
10+
11+
class CLIError(Exception):
12+
# Generic exception to raise and log different fatal errors.
13+
def __init__(self, msg):
14+
super(CLIError).__init__(type(self))
15+
self.msg = "E: %s" % msg
16+
17+
def __str__(self):
18+
return self.msg
19+
20+
def __unicode__(self):
21+
return self.msg
22+
23+
24+
class InvalidModelException(ModelError):
25+
def __init__(self, message):
26+
super(InvalidModelException, self).__init__()
27+
self.message = message
28+
29+
def __str__(self):
30+
return self.message
31+
32+
def __repr__(self):
33+
return self.message
34+
35+
36+
class ApplicationException(Exception):
37+
def __init__(self, msg):
38+
super(ApplicationException).__init__(type(self))
39+
self.msg = msg
40+
41+
def __str__(self):
42+
return self.msg
43+
44+
def __unicode__(self):
45+
return self.msg

common/logger.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
#!/usr/bin/env python
2+
# encoding: utf-8
3+
import sys
4+
5+
MESSAGE_INDENTATION_INCREMENT = 2
6+
7+
8+
def _get_indented_text(text, indentation_level):
9+
return ("%(indentation)s%(text)s" %
10+
{"indentation": " " * (MESSAGE_INDENTATION_INCREMENT * indentation_level),
11+
"text": text})
12+
13+
14+
def warning(warning_text, indentation_level=0):
15+
sys.stdout.write(_get_indented_text("WARNING: %s\n" % warning_text, indentation_level))
16+
17+
18+
def error(error_text, indentation_level=0):
19+
sys.stderr.write(_get_indented_text("ERROR: %s\n" % error_text, indentation_level))
20+
21+
22+
def info(info_text, indentation_level=0):
23+
sys.stdout.write(_get_indented_text("INFO: %s\n" % info_text, indentation_level))

0 commit comments

Comments
 (0)