Skip to content

Commit d0fc313

Browse files
committed
[docs] Document AliECS workflow/task template language
1 parent ddfd32d commit d0fc313

1 file changed

Lines changed: 356 additions & 5 deletions

File tree

docs/handbook/configuration.md

Lines changed: 356 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,159 @@
11
# Workflow Configuration
22

3-
## Non-critical tasks
3+
## The AliECS workflow template language
4+
The AliECS workflow template language is a YAML-based language that allows
5+
the user to define the structure of a data taking and processing activity.
6+
The language is used to define the tasks that are part of the activity,
7+
the data flow relationships between them, the behaviour of integrated services
8+
with respect to the data taking state machine, and the conditions that
9+
trigger these integrated service actions.
10+
The language is designed to be human-readable and easy to understand,
11+
while still being powerful enough to express complex workflows.
412

5-
Any task in a workflow can be declared as non-critical. A non-critical task is a task that doesn't trigger a global environment ERROR in case of failure. The state of a non-critical task doesn't affect the environment state in any way.
13+
To instantiate a data taking activity, or environment, two kinds of files
14+
are needed:
15+
* workflow templates
16+
* task templates
617

7-
To declare a task as non-critical, a line has to be added in the task role block within a workflow template file. Specifically, in the task section of such a task role (usually after the `load` statement), the line to add is `critical: false`, like in the following example:
18+
Both kinds of files can be places in a git repository (by convention,
19+
they must be places in their own directories named `workflows` and
20+
`tasks` respectively), and AliECS can pull them directly from there.
21+
This allows for version control and collaboration on the workflow
22+
and task definitions.
23+
24+
See [the ControlWorkflows repository](https://github.com/AliceO2Group/ControlWorkflows/)
25+
for examples of workflow and task templates and their structure.
26+
Also see [its README](https://github.com/AliceO2Group/ControlWorkflows/blob/master/README.md)
27+
for information on specific variables and their meaning, as well as for
28+
the DPL subworkflow loading system.
29+
30+
## Workflow template structure
31+
32+
A workflow template is a YAML file that contains a tree structure whose nodes
33+
are called roles. This structure can be deeply nested, and each role can have
34+
a set of variables that define its behaviour and that of its child roles.
35+
36+
The root of the tree is the root role, which is the top level role in the
37+
workflow template.
38+
39+
All roles have a mandatory `name` attribute. The root role also has a
40+
`description`.
41+
42+
There are five kinds of roles in a workflow template:
43+
44+
- Task roles: These roles represent tasks that are part of the workflow.
45+
They must contain a `task` attribute.
46+
- Call roles: These roles represent calls to integrated services.
47+
They must contain a `call` attribute.
48+
- Aggregator roles: These roles represent aggregations of other roles.
49+
They must contain a `roles` attribute.
50+
- Iterator roles: These roles expand into multiple instances based on
51+
an iterator expression.
52+
They must contain a `for` attribute, as well as a `roles` attribute.
53+
Additionally, their `name` must be parametrized with the iterator variable
54+
specified in the `for` block.
55+
- Include roles: These roles include another workflow template as subtree.
56+
They must contain an `include` attribute.
57+
58+
Task, call and include roles may only appear as leaves in the tree,
59+
while aggregator and iterator roles may not be leaves, and instead act as
60+
containers of child roles.
61+
62+
All roles may have an `enabled` attribute, which is a boolean that
63+
determines whether the role is enabled or not. If a role is not enabled,
64+
it is excluded from the workflow along with its children.
65+
66+
All roles may also have `defaults` and `vars` attributes. Both `defaults`
67+
and `vars` are key-value maps. The `defaults` map is used to set default
68+
values, and values in `vars` override any `defaults` with the same key.
69+
Values set in `defaults` also act as defaults for child roles, and
70+
values set in `vars` also act as vars for child roles.
71+
User-provided parameters further override anything set in `defaults` or
72+
`vars`.
73+
74+
All roles may have one or more `constraints` expressions, which restrict
75+
the deployment of the role (or it child roles) to nodes that satisfy the
76+
constraints. The constraints are specified as a list of expressions that
77+
evaluate to true or false. The expressions are evaluated against the Mesos
78+
attributes set on the nodes in the cluster.
79+
80+
### Task roles
81+
82+
Task roles represent tasks that are part of the workflow. They must contain
83+
a `task` attribute, which contains a key that maps to a task template (i.e.
84+
a distinct YAML file that defines how to run that specific task).
85+
86+
There are two kinds of task roles: data flow task roles and hook task roles.
87+
88+
Data flow task roles represent tasks that are part of the data flow of the
89+
workflow. They usually contain attributes such as `bind` and `connect` that
90+
define the data flow relationships between tasks. Besides `load`, which
91+
references a task template, and `critical`, which determines whether the
92+
task is critical to the data taking activity, they do not contain other
93+
attributes under the `task` key.
94+
95+
```yaml
96+
- name: "stfb"
97+
enabled: "{{stfb_standalone == 'false'}}"
98+
vars:
99+
dd_discovery_stfb_id: stfb-{{ it }}-{{ uid.New() }}
100+
connect:
101+
- name: readout
102+
type: pull
103+
target: "{{ Up(2).Path }}.readout:readout"
104+
rateLogging: "{{ fmq_rate_logging }}"
105+
bind:
106+
- name: dpl-chan
107+
type: push
108+
rateLogging: "{{ fmq_rate_logging }}"
109+
transport: shmem
110+
addressing: ipc
111+
sndBufSize: "4"
112+
global: "readout-proxy-{{ it }}"
113+
task:
114+
load: stfbuilder
115+
```
116+
117+
Hook task roles represent tasks that are not part of the data flow, but
118+
instead are called at specific points in the environment state machine.
119+
They have a well-defined moment when they must start and finish (with
120+
respect to the environment state machine), and they are generally not
121+
long-running tasks. Like data flow task roles, they may be `critical`.
122+
They do not contain `bind` or `connect` attributes, and their `task`
123+
attribute, besides `load`, contains additional attributes that define
124+
the timing of the task. `trigger` is the moment when the task must start,
125+
and `timeout` is the maximum time the task is allowed to run. Optionally,
126+
`await` may be specified in addition to `trigger`, in which case the task
127+
must finish by `await`. If `await` is not specified, it defaults to the
128+
value of `trigger`, i.e. the task must start and finish within the same
129+
state machine moment.
130+
131+
For more information on the values of `trigger` and `await`, see below.
132+
133+
```yaml
134+
- name: fairmq-shmcleanup
135+
enabled: "{{fmq_initial_shm_cleanup_enabled == 'true'}}"
136+
vars:
137+
shell_command: "source /etc/profile.d/o2.sh && O2_PARTITION={{environment_id}} O2_ROLE={{it}} o2-aliecs-shmcleaner"
138+
user: root
139+
task:
140+
load: "shell-command"
141+
trigger: before_DEPLOY
142+
timeout: "{{ fmq_initial_shm_cleanup_timeout }}"
143+
critical: false
144+
```
145+
146+
#### Non-critical tasks
147+
148+
Any task in a workflow can be declared as non-critical. A non-critical
149+
task is a task that doesn't trigger a global environment ERROR in case
150+
of failure. The state of a non-critical task doesn't affect the
151+
environment state in any way.
152+
153+
To declare a task as non-critical, a line has to be added in the task
154+
role block within a workflow template file. Specifically, in the task
155+
section of such a task role (usually after the `load` statement), the
156+
line to add is `critical: false`, like in the following example:
8157

9158
```yaml
10159
roles:
@@ -18,7 +167,7 @@ roles:
18167

19168
In the absence of an explicit `critical` trait for a given task role, the assumed default value is `critical: true`.
20169

21-
## State machine callbacks
170+
#### State machine callbacks moments
22171

23172
The underlying state machine library allows us to add callbacks upon entering and leaving states as well as before and after events (transitions).
24173
This is the order of callback execution upon a state transition:
@@ -33,7 +182,33 @@ This is the order of callback execution upon a state transition:
33182

34183
Callback execution is further refined with integer indexes, with the syntax `±index`, e.g. `before_CONFIGURE+2`, `enter_CONFIGURED-666`. An expression with no index is assumed to be indexed `+0`. These indexes do not correspond to timestamps, they are discrete labels that allow more granularity in callbacks, ensuring a strict ordering of callback opportunities within a given callback moment. Thus, `before_CONFIGURE+2` will complete execution strictly after `before_CONFIGURE` runs, but strictly before `enter_CONFIGURED-666` is executed.
35184

36-
## Workflow hook calls
185+
### Call roles
186+
187+
Call roles represent calls to integrated services. They must contain a `call`
188+
attribute, which contains a key that maps to an integration plugin function,
189+
i.e. an API call that is made to an integrated service.
190+
191+
The `call` map must contain a `func` key, which references the function to be
192+
called. The functions available depend on which integration plugins are
193+
loaded into the AliECS instance.
194+
Like hook task roles, call roles have a well-defined moment when they must start
195+
and finish (with respect to the environment state machine), and they are generally
196+
not long-running operations.
197+
198+
```yaml
199+
- name: "reset"
200+
call:
201+
func: odc.Reset()
202+
trigger: before_RESET
203+
await: after_RESET
204+
timeout: "{{ odc_reset_timeout }}"
205+
critical: true
206+
```
207+
208+
See [readout-dataflow](https://github.com/AliceO2Group/ControlWorkflows/blob/master/workflows/readout-dataflow.yaml)
209+
for examples of call roles that reference a variety of integration plugins.
210+
211+
#### Workflow hook call structure
37212

38213
The state machine callback moments are exposed to the AliECS workflow template interface and can be used as triggers or synchronization points for integration plugin function calls. The `call` block can be used for this purpose, with similar syntax to the `task` block used for controllable tasks. Its fields are as follows.
39214
* `func` - mandatory, it parses as an [`antonmedv/expr`](https://github.com/antonmedv/expr) expression that corresponds to a call to a function that belongs to an integration plugin object (e.g. `bookkeeping.StartOfRun()`, `dcs.EndOfRun()`, etc.).
@@ -62,8 +237,184 @@ Consider the following example:
62237
critical: true
63238
```
64239
240+
### Aggregator roles
241+
242+
Aggregator roles represent aggregations of other roles. They must contain a
243+
`roles` attribute, which is a list of child roles.
244+
245+
For the purposes of the state machine, they represent their children, and
246+
any `defaults` or `vars` set on an aggregator role are passed down to its
247+
children (which may in turn override them).
248+
249+
```yaml
250+
- name: "readout"
251+
vars:
252+
readout_var: 'this value will be overridden by the 1st child role'
253+
roles:
254+
- name: "readout"
255+
vars:
256+
readout_var: 'var-value'
257+
task:
258+
load: readout
259+
- name: "stfb"
260+
vars:
261+
stfb_var: 'var-value'
262+
task:
263+
load: stfbuilder
264+
```
265+
266+
### Iterator roles
267+
268+
Iterator roles expand into multiple instances based on an iterator expression.
269+
They must contain a `for` attribute, which is an expression that evaluates to
270+
a list of values. The `name` attribute must be parametrized with the iterator
271+
variable specified in the `for` block.
272+
273+
```yaml
274+
- name: host-{{ it }}
275+
for:
276+
range: "{{ hosts }}"
277+
var: it
278+
constraints:
279+
- attribute: machine_id
280+
value: "{{ it }}"
281+
roles:
282+
- name: "readout"
283+
task:
284+
load: readout
285+
```
286+
287+
### Include roles
288+
289+
Include roles include another workflow template as subtree. They must contain
290+
an `include` attribute, which is the path to the workflow template file to
291+
include.
292+
293+
```yaml
294+
- name: dpl
295+
enabled: "{{ qcdd_enabled == 'true' }}"
296+
include: qc-daq
297+
```
298+
299+
### Template expressions
300+
301+
The AliECS workflow template language supports expressions in the form of
302+
`{{ expression }}`. These expressions are evaluated by the AliECS core
303+
when the workflow is instantiated, and the result is used in place of the
304+
expression.
305+
306+
See [`antonmedv/expr`](https://github.com/antonmedv/expr) for the full
307+
documentation on the expression syntax.
308+
309+
AliECS extends the syntax with
310+
additional functions and variables that are available in the context of
311+
the workflow template evaluation.
312+
313+
#### Configuration access functions
314+
315+
* `config.Get(path string) string` - Returns the template-processed configuration payload at the given Apricot path.
316+
* `config.Resolve(component string, runType string, roleName string, entryKey string) string` - Returns the resolved path to a configuration entry for the given component, run type, role name, and entry key.
317+
* `config.ResolvePath(path string) string` - Returns the resolved path to a configuration entry for the given path.
318+
319+
#### Inventory access functions
320+
321+
* `inventory.DetectorForHost(hostname string) string` - Returns the detector name for the specified host.
322+
* `inventory.DetectorsForHosts(hosts string) string` - Returns a JSON-format list of detector names for the specified list of hosts (also expected to be JSON-format).
323+
* `inventory.CRUCardsForHost(hostname string) string` - Returns a JSON-format list of CRUs for the specified host.
324+
* `inventory.EndpointsForCRUCard(hostname string, cardSerial string) string` - Returns a JSON-format list of endpoints for the specified CRU card.
325+
326+
#### Runtime KV map access functions
327+
328+
* `runtime.Get(component string, key string) string` - Returns from Apricot the value of the key in the runtime KV map of the specified component.
329+
* `runtime.Set(component string, key string, value string) string` - Sets in Apricot the value of the key into the runtime KV map of the specified component.
330+
331+
#### DPL subworkflow just-in-time generator functions
332+
333+
* `dpl.Generate`
334+
* `dpl.GenerateFromUri`
335+
* `dpl.GenerateFromUriOrFallbackToTemplate`
336+
337+
#### String functions
338+
339+
* `strings.Atoi`, `strings.Itoa`, `strings.TrimQuotes`, `strings.TrimSpace`, `strings.ToUpper`, `strings.ToLower` - See the [Go strings package](https://golang.org/pkg/strings/) for more information.
340+
* `strings.IsTruthy(in string) bool` - Used in condition evaluation. Returns `true` if the string is one of `"true"`, `"yes"`, `"y"`, `"1"`, `"on"`, `"ok"`, otherwise `false`.
341+
* `strings.IsFalsy(in string) bool` - Used in condition evaluation. Returns `true` if the string is empty, or one of `"false"`, `"no"`, `"n"`, `"0"`, `"off"`, `"none"`, otherwise `false`.
342+
343+
#### JSON manipulation functions
344+
345+
* `json.Unmarshal(in string) object` (with alias `json.Deserialize`) - Unmarshals a JSON string into an object.
346+
* `json.Marshal(in object) string` (with alias `json.Serialize`) - Marshals an object into a JSON string.
347+
348+
#### UID generation function
349+
350+
* `uid.New() string` - Returns a new unique identifier string, same format as AliECS environment IDs.
351+
352+
#### General utility functions
353+
354+
* `util.PrefixedOverride(varname string, prefix string) string` - Looks in the current variables stack for a variable with key `varname`, as well as for a variable with key `prefix_varname`. If the latter exists, it returns its value, otherwise it returns the value of `varname` as fallback. If neither exist, it returns `""`. Note that this function may return either the empty string or other falsy values such as `"none"`, so `strings.IsFalsy` should be used to check the output if used in a condition.
355+
* `util.Dump(in string, filepath string) string` - Dumps the input string to a file at the specified path. Returns the string itself.
356+
* `util.SuffixInRange(input string, prefix string, idMinStr string, idMaxStr string) string`
357+
65358
# Task Configuration
66359

360+
## Task template structure
361+
362+
A task template is a YAML file that describes the configuration of a task,
363+
down to the command line arguments and environment variables that are passed
364+
to the task on startup.
365+
366+
These parameters and variables can be static, or they can be dynamic, pulled
367+
from the GUI, the AliECS `vars` and `defaults` defined in the workflow template,
368+
or from the O² Configuration defaults (in order of importance, from less to more
369+
"defaulty").
370+
371+
A task template must contain a `name` attribute, which is the name of the task,
372+
that is then referenced by a task role in a workflow template.
373+
374+
Task templates can define non-data flow tasks, in which case they only specify
375+
the command to run (for the most part), or they can be data flow tasks, in which
376+
case they also specify the available inbound connections with a `bind` statement.
377+
378+
Data flow tasks can also specify additional parameters in a `properties` map,
379+
which are set during the `CONFIGURE` transition (via the FairMQ plugin interface
380+
or via the OCC library, depending on the task control mechanism).
381+
382+
```yaml
383+
name: readout
384+
defaults:
385+
readout_cfg_uri: "consul-ini://{{ consul_endpoint }}/o2/components/readout/ANY/any/readout-standalone-{{ task_hostname }}"
386+
user: flp
387+
log_task_stdout: none
388+
log_task_stderr: none
389+
_module_cmdline: >-
390+
source /etc/profile.d/modules.sh && MODULEPATH={{ modulepath }} module load Readout Control-OCCPlugin &&
391+
o2-readout-exe
392+
_plain_cmdline: "{{ o2_install_path }}/bin/o2-readout-exe"
393+
control:
394+
mode: direct
395+
wants:
396+
cpu: 0.15
397+
memory: 128
398+
bind:
399+
- name: readout
400+
type: push
401+
rateLogging: "{{ fmq_rate_logging }}"
402+
addressing: ipc
403+
transport: shmem
404+
properties: {}
405+
command:
406+
stdout: "{{ log_task_stdout }}"
407+
stderr: "{{ log_task_stderr }}"
408+
shell: true
409+
env:
410+
- O2_DETECTOR={{ detector }}
411+
- O2_PARTITION={{ environment_id }}
412+
user: "{{ user }}"
413+
arguments:
414+
- "{{ readout_cfg_uri }}"
415+
value: "{{ len(modulepath)>0 ? _module_cmdline : _plain_cmdline }}"
416+
```
417+
67418
## Variables pushed to controlled tasks
68419

69420
FairMQ and non-FairMQ tasks may receive configuration values from a variety of sources, both from their own user code (for example by querying Apricot with or without the O² Configuration library) as well as via AliECS.

0 commit comments

Comments
 (0)