Skip to content

Commit 25434cb

Browse files
committed
Updated level of blocks. Added info on recommended workflow to intro episode. Added XrootD extras link
1 parent d4d13ae commit 25434cb

4 files changed

Lines changed: 20 additions & 17 deletions

File tree

_episodes/01-introduction.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ keypoints:
1414
- "Rucio is the primary way to browse and access simulated EIC/ePIC data"
1515
---
1616

17-
## Simulation Campaigns
17+
# Simulation Campaigns
1818

1919
Simulations of a range of physics processes in the ePIC detector are typically run on a monthly basis by the Production Working Group. Information on simulation campaigns can be found on the [Production Working Group pages](https://eic.github.io/epic-prod/). This includes details of files produced in previous campaigns.
2020

@@ -32,7 +32,7 @@ These are linked to specific software releases following the same format.
3232

3333
Various types of files are produced as part of the simulation campaign as we will discuss in the next section. The files you may wish to access will differ depending upon your use case. In this tutorial, we will explore a few different common use cases and the types of files you may want in each.
3434

35-
### Submitting a New Simulation Request
35+
## Submitting a New Simulation Request
3636

3737
If you would like to submit a new request to a future campaign for a dataset that is not in production, please follow the following process:
3838

@@ -42,7 +42,7 @@ If you would like to submit a new request to a future campaign for a dataset tha
4242
3. Once your input files are ready, submit a [simulation request form](https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_forms_d_e_1FAIpQLScDqiEaHayAcwBDGAWa4W6k-2D6yUFzS-2DXiWuhLolpy64mLk5FA_viewform&d=DwMFAg&c=CJqEzB1piLOyyvZjb8YUQw&r=1bclzxVlhTV419LkWWxwLTl3ztSqyuA_Q_Vnypx1RD4&m=q9b8IbAHm_MLsvy4XkI2Px2QKzFNqjpf0qc4nctB9ZHyf-uL5bZuiegs5-hwb-Ec&s=TFdsmJL2wPtUD-CCXVdkWIF5lxB1QYbz5MKGhB6nroA&e=).
4343
- If your input is not pre-processed following the [pre-processing guidelines](https://github.com/eic/epic-prod/blob/main/docs/_documentation/input_preprocessing.md), it will not be simulated. Please review these carefully.
4444

45-
## Simulation Files Organisation
45+
# Simulation Files Organisation
4646

4747
Within a simulation campaign, there are three broad classes of files that are produce:
4848
- EVGEN: The input hepmc3 datasets
@@ -54,13 +54,16 @@ Within a simulation campaign, there are three broad classes of files that are pr
5454

5555
Most users and use cases will interact with RECO files, the output of the full simulation and reconstruction chain. We will explore some use cases and how to find the relevant files in each case.
5656

57-
## How can I Browse the Simulation Campaign Output and Access Files?
57+
# How can I Browse the Simulation Campaign Output and Access Files?
5858

5959
To browse the campaign output and find the files we want, we can use [Rucio](https://rucio.cern.ch/). *Rucio* is an open source scientific data management system. It is utilised in other large physics experiments such as ATLAS.
6060

61-
### Wait, I read I should use XrootD to find and access files?
61+
## Wait, I read I should use XrootD to find and access files?
6262

63-
You may find reference to or instructions on using Xrootd to browse and access files.These may still work and indeed, we will use some of these commands later in this tutorial. However, Rucio is now the preferred method for the cases we will examine.
63+
You may find reference to or instructions on using [XrootD]({{ page.root }}{% link _extras/xrootd.md %}) to browse and access files. These may still work and indeed, we will use some of these commands later in this tutorial. However, Rucio is now the preferred method for the cases we will examine. **The recommended workflow is now:**
64+
65+
1. Find file location with Rucio
66+
2. Stream or download with XrootD
6467

6568
Why? This change isn't just to make everybody learn something new, it is also a consequence of the expansion of the volume of ePIC data now available. Previously (before 2026), all simulated data was stored on Jefferson Lab servers. However, data is now spread between multiple sites. This makes finding an accessing it using XrootD more complicated. Rucio can deal with this "issue" in a straightforward way.
6669

_episodes/02-rucio_usage.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ keypoints:
1515
- "Once you find the file location with Rucio, you can use xrootd to download or stream it too"
1616
---
1717

18-
## Getting Started
18+
# Getting Started
1919

2020
We can access and run the Rucio client from within eic-shell. From wherever you have eic-shell:
2121

@@ -40,7 +40,7 @@ rucio -h
4040

4141
To use Rucio further, we will need to briefly look at how Rucio organises data.
4242

43-
## Datasets and DIDs
43+
# Datasets and DIDs
4444

4545
Typically, we want to analyse data contained within specific files. Files can be grouped together into datasets which can themselves, be grouped into containers. All three refer to "data". As such, the term "data identifier` or **DID** is used in Rucio. A DID is just the name of a single file, dataset or container.
4646

@@ -86,7 +86,7 @@ The `name` here - `/RECO/26.02.0/epic_craterlake/EXCLUSIVE/DEMP/DEMPgen-1.2.4/10
8686

8787
Other names may not necessarily contain all of the same information, but as a bare minimum, are likely to tell us something about the physics process simulated and beam conditions, as well as which software release was used. This is reflected in the metadata tags assigned as we will see later.
8888

89-
## Finding DIDs
89+
# Finding DIDs
9090

9191
Now that we know what a DID looks like, how can we find the DID corresponding to the file or dataset that we're interested in?
9292

@@ -182,7 +182,7 @@ The `root://dtn-eic.jlab.org` at the start of the output tells us that this part
182182
183183
So, we can find DIDs, check what they are and what they contain. To get to this point though, we needed some pre-knowledge of what the DID looked like which isn't necessarily that helpful for finding something. However, a much easier approach to finding what we need is to use the metadata tags that are assigned all DIDs from March 2026 onwards.
184184
185-
## Metadata Tags
185+
# Metadata Tags
186186
187187
The following tags are available as of March 2026:
188188
@@ -278,7 +278,7 @@ which will return only datasets with 10x250 collisions (10 GeV electrons on 250
278278
> **Hint** - Check the example name we looked at when introducing DIDs in a previous section.
279279
{: .challenge}
280280
281-
## Using DIDs - Downloading or Processing Files
281+
# Using DIDs - Downloading or Processing Files
282282
283283
So far we've seen how we can find DIDs and check some basic info such as what type of data they point to and where that data is stored. We generally want to do a bit more than that though. Typically we want to find data to *use* it in some way. For our simulation data, this is usually to analyse it!
284284
@@ -336,7 +336,7 @@ file_path = "root://dtn-eic.jlab.org:1094//volatile/eic/EPIC//RECO/26.02.0/epic_
336336
file = ROOT.TFile.Open(file_path, "READ")
337337
```
338338
339-
### Testing File Streaming
339+
## Testing File Streaming
340340
341341
We can quickly check the three methods above work.
342342

_episodes/03-use_cases.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ keypoints:
1515

1616
In this episode, we will explore a few common use cases and how users may want to interact with simulation campaign output in each case. Examples of carrying out some common tasks associated with each use case will be included.
1717

18-
## Physics Analyser - Novice
18+
# Physics Analyser - Novice
1919

2020
This use case explores a user new to analysing ePIC data to try and look at a specific physics process. They will likely want to find and identify a specific physics process to pass through their analysis code. Their requirements are likely to include:
2121

@@ -76,7 +76,7 @@ where `FILEPATH` is the path to one specific file from the output of one of the
7676
> - Download **one** file from this dataset of your choice
7777
{: .challenge}
7878

79-
## Physics Analyser - Experienced
79+
# Physics Analyser - Experienced
8080

8181
In this use case, we consider an experienced physics analyser that has a well developed analysis script that they want to run on a large number of files, possibly even a full dataset, for a specific physics process they're interested in. Their requirements are likely to include:
8282

@@ -180,7 +180,7 @@ Note that we have restricted these examples to only print out the first five fil
180180
> 3. Stream **five** of the files in this dataset in a script, check the total number of events contained in all five files.
181181
{: .challenge}
182182

183-
## Detector Designer/Optimiser, Algorithm/Reconstruction Development
183+
# Detector Designer/Optimiser, Algorithm/Reconstruction Development
184184

185185
In this use case, someone updating the design of a detector in DD4HEP, or adjusting a reconstruction algorithm for a detector, may not want full reconstructed data. Instead, they may want more raw, hit level information. They may also want a specific detector configuration for comparison. In terms of physics process, they may not be looking at an actual reaction at all, but a particle gun simulation. To summarise, they may want:
186186

@@ -212,7 +212,7 @@ Some tags they might use to find their data include:
212212
> - Do non-reconstructed files exist for this/these dataset(s)?
213213
{: .challenge}
214214

215-
## Conclusion and Comments
215+
# Conclusion and Comments
216216

217217
That wraps up our introduction to using Rucio and some example use cases and scenarios.
218218

_extras/xrootd.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ In our earlier episode, we used this command to copy a file we found using Rucio
6262

6363
It is also possible to open a file directly in ROOT if you have XrootD installed too. Note that the following command should be executed after opening root and `TFile::Open()` should be used:
6464

65-
```bash
65+
```c++
6666
auto f = TFile::Open("root://dtn-eic.jlab.org//volatile/eic/EPIC/RECO/path-to-file")
6767
```
6868

0 commit comments

Comments
 (0)