Skip to content

Commit d292f85

Browse files
authored
Merge pull request #62 from plessbd/xdmod-tutorial-updates
updates to XDMoD Tutorial
2 parents a35ff8f + f502735 commit d292f85

6 files changed

Lines changed: 226 additions & 23 deletions

File tree

xdmod/README.md

Lines changed: 226 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,13 @@ The base component of Open XDMoD uses the job accounting logs from the HPC
55
resource manager as the data source. We are also going to install the optional Job Performance Module. This
66
allows Open XDMoD to also display performance data for HPC jobs.
77

8+
The asciinema media is not meant to be used on its own, they are intended for use in a "live" demonstration.
9+
Command Line Demos in a Light color, are meant to be watched. Dark theme are interactive
10+
11+
`VIM` is used to edit files in this tutorial. If you prefer a different editor, please install it on the xdmod container.
12+
813
## Submit some jobs to the cluster
14+
915
Before we install and configure XDMoD we are going to submit
1016
some HPC jobs to the cluster. These jobs will run while we go through
1117
the install and then we will be able to view the job information
@@ -20,7 +26,7 @@ Run the provided script that submits several jobs to the cluster. These jobs
2026
run as multiple different users with different job sizes and durations. The
2127
purpose of this is to generate data to display in Open XDMoD. This, of course,
2228
would not be required on a production deployment. This script should be run
23-
as the hpcadmin user as it uses `sudo` to submit jobs as different cluster:
29+
as the hpcadmin user as it uses `sudo` to submit jobs as different cluster
2430
users.
2531
```bash
2632
submit_jobs.sh
@@ -38,13 +44,14 @@ The Open XDMoD software is installed via RPMs. The majority of the software depe
3844
are automatically installed via RPM. However, the `phantomjs` software
3945
that Open XDMoD uses for its image export must be installed seperately.
4046

41-
Open XDMoD provides an interactive configuration script that performs the
42-
database initialization and generates configuration files. This script
43-
handles the basic setup.
44-
45-
The `hpc-toolset-tutorial/xdmod/install.sh` script contains the step-by-step
47+
The [`hpc-toolset-tutorial/xdmod/install.sh`](https://github.com/ubccr/hpc-toolset-tutorial/blob/master/xdmod/install.sh) script contains the step-by-step
4648
instructions to install the packages.
4749

50+
Reference: [RPM Installation Guide](https://open.xdmod.org/install-rpm.html)
51+
52+
Package Installation:
53+
[![asciicast](https://asciinema.org/a/349235.svg)](https://asciinema.org/a/349235)
54+
4855
## Open XDMoD Configuration
4956

5057
### Prerequisites
@@ -62,39 +69,235 @@ The following information is needed by Open XDMoD:
6269
Optionally:
6370

6471
- An image file containing the HPC center logo
72+
- The width HPC center logo
6573

6674
Also the following technical information:
6775

6876
- The public url of Open XDMoD
6977
- Paths to installed dependencies (phantomjs)
7078
- MySQL connection information
79+
- Host
80+
- Port
81+
- Admin Username
82+
- Admin Password
83+
- DB Username
84+
- DB Password
85+
86+
If you are installing the Job Performance module (as we are in this tutorial)
87+
- mongoDB connection information
88+
89+
### Prerequisites used in this Tutorial
90+
91+
- Name of the organization: `Tutorial` abbreviation: `hpcts`
92+
- information for each HPC resource
93+
- Name: `hpc`
94+
- Number of compute nodes: `2`
95+
- Number of cores: `2`
96+
- Timezone: `UTC`
97+
- Whether it runs shared jobs: `no`
98+
- An image file containing the HPC center logo: `/srv/xdmod/small-logo.png`
99+
- The width HPC center logo: `354`
100+
- The public url of Open XDMoD: `https://localhost:4443`
101+
- Paths to installed dependencies (phantomjs): `detected defaults`
102+
- MySQL connection information
103+
- Host: `mysql`
104+
- Port: `3306`
105+
- Admin Username: `root`
106+
- Admin Password: ` leave blank `
107+
- DB Username: `xdmodapp`
108+
- DB Password: `ofbatgorWep0`
109+
- mongoDB connection information `mongodb://xdmod:xsZ0LpZstneBpijLy7@mongodb:27017/supremm?authSource=admin`
110+
111+
### Basic Configuration
112+
Open XDMoD provides an interactive configuration script that performs the
113+
database initialization and generates configuration files. This script
114+
handles the basic setup.
115+
116+
The [`hpc-toolset-tutorial/xdmod/entrypoint.sh`](https://github.com/ubccr/hpc-toolset-tutorial/blob/master/xdmod/entrypoint.sh) script automates this process.
117+
118+
Reference: [Configuration Guide](https://open.xdmod.org/configuration.html)
119+
120+
The following asciinema recordings are how an administrator would do perform these actions:
121+
122+
General Setup:
123+
[![asciicast](https://asciinema.org/a/349236.svg)](https://asciinema.org/a/349236)
124+
125+
Database Setup:
126+
[![asciicast](https://asciinema.org/a/349237.svg)](https://asciinema.org/a/349237)
127+
128+
Organization Setup:
129+
[![asciicast](https://asciinema.org/a/349238.svg)](https://asciinema.org/a/349238)
130+
131+
Resource Setup:
132+
[![asciicast](https://asciinema.org/a/349240.svg)](https://asciinema.org/a/349240)
71133

134+
#### Advanced configuration
135+
136+
The `xdmod-setup` script is used for the basic setup of Open XDMoD. The script includes options to configure the Open XDMoD database, setup the admin user account and configure resources.
137+
Open XDMoD's [Configuration](https://open.xdmod.org/configuration.html#location-of-configuration-files) files can be modified directly when needing more advanced customization.
138+
139+
*Have a heterogeneous cluster?* You could modify `/etc/xdmod/resource_specs.json` and set the PPN to the average number of processors per node.
140+
141+
#### Hierarchy
142+
143+
Open XDMoD supports a three level hierarchy.
144+
In this tutorial we use a hierarchy configuration that is typical of the organizational structure in a University.
145+
146+
Decanal Unit -> Department -> PI Group
147+
148+
Reference: [Hierarchy Guide](https://open.xdmod.org/hierarchy.html)
72149

73150
## Open XDMoD Job Performance
151+
74152
The Job Performance module is optional, but highly recommended.
75-
The Job Performance
76-
TODO: PCP Configuration (mention alternatives, TACC,...)
77-
https://github.com/ubccr/hpc-toolset-tutorial/blob/master/slurm/pmlogger-supremm.config#L56-L59
78153

79-
Done: Names.csv
154+
![Job Performance Dataflow](./tutorial-screenshots/admin-job-performance-dataflow.png)
155+
156+
### Job Performance Configuration
157+
158+
[Job Performance](https://supremm.xdmod.org) data - for the Open source release we'll try to provide support for [Performance Co-Pilot (PCP)](https://pcp.io).
159+
We chose PCP because it is included by default in Centos / RedHat.
160+
In XSEDE we use tacc_stats and PCP (depending on the resource provider). and we have also used LDMS, Cray RUR and are aware of groups using Ganglia too.
161+
162+
PCP has been [installed](https://github.com/ubccr/hpc-toolset-tutorial/blob/master/slurm/install.sh#L80-L87) and configured on the compute nodes.
163+
This tutorial uses a cut-down list of PCP metrics from the recommended metrics for a production HPC system.
164+
This shorter list is suitable for running inside the docker demo. On a
165+
real HPC system the data collection should be setup following the
166+
[PCP Data collection](https://supremm.xdmod.org/supremm-compute-pcp.html#configuration-templates) guide
167+
168+
The file used in this demo can be viewed here: https://github.com/ubccr/hpc-toolset-tutorial/blob/master/slurm/pmlogger-supremm.config#L56-L59
169+
170+
VERY IMPORTANT - Don't start the configuration of the Job Performance module until there is job data ingested into Open XDMoD
171+
The Job performance setup relies on the accounting data from the Jobs realm in Open XDMoD.
172+
This was done as part of this tutorial as part of setup and will be done again later in the tutorial.
80173

81-
TODO: FOS
174+
Job Performance XDMoD Module Setup:
175+
[![asciicast](https://asciinema.org/a/349241.svg)](https://asciinema.org/a/349241)
176+
177+
Job summarization (SUPReMM) configuration:
178+
[![asciicast](https://asciinema.org/a/349243.svg)](https://asciinema.org/a/349243)
82179

83180
## Open XDMoD Operation
84-
TODO: SHRED INGEST AGGREGATE
85-
TODO: SUPREMM SHRED INGEST
86-
## It is Known
181+
182+
### Shredding Ingestion & Aggregation
183+
184+
Shredding
185+
> Load logs from a scheduler (SLURM in this tutorial) and put them into the Open XDMoD databases.
186+
> see [Open XDMoD](https://open.xdmod.org/) for notes on SGE/Grid Engine, Univa Grid Engine, PBS/TORQUE, LSF
187+
> Reference: [Shredder Guide](https://open.xdmod.org/shredder.html)
188+
189+
Ingestion
190+
> Prepare data that has already been loaded by the shredder into the Open XDMoD databases so that is can be queried by the Open XDMoD portal.
191+
> Reference: [Ingestor Guide](https://open.xdmod.org/ingestor.html)
192+
193+
Aggregation
194+
> What actually gets data into the Open XDMoD portal. For core xdmod this is part of ingestion. Job Performance has a separate script.
195+
196+
This tutorial provides a script [`shred-ingest-aggregate-all.sh`](https://github.com/ubccr/hpc-toolset-tutorial/blob/master/xdmod/scripts/shred-ingest-aggregate-all.sh)
197+
that does this. In a typical setup this would be part of a cron job run when it is best suited for the HPC system.
198+
199+
Run this now on the `xdmod` container
200+
201+
Login to frontend via SSH and user: `hpcadmin` password: `ilovelinux`:
202+
203+
```bash
204+
ssh -p6222 hpcadmin@localhost
205+
```
206+
SSH to the xdmod container:
207+
208+
```bash
209+
ssh xdmod
210+
```
211+
Run the script as the xdmod user:
212+
213+
```bash
214+
sudo -u xdmod /srv/xdmod/scripts/shred-ingest-aggregate-all.sh
215+
```
216+
This is going to produce A LOT of output. Each of these commands have flags that will turn this off. For the purpose of this tutorial they have not been silenced.
217+
218+
[![asciicast](https://asciinema.org/a/349242.svg)](https://asciinema.org/a/349242)
219+
220+
#### It is Known
87221
- `[WARNING] ... RuntimeWarning: invalid value encountered in double_scalars`
88-
- https://stackoverflow.com/questions/27784528/numpy-division-with-runtimewarning-invalid-value-encountered-in-double-scalars/27784588#27784588
222+
- https://stackoverflow.com/questions/27784528/numpy-division-with-runtimewarning-invalid-value-encountered-in-double-scalars/27784588#27784588
89223
- `[WARNING] Autoperiod library not found, TimeseriesPatterns plugins will not do period analysis`
90-
- The autoperiod code is used for detecting period I/O patterns in the parallel filesystem traffic. (not needed in the tutorial configuration)
91-
TODO: User Dashboard?
92-
93-
## Open XDMoD Functionality
94-
TODO: User
95-
TODO: PI
96-
TODO: Center
97-
TODO: Basic Admin
224+
- The autoperiod code is used for detecting period I/O patterns in the parallel filesystem traffic. (not needed in the tutorial configuration)
225+
226+
227+
## User / PI Names
228+
229+
The resource manager logs contain the system usernames of the users that submitted jobs.
230+
To display the full names in Open XDMoD you must provide a data file that contains the
231+
full name of each user for each system username. This file is in a `csv` format.
232+
233+
![Group By User(names not importe)](./tutorial-screenshots/usernames.png)
234+
235+
This has not been automated for this tutorial. We dont want you to fall asleep!
236+
237+
Login to frontend via SSH and user: `hpcadmin` password: `ilovelinux`:
238+
```bash
239+
ssh -p6222 hpcadmin@localhost
240+
```
241+
242+
Create a file with the contents below:
243+
The file needs to be able to be read by the `xdmod` user, for this demo it will be
244+
created in /var/tmp
245+
246+
```bash
247+
vim /var/tmp/names.csv
248+
```
249+
250+
The first column should include the user name or group name used by your resource manager, the second column is the user’s first name and the third column is the user’s last name.
251+
(Feel free to change the First and Last names)
252+
253+
```csv
254+
cgray,Carl,Gray
255+
sfoster,Stephanie,Foster
256+
csimmons,Charles,Simmons
257+
astewart,Andrea,Stewart
258+
hpcadmin,HPC, Administrators
259+
```
260+
261+
Now this needs to be imported into xdmod with the command [`xdmod-import-csv`](https://open.xdmod.org/commands.html#xdmod-import-csv)
262+
263+
```bash
264+
sudo -u xdmod xdmod-import-csv -t names -i /var/tmp/names.csv
265+
```
266+
267+
Then we will need to reingest and aggregate the data
268+
269+
```bash
270+
sudo -u xdmod /srv/xdmod/scripts/shred-ingest-aggregate-all.sh
271+
```
272+
![Group By User](./tutorial-screenshots/fullnames.png)
273+
274+
Reference: [User/PI Names Guide](https://open.xdmod.org/user-names.html)
275+
276+
xdmod-import-csv -t names:
277+
[![asciicast](https://asciinema.org/a/349325.svg)](https://asciinema.org/a/349325)
278+
279+
## Open XDMoD Functionality (Interactive Demo)
280+
281+
282+
### Administration
283+
284+
You know that the user is an admin by the addition of the "Admin Dashboard"
285+
286+
![Admin User](./tutorial-screenshots/admin-user.png)
287+
288+
Admin Dashboard:
289+
290+
![Admin Dashboard](./tutorial-screenshots/admin-dashboard.png)
291+
292+
### End User
293+
294+
Lets actually use Open XDMoD now.
295+
296+
User:
297+
298+
PI:
299+
300+
Center: Staff
98301

99302
## Tutorial Navigation
100303
[Next - OnDemand](../ondemand/README.md)
72.7 KB
Loading
216 KB
Loading
150 KB
Loading
184 KB
Loading
177 KB
Loading

0 commit comments

Comments
 (0)