Skip to content

Commit 30e2e7e

Browse files
committed
2 parents 3d8294c + 6f668e2 commit 30e2e7e

15 files changed

Lines changed: 943 additions & 104 deletions

README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,13 @@ This tutorial will be presented at the following conferences:
3232
[PEARC20](https://pearc.acm.org/pearc20/)
3333
[Gateways 2020](https://sciencegateways.org/web/gateways2020)
3434

35+
## Disclaimer
36+
37+
DO NOT run this project on production systems. This project is for educational
38+
purposes only. The container images we publish for the tutorial are configured
39+
with hard coded insecure passwords and should be run locally in development for
40+
testing and learning only.
41+
3542
## License
3643

3744
This tutorial is released under the GPLv3 license. See the LICENSE file.

coldfront/README.md

Lines changed: 75 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -6,37 +6,64 @@
66

77

88
## Login to ColdFront website
9-
- URL https://localhost:2443/
10-
- You'll need to login as some of the users for this tutorial to get things started:
9+
URL https://localhost:2443/
10+
You'll need to login as some of the users for this tutorial to get things started. Do NOT use the OpenID Connect login option at this point.
1111
- Login locally as username `hpcadmin` password: `ilovelinux`
1212
- Logout
1313
- Login locally as username `cgray` password: `test123`
1414
- Logout
1515
- Login locally as username `csimmons` password: `ilovelinux`
1616
- Login locally as username `sfoster` password: `ilovelinux`
1717
- Login locally as username `admin` password: `admin`
18-
- Go to Admin interface, Users
19-
- Click on the hpcadmin user
20-
- Make this user a `superuser` by checking the boxes next to `Staff Status` and `Superuser Status` - SAVE
21-
- Click on the sfoster account and check the box next to `Staff Status` Also under the `User Permissions` section add permissions for `allocation|allocation|Can view all allocations` and `project|project|Can view all projects` Make sure to SAVE the changes.
22-
- Click on the Home link to go to back to the Admin interface, then click "User Profiles"
23-
- Click on `cgray` check ``"Is pi"`` SAVE
24-
- Click on the Home link to go to back to the Admin interface, Click on Resources
25-
- Add a resource: `cluster, cluster name=hpc, description: anything you want, resource attribute: slurm_cluster=hpc` - click SAVE
26-
- Logout
18+
- Go to Admin menu and click on `ColdFront Administration` Once there, scroll halfway down to the `Authentication and Authorization` section. Then click on the `Users` link.
19+
- Click on the hpcadmin user and scroll down to the `Permissions` section
20+
- Make this user a `superuser` by checking the boxes next to `Staff Status` and `Superuser Status` - scroll to the bottom and click SAVE
21+
- Click on the sfoster account and check the box next to `Staff Status` Also under the `User Permissions` section add permissions to make this user the Center Director
22+
`allocation|allocation|Can manage invoice`
23+
`allocation|allocation|Can view all allocations`
24+
`grant|grant|Can view all grants`
25+
`project|project|Can view all projects`
26+
`project|project|Can review pending project reviews`
27+
`publication|publication|Can view publication`
28+
Make sure to SAVE the changes.
29+
- Click on the Home link to go to back to the Admin interface, scroll to the bottom of the page under the `User` section and click `User Profiles`
30+
- Click on `cgray` check ``"Is pi"`` - click SAVE
31+
32+
Create a new resource:
33+
- Click on the Home link to go to back to the Admin interface, scroll down near the bottom to the `Resource` section and Click on `Resources` then click the `Add Resource` button
34+
- Add a resource with the following settings:
35+
Resource type: select `cluster`
36+
Name: type `hpc`
37+
Description: enter anything you want
38+
Ensure that the following are checked: `Is available`, `Is public`, `Is allocatable`
39+
Under the resource attributes section, click `Add another Resource attribute` and select `slurm_cluster` from the drop down menu. In the `value` field, enter `hpc` - then click SAVE
40+
- Logout
41+
42+
Request an allocation for the new resource as the PI user:
2743
- Login as the PI using local account username: `cgray` password: `test123`
2844
- Create a new project, filling in the name, description, and selecting any field of science
2945
- Request an allocation for resource: hpc
3046
- Add a user to the project - search for `csimmons` and add to the HPC cluster allocation
31-
- Logout
47+
- Logout
48+
49+
Activate and setup the new allocation:
3250
- Login using local account username: `hpcadmin` password: `ilovelinux`
33-
- Activate the allocation and set the appropriate allocation attributes:
34-
`slurm_account:cgray, slurm_specs:Fairshare=100, slurm_user_specs:Fairshare=parent`
51+
- Navigate to the `Admin` menu and click on `Allocation Requests`
52+
- Click on the `Detail` button to configure and activate the allocation:
53+
click the `Add Allocation Attribute` button and select these allocation attributes from the drop down menu:
54+
`slurm_account_name` Enter: `cgray`
55+
`slurm_specs` Enter: `Fairshare=100:DefaultQOS=normal`
56+
`slurm_user_specs` Enter: `Fairshare=parent:DefaultQOS=normal`
57+
Set the status to `Active`, set the start date to today, and set the expiration date to the end of this month (you'll see why later)
58+
Click the `Update` button
3559

3660
## Login to OnDemand website
3761
- Login to Open OnDemand https://localhost:3443/ as username: `cgray` password: `test123`
38-
- Try to launch an interactive Job - you will get an error message that you do not have permission to run on the cluster
62+
- Click on the `Interactive Apps` menu and click on `HPC Desktop`
63+
- Try to launch an interactive Job by clicking on the `Launch` button
64+
You will get an error message that you do not have permission to run on the cluster
3965
`sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified`
66+
This is because we have not synced the allocation information in ColdFront with Slurm yet.
4067

4168

4269
## Run Slurm plugin to sync active allocations from ColdFront to Slurm
@@ -74,8 +101,8 @@ NOTE: you should already be on the frontend but just in case you're not:
74101
`ssh -p 6222 hpcadmin@localhost`
75102
password: `ilovelinux`
76103

77-
Check slurm associations for cgray again: they should now show access to the hpc cluster
78-
`sacctmgr show user cgray -s list`
104+
Check slurm associations for cgray account: they should now show access to the hpc cluster
105+
`sacctmgr show account cgray -s list`
79106
`su - cgray`
80107
password: `test123`
81108
`sbatch --wrap "sleep 600"`
@@ -86,42 +113,49 @@ password: `test123`
86113
## Login to OnDemand website
87114
- Login back into or refresh your login to Open OnDemand https://localhost:3443/ as username: `cgray` password: `test123`
88115
- Try to launch an interactive job again. Does it work this time?
89-
- Go to Active Jobs and click on your running jobs to see more details
116+
- Go to the `Jobs` menu and click `Active Jobs` and click on your running jobs to see more details
90117
- Delete (cancel) the jobs so they show the `completed` status
91118

92119

120+
## Annual Project Review
121+
When the project review functionality is enabled (it is by default) a PI will be forced to review their project once every 365 days. To change this time frame, edit the default in `coldfront.env` We can force a project to be under review in less than a year which is what we'll do for the cgray project.
93122

94-
## Login to Open XDMoD website
95-
- Login to Open XDMoD https://localhost:4443/
96-
-- Click on `Sign In` at the top left
97-
-- Under the section "Sign in with local XDMoD account:" Click on "Login Here" and enter username: `admin` password: `admin`
98-
- Notice there is currently no data in XDMoD
123+
Login as `hpcadmin` password `ilovelinux`
124+
Navigate to the `Admin` menu and click on the `ColdFront Administration` link. Scroll to the `Project` section and click on `Projects` Then click on the project that we created earlier. Check the box next to `Force Review`
125+
NOTE: If there is a project you never want project reviews on, uncheck 'Requires review'
99126

100-
![XDMoD login](../docs/xdmod_login.PNG)
127+
Logout as `hpcadmin` and login as `cgray` password `test123` and notice the warning banner. Click on the allocation and try to renew it. You should see a warning banner telling you it can't be done because the project review is due. When a project review is required, a PI can't request new allocations or renew expiring allocations. They can, however, add/remove users, publications, grants, and research output. Click on the `renew now` link for the allocation to test this out.
101128

102-
![XDMoD no data](../docs/xdmod_empty.PNG)
129+
Click the `Review Project` link. Provide a reason for not providing grant or publication information, check the box to acknowledge the update and click the Submit button. Now try to renew the expiring allocation.
103130

131+
Log out as `cgray`
104132

105-
## Login to Open XDMoD container
106-
- `ssh hpcadmin@xdmod`
107-
password: `ilovelinux`
108-
- In order to see the job data just generated in slurm, we need to ingest the data into Open XDMoD and aggregate it. This is normally done once a day on a typical system but for the purposes of this demo, we have created a script that you can run now:
109-
`sudo -u xdmod /srv/xdmod/scripts/shred-ingest-aggregate-all.sh`
110-
`exit`
111133

134+
## Allocation Change Requests
135+
Allocation change requests are turned on by default. This will allow PIs to request date extensions for their allocations. The date ranges default to 30, 60, & 90 days but can be set changed or disabled completely in `hpc-toolset-tutorial/coldfront/coldfront.env`
136+
See https://coldfront.readthedocs.io/en/latest/config/#coldfront-core-settings for more information.
112137

113-
**Note: More information about this script in the Open XDMoD portion of this tutorial**
138+
If you want PIs to be able to request changes to allocation attributes (i.e. storage quotas, unix group) this needs to be set on the allocation attribute. For this demo, we will allow the PI to request changes on the `slurm_account` attribute.
139+
- Login as `hpcadmin` password `ilovelinux`
140+
- Navigate to the `Admin` menu and click on the `ColdFront Administration` link. Under the `Allocation` section, click on `Allocation Attribute Types` Click on `slurm_account_name` check the box next to `Is changeable` and then click the SAVE button. Logout.
141+
- Login as `cgray` password `test123`
142+
- Click on the allocation `RENEWAL REQUESTED` button or navigate to the Allocation Detail page through the project. Click on the `Request Change` button, select a date extension, enter a new slurm account and provide a justification. Then click the `SUBMIT` button. Logout.
143+
- Login as `hpcadmin` password `ilovelinux`
144+
- Go to the `Admin` menu and click on `Allocation Change Requests`
145+
- As the admin you have the ability to approve the date extension, change it to another setting or select `no extension` You can remove the `slurm_account_name` request or change it. You can add notes for the PI and users on the allocation to see. Then you can take action such as `Approve` or `Deny` the request. For this demo, let's click the `Approve` button.
146+
- Next review the pending allocation requests. Navigate to the `Admin` menu and click on `Allocation Requests` Note that the project review status is pending.
147+
- Click on the `Admin` menu and click on `Pending project reviews`.
114148

115-
## Login to Open XDMoD website
116-
- Login to Open XDMoD https://localhost:4443/
117-
-- Click on 'Sign In' at the top left
118-
-- Under the section "Sign in with tutorial:" Click on "Login Here" and enter username: `cgray` password: `test123`
119-
- You should see the data from the job you just ran
120-
NOTE: There won't be much info except that we ran a few jobs. More will be presented in the XDMoD portion of the tutorial
149+
## Center Director Role and Permissions
150+
At the start of the tutorial we configured the user `sfoster` with the 'Staff Status' role and gave permissions to act as the Center Director. This allows `sfoster` to view all projects, allocations, publications, and grants. We've also given permission to view the pending project review list. Login as `sfoster` password `ilovelinux` to see what additional menus and functionality this account has access to.
121151

122-
![XDMoD job data](../docs/xdmod_jobs.PNG)
152+
Navigate to the `Staff` menu and click on `Project Reviews`
153+
Click the `Email` button to see this functionality. Go back to the `Project Reviews` and click `Mark Complete`.
123154

124-
## Integrating OnDemand with ColdFront
155+
For more options on allowing permissions for various types of staff access, see the ColdFront manual: https://coldfront.readthedocs.io/en/latest/manual/users/
156+
157+
158+
## Integrating OnDemand with ColdFront (Time Permitting)
125159
This is a very simple example of modifying the ColdFront configuration to use a plugin. This plugin allows us to provide a link to our OnDemand instance for any allocations for resources that have "OnDemand enabled"
126160

127161
We have already added the OnDemand instance info to the ColdFront config. You can see this outside the containers in your git directory: See `hpc-toolset-tutorial/coldfront/coldfront.env`
@@ -130,24 +164,8 @@ Now let's enable OnDemand for our cluster resource:
130164
- Log back in to the ColdFront Administration site https://localhost:2443/admin/ as the `hpcadmin` acccount - password `ilovelinux`:
131165
- Navigate to the Resources section and click on the 'HPC' cluster resource. Add a new resource attribute: `OnDemand = "Yes"`
132166
- Log out and log in as the PI user `cgray` password `test123`
133-
- Notice on the ColdFront home page next to the allocation for the HPC cluster resource you see the OnDemand logo. Click on the Project name and see this logo also shows up next to the allocation. When we click on that logo, it directs us to the OnDemand instance.
134-
135-
## Staff Role
136-
At the start of the tutorial we configured the user `sfoster` with the 'Staff Status' role and gave permissions to view all projects and all allocations. Login as `sfoster` password `ilovelinux` to see what additional menus and functionality this account has access to.
137-
138-
139-
## Annual Project Review (time permitting)
140-
When the project review functionality is enabled (it is by default) a PI will be forced to review their project once every 365 days. To change this time frame, edit the default in `coldfront.env` We can force a project to be under review in less than a year which is what we'll do for the cgray project.
167+
- Notice on the ColdFront home page next to the allocation for the HPC cluster resource you see the OnDemand logo. Click on the Project name and see this logo also shows up next to the allocation. When we click on that logo, it directs us to the OnDemand instance.
141168

142-
Login as `hpcadmin` password `ilovelinux` and go to the ColdFront Administration interface. Click on Projects and click on the cgray project that we created earlier. Check the box next to 'Force Review'
143-
NOTE: If there is a project you never want project reviews on, uncheck 'Requires review'
144-
145-
Now login as `cgray` password `test123` and notice the warning banner. Click on the allocation and try to renew it. You should see a warning banner telling you it can't be done because the project review is due. When a project review is required, a PI can't request new allocations or renew expiring allocations. They can, however, add/remove users, publications, grants, and research output.
146-
147-
Click the "Review Project" link. Provide a reason for not providing grant or publication information, check the box to acknowledge the update and click the Submit button. Now try to renew the expiring allocation. Log out as `cgray`
148-
149-
Login as `hpcadmin` password `ilovelinux`
150-
View the pending allocation requests. Note that the project review status is pending. View the pending project reviews. Mark this one complete and go back to the pending allocation requests. Click the "Activate" button and ColdFront activates the allocation for another year.
151169

152170
## Tutorial Navigation
153171
[Next - Open XDMoD](../xdmod/README.md)

coldfront/coldfront.env

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ ENABLE_PROJECT_REVIEW=True
44
DB_URL=mysql://coldfrontapp:9obCuAphabeg@mysql:3306/coldfront
55
STATIC_ROOT=/srv/www/static
66

7-
CENTER_NAME="HPC Resources"
7+
CENTER_NAME="HPC Tutorial"
88
LANGUAGE_CODE=en-us
99
TIME_ZONE=America/New_York
1010

docker-compose.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,7 @@ services:
215215
networks:
216216
- compute
217217
volumes:
218+
- etc_xdmod:/etc/xdmod
218219
- etc_munge:/etc/munge
219220
- etc_slurm:/etc/slurm
220221
- home:/home
@@ -229,6 +230,7 @@ services:
229230
- frontend
230231

231232
volumes:
233+
etc_xdmod:
232234
etc_munge:
233235
etc_slurm:
234236
home:

docs/docker_tips.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ docker volume list
115115
To tear down all containers and remove the volumes:
116116

117117
```
118-
./hpcts stop
118+
./hpcts destroy
119119
```
120120

121121
To tear down all containers, remove volumes, and remove the container images (next time you run start they will be re-downloaded):

docs/requirements.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
## Requirements
22

3-
For this tutorial you will need to have **13GB of free disk space** and git, docker, docker-compose and a web browser installed on your local machine. This tutorial has been tested on various versions of Linux, MacOS, and Windows 10 with the following package versions:
3+
For this tutorial you will need to have **20GB of free disk space** and git, docker, docker-compose and a web browser installed on your local machine. This tutorial has been tested on various versions of Linux, MacOS, and Windows 10 with the following package versions:
44

55
- git 2.17+ (Windows users we recommend: https://gitforwindows.org/)
66
- docker engine version 20.10.12+

hpcts

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,15 +18,20 @@ start() {
1818
}
1919

2020
stop() {
21+
log_info "Stopping HPC Toolset Cluster containers.."
22+
docker-compose stop
23+
}
24+
25+
destroy() {
2126
log_info "Stopping and removing HPC Toolset Cluster containers and volumes.."
2227
docker-compose stop && \
2328
docker-compose rm -f -v && \
2429
docker-compose down -v
2530
}
2631

2732
cleanup() {
28-
log_info "Cleaning up HPC Toolset containers and images.."
29-
stop
33+
log_info "Removing HPC Toolset containers and images.."
34+
destroy
3035
docker rmi $(docker images -f "reference=ubccr/hpcts*" -q)
3136
}
3237

@@ -40,8 +45,11 @@ case "$1" in
4045
'cleanup')
4146
cleanup
4247
;;
48+
'destroy')
49+
destroy
50+
;;
4351
*)
44-
log_info "Usage: $0 { start | stop | cleanup}"
52+
log_info "Usage: $0 { start | stop | destroy | cleanup}"
4553
exit 1
4654
;;
4755
esac

ondemand/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ ARG HPCTS_VERSION=latest
33
FROM --platform=linux/amd64 ubccr/hpcts:slurm-${HPCTS_VERSION} as stage-amd64
44
RUN dnf install -y https://yum.osc.edu/ondemand/2.0/ondemand-release-web-2.0-1.noarch.rpm
55
RUN dnf install -y netcat ondemand ondemand-dex
6-
RUN sed -i 's/\-nohttpd//' /opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.26/gems/ood_core-0.19.0/lib/ood_core/batch_connect/templates/vnc.rb
6+
RUN sed -i 's/\-nohttpd//' /opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.27/gems/ood_core-0.19.0/lib/ood_core/batch_connect/templates/vnc.rb
77

88
FROM --platform=linux/arm64 ubccr/hpcts:slurm-${HPCTS_VERSION} as stage-arm64
99
RUN dnf install -y file lsof sudo gcc gcc-c++ git \

ondemand/entrypoint.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,9 @@ then
99
sleep 2
1010
done
1111

12+
echo "---> Cleaning NGINX ..."
13+
/opt/ood/nginx_stage/sbin/nginx_stage nginx_clean
14+
1215
echo "---> Populating /etc/ssh/ssh_known_hosts from frontend for ondemand..."
1316
/usr/bin/ssh-keyscan frontend >> /etc/ssh/ssh_known_hosts
1417

xdmod/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@ ARG TARGETARCH
99
FROM stage-${TARGETARCH} as final
1010

1111
COPY . /build
12-
RUN /build/install.sh && rm -rf /build
1312
COPY conf/httpd.conf /etc/httpd/conf.d/xdmod.conf
1413
COPY conf/simplesamlphp /etc/xdmod/simplesamlphp
1514
COPY hierarchy.csv /srv/xdmod/hierarchy.csv
@@ -18,4 +17,5 @@ COPY scripts/ /srv/xdmod/scripts
1817
COPY bin/sendmail /usr/sbin/sendmail
1918
COPY entrypoint.sh /usr/local/bin/entrypoint.sh
2019
COPY small-logo.png /srv/xdmod/small-logo.png
20+
RUN /build/install.sh && rm -rf /build
2121
ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]

0 commit comments

Comments
 (0)