Skip to content

Commit 56269f3

Browse files
authored
Merge pull request #151 from dsajdak/isc23
ISC23
2 parents aa252d0 + 32d96e7 commit 56269f3

8 files changed

Lines changed: 211 additions & 203 deletions

File tree

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
11
# HPC Toolset Tutorial
22

3-
Tutorial for installing and configuring [OnDemand](https://openondemand.org/), [Open XDMoD](https://open.xdmod.org), and [ColdFront](http://coldfront.io): an HPC center management toolset.
3+
Tutorial for installing and configuring [ColdFront](http://coldfront.io), [OnDemand](https://openondemand.org/), and [Open XDMoD](https://open.xdmod.org): an HPC center management toolset.
44

55
### Presented by:
66

77
[![OSC Logo https://osc.edu](docs/osc_logo.png)](https://osc.edu)
88
[![CCR logo](docs/ccr_logo.jpg)](https://buffalo.edu/ccr)
9-
[![VT logo](docs/vt_logo.jpg)](https://arc.vt.edu/)
109

1110

1211
This tutorial aims to demonstrate how three open source applications work in concert to provide a toolset for high performance computing (HPC) centers. ColdFront is an allocations management portal that provides users an easy way to request access to allocations for a Center's resources. HPC systems staff configure the data center’s resources with attributes that tie ColdFront’s plug-ins to systems such as job schedulers, authentication/account management systems, system monitoring, and Open XDMoD. Once the user's allocation is activated in ColdFront, they are able to access the resource using OnDemand, a web-based portal for accessing HPC services that removes the complexities of HPC system environments from the end-user. Through OnDemand, users can upload and download files, create, edit, submit and monitor jobs, create and share apps, run GUI applications and connect to a terminal, all via a web browser, with no client software to install and configure. The Open XDMoD portal provides a rich set of features, which are tailored to the role of the user. Sample metrics provided by Open XDMoD include: number of jobs, CPUs consumed, wait time, and wall time, with minimum, maximum and the average of these metrics. Performance and quality of service metrics of the HPC infrastructure are also provided, along with application specific performance metrics (flop/s, IO rates, network metrics, etc) for all user applications running on a given resource.
@@ -27,6 +26,8 @@ This tutorial aims to demonstrate how three open source applications work in con
2726
## Workshops
2827
This tutorial will be presented at the following conferences:
2928

29+
[PEARC23](https://pearc.acm.org/pearc23/)
30+
[ISC23](https://www.isc-hpc.com/)
3031
[PEARC22](https://pearc.acm.org/pearc22)
3132
[PEARC21](https://pearc.acm.org/pearc21)
3233
[PEARC20](https://pearc.acm.org/pearc20/)

coldfront/README.md

Lines changed: 183 additions & 99 deletions
Large diffs are not rendered by default.

docs/acknowledgments.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Thank you to the staff of the [Ohio Supercomputer Center](https://osc.edu), [Uni
66
Thank you to the [National Science Foundation](https://nsf.gov) for the supporting the development of [Open OnDemand](https://openondemand.org) and [Open XDMoD](https://open.xdmod.org/)
77
- Open OnDemand NSF award numbers: [NSF#1534949](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1534949) and [NSF#1935725](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1835725)
88

9-
- XDMoD NSF award numbers: [ACI 1025159](https://nsf.gov/awardsearch/showAward?AWD_ID=1025159) and [ACI 1445806](https://nsf.gov/awardsearch/showAward?AWD_ID=1445806)
9+
- XDMoD NSF award numbers: [ACI 1025159](https://nsf.gov/awardsearch/showAward?AWD_ID=1025159), [ACI 1445806](https://nsf.gov/awardsearch/showAward?AWD_ID=1445806), and [OAC 2137603](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2137603)
1010

1111
#### Publications
1212
- Andrew E. Bruno, Doris J. Sajdak. 2021. ColdFront: Resource Allocation Management System (PEARC ’21). Association for Computing Machinery, New York, NY, USA. DOI:[10.1145/3437359.3465585](https://doi.org/10.1145/3437359.3465585)
@@ -18,12 +18,15 @@ Thank you to the [National Science Foundation](https://nsf.gov) for the supporti
1818
#### Workshops
1919
This tutorial was first presented at the PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING 2020 Virtual Conference (PEARC20) - https://pearc.acm.org/pearc20/
2020

21-
A condensed version of this tutorial will be presented at [Gateways 2020](https://sciencegateways.org/web/gateways2020)
21+
A condensed version of this tutorial was presented at [Gateways 2020](https://sciencegateways.org/web/gateways2020)
2222

2323
An updated version of this tutorial was presented at the PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING 2021 Virtual Conference (PEARC21) - https://pearc.acm.org/pearc21/
2424

2525
An updated version of this tutorial was presented at the PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING 2022 Conference (PEARC22), Boston, MA - https://pearc.acm.org/pearc22/
2626

27+
A condensed, half day version of this tutorial will be presented at ISC High Performance Conference 2023, Hamburg, Germany - https://www.isc-hpc.com/
28+
29+
An updated version of this tutorial will be presented at the PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING 2023 Conference (PEARC23), Portland, OR - https://pearc.acm.org/pearc23/
2730

2831
#### Container Development
2932

docs/docker_tips.md

Lines changed: 8 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -46,30 +46,28 @@ $ ./hpcts start
4646

4747
- [Docker](https://docs.docker.com)
4848
- [Install & Start Docker](https://docs.docker.com/engine/install/)
49-
- [Install Docker Compose](https://docs.docker.com/compose/install/)
50-
- [Linux](https://docs.docker.com/engine/install/linux-postinstall/)
49+
- [Linux & WSL](https://docs.docker.com/engine/install/linux-postinstall/)
5150
- [MacOS Docker Desktop](https://docs.docker.com/docker-for-mac/troubleshoot/)
52-
- [Windows Docker Desktop](https://docs.docker.com/docker-for-windows/troubleshoot/)
5351

5452
### Helpful Docker commands
5553

5654
```
5755
# Start all HPC Toolset Containers manually
58-
$ docker-compose up -d
56+
$ docker compose up -d
5957
6058
# Display Tutorial Container Logs
61-
$ docker-compose logs -f
62-
$ docker-compose logs -f coldfront
63-
$ docker-compose logs -f xdmod
59+
$ docker compose logs -f
60+
$ docker compose logs -f coldfront
61+
$ docker compose logs -f xdmod
6462
6563
# Stop containers
66-
$ docker-compose stop
64+
$ docker compose stop
6765
6866
# Stop containers and remove them
69-
$ docker-compose down
67+
$ docker compose down
7068
7169
# Stop containers,remove them and all volumes
72-
$ docker-compose down -v
70+
$ docker compose down -v
7371
7472
# Display Docker processes
7573
$ docker ps -a
@@ -113,28 +111,6 @@ NOTE: this is only necessary on some systems so don't use it if the previous com
113111

114112
**Sometimes restarting your operating system is the only solution.**
115113

116-
#### Windows Errors
117-
118-
NOTE: Windows users should get several pop-up messages from Docker Desktop during this process asking to allow local system access to the Docker containers. Please click the "Share it" button:
119-
![](windows_sharing.PNG)
120-
121-
If you have notifications blocked, you may not see these pop-ups and the authorization will eventually time out. If this happens, you will get this type of error message:
122-
123-
```
124-
Error response from daemon: user declined directory sharing C:\Users\path_to_my_folder
125-
```
126-
Open Docker Desktop, navigate to Settings - Resources, and click on File Sharing. Then add the directory where you've cloned the HPC Toolset Tutorial and click "Apply & Restart"
127-
128-
Re-run:
129-
```
130-
./hpcts start
131-
```
132-
133-
If this doesn't work, please run:
134-
```
135-
./hpcts cleanup
136-
./hpcts start
137-
```
138114

139115
#### Deleting Docker containers/images/volumes manually
140116

docs/getting_started.md

Lines changed: 1 addition & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,8 @@ If you haven't already installed and tested the required packages, please refer
1212

1313
## Getting started
1414

15-
You will need to clone the tutorial repo and then run the helper script. The initial clone of the repo may take 5-10 minutes. The first time running the helper script, you'll be downloading all the containers from Docker Hub. This can take quite a long time depending on your network speed. The images total approximately 13GB in size. Once the containers are downloaded, they are started and the services launched. For point of reference: on a recent test from a home fiber optic network with client connected over wifi this download and container startup process took 12 minutes.
15+
You will need to clone the tutorial repo and then run the helper script. The initial clone of the repo may take 5-10 minutes. The first time running the helper script, you'll be downloading all the containers from Docker Hub. This can take quite a long time depending on your network speed. The images total approximately 20GB in size. Once the containers are downloaded, they are started and the services launched. For point of reference: on a recent test from a home fiber optic network with client connected over wifi this download and container startup process took 12 minutes.
1616

17-
NOTE: For Windows, if you haven't already done so, you will need to configure git not to convert line endings into Windows format. Run this command using the git-bash shell application before cloning the tutorial repo:
18-
19-
```
20-
git config --global core.autocrlf input
21-
```
2217

2318
### Clone Repo and Start Containers
2419

@@ -84,28 +79,7 @@ Starting HPC Toolset Cluster..
8479

8580
**NOTE: Despite seeing this output with URLs, the processes on these containers may not be fully running yet. Depending on the speed of your computer, starting up the processes may take a few minutes (or even up to 10 minutes). Use the command below to check the docker logs if the websites are not yet displaying.**
8681

87-
### Windows Errors
88-
89-
NOTE: Windows users should get several pop-up messages from Docker Desktop during this process asking to allow local system access to the Docker containers. Please click the "Share it" button:
90-
![](windows_sharing.PNG)
9182

92-
If you have notifications blocked, you may not see these pop-ups and the authorization will eventually time out. If this happens, you will get this type of error message:
93-
94-
```
95-
Error response from daemon: user declined directory sharing C:\Users\path_to_my_folder
96-
```
97-
Open Docker Desktop, navigate to Settings - Resources, and click on File Sharing. Then add the directory where you've cloned the HPC Toolset Tutorial and click "Apply & Restart"
98-
99-
Re-run:
100-
```
101-
./hpcts start
102-
```
103-
104-
If this doesn't work, please run:
105-
```
106-
./hpcts destroy
107-
./hpcts start
108-
```
10983

11084
### Docker Logs
11185

docs/requirements.md

Lines changed: 5 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,10 @@
11
## Requirements
22

3-
For this tutorial you will need to have **20GB of free disk space** and git, docker, docker-compose and a web browser installed on your local machine. This tutorial has been tested on various versions of Linux, MacOS, and Windows 10 with the following package versions:
3+
For this tutorial you will need to have **20GB of free disk space** and git, docker, docker-compose and a web browser installed on your local machine. This tutorial has been tested on various versions of Linux, MacOS, and Windows 10/11 (using Windows subsystem for Linux) with the following package versions:
44

5-
- git 2.17+ (Windows users we recommend: https://gitforwindows.org/)
5+
- git 2.17+
66
- docker engine version 20.10.12+
7-
- docker-compose 2.6.0+
8-
9-
NOTE: For Windows, if you haven't already done so, you will need to configure git not to convert line endings into Windows format. Run this command using the git-bash shell application before cloning the tutorial repo:
10-
11-
```
12-
git config --global core.autocrlf input
13-
```
7+
- docker compose 2.6.0+ (this is distributed with newer versions of docker and not necessary to install separately)
148

159
The following ports must be open and available:
1610

@@ -27,11 +21,9 @@ If they are not you might see an error like:
2721

2822
https://docs.docker.com/engine/install/
2923

30-
**NOTE: You'll need to make sure the account you're running docker with is in the 'docker' group**
31-
32-
### Install Docker Compose
24+
**NOTE: Make sure the account you're running docker with is in the 'docker' group**
3325

34-
https://docs.docker.com/compose/install/
26+
**Windows users:** We do NOT recommend using Docker Desktop with this container environment. Instead, use Windows subsystem for Linux (WSL) and install Docker within an Ubuntu virtual machine. [This site](https://nickjanetakis.com/blog/install-docker-in-wsl-2-without-docker-desktop) provides useful information on this method, except docker-compose no longer needs to be installed separately. Using WSL allows you to follow the docker installation instructions for Linux and this is what is tested by the HPC Toolset Tutorial team.
3527

3628
### Verify working Docker
3729

@@ -42,28 +34,6 @@ docker info
4234
**This should display your system info along with Docker-specific info. If there are any errors, stop/start Docker. Do NOT proceed with the tutorial until you are sure you have a working Docker setup**
4335

4436

45-
### Error when running 'docker info' or when starting up tutorial containers
46-
47-
If you get this error when starting the tutorial
48-
49-
```
50-
ERROR: Couldn't connect to Docker daemon at http+docker://localunixsocket - is it running?
51-
52-
or
53-
54-
ERROR: Couldn't connect to Docker daemon at http+docker://localhost - is it running?
55-
```
56-
57-
Try stopping and starting Docker (restart doesn't usually fix the problem). Commands for this differ depending on operating system.
58-
59-
If the error persists, try:
60-
61-
```
62-
export DOCKER_HOST=127.0.0.1
63-
```
64-
65-
NOTE: this is only necessary on some systems so don't use it if the previous command works
66-
6737
## Docker Tips
6838

6939
Some useful info on installing Docker, navigating this tutorial and learning a bit about docker-compose

ondemand/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1647,6 +1647,6 @@ Review Job Composer links - access Job Composer
16471647
</details>
16481648

16491649
## Tutorial Navigation
1650-
[Next - Acknowledgments](../docs/acknowledgments.md)
1651-
[Previous Step - Open XDMoD](../xdmod/README.md)
1652-
[Back to Start](../README.md)
1650+
[Next Step - Open XDMoD](../xdmod/README.md)
1651+
[Previous Step - ColdFront](../coldfront/README.md)
1652+
[Back to Start](../README.md)

xdmod/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1486,6 +1486,6 @@ Admin Dashboard:
14861486
![Admin Dashboard](./tutorial-screenshots/admin-dashboard.png)
14871487

14881488
## Tutorial Navigation
1489-
[Next - OnDemand](../ondemand/README.md)
1490-
[Previous Step - ColdFront](../coldfront/README.md)
1491-
[Back to Start](../README.md)
1489+
[Next - Acknowledgements](../docs/acknowledgments.md)
1490+
[Previous Step - Open OnDemand](../ondemand/README.md)
1491+
[Back to Start](../README.md)

0 commit comments

Comments
 (0)