Skip to content

Commit 1fbcf48

Browse files
committed
additional updates to install & troubleshooting docs
1 parent 9db172e commit 1fbcf48

4 files changed

Lines changed: 25 additions & 99 deletions

File tree

coldfront/README.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -36,21 +36,23 @@ For more options on allowing permissions for various types of staff access, see
3636

3737

3838
### Activate the allocation request
39-
At part of the database seeding we did at the start of the tutorial, we activated and set attributes on the allocations requested on the `cgray` project. Let's look at that allocation and how it was setup.
40-
- Login using local account username: `hpcadmin` password: `ilovelinux`
41-
- Navigate to the `Admin` menu and click on the `All Allocations` option
42-
- Click on the allocation number next to the allocation for the `HPC cluster` resource.
43-
- Scroll down to look at the allocation attributes set. Notice that allocation status is `Renewal Requested` and there is a start and end date associated with it.
39+
40+
- Navigate to the `Admin` menu and click on `Allocation Requests`
41+
Note: the project review status is a green check mark, indicating our Center Director has already approved the submitted project review.
42+
43+
At part of the database seeding we did at the start of the tutorial, we activated and set attributes on the allocations requested on the `cgray` project. Let's look at that allocation and how it was setup.
44+
45+
- Click the `Details` button to review the Allocation Detail page.
46+
- Notice that allocation status is `Renewal Requested` and there is a start and end date associated with it.
47+
- Scroll down to look at the allocation attributes set. There is a slurm_account attribute as well as slurm_specs and slurm_user_specs attributes. This is what is used by the Slurm plugin to sync with the Slurm database.
48+
- Click the `Approve` button to re-activate the allocation. This updates the status to `Active` and changes the expiration date to one year from today.
4449

4550
Now let's go look at and activate the allocation change request submitted by `cgray` for the storage resource. As the HPC admin user, activate and setup the new allocation:
4651
- Navigate to the `Admin` menu and click on `Allocation Change Requests`
4752
- Click on the `Details` button to review and approve the allocation changes requested. As the admin you have the ability to approve the date extension, change it to another setting or select `no extension` You can remove the `storage_quota` request or change it. You can add notes for the PI and users on the allocation to see. Then you can take action such as `Approve` or `Deny` the request. For this demo, let's click the `Approve` button.
4853

4954
For more information about configuring Allocation Change Requests [see here](#more-info-on-allocation-change-requests)
5055

51-
Next review the pending allocation requests:
52-
- Navigate to the `Admin` menu and click on `Allocation Requests` Note that the project review status is green check mark, indicating our Center Director has already approved the submitted project review.
53-
- Click the `Details` button if you'd like to review the Allocation Detail page. Otherwise, click the `Approve` button to renew the allocation for another year.
5456
- Logout as the `hpcadmin` user
5557

5658

@@ -72,7 +74,7 @@ _**This is because we have not synced the allocation information in ColdFront wi
7274
### Run Slurm plugin to sync active allocations from ColdFront to Slurm
7375
- Login to the coldfront container & setup ColdFront environment
7476
`ssh coldfront`
75-
`cd /srv/www`
77+
`cd /srv/www`
7678
`source venv/bin/activate`
7779
- Let's see what slurm access cgray currently has:
7880
`sacctmgr show user cgray -s list`

docs/docker_tips.md

Lines changed: 8 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -46,30 +46,28 @@ $ ./hpcts start
4646

4747
- [Docker](https://docs.docker.com)
4848
- [Install & Start Docker](https://docs.docker.com/engine/install/)
49-
- [Install Docker Compose](https://docs.docker.com/compose/install/)
50-
- [Linux](https://docs.docker.com/engine/install/linux-postinstall/)
49+
- [Linux & WSL](https://docs.docker.com/engine/install/linux-postinstall/)
5150
- [MacOS Docker Desktop](https://docs.docker.com/docker-for-mac/troubleshoot/)
52-
- [Windows Docker Desktop](https://docs.docker.com/docker-for-windows/troubleshoot/)
5351

5452
### Helpful Docker commands
5553

5654
```
5755
# Start all HPC Toolset Containers manually
58-
$ docker-compose up -d
56+
$ docker compose up -d
5957
6058
# Display Tutorial Container Logs
61-
$ docker-compose logs -f
62-
$ docker-compose logs -f coldfront
63-
$ docker-compose logs -f xdmod
59+
$ docker compose logs -f
60+
$ docker compose logs -f coldfront
61+
$ docker compose logs -f xdmod
6462
6563
# Stop containers
66-
$ docker-compose stop
64+
$ docker compose stop
6765
6866
# Stop containers and remove them
69-
$ docker-compose down
67+
$ docker compose down
7068
7169
# Stop containers,remove them and all volumes
72-
$ docker-compose down -v
70+
$ docker compose down -v
7371
7472
# Display Docker processes
7573
$ docker ps -a
@@ -113,28 +111,6 @@ NOTE: this is only necessary on some systems so don't use it if the previous com
113111

114112
**Sometimes restarting your operating system is the only solution.**
115113

116-
#### Windows Errors
117-
118-
NOTE: Windows users should get several pop-up messages from Docker Desktop during this process asking to allow local system access to the Docker containers. Please click the "Share it" button:
119-
![](windows_sharing.PNG)
120-
121-
If you have notifications blocked, you may not see these pop-ups and the authorization will eventually time out. If this happens, you will get this type of error message:
122-
123-
```
124-
Error response from daemon: user declined directory sharing C:\Users\path_to_my_folder
125-
```
126-
Open Docker Desktop, navigate to Settings - Resources, and click on File Sharing. Then add the directory where you've cloned the HPC Toolset Tutorial and click "Apply & Restart"
127-
128-
Re-run:
129-
```
130-
./hpcts start
131-
```
132-
133-
If this doesn't work, please run:
134-
```
135-
./hpcts cleanup
136-
./hpcts start
137-
```
138114

139115
#### Deleting Docker containers/images/volumes manually
140116

docs/getting_started.md

Lines changed: 1 addition & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,8 @@ If you haven't already installed and tested the required packages, please refer
1212

1313
## Getting started
1414

15-
You will need to clone the tutorial repo and then run the helper script. The initial clone of the repo may take 5-10 minutes. The first time running the helper script, you'll be downloading all the containers from Docker Hub. This can take quite a long time depending on your network speed. The images total approximately 13GB in size. Once the containers are downloaded, they are started and the services launched. For point of reference: on a recent test from a home fiber optic network with client connected over wifi this download and container startup process took 12 minutes.
15+
You will need to clone the tutorial repo and then run the helper script. The initial clone of the repo may take 5-10 minutes. The first time running the helper script, you'll be downloading all the containers from Docker Hub. This can take quite a long time depending on your network speed. The images total approximately 20GB in size. Once the containers are downloaded, they are started and the services launched. For point of reference: on a recent test from a home fiber optic network with client connected over wifi this download and container startup process took 12 minutes.
1616

17-
NOTE: For Windows, if you haven't already done so, you will need to configure git not to convert line endings into Windows format. Run this command using the git-bash shell application before cloning the tutorial repo:
18-
19-
```
20-
git config --global core.autocrlf input
21-
```
2217

2318
### Clone Repo and Start Containers
2419

@@ -84,28 +79,7 @@ Starting HPC Toolset Cluster..
8479

8580
**NOTE: Despite seeing this output with URLs, the processes on these containers may not be fully running yet. Depending on the speed of your computer, starting up the processes may take a few minutes (or even up to 10 minutes). Use the command below to check the docker logs if the websites are not yet displaying.**
8681

87-
### Windows Errors
88-
89-
NOTE: Windows users should get several pop-up messages from Docker Desktop during this process asking to allow local system access to the Docker containers. Please click the "Share it" button:
90-
![](windows_sharing.PNG)
9182

92-
If you have notifications blocked, you may not see these pop-ups and the authorization will eventually time out. If this happens, you will get this type of error message:
93-
94-
```
95-
Error response from daemon: user declined directory sharing C:\Users\path_to_my_folder
96-
```
97-
Open Docker Desktop, navigate to Settings - Resources, and click on File Sharing. Then add the directory where you've cloned the HPC Toolset Tutorial and click "Apply & Restart"
98-
99-
Re-run:
100-
```
101-
./hpcts start
102-
```
103-
104-
If this doesn't work, please run:
105-
```
106-
./hpcts destroy
107-
./hpcts start
108-
```
10983

11084
### Docker Logs
11185

docs/requirements.md

Lines changed: 5 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,11 @@
11
## Requirements
22

3-
For this tutorial you will need to have **20GB of free disk space** and git, docker, docker-compose and a web browser installed on your local machine. This tutorial has been tested on various versions of Linux, MacOS, and Windows 10/11 with the following package versions:
3+
For this tutorial you will need to have **20GB of free disk space** and git, docker, docker-compose and a web browser installed on your local machine. This tutorial has been tested on various versions of Linux, MacOS, and Windows 10/11 (using Windows subsystem for Linux) with the following package versions:
44

5-
- git 2.17+ (Windows users we recommend: https://gitforwindows.org/)
5+
- git 2.17+
66
- docker engine version 20.10.12+
77
- docker compose 2.6.0+ (this is distributed with newer versions of docker and not necessary to install separately)
88

9-
NOTE: For Windows, if you haven't already done so, you will need to configure git not to convert line endings into Windows format. Run this command using the git-bash shell application before cloning the tutorial repo:
10-
11-
```
12-
git config --global core.autocrlf input
13-
```
14-
159
The following ports must be open and available:
1610

1711
- 2443
@@ -27,7 +21,9 @@ If they are not you might see an error like:
2721

2822
https://docs.docker.com/engine/install/
2923

30-
**NOTE: You'll need to make sure the account you're running docker with is in the 'docker' group**
24+
**NOTE: Make sure the account you're running docker with is in the 'docker' group**
25+
26+
**Windows users:** We do NOT recommend using Docker Desktop with this container environment. Instead, use Windows subsystem for Linux (WSL) and install Docker within an Ubuntu virtual machine. [This site](https://nickjanetakis.com/blog/install-docker-in-wsl-2-without-docker-desktop) provides useful information on this method, except docker-compose no longer needs to be installed separately. Using WSL allows you to follow the docker installation instructions for Linux and this is what is tested by the HPC Toolset Tutorial team.
3127

3228
### Verify working Docker
3329

@@ -38,28 +34,6 @@ docker info
3834
**This should display your system info along with Docker-specific info. If there are any errors, stop/start Docker. Do NOT proceed with the tutorial until you are sure you have a working Docker setup**
3935

4036

41-
### Error when running 'docker info' or when starting up tutorial containers
42-
43-
If you get this error when starting the tutorial
44-
45-
```
46-
ERROR: Couldn't connect to Docker daemon at http+docker://localunixsocket - is it running?
47-
48-
or
49-
50-
ERROR: Couldn't connect to Docker daemon at http+docker://localhost - is it running?
51-
```
52-
53-
Try stopping and starting Docker (restart doesn't usually fix the problem). Commands for this differ depending on operating system.
54-
55-
If the error persists, try:
56-
57-
```
58-
export DOCKER_HOST=127.0.0.1
59-
```
60-
61-
NOTE: this is only necessary on some systems so don't use it if the previous command works
62-
6337
## Docker Tips
6438

6539
Some useful info on installing Docker, navigating this tutorial and learning a bit about docker-compose

0 commit comments

Comments
 (0)