Skip to content
This repository was archived by the owner on Feb 9, 2025. It is now read-only.

Commit 3a13048

Browse files
committed
Merge branch 'master' of https://github.com/haipinglu/ScalableML
2 parents 4caaabc + 5857fc5 commit 3a13048

3 files changed

Lines changed: 9 additions & 1 deletion

Lab 2 - RDD, DataFrame, ML pipeline, and parallelization.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ Firstly, we follow the standard steps as in Task 2 of Lab 1 but with some variat
3939

4040
```sh
4141
qrshx -P rse-com6012 -pe smp 4 # request 4 CPU cores using our reserved queue
42-
source myspark.sh # assuming HPC/myspark.sh is under the root directory, otherwise, see Lab 1 Task 2
42+
source myspark.sh # assuming HPC/myspark.sh is under your root directory, otherwise, see Lab 1 Task 2
4343
conda install -y numpy # install numpy, to be used in Task 3. This ONLY needs to be done ONCE. NOT every time.
4444
cd com6012/ScalableML # our main working directory
4545
pyspark --master local[4] # start pyspark with 4 cores requested above.

Lab 3 - Scalable Logistic Regression.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
# Lab 3: Scalable logistic regression
22

3+
### Modified by Shuo Zhou, 19th February 2023
4+
5+
#### 25th February 2022 Mauricio A Álvarez
6+
37
## Study schedule
48

59
- [Section 1](#1-data-storage-in-sharc-and-spark-configuration): To finish by 24th February. **Essential**

Lab 4 - Scalable Generalized Linear Models.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
# Lab 4: Scalable Generalized Linear Models
22

3+
### Modified by Shuo Zhou, 26th February 2023
4+
5+
#### 28th February 2022 Mauricio A Álvarez
6+
37
## Study schedule
48

59
- [Section 1](#1-glms-in-pyspark): To finish by 3rd March. **Essential**

0 commit comments

Comments
 (0)