-
Notifications
You must be signed in to change notification settings - Fork 767
Add ml xgboost workload #638
base: master
Are you sure you want to change the base?
Changes from 9 commits
45b71a9
b81f847
6210f92
85851a6
93b663b
0e923f0
31c3b59
5791a0f
205a4e2
38fc921
a213d75
b92c8cc
7902f59
b928bb9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -88,3 +88,59 @@ hibench.yarn.executor.num | Spark executor number in Yarn mode | |
| hibench.yarn.executor.cores | Spark executor cores in Yarn mode | ||
| spark.executor.memory | Spark executor memory | ||
| spark.driver.memory | Spark driver memory | ||
|
|
||
|
|
||
| ### 8. Run xgboost workload ### | ||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you change xgboost to XGBoost and following the same? |
||
| Hibench xgboost benchmark depends on the xgboost libraries to build and run. The libs are ```xgboost4j_<scala version>-<xgboost version>.jar``` and ```xgboost4j-spark_<scala version>-<xgboost version>.jar```.<br> | ||
| The relevant configurations are in ```./sparkbench/ml/pom.xml``` | ||
| ``` | ||
| <dependency> | ||
| <groupId>ml.dmlc</groupId> | ||
| <artifactId>xgboost4j_${scala.binary.version}</artifactId> | ||
| <version>1.1.0</version> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>ml.dmlc</groupId> | ||
| <artifactId>xgboost4j-spark_${scala.binary.version}</artifactId> | ||
| <version>1.1.0</version> | ||
| </dependency> | ||
| ``` | ||
| and ```./pom.xml``` | ||
| ``` | ||
| <repository> | ||
| <id>xgboostrepo</id> | ||
| <name>XGBoost Maven Repo</name> | ||
| <url>https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/release</url> | ||
| <releases> | ||
| <enabled>true</enabled> | ||
| </releases> | ||
| <snapshots> | ||
| <enabled>false</enabled> | ||
| </snapshots> | ||
| </repository> | ||
| ``` | ||
|
|
||
| #### 8.a latest xgboost release (default) #### | ||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. don't need to use 8.a, 8.b., need to use correct captial cases for titles. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you don't need to write this since it's already written in the above section 4. Run a workload |
||
| By default, the hibench xgboost benchmark is configured to use the latest xgboost release from https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/release.<br> | ||
| To use it, simply build hibench, prepare data and run xgboost benchmark. For example, | ||
| ``` | ||
| $ mvn -Psparkbench -Dmodules -Pml -Dspark=2.4 -Dscala=2.12 clean package | ||
| $ bin/workloads/ml/xgboost/prepare/prepare.sh && hdfs dfs -du -s -h /HiBench/XGBoost/Input | ||
| $ bin/workloads/ml/xgboost/spark/run.sh | ||
| ``` | ||
|
|
||
| #### 8.b other xgboost releases #### | ||
|
|
||
| To use other xgboost releases, change the xgboost versions for xgboost4j and xgboost4j-spark to the target versions in ```./sparkbench/ml/pom.xml```. The ```scala.binary.version``` can be specified by command line parameter ```-Dscala```.<br> | ||
| e.g. to use xgboost v1.0.0, change ```<version>1.1.0</version>``` to ```<version>1.0.0</version>``` for both xgboost4j and xgboost4j-spark.<br> | ||
| If the xgboost release is from other maven repo, update the xgboostrepo url in ```./pom.xml``` as well.<br> | ||
| After that, build hibench, prepare data and run xgboost benchmark. | ||
|
|
||
| #### 8.c xgboost jar files #### | ||
|
|
||
| If you only have the xgboost jar files, just copy them to $SPARK_HOME/jars/ and update the relevant versions for xgboost4j and xgboost4j-spark in sparkbench/ml/pom.xml to get aligned.<br> | ||
| For example, if xgboost is built from source on a Linux platform, the jars will be generated and installed to ```~/.m2/repository/ml/dmlc/xgboost4j_<scala version>/<xgboost version>-SNAPSHOT/``` and ```~/.m2/repository/ml/dmlc/xgboost4j-spark_<scala version>/<xgboost version>-SNAPSHOT/``` respectively. To use them, copy the 2 jars to $SPARK_HOME/jars/ and update the relevant versions for xgboost4j and xgboost4j-spark in the pom.xml files.<br> | ||
| After that, build hibench, prepare data and run xgboost benchmark. | ||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Generally, the doc style is not consistent as the original doc. and too complicated to follow. |
||
Uh oh!
There was an error while loading. Please reload this page.