Skip to content
This repository was archived by the owner on Dec 15, 2025. It is now read-only.

Commit 833fbdc

Browse files
heyu1carsonwang
authored andcommitted
1, Update 7.0 to 7.1-SNAPSHOT 2, Modify READM (#527)
* 1, Update 7.0 to 7.1-SNAPSHOT 2, Modify READM Signed-off-by: He, Yu <yu.he@intel.com> * Update 7.0 to 7.1-SNAPSHOT Signed-off-by: He, Yu <yu.he@intel.com> * Update 7.0 to 7.1-SNAPSHOT in travis/hibench.conf Signed-off-by: He, Yu <yu.he@intel.com>
1 parent e6ad2d7 commit 833fbdc

27 files changed

Lines changed: 57 additions & 57 deletions

File tree

README.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -52,47 +52,47 @@ There are totally 19 workloads in HiBench. The workloads are divided into 6 cate
5252

5353
1. Bayesian Classification (Bayes)
5454

55-
This workload benchmarks NaiveBayesian Classification implemented in Spark-MLLib. The workload uses the automatically generated documents whose words follow the zipfian distribution. The dict used for text generation is also from the default linux file /usr/share/dict/linux.words.
55+
Naive Bayes is a simple multiclass classification algorithm with the assumption of independence between every pair of features. This workload is implemented in spark.mllib and uses the automatically generated documents whose words follow the zipfian distribution. The dict used for text generation is also from the default linux file /usr/share/dict/linux.words.ords.
5656

5757
2. K-means clustering (Kmeans)
5858

59-
This workload tests the K-means (a well-known clustering algorithm for knowledge discovery and data mining) clustering in Spark-MLlib. The input data set is generated by GenKMeansDataset based on Uniform Distribution and Guassian Distribution.
59+
This workload tests the K-means (a well-known clustering algorithm for knowledge discovery and data mining) clustering in spark.mllib. The input data set is generated by GenKMeansDataset based on Uniform Distribution and Guassian Distribution.
6060

6161
3. Logistic Regression (LR)
6262

63-
This workload benchmarks Logistic Regression (LR) implemented in Spark-MLLib with LBFGS optimizer. The input data set is generated by LogisticRegressionDataGenerator based on random balance decision tree. It contains three different kinds of data types, including categorical data, continuous data, and binary data.
63+
Logistic Regression (LR) is a popular method to predict a categorical response. This workload is implemented in spark.mllib with LBFGS optimizer and the input data set is generated by LogisticRegressionDataGenerator based on random balance decision tree. It contains three different kinds of data types, including categorical data, continuous data, and binary data.
6464

6565
4. Alternating Least Squares (ALS)
6666

67-
This workload benchmarks Alternating Least Squares (ALS) implememnted in Spark-MLLib. The input data set is generated by RatingDataGenerator for a product recommendation system.
67+
The alternating least squares (ALS) algorithm is a well-known algorithm for collaborative filtering. This workload is implemented in spark.mllib and the input data set is generated by RatingDataGenerator for a product recommendation system.
6868

69-
5. Gradient Boosting Tree (GBT)
69+
5. Gradient Boosting Trees (GBT)
7070

71-
This workload benchmarks Gradient Boosting Tree (GBT) implememnted in Spark-MLLib. The input data set is generated by GradientBoostingTreeDataGenerator.
71+
Gradient-boosted trees (GBT) is a popular regression method using ensembles of decision trees. This workload is implemented in spark.mllib and the input data set is generated by GradientBoostingTreeDataGenerator.
7272

73-
6. Linear Regression (LiR)
73+
6. Linear Regression (Linear)
7474

75-
This workload benchmarks Linear Regression (LiR) implemented in Spark-MLLib with SGD optimizer. The input data set is generated by LinearRegressionDataGenerator.
75+
Linear Regression (Linear) is a workload that implemented in spark.mllib with SGD optimizer. The input data set is generated by LinearRegressionDataGenerator.
7676

77-
7. Latent Dirichlet Allocation (lda)
77+
7. Latent Dirichlet Allocation (LDA)
7878

79-
This workload benchmarks Latent Dirichlet Allocation (LDA) implemented in Spark-MLLib. The input data set is generated by LDADataGenerator.
79+
Latent Dirichlet allocation (LDA) is a topic model which infers topics from a collection of text documents. This workload is implemented in spark.mllib and the input data set is generated by LDADataGenerator.
8080

8181
8. Principal Components Analysis (PCA)
8282

83-
This workload benchmarks Principal Components Analysis (PCA) implemented in Spark-MLLib. The input data set is generated by PCADataGenerator.
83+
Principal component analysis (PCA) is a statistical method to find a rotation such that the first coordinate has the largest variance possible, and each succeeding coordinate in turn has the largest variance possible. PCA is used widely in dimensionality reduction. This workload is implemented in spark.mllib. The input data set is generated by PCADataGenerator.
8484

8585
9. Random Forest (RF)
8686

87-
This workload benchmarks Random Forest (RF) implemented in Spark-MLLib. The input data set is generated by RandomForestDataGenerator.
87+
Random forests (RF) are ensembles of decision trees. Random forests are one of the most successful machine learning models for classification and regression. They combine many decision trees in order to reduce the risk of overfitting. This workload is implemented in spark.mllib and the input data set is generated by RandomForestDataGenerator.
8888

8989
10. Support Vector Machine (SVM)
9090

91-
This workload benchmarks Support Vector Machine (SVM) implemented in Spark-MLLib. The input data set is generated by SVMDataGenerator.
91+
Support Vector Machine (SVM) is a standard method for large-scale classification tasks. This workload is implemented in spark.mllib and the input data set is generated by SVMDataGenerator.
9292

9393
11. Singular Value Decomposition (SVD)
9494

95-
This workload benchmarks Singular Value Decomposition (SVD) implemented in Spark-MLLib. The input data set is generated by SVDDataGenerator.
95+
Singular value decomposition (SVD) factorizes a matrix into three matrices. This workload is implemented in spark.mllib and its input data set is generated by SVDDataGenerator.
9696

9797

9898
**SQL:**

autogen/pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
<parent>
77
<groupId>com.intel.hibench</groupId>
88
<artifactId>hibench</artifactId>
9-
<version>7.0</version>
9+
<version>7.1-SNAPSHOT</version>
1010
</parent>
1111

1212
<artifactId>autogen</artifactId>

common/pom.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,13 @@
2323
<parent>
2424
<groupId>com.intel.hibench</groupId>
2525
<artifactId>hibench</artifactId>
26-
<version>7.0</version>
26+
<version>7.1-SNAPSHOT</version>
2727
</parent>
2828

2929
<groupId>com.intel.hibench</groupId>
3030
<artifactId>hibench-common</artifactId>
3131
<packaging>jar</packaging>
32-
<version>7.0</version>
32+
<version>7.1-SNAPSHOT</version>
3333
<name>hibench-common</name>
3434

3535
<dependencies>

conf/hibench.conf

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,12 +31,12 @@ hibench.configure.dir ${hibench.home}/conf
3131
hibench.hdfs.data.dir ${hibench.hdfs.master}/HiBench
3232

3333
# path of hibench jars
34-
hibench.hibench.datatool.dir ${hibench.home}/autogen/target/autogen-7.0-jar-with-dependencies.jar
35-
hibench.common.jar ${hibench.home}/common/target/hibench-common-7.0-jar-with-dependencies.jar
36-
hibench.sparkbench.jar ${hibench.home}/sparkbench/assembly/target/sparkbench-assembly-7.0-dist.jar
37-
hibench.streambench.stormbench.jar ${hibench.home}/stormbench/streaming/target/stormbench-streaming-7.0.jar
38-
hibench.streambench.gearpump.jar ${hibench.home}/gearpumpbench/streaming/target/gearpumpbench-streaming-7.0-jar-with-dependencies.jar
39-
hibench.streambench.flinkbench.jar ${hibench.home}/flinkbench/streaming/target/flinkbench-streaming-7.0-jar-with-dependencies.jar
34+
hibench.hibench.datatool.dir ${hibench.home}/autogen/target/autogen-7.1-SNAPSHOT-jar-with-dependencies.jar
35+
hibench.common.jar ${hibench.home}/common/target/hibench-common-7.1-SNAPSHOT-jar-with-dependencies.jar
36+
hibench.sparkbench.jar ${hibench.home}/sparkbench/assembly/target/sparkbench-assembly-7.1-SNAPSHOT-dist.jar
37+
hibench.streambench.stormbench.jar ${hibench.home}/stormbench/streaming/target/stormbench-streaming-7.1-SNAPSHOT.jar
38+
hibench.streambench.gearpump.jar ${hibench.home}/gearpumpbench/streaming/target/gearpumpbench-streaming-7.1-SNAPSHOT-jar-with-dependencies.jar
39+
hibench.streambench.flinkbench.jar ${hibench.home}/flinkbench/streaming/target/flinkbench-streaming-7.1-SNAPSHOT-jar-with-dependencies.jar
4040

4141
#======================================================
4242
# workload home/input/ouput path

flinkbench/pom.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,13 @@
66
<parent>
77
<groupId>com.intel.hibench</groupId>
88
<artifactId>hibench</artifactId>
9-
<version>7.0</version>
9+
<version>7.1-SNAPSHOT</version>
1010
</parent>
1111

1212
<groupId>com.intel.hibench</groupId>
1313
<artifactId>flinkbench</artifactId>
1414
<packaging>pom</packaging>
15-
<version>7.0</version>
15+
<version>7.1-SNAPSHOT</version>
1616
<name>flinkbench</name>
1717

1818
<properties>

flinkbench/streaming/pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
<parent>
2424
<groupId>com.intel.hibench</groupId>
2525
<artifactId>flinkbench</artifactId>
26-
<version>7.0</version>
26+
<version>7.1-SNAPSHOT</version>
2727
</parent>
2828

2929
<groupId>com.intel.hibench.flinkbench</groupId>

gearpumpbench/pom.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,13 @@
66
<parent>
77
<groupId>com.intel.hibench</groupId>
88
<artifactId>hibench</artifactId>
9-
<version>7.0</version>
9+
<version>7.1-SNAPSHOT</version>
1010
</parent>
1111

1212
<groupId>com.intel.hibench</groupId>
1313
<artifactId>gearpumpbench</artifactId>
1414
<packaging>pom</packaging>
15-
<version>7.0</version>
15+
<version>7.1-SNAPSHOT</version>
1616
<name>gearpumpbench</name>
1717
<properties>
1818
<gearpumpVersion>0.8.1</gearpumpVersion>

gearpumpbench/streaming/pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
<parent>
2424
<groupId>com.intel.hibench</groupId>
2525
<artifactId>gearpumpbench</artifactId>
26-
<version>7.0</version>
26+
<version>7.1-SNAPSHOT</version>
2727
</parent>
2828

2929
<groupId>com.intel.hibench.gearpumpbench</groupId>

hadoopbench/mahout/pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
<parent>
77
<groupId>com.intel.hibench</groupId>
88
<artifactId>hadoopbench</artifactId>
9-
<version>7.0</version>
9+
<version>7.1-SNAPSHOT</version>
1010
</parent>
1111

1212
<groupId>com.intel.hibench.hadoopbench</groupId>

hadoopbench/nutchindexing/pom.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,12 @@
55
<parent>
66
<groupId>com.intel.hibench</groupId>
77
<artifactId>hadoopbench</artifactId>
8-
<version>7.0</version>
8+
<version>7.1-SNAPSHOT</version>
99
</parent>
1010

1111
<groupId>com.intel.hibench.hadoopbench</groupId>
1212
<artifactId>nutchindexing</artifactId>
13-
<version>7.0</version>
13+
<version>7.1-SNAPSHOT</version>
1414
<packaging>jar</packaging>
1515

1616
<dependencies>

0 commit comments

Comments
 (0)