1, Update 7.0 to 7.1-SNAPSHOT 2, Modify READM (#527)

heyu1 · carsonwang · commit 833fbdcfcbf0 · 2017-11-09T12:46:23.000+08:00
* 1, Update 7.0 to 7.1-SNAPSHOT 2, Modify READM

Signed-off-by: He, Yu &lt;yu.he@intel.com&gt;

* Update 7.0 to 7.1-SNAPSHOT

Signed-off-by: He, Yu &lt;yu.he@intel.com&gt;

* Update 7.0 to 7.1-SNAPSHOT in travis/hibench.conf

Signed-off-by: He, Yu &lt;yu.he@intel.com&gt;
diff --git a/README.md b/README.md
@@ -52,47 +52,47 @@ There are totally 19 workloads in HiBench. The workloads are divided into 6 cate
 
 1. Bayesian Classification (Bayes)
 
-    This workload benchmarks NaiveBayesian Classification implemented in Spark-MLLib. The workload uses the automatically generated documents whose words follow the zipfian distribution. The dict used for text generation is also from the default linux file /usr/share/dict/linux.words.
+    Naive Bayes is a simple multiclass classification algorithm with the assumption of independence between every pair of features. This workload is implemented in spark.mllib and uses the automatically generated documents whose words follow the zipfian distribution. The dict used for text generation is also from the default linux file /usr/share/dict/linux.words.ords.
 
 2. K-means clustering (Kmeans)
 
-    This workload tests the K-means (a well-known clustering algorithm for knowledge discovery and data mining) clustering in Spark-MLlib. The input data set is generated by GenKMeansDataset based on Uniform Distribution and Guassian Distribution.
+    This workload tests the K-means (a well-known clustering algorithm for knowledge discovery and data mining) clustering in spark.mllib. The input data set is generated by GenKMeansDataset based on Uniform Distribution and Guassian Distribution.
 
 3. Logistic Regression (LR)
 
-    This workload benchmarks Logistic Regression (LR) implemented in Spark-MLLib with LBFGS optimizer. The input data set is generated by LogisticRegressionDataGenerator based on random balance decision tree. It contains three different kinds of data types, including categorical data, continuous data, and binary data.
+    Logistic Regression (LR) is a popular method to predict a categorical response. This workload is implemented in spark.mllib with LBFGS optimizer and the input data set is generated by LogisticRegressionDataGenerator based on random balance decision tree. It contains three different kinds of data types, including categorical data, continuous data, and binary data.
 
 4. Alternating Least Squares (ALS)
 
-    This workload benchmarks Alternating Least Squares (ALS) implememnted in Spark-MLLib. The input data set is generated by RatingDataGenerator for a product recommendation system.
+    The alternating least squares (ALS) algorithm is a well-known algorithm for collaborative filtering. This workload is implemented in spark.mllib and the input data set is generated by RatingDataGenerator for a product recommendation system.
 
-5. Gradient Boosting Tree (GBT)
+5. Gradient Boosting Trees (GBT)
 
-    This workload benchmarks Gradient Boosting Tree (GBT) implememnted in Spark-MLLib. The input data set is generated by GradientBoostingTreeDataGenerator.
+    Gradient-boosted trees (GBT) is a popular regression method using ensembles of decision trees. This workload is implemented in spark.mllib and the input data set is generated by GradientBoostingTreeDataGenerator.
 
-6. Linear Regression (LiR)
+6. Linear Regression (Linear)
 
-    This workload benchmarks Linear Regression (LiR) implemented in Spark-MLLib with SGD optimizer. The input data set is generated by LinearRegressionDataGenerator.
+    Linear Regression (Linear) is a workload that implemented in spark.mllib with SGD optimizer. The input data set is generated by LinearRegressionDataGenerator.
 
-7. Latent Dirichlet Allocation (lda)
+7. Latent Dirichlet Allocation (LDA)
 
-    This workload benchmarks Latent Dirichlet Allocation (LDA) implemented in Spark-MLLib. The input data set is generated by LDADataGenerator.
+    Latent Dirichlet allocation (LDA) is a topic model which infers topics from a collection of text documents. This workload is implemented in spark.mllib and the input data set is generated by LDADataGenerator.
 
 8. Principal Components Analysis (PCA)
 
-    This workload benchmarks Principal Components Analysis (PCA) implemented in Spark-MLLib. The input data set is generated by PCADataGenerator.
+    Principal component analysis (PCA) is a statistical method to find a rotation such that the first coordinate has the largest variance possible, and each succeeding coordinate in turn has the largest variance possible. PCA is used widely in dimensionality reduction. This workload is implemented in spark.mllib. The input data set is generated by PCADataGenerator.
 
 9. Random Forest (RF)
 
-    This workload benchmarks Random Forest (RF) implemented in Spark-MLLib. The input data set is generated by RandomForestDataGenerator.
+    Random forests (RF) are ensembles of decision trees. Random forests are one of the most successful machine learning models for classification and regression. They combine many decision trees in order to reduce the risk of overfitting. This workload is implemented in spark.mllib and the input data set is generated by RandomForestDataGenerator.
 
 10. Support Vector Machine (SVM)
 
-    This workload benchmarks Support Vector Machine (SVM) implemented in Spark-MLLib. The input data set is generated by SVMDataGenerator.
+    Support Vector Machine (SVM) is a standard method for large-scale classification tasks. This workload is implemented in spark.mllib and the input data set is generated by SVMDataGenerator.
 
 11. Singular Value Decomposition (SVD)
 
-    This workload benchmarks Singular Value Decomposition (SVD) implemented in Spark-MLLib. The input data set is generated by SVDDataGenerator.
+    Singular value decomposition (SVD) factorizes a matrix into three matrices. This workload is implemented in spark.mllib and its input data set is generated by SVDDataGenerator.
 
 
 **SQL:**
diff --git a/autogen/pom.xml b/autogen/pom.xml
@@ -6,7 +6,7 @@
   <parent>
     <groupId>com.intel.hibench</groupId>
     <artifactId>hibench</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
   </parent>
 
   <artifactId>autogen</artifactId>
diff --git a/common/pom.xml b/common/pom.xml
@@ -23,13 +23,13 @@
   <parent>
     <groupId>com.intel.hibench</groupId>
     <artifactId>hibench</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
   </parent>
 
   <groupId>com.intel.hibench</groupId>
   <artifactId>hibench-common</artifactId>
   <packaging>jar</packaging>
-  <version>7.0</version>
+  <version>7.1-SNAPSHOT</version>
   <name>hibench-common</name>
 
   <dependencies>
diff --git a/conf/hibench.conf b/conf/hibench.conf
@@ -31,12 +31,12 @@ hibench.configure.dir		${hibench.home}/conf
 hibench.hdfs.data.dir		${hibench.hdfs.master}/HiBench
 
 # path of hibench jars
-hibench.hibench.datatool.dir	          ${hibench.home}/autogen/target/autogen-7.0-jar-with-dependencies.jar
-hibench.common.jar                      ${hibench.home}/common/target/hibench-common-7.0-jar-with-dependencies.jar
-hibench.sparkbench.jar                  ${hibench.home}/sparkbench/assembly/target/sparkbench-assembly-7.0-dist.jar
-hibench.streambench.stormbench.jar      ${hibench.home}/stormbench/streaming/target/stormbench-streaming-7.0.jar
-hibench.streambench.gearpump.jar        ${hibench.home}/gearpumpbench/streaming/target/gearpumpbench-streaming-7.0-jar-with-dependencies.jar
-hibench.streambench.flinkbench.jar      ${hibench.home}/flinkbench/streaming/target/flinkbench-streaming-7.0-jar-with-dependencies.jar
+hibench.hibench.datatool.dir	          ${hibench.home}/autogen/target/autogen-7.1-SNAPSHOT-jar-with-dependencies.jar
+hibench.common.jar                      ${hibench.home}/common/target/hibench-common-7.1-SNAPSHOT-jar-with-dependencies.jar
+hibench.sparkbench.jar                  ${hibench.home}/sparkbench/assembly/target/sparkbench-assembly-7.1-SNAPSHOT-dist.jar
+hibench.streambench.stormbench.jar      ${hibench.home}/stormbench/streaming/target/stormbench-streaming-7.1-SNAPSHOT.jar
+hibench.streambench.gearpump.jar        ${hibench.home}/gearpumpbench/streaming/target/gearpumpbench-streaming-7.1-SNAPSHOT-jar-with-dependencies.jar
+hibench.streambench.flinkbench.jar      ${hibench.home}/flinkbench/streaming/target/flinkbench-streaming-7.1-SNAPSHOT-jar-with-dependencies.jar
 
 #======================================================
 # workload home/input/ouput path
diff --git a/flinkbench/pom.xml b/flinkbench/pom.xml
@@ -6,13 +6,13 @@
   <parent>
     <groupId>com.intel.hibench</groupId>
     <artifactId>hibench</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
   </parent>
 
   <groupId>com.intel.hibench</groupId>
   <artifactId>flinkbench</artifactId>
   <packaging>pom</packaging>
-  <version>7.0</version>
+  <version>7.1-SNAPSHOT</version>
   <name>flinkbench</name>
 
   <properties>
diff --git a/flinkbench/streaming/pom.xml b/flinkbench/streaming/pom.xml
@@ -23,7 +23,7 @@
   <parent>
     <groupId>com.intel.hibench</groupId>
     <artifactId>flinkbench</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
   </parent>
 
   <groupId>com.intel.hibench.flinkbench</groupId>
diff --git a/gearpumpbench/pom.xml b/gearpumpbench/pom.xml
@@ -6,13 +6,13 @@
   <parent>
     <groupId>com.intel.hibench</groupId>
     <artifactId>hibench</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
   </parent>
 
   <groupId>com.intel.hibench</groupId>
   <artifactId>gearpumpbench</artifactId>
   <packaging>pom</packaging>
-  <version>7.0</version>
+  <version>7.1-SNAPSHOT</version>
   <name>gearpumpbench</name>
   <properties>
   <gearpumpVersion>0.8.1</gearpumpVersion>
diff --git a/gearpumpbench/streaming/pom.xml b/gearpumpbench/streaming/pom.xml
@@ -23,7 +23,7 @@
   <parent>
     <groupId>com.intel.hibench</groupId>
     <artifactId>gearpumpbench</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
   </parent>
 
   <groupId>com.intel.hibench.gearpumpbench</groupId>
diff --git a/hadoopbench/mahout/pom.xml b/hadoopbench/mahout/pom.xml
@@ -6,7 +6,7 @@
   <parent>
     <groupId>com.intel.hibench</groupId>
     <artifactId>hadoopbench</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
   </parent>
 
   <groupId>com.intel.hibench.hadoopbench</groupId>
diff --git a/hadoopbench/nutchindexing/pom.xml b/hadoopbench/nutchindexing/pom.xml
@@ -5,12 +5,12 @@
     <parent>
         <groupId>com.intel.hibench</groupId>
         <artifactId>hadoopbench</artifactId>
-        <version>7.0</version>
+        <version>7.1-SNAPSHOT</version>
     </parent>
 
     <groupId>com.intel.hibench.hadoopbench</groupId>
     <artifactId>nutchindexing</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
     <packaging>jar</packaging>
 
     <dependencies>
diff --git a/hadoopbench/pegasus/pom.xml b/hadoopbench/pegasus/pom.xml
@@ -6,7 +6,7 @@
   <parent>
     <groupId>com.intel.hibench</groupId>
     <artifactId>hadoopbench</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
   </parent>
 
   <groupId>com.intel.hibench.hadoopbench</groupId>
diff --git a/hadoopbench/pom.xml b/hadoopbench/pom.xml
@@ -6,13 +6,13 @@
   <parent>
     <groupId>com.intel.hibench</groupId>
     <artifactId>hibench</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
   </parent>
 
   <groupId>com.intel.hibench</groupId>
   <artifactId>hadoopbench</artifactId>
   <packaging>pom</packaging>
-  <version>7.0</version>
+  <version>7.1-SNAPSHOT</version>
   <name>hadoopbench</name>
 
   <profiles>
diff --git a/hadoopbench/sql/pom.xml b/hadoopbench/sql/pom.xml
@@ -6,12 +6,12 @@
   <parent>
     <groupId>com.intel.hibench</groupId>
     <artifactId>hadoopbench</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
   </parent>
 
   <groupId>com.intel.hibench.hadoopbench</groupId>
   <artifactId>hadoopbench-sql</artifactId>
-  <version>7.0</version>
+  <version>7.1-SNAPSHOT</version>
   <packaging>jar</packaging>
   <name>hadoopbench-sql</name>
 
diff --git a/pom.xml b/pom.xml
@@ -5,7 +5,7 @@
 
   <groupId>com.intel.hibench</groupId>
   <artifactId>hibench</artifactId>
-  <version>7.0</version>
+  <version>7.1-SNAPSHOT</version>
   <packaging>pom</packaging>
   <name>hibench</name>
   <url>http://maven.apache.org</url>
diff --git a/sparkbench/assembly/pom.xml b/sparkbench/assembly/pom.xml
@@ -6,7 +6,7 @@
   <parent>
     <groupId>com.intel.hibench</groupId>
     <artifactId>sparkbench</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
   </parent>
 
   <groupId>com.intel.hibench.sparkbench</groupId>
diff --git a/sparkbench/common/pom.xml b/sparkbench/common/pom.xml
@@ -7,7 +7,7 @@
   <parent>
     <groupId>com.intel.hibench</groupId>
     <artifactId>sparkbench</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
   </parent>
 
   <groupId>com.intel.hibench.sparkbench</groupId>
diff --git a/sparkbench/graph/pom.xml b/sparkbench/graph/pom.xml
@@ -6,13 +6,13 @@
   <parent>
     <groupId>com.intel.hibench</groupId>
     <artifactId>sparkbench</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
   </parent>
 
   <groupId>com.intel.hibench.sparkbench</groupId>
   <artifactId>sparkbench-graph</artifactId>
   <packaging>jar</packaging>
-  <version>7.0</version>
+  <version>7.1-SNAPSHOT</version>
   <name>sparkbench-graph</name>
 
   <dependencies>
diff --git a/sparkbench/micro/pom.xml b/sparkbench/micro/pom.xml
@@ -7,7 +7,7 @@
   <parent>
     <groupId>com.intel.hibench</groupId>
     <artifactId>sparkbench</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
   </parent>
 
   <groupId>com.intel.hibench.sparkbench</groupId>
diff --git a/sparkbench/ml/pom.xml b/sparkbench/ml/pom.xml
@@ -6,7 +6,7 @@
   <parent>
     <groupId>com.intel.hibench</groupId>
     <artifactId>sparkbench</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
   </parent>
 
   <groupId>com.intel.hibench.sparkbench</groupId>
diff --git a/sparkbench/pom.xml b/sparkbench/pom.xml
@@ -6,13 +6,13 @@
   <parent>
     <groupId>com.intel.hibench</groupId>
     <artifactId>hibench</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
   </parent>
 
   <groupId>com.intel.hibench</groupId>
   <artifactId>sparkbench</artifactId>
   <packaging>pom</packaging>
-  <version>7.0</version>
+  <version>7.1-SNAPSHOT</version>
   <name>sparkbench</name>
 
   <modules>
diff --git a/sparkbench/sql/pom.xml b/sparkbench/sql/pom.xml
@@ -7,13 +7,13 @@
   <parent>
     <groupId>com.intel.hibench</groupId>
     <artifactId>sparkbench</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
   </parent>
 
   <groupId>com.intel.hibench.sparkbench</groupId>
   <artifactId>sparkbench-sql</artifactId>
   <packaging>jar</packaging>
-  <version>7.0</version>
+  <version>7.1-SNAPSHOT</version>
   <name>sparkbench-sql</name>
 
   <dependencies>
diff --git a/sparkbench/streaming/pom.xml b/sparkbench/streaming/pom.xml
@@ -6,7 +6,7 @@
   <parent>
     <groupId>com.intel.hibench</groupId>
     <artifactId>sparkbench</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
   </parent>
 
   <groupId>com.intel.hibench.sparkbench</groupId>
diff --git a/sparkbench/structuredStreaming/pom.xml b/sparkbench/structuredStreaming/pom.xml
@@ -6,7 +6,7 @@
   <parent>
     <groupId>com.intel.hibench</groupId>
     <artifactId>sparkbench</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
   </parent>
 
   <groupId>com.intel.hibench.sparkbench</groupId>
diff --git a/sparkbench/websearch/pom.xml b/sparkbench/websearch/pom.xml
@@ -6,13 +6,13 @@
   <parent>
     <groupId>com.intel.hibench</groupId>
     <artifactId>sparkbench</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
   </parent>
 
   <groupId>com.intel.hibench.sparkbench</groupId>
   <artifactId>sparkbench-websearch</artifactId>
   <packaging>jar</packaging>
-  <version>7.0</version>
+  <version>7.1-SNAPSHOT</version>
   <name>sparkbench-websearch</name>
 
   <dependencies>
diff --git a/stormbench/pom.xml b/stormbench/pom.xml
@@ -6,13 +6,13 @@
   <parent>
     <groupId>com.intel.hibench</groupId>
     <artifactId>hibench</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
   </parent>
 
   <groupId>com.intel.hibench</groupId>
   <artifactId>stormbench</artifactId>
   <packaging>pom</packaging>
-  <version>7.0</version>
+  <version>7.1-SNAPSHOT</version>
   <name>stormbench</name>
 
   <profiles>
diff --git a/stormbench/streaming/pom.xml b/stormbench/streaming/pom.xml
@@ -23,7 +23,7 @@
   <parent>
     <groupId>com.intel.hibench</groupId>
     <artifactId>stormbench</artifactId>
-    <version>7.0</version>
+    <version>7.1-SNAPSHOT</version>
   </parent>
 
   <groupId>com.intel.hibench.stormbench</groupId>
diff --git a/travis/hibench.conf b/travis/hibench.conf
@@ -29,9 +29,9 @@ hibench.configure.dir		${hibench.home}/conf
 hibench.hdfs.data.dir		${hibench.hdfs.master}/HiBench
 
 # path of hibench datatools
-hibench.hibench.datatool.dir	${hibench.home}/autogen/target/autogen-7.0-jar-with-dependencies.jar
+hibench.hibench.datatool.dir	${hibench.home}/autogen/target/autogen-7.1-SNAPSHOT-jar-with-dependencies.jar
 
-hibench.sparkbench.jar      ${hibench.home}/sparkbench/assembly/target/sparkbench-assembly-7.0-dist.jar
+hibench.sparkbench.jar      ${hibench.home}/sparkbench/assembly/target/sparkbench-assembly-7.1-SNAPSHOT-dist.jar
 
 #======================================================
 # workload home/input/ouput path