Skip to content
This repository was archived by the owner on Jan 3, 2023. It is now read-only.

Commit faef27f

Browse files
rui-mophilo-he
authored andcommitted
Refine SSM Deployment Guide (#2073)
1 parent 65d67ac commit faef27f

2 files changed

Lines changed: 17 additions & 6 deletions

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ HDFS Smart Storage Management [![Build Status](https://travis-ci.org/Intel-bigda
44

55
**HDFS-SSM** is the major portion of the overall [Smart Data Management Initiative](https://github.com/Intel-bigdata/SSM/blob/trunk/docs/overall-initiative.md).
66

7-
In big data field, HDFS storage has been facing increasing pressure due to various workloads and demanding performance in recent years. The latest storage devices (Optane Memory, Optane SSD, NVMe SSD, etc.) can be used to improve the storage performance. Meanwhile HDFS provides all kinds of nice methodologies like HDFS Cache, Heterogeneous Storage Management (HSM) and Erasure Coding (EC), but it is a big challenge for users to make full utilization of these high-performance storage devices and HDFS storage options in a dynamic environment.
7+
In big data field, HDFS storage has been facing increasing pressure due to various workloads and demanding performance requirements in recent years. The latest storage devices (Optane Memory, Optane SSD, NVMe SSD, etc.) can be used to improve the storage performance. Meanwhile HDFS provides all kinds of nice methodologies like HDFS Cache, Heterogeneous Storage Management (HSM) and Erasure Coding (EC), but it is a big challenge for users to make full utilization of these high-performance storage devices and HDFS storage options in a dynamic environment.
88

99
To overcome the challenge, we have introduced a comprehensive end-to-end solution, aka Smart Storage Management (SSM) in Apache Hadoop. HDFS operation data and system state information are collected, and based on the collected metrics, SSM can automatically make sophisticated usage of these methodologies to optimize HDFS storage efficiency.
1010

docs/ssm-deployment-guide.md

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Download SSM branch from Github https://github.com/Intel-bigdata/SSM/
3636

3737
mvn clean package -Pdist,web,hadoop-3.1 -DskipTests
3838

39-
A tar distribution package will be generated under 'smart-dist/target'. unzip the tar distribution package to ${SMART_HOME} directory, the configuration files of SSM is under '${SMART_HOME}/conf'.
39+
A tar distribution package will be generated under 'smart-dist/target'. Unzip the tar distribution package to ${SMART_HOME} directory, and the configuration files of SSM is under '${SMART_HOME}/conf'.
4040
For more detailed information, please refer to BUILDING.txt file.
4141

4242
# Configure SSM
@@ -80,19 +80,30 @@ Please note that SSM action will not be scheduled for files under ignored direct
8080
</property>
8181
```
8282

83-
### Fetch Dirs
83+
### Cover Dirs
8484
SSM will fetch the whole HDFS namespace by default when it starts. If you only care about the files under some directory, you can make a modification in smart-default.xml as the following shows.
8585
SSM will only fetch files under the given directories. For more than one directories, they should be separated by ",".
8686

8787
The access info and other info related to fetched files will be considered. For other files not under the fetched directories, their info will be ignored.
8888

8989
```xml
9090
<property>
91-
<name>smart.fetch.dirs</name>
91+
<name>smart.cover.dirs</name>
9292
<value>/foodirA,/foodirB</value>
9393
</property>
9494
```
9595

96+
### Work Dir
97+
This HDFS directory is used as a tmp directory for SSM to store tmp files and data. The default path is "/system/ssm", and SSM will ignore files under the tmp directory.
98+
Only one directory can be set for this property.
99+
100+
```xml
101+
<property>
102+
<name>smart.work.dir</name>
103+
<value>/system/ssm</value>
104+
</property>
105+
```
106+
96107
## **Configure Smart Server**
97108

98109
SSM supports running multiple Smart Servers for high-availability. Only one of these Smart Servers can be in active state and provides services. One of the standby Smart Servers will take its place if the active Smart Server fails.
@@ -408,8 +419,8 @@ After we switch to the SmartFileSystem from the default HDFS implementation, we
408419

409420

410421
## Validate the Hadoop Configuration
411-
After all these steps, a cluster restart is required. After the restart, try to run some simple test to see if
412-
the configuration takes effect. For example, you can try to run TestDFSIO workload.
422+
423+
After all these steps, a cluster restart is required. After the restart, try to run some simple test to see if the configuration takes effect. For example, you can try to run TestDFSIO workload.
413424

414425
* write data
415426

0 commit comments

Comments
 (0)