You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jan 3, 2023. It is now read-only.
**HDFS-SSM** is the major portion of the overall [Smart Data Management Initiative](https://github.com/Intel-bigdata/SSM/blob/trunk/docs/overall-initiative.md).
6
6
7
-
In big data field, HDFS storage has been facing increasing pressure due to various workloads and demanding performance in recent years. The latest storage devices (Optane Memory, Optane SSD, NVMe SSD, etc.) can be used to improve the storage performance. Meanwhile HDFS provides all kinds of nice methodologies like HDFS Cache, Heterogeneous Storage Management (HSM) and Erasure Coding (EC), but it is a big challenge for users to make full utilization of these high-performance storage devices and HDFS storage options in a dynamic environment.
7
+
In big data field, HDFS storage has been facing increasing pressure due to various workloads and demanding performance requirements in recent years. The latest storage devices (Optane Memory, Optane SSD, NVMe SSD, etc.) can be used to improve the storage performance. Meanwhile HDFS provides all kinds of nice methodologies like HDFS Cache, Heterogeneous Storage Management (HSM) and Erasure Coding (EC), but it is a big challenge for users to make full utilization of these high-performance storage devices and HDFS storage options in a dynamic environment.
8
8
9
9
To overcome the challenge, we have introduced a comprehensive end-to-end solution, aka Smart Storage Management (SSM) in Apache Hadoop. HDFS operation data and system state information are collected, and based on the collected metrics, SSM can automatically make sophisticated usage of these methodologies to optimize HDFS storage efficiency.
A tar distribution package will be generated under 'smart-dist/target'. unzip the tar distribution package to ${SMART_HOME} directory, the configuration files of SSM is under '${SMART_HOME}/conf'.
39
+
A tar distribution package will be generated under 'smart-dist/target'. Unzip the tar distribution package to ${SMART_HOME} directory, and the configuration files of SSM is under '${SMART_HOME}/conf'.
40
40
For more detailed information, please refer to BUILDING.txt file.
41
41
42
42
# Configure SSM
@@ -80,19 +80,30 @@ Please note that SSM action will not be scheduled for files under ignored direct
80
80
</property>
81
81
```
82
82
83
-
### Fetch Dirs
83
+
### Cover Dirs
84
84
SSM will fetch the whole HDFS namespace by default when it starts. If you only care about the files under some directory, you can make a modification in smart-default.xml as the following shows.
85
85
SSM will only fetch files under the given directories. For more than one directories, they should be separated by ",".
86
86
87
87
The access info and other info related to fetched files will be considered. For other files not under the fetched directories, their info will be ignored.
88
88
89
89
```xml
90
90
<property>
91
-
<name>smart.fetch.dirs</name>
91
+
<name>smart.cover.dirs</name>
92
92
<value>/foodirA,/foodirB</value>
93
93
</property>
94
94
```
95
95
96
+
### Work Dir
97
+
This HDFS directory is used as a tmp directory for SSM to store tmp files and data. The default path is "/system/ssm", and SSM will ignore files under the tmp directory.
98
+
Only one directory can be set for this property.
99
+
100
+
```xml
101
+
<property>
102
+
<name>smart.work.dir</name>
103
+
<value>/system/ssm</value>
104
+
</property>
105
+
```
106
+
96
107
## **Configure Smart Server**
97
108
98
109
SSM supports running multiple Smart Servers for high-availability. Only one of these Smart Servers can be in active state and provides services. One of the standby Smart Servers will take its place if the active Smart Server fails.
@@ -408,8 +419,8 @@ After we switch to the SmartFileSystem from the default HDFS implementation, we
408
419
409
420
410
421
## Validate the Hadoop Configuration
411
-
After all these steps, a cluster restart is required. After the restart, try to run some simple test to see if
412
-
the configuration takes effect. For example, you can try to run TestDFSIO workload.
422
+
423
+
After all these steps, a cluster restart is required. After the restart, try to run some simple test to see if the configuration takes effect. For example, you can try to run TestDFSIO workload.
0 commit comments