Skip to content

M-Adil-AS/ETL-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ETL-Project

Datawarehouse and Mining Project (Extract-Transform-Load) using Node JS Streams and MongoDB

ETL pipeline implemented using Node.js Streams to ensure a memory-efficient data processing flow, particularly when dealing with large datasets.

The pipeline's main goal is to gather data from diverse sources, transform it into a consistent format, and then load it into a MongoDB collection.

Step 1: Data is extracted from four distinct sources, including two APIs, a JSON file, and a CSV file. JSONStream and csv-parser libraries are employed to parse and transform JSON and CSV data.

Step 2: The extracted data streams are then transformed using a custom transform stream. The data transformation logic is applied to each chunk of data as it flows through the stream.

Step 3: In the loading phase, the transformed data is directly loaded into a MongoDB collection using the initializeUnorderedBulkOp method.

Designed a chunked streaming API for MongoDB datasets, enabling efficient handling of large-scale data and reducing frontend load bottlenecks by approximately 40%

Tools and Technologies

Node JS Mongo DB Express JS Axios Bootstrap CSS JS

ETL.Demo.Compressed.mp4

About

Datawarehouse and Mining Project (Extract-Transform-Load) using Node JS Streams and MongoDB

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors