Sales deals Azure End2End Project

This Project helps in understanding how various azure services & tools are used to implement an end-end data project.

In this project we will perform the following tasks:

Move on-premise data to the cloud using Azure Data Factory with a Self-Hosted Integration Runtime(IR).
Integrate data factory with GitHub for version control.
Use Azure Data Lake Storage Gen2 (ADLS) as a central data repository.
Improve reliability with Azure Logic Apps for email alerts and Azure Monitor for detailed tracking.
Perform data cleaning and transformations using Azure Databricks.
Implement security with Azure Key Vault + Databricks Secret Scope to protect sensitive secrets.
Analyze data with Azure Synapse Analytics using SQL.
Connect ADLS to Power BI for visualizations.

Dataset

The sales deals data set is located on-premise(from maven analytics), dataset has 5 csv files named accounts, products, sales teams, and sales opportunities, about a B2B sales pipeline data from a fictitious company that sells computers hardware.

Architecture

Project Implementation steps

i) Azure Storage Account

Create azure resource group
Create azure storage account (ADLS Gen 2)

ii) Azure Data Factory

Create azure data factory (ADF)
Create containers: raw, transformed data.

iii) GitHub

Link GitHub repository to azure data factory for version control.

iv) Integration Runtime

Create self-hosted integration runtime (IR).
Run the downloaded IR file.

v) Linked Services

Create Linked services (to connect on-prem to cloud, use filesystem) and provide the path to on-premise data source, test the connection.
Create a pipeline in ADF and configure source (on-premise path), use wild file path(*.csv) and sink (raw data in storage account) for the pipeline.

vi) Azure Logic Apps

Create Logic apps, click on add an action i) add when https request is received and write the following expression in the body section:

{
 "type": "object",
 "properties": {
   "pipelinename": {
     "type": "string"
   },
   "status": {
     "type": "string"
   },
   "pipelineid": {
     "type": "string"
   },
   "time": {
     "type": "string"
   }
 }


       ii) click on add action, add send email, select parameters body and subject and write the below expression to get email with the following details:
    Pipeline Details:
                 Name: @name
                 Status: @status
                 ID: @pipelineid
                 time: @time utc
)

vii) Azure Monitor

Create Azure Monitor and under metrics add the failure pipeline and success pipeline (Count) monitor metrics.
Now in pipeline add activity web and provide the https URL from the logic apps and configure body with the following for both success and failed web activities:
```
  {
"pipelinename": "@{pipeline().Pipeline}",
"status":  "Failed",
"runid": @{pipeline().RunId},
"time": "@{utcNow()}"  }
```
Validate the pipeline and run by manually triggering or scheduling.
Now the data from the on-premise should be moved to raw container in storage.

vii) Azure Key Vault

Create Azure Key vault -> under objects –> click on secret and create secret, use storage account (Adls)security –> access key –> key 1 and click on create secret.

viii) Databricks

Create Databricks workspace.
Create new notebook in workspace and create secret scope for connection between storage account and databricks. Duplicate the notebook link and edit the link as: https://adb-1025563089158539.19.azuredatabricks.net/?o=1025563089158539#secrets/createScope
Click on properties in key vault and copy vault URI, resource id and paste in vault section DNS name, resource id.
Create compute first since it takes time to start.
Now mount the storage account to databricks using the following code:

dbutils.fs.mount(
   source = "wasbs://raw-data@azureprojectspractice.blob.core.windows.net",
   mount_point = "/mnt/raw-data",
   extra_configs = {"fs.azure.sas.raw-data.azureprojectspractice.blob.core.windows.net":
                    dbutils.secrets.get(scope = "databricks scope1", key = "azuresecret1")})

Verify the mount using: dbutils.fs.ls("/mnt/raw-data")
Now repeat the same to mount transformed data container from storage account, copy the code and rename raw-data to transformed-data:

dbutils.fs.mount(
 source = "wasbs://transformed-data@azureprojectspractice.blob.core.windows.net",
 mount_point = "/mnt/transformed-data",
 extra_configs = {"fs.azure.sas.transformed-data.azureprojectspractice.blob.core.windows.net":
                  dbutils.secrets.get(scope = "databricks scope1", key = "azuresecret1")})

Check the notebook for more transformation details.
Now write back the cleaned data to transformed-data store in ADLS.
Terminate the cluster to avoid charges.

ix) Synapse Analytics

Now create Synapse analytics workspace.
Create SQL database and then create views (virtual queries, data is not stored)
Create queries that we need for reporting. check sql queryset for details.

x) Power BI

Open Power BI and connect it to ADLS gen 2.
Provide the link to transformed-data container URL.
Change the blob in url to dfs.
Authenticate -> transform data -> select files and start building visualizations.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
Data Sources		Data Sources
credential		credential
dataset		dataset
factory		factory
integrationRuntime		integrationRuntime
linkedService		linkedService
pipeline		pipeline
sqlscript		sqlscript
trigger		trigger
Architecture.png		Architecture.png
LICENSE		LICENSE
README.md		README.md
Sales Deals Azure end2end project details.docx		Sales Deals Azure end2end project details.docx
publish_config.json		publish_config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sales deals Azure End2End Project

In this project we will perform the following tasks:

Dataset

Architecture

Project Implementation steps

i) Azure Storage Account

ii) Azure Data Factory

iii) GitHub

iv) Integration Runtime

v) Linked Services

vi) Azure Logic Apps

vii) Azure Monitor

vii) Azure Key Vault

viii) Databricks

ix) Synapse Analytics

x) Power BI

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Sales deals Azure End2End Project

In this project we will perform the following tasks:

Dataset

Architecture

Project Implementation steps

i) Azure Storage Account

ii) Azure Data Factory

iii) GitHub

iv) Integration Runtime

v) Linked Services

vi) Azure Logic Apps

vii) Azure Monitor

vii) Azure Key Vault

viii) Databricks

ix) Synapse Analytics

x) Power BI

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages