At the cross section of ML, Data Engineering & DevOps.
-
(Initial) Time to PoC: Standardize env & processes; access; track experiments
-
(Repeatable) Model deployment time: Standardize SCM, automate model build & deployment, automate governed access, centralize management of models
-
(Reliable) ML Defect Rate: Introduce auto testing, monitoring & lineage tracking. Standardize CI/CD & multi-account deployment.
-
(Scalable) ML Lifecycle Time: ML Lifecycle reproducible via templates across teams. Standardize infra & team onboarding.
-
Self-service secure Infra deployment for ML use cases: AWS Sagemaker Projects, AWS Service Catalogs, IaC
-
Auditability: Sagemaker Experiments, Model Registry & Model Monitor, Model Dashboard & Cards
-
Increased Collboration among Data Scientists: Sagemaker Studio
-
Enable sustanability: Sagemaker Processor & Pipelines
-
Embedded QA & Automated Testing: Sagemaker Model Registry, CodePipeline & CodeBuild
-
Data Preparation: Sagemaker (Spark) Processors
-
Explainability & Bias Reporting: Sagemaker Clarify & Model Monitor
-
Production-ready ML Workflows: Sagemaker Pipelines
-
Time to Value (Inception to Production)
-
Time to productionize existing ML use cases
-
Percent of Template Driven Development
-
Time to init new MLOps infra & ML Projects
-
Execute ML solutions w/o internet access in Private Cloud
-
Reduce Infra Costs
[Data ]---->[Data Curation, Quality]--->[Data prep, pipeline]-->
[Ingestion] [ & Cataloging ] [ & sharing ] |
,________________________________________________________________<-|
|
| _______________________________________________________
| ,/ \
| ETL [Data sampling] [New feature] [Model Build] [Model Eval]
|----->[& exploration]--->[engineering]--->[/Fine Tune ]--->[ & PoV ]
|
| __________________________________________________________
| ML ,/ \
'----->[Auto re-train]--->[Model version]--->[Model deployment]-->[Model ]
[at Scale ] [& auditing ] [& serve at scale] [Monitor]
- Separation of Concern: Platform Administration, Data, {Experimentation, Model Build, Model Test, Model Deployment}, ML Governance
- Experimentation: Notebooks
- Model Pipeline: Pre-Migration Notebooks moved to a standard structure project; CI/CD to run-test-debug each execution. Standardize repo structure for ML build & deploy phases.
- Standardize data storage & versioning based on ML Pipelines
- github: Amazon Sagemaker Examples
- github: Amazon Sagemaker 101 Workshop
- github: Amazon Sagemaker MLOps Workshop
- github: Amazon Sagemaker Secure MLOps
- github: Amazon Sagemaker Build-Train-Deploy Sample
- github: Amazon Sagemaker Notebook Instance Lifecycle Config Sample
- github: Amazon Sagemaker Safe Deployment Pipeline Sample
- github: Amazon Sagemaker MLFlow Fargate Sample
- github: Amazon Sagemaker GenerativeAI Sample
- github: Amazon Sagemaker Distributed Training Workshop
-
Amazon Sagemaker Experiments: track parameters & metrics across ML experiments
-
Sagemaker Pipelines: Pre-process, Train, Evaluation & Register Models
-
Sagemaker Model Registry: Store, Version & Trigger model promotion
-
Sagemaker Projects: manager repo & CI/CD per project; organize ML Lifecycle under one namespace
-
Sagemaker Model Monitor: Auto detection of data & model quality drifts
-
Sagemaker Lineage Tracking: track workflow steps; model & data lineage; establish model governance & audit
-
ML Governance - Model Cards: create single source of truth for model information; visibility with Sagemaker Model Dashboard
-
Sagemaker RT Endpoints, BatchTransform, Shadow Testing for model testing & deployment
-
Sagemaker Custom Projects Templates & AWS Service Catalog for better env init
-
Sagemaker Data Wrangler, Sagemaker Feature Store, AWS LakeFormation, AWS Glue, AWS EMR for auto data flow