App that creates a DAG inside MWAA that takes a dataset, and builds a classifier model based on the feature columns and targetting column. A classifier is trained and the one with the best accuracy out of a bunch of three algorithms is picked up: SVM, Logistic Regression, and Decision Tree. Finally, the model is deployed as a Lambda function.
To keep it simple, no external dependencies (custom Docker images) were added, and the training happens locally in Airflow. Following that, the model gets deployed as a Lambda function. While not ideal, as usually all workloads are supposed to be off-loaded (i.e. with SageMaker, or EC2 / AWS Batch jobs), but easily trained models can still technically be run with the local executor.
The only input the DAG has is a airflow/variables/dataset_spec secret in SecretsManager service, like the following one:
{
"url": "https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv",
"name": "iris.data",
"feature_columns": ["sepal.length", "sepal.width", "petal.length", "petal.width"],
"target_column": "variety"
}- LocalStack
- Docker
- Python 3.8+ / Python Pip
makejqcurlawslocal
To install the dependencies:
make installMake sure that LocalStack is started:
LOCALSTACK_AUTH_TOKEN=... make startRun the sample demo script:
make runTo proxy Airflow variables to upstream AWS, you can use the proxy.conf config file to only use upstream AWS secrets as the Airflow variables. That's because we're sourcing the Airflow variables from the AWS Secrets backend. This assumes you have the localstack-extension-aws-replicator extension installed onto the LocalStack instance: https://pypi.org/project/localstack-extension-aws-replicator/.
localstack aws proxy -c proxy.conf --containerThis code is available under the Apache 2.0 license.