|
| 1 | +# Kinesis Data Analytics Writing Stream Data to Amazon S3 Bucket as a Sink |
| 2 | + |
| 3 | +The purpose of this pattern is to deploy the infrastructure necessary to enable Kinesis Data Analytics to write streaming data to Amazon S3 Bucket. |
| 4 | + |
| 5 | +With Amazon Kinesis Data Analytics for Apache Flink, you can use Java, Scala, or Python to process and analyze streaming data. The service enables you to author and run code against streaming sources to perform time-series analytics, feed real-time dashboards, and create real-time metrics. |
| 6 | + |
| 7 | +Kinesis Data Analytics provides the underlying infrastructure for your Apache Flink applications. It handles core capabilities like provisioning compute resources, parallel computation, automatic scaling, and application backups (implemented as checkpoints and snapshots). You can use the high-level Flink programming features (such as operators, functions, sources, and sinks) in the same way that you use them when hosting the Flink infrastructure yourself. |
| 8 | + |
| 9 | +In this project, you create an Amazon Kinesis Data Analytics for Apache Flink application that has a Kinesis data stream as a source and an Amazon S3 bucket as a sink. |
| 10 | + |
| 11 | +Learn more about this pattern at [Serverless Land Patterns](https://serverlessland.com/patterns/firehose-dataanalytics-flink-s3-sink). |
| 12 | + |
| 13 | +Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the [AWS Pricing page](https://aws.amazon.com/pricing/) for details. You are responsible for any AWS costs incurred. No warranty is implied in this example. |
| 14 | + |
| 15 | +## Requirements |
| 16 | + |
| 17 | +* [Create an AWS account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources. |
| 18 | +* [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed and configured |
| 19 | +* [Git Installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) |
| 20 | +* [Terraform](https://learn.hashicorp.com/tutorials/terraform/install-cli?in=terraform/aws-get-started) installed |
| 21 | + |
| 22 | +## Deployment Instructions |
| 23 | + |
| 24 | +1. Clone the project to your local working directory |
| 25 | + |
| 26 | + ```sh |
| 27 | + git clone https://github.com/aws-samples/serverless-patterns/ |
| 28 | + ``` |
| 29 | + |
| 30 | +2. Change the working directory to this pattern's directory |
| 31 | + |
| 32 | + ```sh |
| 33 | + cd serverless-patterns/firehose-dataanalytics-flink-s3-sink |
| 34 | + ``` |
| 35 | + |
| 36 | +3. From the command line, initialize terraform to to downloads and installs the providers defined in the configuration: |
| 37 | + ``` |
| 38 | + terraform init |
| 39 | + ``` |
| 40 | +
|
| 41 | +4. From the command line, apply the configuration in the main.tf file: |
| 42 | + ``` |
| 43 | + terraform apply |
| 44 | + ``` |
| 45 | +
|
| 46 | +5. During the prompts: |
| 47 | + - Enter yes |
| 48 | +
|
| 49 | +## How it works |
| 50 | +
|
| 51 | + |
| 52 | +
|
| 53 | +This pattern deploys a Kinesis Analytics streaming application, Kinesis Stream, a destination S3 bucket, and all of the additional required infrastructure services. |
| 54 | +
|
| 55 | +In this project, you create an Amazon Kinesis Data Analytics for Apache Flink application that has a Kinesis data stream as a source and an Amazon S3 bucket as a sink. |
| 56 | +
|
| 57 | +Note: The default region is `us-east-1`, it can also be changed using the variable `region`. |
| 58 | +
|
| 59 | +**Note:** Variables can be supplied in different options, check the [Terraform documentation](https://developer.hashicorp.com/terraform/language/values/variables) for more details. |
| 60 | +
|
| 61 | +## Testing |
| 62 | +
|
| 63 | +To test this project, follow the below steps: |
| 64 | +
|
| 65 | +1. Sign in to your aws console at https://console.aws.amazon.com |
| 66 | +
|
| 67 | +2. Navigate to Amazon Kinesis and go to the Analytics applications. This should display the list of all the streaming applications. |
| 68 | +
|
| 69 | +3. Select the Streaming application that you created as part of the deployment stack. |
| 70 | +
|
| 71 | +4. Press the **Run** button on the upper panel and in the next screen, choose 'Run with latest snapshot' and press the **Run** button on the bottom of the screen. Wait till the streaming application gets started successfully. |
| 72 | +
|
| 73 | +5. Generating Data: |
| 74 | + ```sh |
| 75 | + cd serverless-patterns/firehose-dataanalytics-flink-s3-sink/test |
| 76 | + python stock.py |
| 77 | + ``` |
| 78 | + The stock.py generates stream data and puts it in the kinesis stream that you created as part of the deployment stack. |
| 79 | +
|
| 80 | + Note: Change the STREAM_NAME and region_name in the stock.py per your testing needs before you run it. |
| 81 | +
|
| 82 | +6. Wait for few minutes and then stop the data generation process. |
| 83 | + |
| 84 | +7. Go to S3 in the AWS console and select the bucket that you created as part of the deployment stack. You should now see the stock data paritioned based on the stock symbol. For example: ticker:AAPL/, ticker:AMZN/ etc. |
| 85 | +
|
| 86 | +8. For further analysis, you could go back to the Amazon Kinesis page in the AWS console and go to the Analytics applications. Select the Streaming application that you created and Press the **Open Apache Flink dashboard** button on the upper panel. |
| 87 | +
|
| 88 | +9. This should list the running jobs, select the Job that is currently in the 'RUNNING' status and you could drill down for further analysis. |
| 89 | +
|
| 90 | +## Cleanup |
| 91 | +
|
| 92 | +1. Change directory to the pattern directory: |
| 93 | + ```sh |
| 94 | + cd serverless-patterns/firehose-dataanalytics-flink-s3-sink |
| 95 | + ``` |
| 96 | +
|
| 97 | +2. Delete all created resources |
| 98 | + ```sh |
| 99 | + terraform destroy |
| 100 | + ``` |
| 101 | +
|
| 102 | +3. During the prompts: |
| 103 | + * Enter yes |
| 104 | +
|
| 105 | +4. Confirm all created resources has been deleted |
| 106 | + ```sh |
| 107 | + terraform show |
| 108 | + ``` |
| 109 | +
|
| 110 | +## Reference |
| 111 | +- [Amazon Kinesis Data Analytics for Apache Flink](https://docs.aws.amazon.com/kinesisanalytics/latest/java/what-is.html) |
| 112 | +- [Send Streaming Data to Amazon S3 in Python](https://docs.aws.amazon.com/kinesisanalytics/latest/java/examples-python-s3.html) |
| 113 | +
|
| 114 | +---- |
| 115 | +Copyright 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. |
| 116 | +
|
| 117 | +SPDX-License-Identifier: MIT-0 |
0 commit comments