Skip to content

Commit e4609ce

Browse files
authored
Merge pull request aws-samples#1625 from nareshrajaram2017/nareshrd-feature-lambda-esm-kinesis-filters-terraform
New serverless pattern - lambda-esm-kinesis-filters-terraform
2 parents 42ecd89 + 3ee1f2f commit e4609ce

36 files changed

Lines changed: 1570 additions & 0 deletions
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
# Kinesis Data Analytics Writing Stream Data to Amazon S3 Bucket as a Sink
2+
3+
The purpose of this pattern is to deploy the infrastructure necessary to enable Kinesis Data Analytics to write streaming data to Amazon S3 Bucket.
4+
5+
With Amazon Kinesis Data Analytics for Apache Flink, you can use Java, Scala, or Python to process and analyze streaming data. The service enables you to author and run code against streaming sources to perform time-series analytics, feed real-time dashboards, and create real-time metrics.
6+
7+
Kinesis Data Analytics provides the underlying infrastructure for your Apache Flink applications. It handles core capabilities like provisioning compute resources, parallel computation, automatic scaling, and application backups (implemented as checkpoints and snapshots). You can use the high-level Flink programming features (such as operators, functions, sources, and sinks) in the same way that you use them when hosting the Flink infrastructure yourself.
8+
9+
In this project, you create an Amazon Kinesis Data Analytics for Apache Flink application that has a Kinesis data stream as a source and an Amazon S3 bucket as a sink.
10+
11+
Learn more about this pattern at [Serverless Land Patterns](https://serverlessland.com/patterns/firehose-dataanalytics-flink-s3-sink).
12+
13+
Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the [AWS Pricing page](https://aws.amazon.com/pricing/) for details. You are responsible for any AWS costs incurred. No warranty is implied in this example.
14+
15+
## Requirements
16+
17+
* [Create an AWS account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources.
18+
* [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed and configured
19+
* [Git Installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
20+
* [Terraform](https://learn.hashicorp.com/tutorials/terraform/install-cli?in=terraform/aws-get-started) installed
21+
22+
## Deployment Instructions
23+
24+
1. Clone the project to your local working directory
25+
26+
```sh
27+
git clone https://github.com/aws-samples/serverless-patterns/
28+
```
29+
30+
2. Change the working directory to this pattern's directory
31+
32+
```sh
33+
cd serverless-patterns/firehose-dataanalytics-flink-s3-sink
34+
```
35+
36+
3. From the command line, initialize terraform to to downloads and installs the providers defined in the configuration:
37+
```
38+
terraform init
39+
```
40+
41+
4. From the command line, apply the configuration in the main.tf file:
42+
```
43+
terraform apply
44+
```
45+
46+
5. During the prompts:
47+
- Enter yes
48+
49+
## How it works
50+
51+
![Reference Architecture](./images/kinesis-dataanalytics-flink-s3-sink.png)
52+
53+
This pattern deploys a Kinesis Analytics streaming application, Kinesis Stream, a destination S3 bucket, and all of the additional required infrastructure services.
54+
55+
In this project, you create an Amazon Kinesis Data Analytics for Apache Flink application that has a Kinesis data stream as a source and an Amazon S3 bucket as a sink.
56+
57+
Note: The default region is `us-east-1`, it can also be changed using the variable `region`.
58+
59+
**Note:** Variables can be supplied in different options, check the [Terraform documentation](https://developer.hashicorp.com/terraform/language/values/variables) for more details.
60+
61+
## Testing
62+
63+
To test this project, follow the below steps:
64+
65+
1. Sign in to your aws console at https://console.aws.amazon.com
66+
67+
2. Navigate to Amazon Kinesis and go to the Analytics applications. This should display the list of all the streaming applications.
68+
69+
3. Select the Streaming application that you created as part of the deployment stack.
70+
71+
4. Press the **Run** button on the upper panel and in the next screen, choose 'Run with latest snapshot' and press the **Run** button on the bottom of the screen. Wait till the streaming application gets started successfully.
72+
73+
5. Generating Data:
74+
```sh
75+
cd serverless-patterns/firehose-dataanalytics-flink-s3-sink/test
76+
python stock.py
77+
```
78+
The stock.py generates stream data and puts it in the kinesis stream that you created as part of the deployment stack.
79+
80+
Note: Change the STREAM_NAME and region_name in the stock.py per your testing needs before you run it.
81+
82+
6. Wait for few minutes and then stop the data generation process.
83+
84+
7. Go to S3 in the AWS console and select the bucket that you created as part of the deployment stack. You should now see the stock data paritioned based on the stock symbol. For example: ticker:AAPL/, ticker:AMZN/ etc.
85+
86+
8. For further analysis, you could go back to the Amazon Kinesis page in the AWS console and go to the Analytics applications. Select the Streaming application that you created and Press the **Open Apache Flink dashboard** button on the upper panel.
87+
88+
9. This should list the running jobs, select the Job that is currently in the 'RUNNING' status and you could drill down for further analysis.
89+
90+
## Cleanup
91+
92+
1. Change directory to the pattern directory:
93+
```sh
94+
cd serverless-patterns/firehose-dataanalytics-flink-s3-sink
95+
```
96+
97+
2. Delete all created resources
98+
```sh
99+
terraform destroy
100+
```
101+
102+
3. During the prompts:
103+
* Enter yes
104+
105+
4. Confirm all created resources has been deleted
106+
```sh
107+
terraform show
108+
```
109+
110+
## Reference
111+
- [Amazon Kinesis Data Analytics for Apache Flink](https://docs.aws.amazon.com/kinesisanalytics/latest/java/what-is.html)
112+
- [Send Streaming Data to Amazon S3 in Python](https://docs.aws.amazon.com/kinesisanalytics/latest/java/examples-python-s3.html)
113+
114+
----
115+
Copyright 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
116+
117+
SPDX-License-Identifier: MIT-0
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
{
2+
"title": "Kinesis Data Analytics Writing Stream Data to Amazon S3 Bucket as a Sink",
3+
"description": "Transform incoming stream data and deliver the transformed data to S3 destination as a Sink",
4+
"language": "",
5+
"level": "200",
6+
"framework": "Terraform",
7+
"introBox": {
8+
"headline": "How it works",
9+
"text": [
10+
"In this project, you create an Amazon Kinesis Data Analytics for Apache Flink application that has a Kinesis data stream as a source and an Amazon S3 bucket as a sink.",
11+
"Kinesis Data Analytics provides the underlying infrastructure for your Apache Flink applications. It handles core capabilities like provisioning compute resources, parallel computation, automatic scaling, and application backups (implemented as checkpoints and snapshots)."
12+
]
13+
},
14+
"gitHub": {
15+
"template": {
16+
"repoURL": "https://github.com/aws-samples/serverless-patterns/tree/main/firehose-dataanalytics-flink-s3-sink",
17+
"templateURL": "serverless-patterns/firehose-dataanalytics-flink-s3-sink",
18+
"projectFolder": "firehose-dataanalytics-flink-s3-sink",
19+
"templateFile": "firehose-dataanalytics-flink-s3-sink/main.tf"
20+
}
21+
},
22+
"resources": {
23+
"bullets": [{
24+
"text": "Amazon Kinesis Data Analytics for Apache Flink",
25+
"link": "https://docs.aws.amazon.com/kinesisanalytics/latest/java/what-is.html"
26+
}
27+
]
28+
},
29+
"deploy": {
30+
"text": [
31+
"terraform init",
32+
"terraform plan",
33+
"terraform apply"
34+
]
35+
},
36+
"testing": {
37+
"text": [
38+
"See the README in the GitHub repo for detailed testing instructions."
39+
]
40+
},
41+
"cleanup": {
42+
"text": [
43+
"terraform destroy",
44+
"terraform show"
45+
]
46+
},
47+
"authors": [{
48+
"name": "Naresh Rajaram",
49+
"image": "",
50+
"bio": "Cloud Infrastructure Architect, AWS",
51+
"linkedin": "https://www.linkedin.com/in/naresh-rajaram-25bb106/"
52+
}]
53+
}
64.1 KB
Loading
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
/* Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. */
2+
3+
# --- main.tf ---
4+
5+
# Create Amazon S3 - flink application landing zone
6+
module "s3_application_landing_zone" {
7+
source = "./modules/s3"
8+
}
9+
10+
# Create Kinesis Stream
11+
module "kinesis_stream" {
12+
source = "./modules/kinesisstream"
13+
}
14+
15+
# Create Kinesis Analytics
16+
module "kinesisanalytics" {
17+
source = "./modules/kinesisanalytics"
18+
bucket_arn = module.s3_application_landing_zone.bucket_arn
19+
bucket_name = module.s3_application_landing_zone.bucket_id
20+
file_key = module.s3_application_landing_zone.file_key
21+
stream_name = module.kinesis_stream.stream_name
22+
region = var.region
23+
}
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
{
2+
"Version": "2012-10-17",
3+
"Statement": [
4+
{
5+
"Sid": "ReadCode",
6+
"Effect": "Allow",
7+
"Action": [
8+
"s3:GetObject",
9+
"s3:GetObjectVersion"
10+
],
11+
"Resource": [
12+
"${s3_bucket_arn}/${file_key}"
13+
]
14+
},
15+
{
16+
"Sid": "ListCloudwatchLogGroups",
17+
"Effect": "Allow",
18+
"Action": [
19+
"logs:DescribeLogGroups"
20+
],
21+
"Resource": [
22+
"arn:aws:logs:${aws_region}:${account_id}:log-group:*"
23+
]
24+
},
25+
{
26+
"Sid": "ListCloudwatchLogStreams",
27+
"Effect": "Allow",
28+
"Action": [
29+
"logs:DescribeLogStreams"
30+
],
31+
"Resource": [
32+
"arn:aws:logs:${aws_region}:${account_id}:log-group:/aws/kinesis-analytics/${application_name}:log-stream:*"
33+
]
34+
},
35+
{
36+
"Sid": "PutCloudwatchLogs",
37+
"Effect": "Allow",
38+
"Action": [
39+
"logs:PutLogEvents"
40+
],
41+
"Resource": [
42+
"arn:aws:logs:${aws_region}:${account_id}:log-group:/aws/kinesis-analytics/${application_name}:log-stream:${log_stream_name}"
43+
]
44+
},
45+
{
46+
"Sid": "ReadInputStream",
47+
"Effect": "Allow",
48+
"Action": "kinesis:*",
49+
"Resource": "arn:aws:kinesis:${aws_region}:${account_id}:stream/${stream_name}"
50+
},
51+
{
52+
"Sid": "WriteObjects",
53+
"Effect": "Allow",
54+
"Action": [
55+
"s3:Abort*",
56+
"s3:DeleteObject*",
57+
"s3:GetObject*",
58+
"s3:GetBucket*",
59+
"s3:List*",
60+
"s3:ListBucket",
61+
"s3:PutObject"
62+
],
63+
"Resource": [
64+
"${s3_bucket_arn}",
65+
"${s3_bucket_arn}/*"
66+
]
67+
}
68+
]
69+
}
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
{
2+
"Version": "2012-10-17",
3+
"Statement": [
4+
{
5+
"Effect": "Allow",
6+
"Principal": {
7+
"Service": "kinesisanalytics.amazonaws.com"
8+
},
9+
"Action": "sts:AssumeRole",
10+
"Condition": {
11+
"StringEquals": {
12+
"aws:SourceAccount": "${account_id}"
13+
},
14+
"ArnEquals": {
15+
"aws:SourceArn": "arn:aws:kinesisanalytics:${aws_region}:${account_id}:application/${application_name}"
16+
}
17+
}
18+
}
19+
]
20+
}

0 commit comments

Comments
 (0)