Kinesis Data Streams

Kinesis Data Streams can support very high ingestion rates.

c03 kinesis datastreams architecture

Data is ingested into Kinesis Data Streams by Producers.

Each Data Streams stream is made up of Shards (1 ⇒ n) that can be either pre-provisioned or adjusted dynamically. Each shard has an ingestion throughput of 1 MB/s (or 1000 messages/s) PER SHARD, and a send throughput of 2 MB/s PER SHARD, so you need to plan how many shards you’ll need if you don’t want to pay additinoal fees for auto-scaling. Each Shard has a partition identifies so that when producers send data they also speficy a partition key to select the destination shard.

Produces send Records to the stream, and each record contains a partition key and a blob (up to 1 MB) of data. Producers can be:

  • Applications/clients using the AWS SDK

  • Applications/clients using the KPS (Kinesis Producer Library)

  • A Kinesis Agent

  • Other AWS services

  • Third party integrations like Debezium, Apache Flink, Fluentd, Kafka Connect and others

Consumers will then consume data:

  • Applications/clients using the AWS SDK

  • Applications/clients using the KPS (Kinesis Producer Library)

  • Amazon Data Firehose

  • Kinesis Data Analytics

  • Other AWS services

  • Third party integrations like Apache Druid, Talend, Apache Flink, Kafka Confluent Platform, Kinesumer and others.

Records coming FROM Data Streams contain:

  • A Partition Key

  • A Sequence Number

  • The data blob

Data retention

Retention can be set to anything between 1 and 365 days (default 24h). Within the retention period you can replay data. Data is Immutable, meaning it cannot be deleted in Kinesis Data Streams to ensure integrity.

Read Capacity

Usually the read throughput is 2MB/s/shard shared across all consumers.

You can enable Enhanced Fan-Out so that each consumer has a dedicated throughput.

Capacity Modes

  • On-demand: data streams with an on-demand mode require no capacity planning and automatically scale to handle gigabytes of write and read throughput per minute. With the on-demand mode, Kinesis Data Streams automatically manages the shards in order to provide the necessary throughput.

    • Write: up to 200 MB/s or 200.000 records/s

    • Read: 400 MB/s per consumer for up to 2 default consumers;
      Enhanced Fan-Out (EFO) supports up to 20 consumers with dedicated throughput.

  • Provisioned: data streams with a provisioned mode require capacity planning and require you to specify the number of shards and the amount of data throughput you want to provision. The total capacity of a data stream is the sum of the capacities of its shards. You can increase or decrease the number of shards in a data stream as needed.