CloudFront

Architecture

Terms:

Origin: the content’s source location.
- S3 Origin
- Custom Origin: anything running on a web server with a publicly routable IP address.
Distribution: a configuration that defines origins and 1 or more cache behaviors.
Edge Locations: the locations where CloudFront caches content.
Regional Edge Cache: a bigger cache that provides anoter layer of caching.

By default, an URL is generated (xxxxxxxxxxxxx.cloudfront.net), but you can add your custom domain. CloudFront integrates with AWS Certitificate Manager (ACM) for HTTPS.

Whenever content is requested to the edge location, in the case of a cache hit it’s returned immediately (low latency). In the case of a cache miss the regional edge cache is used. At this level, if there’s a cache hit, the content is copied to the requesting edge location. If a cache miss occurs, an origin fetch is started so that the content will be available in the regional edge cache and for the edge location that requested it.

Caching WRITE operations is NOT supported.

CloudFront provides DDoS and integrates with AWS Shield and AWS WAF.

It improves performance for both static content (images, videos, …) and dynamic content (API acceleration, dynamic site delivery)

Request/Response Flow

Viewer Request: User ⇒ CloudFront
Origin Request: CloudFront ⇒ Origin
Origin Response: Origin ⇒ CloudFront
Viewer Response: CloudFront ⇒ User

Behaviors

A Distribution has one default Behavior, but you can add custom ones.

Behavior selection happens via pattern matching, the default behavior has a pattern of *.

In a Behavior you can select:

Object compression
Viewer Protocol: HTTP and HTTPS, redirect to HTTPS, HTTPS only.
Allowed HTTP methods
Cache Policies that allow you to define Compression, a Minimum TTL, a Maximum TTL and a Default TTL. Also you can define which Headers, Query Strings and Cookies are part of the Cache Key.
Origin Request Policies which let you add Headers, Query Strings and Cookies in the request to the origin.
Restricted access using Signed URLs os Signed Cookies (⇒ KeyGroups)
Field-level Encription: it allows you to generate a keypair and distribute the public key to the edge locations. Then specify the set of fields in POST requests that you want to be encrypted, and the public key to use to encrypt them. You can encrypt up to 10 data fields in a request.
Cache directives like policies.
Lambda Functions association
Viewer Access Restrictions

The above settings only affect paths that fall in the PATTERN of the Behavior

TTL and Invalidation

Edge locations and Regional Edge Caches check with the origin if their content is still valid:

If the content’s age is below the TTL the origin returns a 304 Not Modified status code and the caches keep serving that content.
If the content’s age is above the TTL the origin returns a 200 OK status code *along with the new version of the content.

Behaviors can control TTL using Cache Policies, where you can define Minimum TTL, Maximum TTL and Default TTL for the content. Those values DO NOT affect the content by themselves, because if the Origin returns different values in its headers the latters will be applied. BUT if values returned by the origin are not within that range, they’re set to the limits' value.

Origin headers * Cache-Control: Max-Age (seconds) * Cache-Control: s-maxage (seconds) * Cache-Control: Expires (date)

For example if the Cache Policy in the Behavior sets Maximum TTL to 500 seconds, and the origin returns a Cache-Control: Max-Age=10000 header, the cache will serve the content for 500 seconds.

It follows that these settings can be set per object in the origin, but this is a way to limit those per object settings.

Origin headers can be served:

For Custom Origins by the web server
For S3 using Object Metadata

Invalidation is a parameter that you set per Distribution, not per Behavior. It immediately expires every object, regardless of their TTL, based on the invalidation pattern that you provide.

/img/*
/img/test*
/img/myimgs/*.png
/*  # Invalidates all!

It takes some time for the invalidation to propagate to the edge locations. Also, invalidation has a cost that is the same regardless of the number of objects involved.

If you want to save money you can do so by using versioned file names like mycat_v1.jpg mycat_v2.jpg. The application will point to new names and old names will never be requested again by the users' browsers, they’ll expire from the cache and never be requested again. Also logging will be more effective because the version of the file will show up.

SSL/TLS and SNI

You need to create/import a certificate in the us-east-1 region*.

You can configure how HTTPS is handled in the Viewer Protocol setting of Behaviors:

HTTP and HTTPS: both ok.
redirect to HTTPS: redirect to HTTPS.
HTTPS only.

You actually have two connections and they BOTH NEED VALID PUBLIC CERTIFICATES (Self-signed certificates will not work):

Viewer/Client ⇒ CloudFront (Viewer Protocol)
CloudFront ⇒ Origin (Origin Protocol)

The certificate SNI for the Viewer protocol must match the domain name of the cloudfront distribution and the origin’s certificate must match the domain name of the origin so that when cloudfront fetches from it, it’ll have no certificate error.

Whichever protocol (HTTP, HTTPS) is used as the Viewer Protocol, the same protocol is used as Origin Protocol.

SNI

SNI is an extension of the TLS protocol. It allows a single IP address to serve multiple domains by allowing to specify which domain name a client is trying to securely connect to without the need to unpack data first.

The Host header in an HTTP request is evaluated after layer 4 packets are decapulated so if a web serve needs to evaluate if the packet is directed to it and finds it’s not, information is lost. There needs to be a way to check that the connection is bein routed correctly before layer 7 in decapsulated so that packets can be sent to the right layer 7 router (web server, ingress, load balancer).

From another perspective, there’s no way for a server to provide multiple certificates since the host header is read too late.

SNI allows to have multiple HTTPS sites and certificates behind the same IP address.

In CloudFront you can choose to:

Use SNI only (default): no additional charge, not compatible with older browsers.
Use a dedicated IP at the edge locations: 600$ monthly charge. Compatible with all browser, even those not supporting SNI.

Origins

Amazon S3
Elastic Load Balancer
API Gateway
Mediastore Container
Mediapackage Container
Mediapackage V2 Endpoint
Custom origin’s domain name (HTTP/HTTPS), including EC2 instances with a public ip and endpoint.

An Amazon S3 Bucket configured for Static Website Hosting is considered a CUSTOM origin and several features will be different.

While fetching from origins you can use custom headers, maybe for security configuration.

Origin Groups

They provide resilience because a Behavior can be configured to use multiple origins and in case an origin fails another is selected.

Logging and Monitoring

You can enable real-time logs that you can use in a Kinesis stream.

Security

Securing the Origins

Origin Access Identity (OAI) for S3 (Legacy)

It’s an Identity that CloudFront impersonates to access S3. For S3 that identity can be used in a bucket policy.

It’s wise to only access the bucket via the OAI.

Origin Access Control (OAC) for S3

Same as above, this is the recommended way.

Securing Custom Origins

Using Custom Headers: they’re injected at the edge location without the user knowing, before requesting content to the origin. The Origin refuses to serve content without those headers.
Using a whitelist and whitelisting CloudFront IP addresses, which are well known.

Securing the user-facing

Private Behaviors

A Distribution can be set to either Public or Private, but you can actually define access control per behavior.

Access control is performed using signed URLs and signed cookies. To create them you need a Trusted Signer:

A legacy method is to have a CloudFront Key created by the root user.
A more modern method is to use Key Groups and use them to sign URLs and cookies.

Signed URLs:

They’re only valid for one object.
Used if the client doesn’t support cookies
You’re provided with a custom URL.

Signed Cookies:

They’re valid for groups of files or all files of a type.
You can preserve the URL.

CloudFront Functions and Lambda@Edge

CloudFront Functions

Lambda@Edge

Event sources

Viewer Request, Viewer Response

Viewer Request, Viewer Response, Origin Request, Origin Response

Support for CloudFront KeyValueStore

yes

Scale

> 10.000.000 rps

up to 10.000 rps/region

Duration

Submillisecond

Up to 5s (Viewer Request/Response), up to 30s (Origin Request/Response)

Max Memory

2 MB

128-10.240 MB

Network Access

yes

Filesystem Access

yes

Request Body Access

yes

Geolocation Data Access

yes

only in Origin Request/Response

They can be executed in four different moments:

After a Viewer Request: they change what CloudFront receives from the user. They can be used to:
- Perform A/B testing: modify a URL (an image path for example) before it’s processed by CloudFront.
- Generate a static response, in case of maintenance or to reduce the load on the origin.
- Perform an HTTP Redirect, for example unauthenticated users to the login page.
- Redirect to country-specific URLs.
- Read a form and edit information as needed
Before an Origin Request:
- Migrate between origins without updating the distribution
- Generate a header based on a query string parameter, keeps the same outer interface while modifying the backend.
- Normalize query string parameters to improve cache hits
- Content-based dynamic Origin selection
- Serve different versions of an object based on the device/user agent.
- County-based content serving
After and Origin Response:
- The Origin returns an error code, but you want to return a 200 to the client
- Translate an origin error to a redirect
Before a Viewer Response:
- Override a response header like X-Amz-Meta-Last-Modified → Last-Modified

CloudFront Functions

Ideal for lightweight, short-running functions:

Cache key normalization – Transform HTTP request attributes (headers, query strings, cookies, and even the URL path) to create an optimal cache key, which can improve your cache hit ratio.
Header manipulation – Insert, modify, or delete HTTP headers in the request or response. For example, you can add a True-Client-IP header to every request.
URL redirects or rewrites – Redirect viewers to other pages based on information in the request, or rewrite all requests from one path to another.
Request authorization – Validate hashed authorization tokens, such as JSON web tokens (JWT), by inspecting authorization headers or other request metadata.
Tasks that require access to the CloudFront KeyValueStore

Lambda@Edge

Functions that take several milliseconds or more to complete
Functions that require adjustable CPU or memory
Functions that depend on third-party libraries (including the AWS SDK, for integration with other AWS services)
Functions that require network access to use external services for processing
Functions that require file system access or access to the body of HTTP requests

Lambda Functions that execute at the Edge Location. They [.underline]#only support Node.js and Python.

They can be used for Origin Failover.

Geo Blocking

It’s possible to set up geographic restriction on accessing the distribution. This is done via either:

A Block list
An Allow list

Price Classes

The cost of data out from edge locations varies a lot depending on the countries involved.

You can reduce the locations of the edge locations you use.

Price classes:

All: all countries.
200: all but the most expensive. No South America, Australia and New Zeland.
100: only the cheapest. Only USA, Mexico, Canada, Europe, Israel and Türkiye.