Buckets

Limits

Soft limit: 100 buckets/account

Hard limit: 1000 buckets/account

Naming

There are two valid URL formats that can be used:

Path-style URL: https://s3.${aws_region}.amazonaws.com/${bucket-name}/${object_key}
Virtual-hosted-style URLs: https://${bucket_name}.s3.${aws_region}.amazonaws.com/${object_key}

bucket_name = "my-test-bucket"

aws_region = "eu-north-1"

object_prefix = "images/"

object_key = "images/home.jpg"

Object URL: https://my-test-bucket.s3.eu-north-1.amazonaws.com/images/home.jpg or https://s3.eu-north-1.amazonaws.com/my-test-bucket/images/home.jpg

In terms of implementation, buckets and objects are AWS resources, and Amazon S3 provides APIs for you to manage them.

Amazon S3 supports global buckets, which means that each bucket name must be unique across all AWS accounts in all the AWS Regions within a partition. You should not depend on specific bucket naming conventions for availability or security verification purposes.

There is a soft limit of 100 buckets per account and a hard limit of 1000 buckets per account.

Naming:

3-63 chars
lowercase letters, numbers, dots (discouraged), hyphens
must begin or end with a letter or number
No ..
No IP address format
Forbidden prefixes:
- xn--
- sthree-
- sthree-configurator
Forbidden suffixes:
- s3alias
- --ol-s3
Unique within partition (Partitions: aws, aws-cn, aws-us-gov)
No dots if using S3 Transfer Acceleration

Configuration:

website
versioning
transfer acceleration
tagging
requestPayment (Requester Pays)
replication
policy and ACL (access control list)
object locking
logging
location
lifecycle
event notification
CORS (cross-origin resource sharing)

Access options:

Console, AWS CLI, AWS SDKs or S3 API (directly)
Static websites
S3 Access Points / Multi-region Access Points (for shared datasets)
Mountpoint for Amazon S3 (mount locally, high throughput)
SFTP with AWS SFTP

Security

The resource owner refers to the AWS account that creates the resource. The default owner is the AWS account for objects created by root accounts, IAM Roles and IAM Users.

A bucket owner can grant cross-account permissions to another AWS account (or users in another account) to upload objects. In this case, the AWS account that uploads objects owns those objects. The bucket owner does not have permissions on the objects that other accounts own. With some exceptions:

The bucket owner pays the bills. The bucket owner can deny access to any objects, or delete any objects in the bucket, regardless of who owns them.
The bucket owner can archive any objects or restore archived objects regardless of who owns them.

Amazon S3 considers a bucket or object ACL public if it grants any permissions to members of the predefined AllUsers or AuthenticatedUsers groups.

The anonymous user is identified by a specific canonical user ID: 65a011a29cdf8ec533ec3d1ccaae921c.

All Users group: predefined group.

If an object is uploaded to a bucket through an unauthenticated request, the anonymous user owns the object. The default object ACL grants FULL_CONTROL to the anonymous user as the object’s owner. Therefore, Amazon S3 allows unauthenticated requests to retrieve the object or modify its ACL.

Policies

S3 Public Access

A helper setting to managing public access.Default is deny.

It overrides policies and permissions, it must be disablet for other mechanisms to work, it’s evaluated before policies.

Settings:

BlockPublicAcls: blocks PUT calls on ACL APIs for the bucket and all objects.ACLs cannot be changed.
IgnorePublicAcls: public ACLs.
BlockPublicPolicy: blocks PUL calls to ALC APIs for buckets and access points.
RestrictPublicBuckets: only allows access to entities within the AWS account that have access to the bucket.

S3 Bucket Ownership

S3 Object Ownership is an Amazon S3 bucket-level setting that you can use to control ownership of objects uploaded to your bucket and to disable or enable access control lists (ACLs).

Owner: the root account in which the bucket was created.

Encryption

S3 uses AWS-256 for encrption.

Available options:

Server Side Encryption (SSE)
- SSE-S3: uses S3 managed keys, no calls to KMS. Keys are from S3.
  For API calls set the "x-amz-server-side-encryption" header to "AES256"
- (D)SSE-KMS: uses KMS keys to encrypt/decrypt. KMS APIs are called for each encryption/descryption operation (limits and costs will impact).
  With DSSE you get dual-layer SSE.
  For API calls set the "x-amz-server-side-encryption" header to "aws:kms" (for SSE-KMS, not DSSE-KMS).
  - Bucket keys: KMS generates a key that’s stored in the bucket so no external KMS calls are made every time. Works with replication even though KMS is regional but the ETAG of the destination object will not be the same as the source.
- SSE-C: Server side encryption with customer provided keys.
  The key is provided in an header of the request and is not stored by AWS after use.
  HTTPS is mandatory in this case.
Client Side Encryption: AWS has no role in encrypting. The client encrypts the payload before storing it o decrypts it upon retrieval.

With bucket keys enabled only the bucket will show up in CloudTrail KMS events, not the objects nymore.

To encrypt existing object S3 Batch Operations or the copy-object CLI command can be used..

Encryption in transit:

S3 exposes both HTTP and HTTPS endpoints. The latter is recommended but it’s mandatory when using CSE.
To force encryption in transit a bucket policy of deny with a condition of "aws:SecureTransport": "false" can be used. No plain HTTP requests will be accepted.
Bucket policies are evaluated before default encryption.

MFA Delete

Only available if versioning is enabled, in the versioning section.

Requires the two-factors authentication to:

Permanently delete an object version
Suspend versioning

Any API call must provide the MFA device serial number and the MFA code.

Storage Management

Object Versioning

States of a bucket:

Unversioned: versioning has never been active.
versioning-enabled: versioning is enabled
versioning-suspended: versioning has been suspended. Once versioning has been activated it cannot be deactivated, only suspended.

Enabling versioning will not give any version ID to existing objects, new versions of that objects will get one. On versioning enabling the version ID for existing objects will be null.

If you delete an object, instead of removing the object permanently, Amazon S3 inserts a delete marker, which becomes the current object version (Performing a GET Object request when the current version is a delete marker returns a 404 Not Found error). If your DELETE operation specifies the versionId, that object version is permanently deleted, and Amazon S3 doesn’t insert a delete marker.

You can permanently delete an object by specifying the version that you want to delete. Only the owner of an Amazon S3 bucket or an authorized IAM user can permanently delete a version.

Only PUT calls create a new version, Some actions that modify the current object don’t create a new version because they don’t PUT a new object. This includes actions such as changing the tags on an object.

Replication

Replication enables automatic, asynchronous copying of objects across Amazon S3 buckets within the same account or across accounts. Replication also works with multiple destinations and different regions (Cross-Region Replication, CRR).

You can use a prefix or tags to limit its scope.

Replication can be:

Live replication: replicates objects as soon as they arrive to the bucket.
Batch replication: on-demand.

On enabling, existing objects are not replicated, you must use batch replication to replicate them.

Replication options:

Which objects to replicate (All, prefix, tags)
Destination storage class (default = same)
Destination objects ownership: by default in case of cross-account replication the source account is the owner of the objects

Requirements:

Destination bucket(s)
An IAM role that can read from the source and write to the destination: in case of cross-accounts replication the destination bucket must allow writes by the source account role with a bucket policy, the role will be out of scope of the destination IAM.
Both source and destination buckets must have versioning enabled.
If the owner of the source bucket doesn’t own the object in the bucket, the object owner must grant the bucket owner READ and READ_ACP permissions with the object access control list (ACL).
If the source bucket has S3 Object Lock enabled, the destination buckets must also have S3 Object Lock enabled.

What is NOT replicated

Replicas. They need a batch replication to be themselves replicated. There’s no replication chaining A ⇒ B ⇒ C.
Objects in the source bucket that have already been replicated to a different destination. For example, if you change the destination bucket in an existing replication configuration, Amazon S3 won’t replicate the objects again. To replicate previously replicated objects, use Batch Replication.
Delete markers are not replicated by default but you can opt-in.
Objects that were deleted with the version ID of the object from the destination bucket.
By default, when replicating from a different AWS account, delete markers added to the source bucket are not replicated.
Objects that are stored in the S3 Glacier Flexible Retrieval, S3 Glacier Deep Archive, S3 Intelligent-Tiering Archive Access, or S3 Intelligent-Tiering Deep Archive Access storage classes or tiers.
Objects in the source bucket that the bucket owner doesn’t have sufficient permissions to replicate.
Updates to bucket-level subresources. For example, if you change the lifecycle configuration or add a notification configuration to your source bucket, these changes are not applied to the destination bucket. This feature makes it possible to have different configurations on source and destination buckets.
System actions performed by lifecycle configuration.

Default bucket encryption and replication

If objects in the source bucket are not encrypted, the replica objects in the destination bucket are encrypted by using the default encryption settings of the destination bucket (As a result, the entity tags (ETags) of the source objects differ from the ETags of the replica objects).
If objects in the source bucket are encrypted by using SSE-S3, SSE-KMS or DSSE-KMS, the replica objects in the destination bucket use the same type of encryption as the source objects. The default encryption settings of the destination bucket are not used.

Replication use cases

Replicate objects while retaining metadata – important if you must ensure that your replica is identical to the source object.
Replicate objects into different storage classes
Maintain object copies under different ownership – Regardless of who owns the source object, you can tell Amazon S3 to change replica ownership to the AWS account that owns the destination bucket. This is referred to as the owner override option.
Keep objects stored over multiple AWS Regions
Replicate objects within 15 minutes – To replicate your data in the same AWS Region or across different Regions within a predictable time frame, you can use S3 Replication Time Control (S3 RTC). S3 RTC replicates 99.99 percent of new objects stored in Amazon S3 within 15 minutes (backed by a service-level agreement).
Replicate previously failed or replicated objects
Replicate objects and fail over to a bucket in another AWS Region – To keep all metadata and objects in sync across buckets during data replication, use two-way replication (also known as bi-directional replication) rules before configuring Amazon S3 Multi-Region Access Point failover controls. Two-way replication rules help ensure that when data is written to the S3 bucket that traffic fails over to, that data is then replicated back to the source bucket.

CRR (Cross-Region Replication):

Compliance
Lower latency access
Replication across accounts

SRR (Same-Region Replication):

Log aggregation
Live replication between production and test accounts/environments

When to use Cross-Region Replication (CRR)

Meet compliance requirements
Minimize latency
Increase operational efficiency

When to use Same-Region Replication (SRR)

Aggregate logs into a single bucket
Configure live replication between production and test accounts
Abide by data sovereignty laws

When to use two-way replication (bi-directional replication)

Build shared datasets across multiple AWS Regions
Keep data synchronized across Regions during failover
Make your application highly available

When to use S3 Batch Replication

Replicate existing objects: replication is not retroactive
Replicate objects that previously failed to replicate
Replicate objects that were already replicated – You might be required to store multiple copies of your data in separate AWS accounts or AWS Regions. Batch Replication can replicate existing objects to newly added destinations.
Replicate replicas of objects that were created from a replication rule. Replicas of objects can be replicated only with Batch Replication.

Batch operations

Transfer Acceleration

Amazon S3 Transfer Acceleration is a bucket-level feature that enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket. It can speed up content transfers to and from Amazon S3 by as much as 50–500 percent for long-distance transfer of larger objects.

Transfer Acceleration takes advantage of the globally distributed edge locations in Amazon CloudFront. As the data arrives at an edge location, the data is routed to Amazon S3 over an optimized network path.

It supports multi-part uploads.

The bucket name MUST be DNS-compatible and it MUST NOT contain dots.

Requester Pays

With Requester Pays buckets, the requester instead of the bucket owner pays the cost of the request and the data download from the bucket. The bucket owner always pays the cost of storing data.

The requester must be authenticated to AWS.

Useful when sharing data among accounts.

S3 Inventory

You can use it to audit and report on the replication and encryption status of your objects for business, compliance, and regulatory needs.

Amazon S3 Inventory provides comma-separated values (CSV), Apache optimized row columnar (ORC) or Apache Parquet output files that list your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or objects with a shared prefix.

Configuration:

Which metadata to include per object
Which versions (all, current)
Destination
Schedule: daily or weekly
Inventory list file encryption

All of your objects might not appear in each inventory list. The inventory list provides eventual consistency for PUT requests (of both new objects and overwrites) and for DELETE requests. Each inventory list for a bucket is a snapshot of bucket items. These lists are eventually consistent (that is, a list might not include recently added or deleted objects).

Amazon S3 Event Notifications can be used to notify.

The Inventory can be queried through Athena.

The source bucket:

contains the objects
has the inventory configuration

The destination bucket:

Contains the inventory file lists
Contains the inventory manifest files that list all the inventory list files that are stored in the destination bucket
Must have a bucket policy to give Amazon S3 permission to verify ownership of the bucket and permission to write files to the bucket.
Must be in the same AWS Region as the source bucket.
Can be the same as the source bucket.
Can be owned by a different AWS account than the account that owns the source bucket.

Inventory formats:

Gzip compressed CSV
Zlib compressed ORC (Apache Optimized Row Columnar)
Snappy compressed Apache Parquet

Columns:

Bucket name
Object owner
Key name – When you’re using the CSV file format, the key name is URL-encoded and must be decoded before you can use it.
Version ID – This field is not included if the list is configured only for the current version of the objects.)
IsLatest – Set to True if the object is the current version of the object. (This field is not included if the list is configured only for the current version of the objects.
Delete marker – Set to True if the object is a delete marker.
Size – not including the size of incomplete multipart uploads, object metadata, and delete markers.
Last modified date
ETag – The entity tag (ETag) is a hash of the object. The ETag reflects changes only to the contents of an object, not to its metadata. The ETag can be an MD5 digest of the object data. Whether it is depends on how the object was created and how it is encrypted.
Storage class – Set to STANDARD, REDUCED_REDUNDANCY, STANDARD_IA, ONEZONE_IA, INTELLIGENT_TIERING, GLACIER, DEEP_ARCHIVE, OUTPOSTS, GLACIER_IR, or SNOW
Multipart upload flag
Replication status – Set to PENDING, COMPLETED, FAILED, or REPLICA.
Encryption status – Set to SSE-S3, SSE-C, SSE-KMS, or NOT-SSE.
S3 Object Lock retain until date
S3 Object Lock retention mode - Governance or Compliance.
S3 Object Lock legal hold status – On or Off
S3 Intelligent-Tiering access tier – Set to FREQUENT, INFREQUENT, ARCHIVE_INSTANT_ACCESS, ARCHIVE, or DEEP_ARCHIVE.
S3 Bucket Key status – Set to ENABLED or DISABLED. Indicates whether the object uses an S3 Bucket Key for SSE-KMS.
Checksum algorithm
Object access control list – An access control list (ACL) for each object that defines which AWS accounts or groups are granted access to this object and the type of access that is granted. The Object ACL field is defined in JSON format.

Website hosting

Website endpoints:

s3-website dash (-) Region: http://${bucket_name}.s3-website-${aws_region}.amazonaws.com
s3-website dot (.) Region: http://${bucket_name}.s3-website.${aws_region}.amazonaws.com

If you want to configure an existing bucket as a static website that has public access, you must edit Block Public Access settings for that bucket. You might also have to edit your account-level Block Public Access settings. Additionally you must add a bucket policy!

In order to use a custom hostname for a Route53 Alias record the bucket must be named the same: for example, to access the static site at http://mysite.com the bucket name must be mysite.com. HTTPS is not supported anyway.

Use cases

Static web content serving.
Offloading big files from, for example, EC2 instances (see CORS).
Out-of-band pages: storing "under maintenance" or "out of service" pages so that the service hosted somewhere else can undergo maintenance and the page still shows, because it’s on a different place (an S3 buccket).

Public website policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PublicReadGetObject",
            "Effect": "Allow",
            "Principal": "*",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::Bucket-Name/*"
            ]
        }
    ]
}

The bucket policy applies only to objects that are owned by the bucket owner. If your bucket contains objects that aren’t owned by the bucket owner, the bucket owner should use the object access control list (ACL) to grant public READ permission on those objects:

Public website ACL

<Grant>
  <Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:type="Group">
    <URI>http://acs.amazonaws.com/groups/global/AllUsers</URI>
  </Grantee>
  <Permission>READ</Permission>
</Grant>

You can optionally enable Amazon S3 server access logging for a bucket that is configured as a static website.

CORS

Least configuration

[
    {
        "AllowedMethods": [
            "METHOD1"
        ],
        "AllowedOrigins": [
            "http://www.example1.com"
        ]
    }
]

Complete configuration

[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "METHOD1",
            "METHOD2"
        ],
        "AllowedOrigins": [
            "http://www.example1.com"
        ],
        "ExposeHeaders": []
    },
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "METHOD1",
            "METHOD2",
            "METHOD3"
        ],
        "AllowedOrigins": [
            "http://www.example2.com"
        ],
        "ExposeHeaders": []
    },
    {
        "AllowedHeaders": [],
        "AllowedMethods": [
            "GET"
        ],
        "AllowedOrigins": [
            "*"
        ],
        "ExposeHeaders": []
    }
]

Data Access

S3 Access Points

Access points are named network endpoints that are attached to buckets that you can use to perform S3 object operations.

Each access point has distinct permissions and network controls that S3 applies for any request that is made through that access point. Each access point enforces a customized access point policy that works in conjunction with the bucket policy. Each allow permission on an access point must have a counterpart on the bucket policy. However you can delegate: you grant unlimited access via the access point and move the granular permissions definition on the access point.

Access point bucket-style alias: When you create an access point, Amazon S3 automatically generates an alias that you can use instead of an Amazon S3 bucket name for data access. You can use this access point alias instead of an Amazon Resource Name (ARN) for access point data plane operations.

You can configure any access point to accept requests only from a virtual private cloud (VPC) to restrict Amazon S3 data access to a private network. You can also configure custom block public access settings for each access point.

All block public access settings are enabled by default for access points. You must explicitly disable any settings that you don’t want to apply to an access point.

Creation command:

$ aws s3control create-access-point \
      --name partialbucket \
      --bucket mybucket230ifde \
      --account-id 123456789012

Access point URL format: s3-accesspoint.Region.amazonaws.com

Access point policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::123456789012:user/Jane"
      },
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:us-west-2:123456789012:accesspoint/my-access-point/object/Jane/*"
    }
  ]
}

In conjunction with a bucket policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::123456789012:user/Jane"
      },
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::DOC-EXAMPLE-BUCKET1/Jane/*"
    }
  ]
}

Or the bucket can delegate (to any access point owned by the user in this case):

{
    "Version": "2012-10-17",
    "Statement" : [
    {
        "Effect": "Allow",
        "Principal" : { "AWS": "*" },
        "Action" : "*",
        "Resource" : [ "Bucket ARN", "Bucket ARN/*"],
        "Condition": {
            "StringEquals" : { "s3:DataAccessPointAccount" : "Bucket owner's account ID" }
        }
    }]
}

Multi-region Access Points (MRAP)

A MRAP is a single access point that is backed by multiple S3 buckets. You can enable two-ways replication.

Creation take a while, it’s not immediate.

An interesting use case: you have a bucket replicated in n regions and allow people to use a single endpoint to have their traffic go to the lowest-latency bucket. It’s an active-active failover configuration. If you try to get, from the MRAP, a file that was uploaded in a far region and that hasn’t yet been replicated in a closer region youl’ll get a 404. S3 won’t serve you the file from the origin bucket :(

Restricting access points to a VPC

A VPC Id can be specified on creation.

Restricting public acces

All block public access settings are enabled by default for access points. You must explicitly disable any settings that you don’t want to apply to an access point.

S3 Select and Glacier Select

You can use structured query language (SQL) statements to filter the contents of an Amazon S3 object and retrieve only the subset of data that you need. You can reduce the amount of data that Amazon S3 transfers, which reduces the cost and latency to retrieve this data.

File formats:

CSV
JSON
Apache Parquet
Gzip, Bzip2 ⇒ CSV and JSON only + Server Side Encryption

Limitations

One object at a time
The maximum length of a SQL expression is 256 KB.
The maximum length of a record in the input or result is 1 MB.
Can only emit nested data by using the JSON output format.
You cannot query an object stored in the S3 Glacier Flexible Retrieval, S3 Glacier Deep Archive, Reduced Redundancy Storage (RRS) storage classes, S3 Intelligent-Tiering Archive Access tier or the S3 Intelligent-Tiering Deep Archive Access tiers.
Apache Parquet objects:
- Amazon S3 Select supports only columnar compression using GZIP or Snappy
- Amazon S3 Select doesn’t support Parquet output
- The maximum uncompressed row group size is 512 MB.
- You must use the data types that are specified in the object’s schema.
- Selecting on a repeated field returns only the last value.