Amazon File Cache

Amazon File Cache is a fully managed, high-speed (SSD backed) cache on AWS that’s used to process file data, regardless of where the data is stored.

Speed: sub-millisecond latencies (millions of operations per second) and high throughput (hundreds of GB/s).

Amazon File Cache is POSIX-compliant, so you can use your current Linux-based applications without having to make any changes, NFSv3 protocol is required to access the cache; it provides a native file system interface and works as any file system does with your Linux operating system.

It’s not available in all regions.

Do NOT edit the same file from the cache and in the backing repository, the behavior is undefined.

FileCache is available from:

  • EC2 instances

  • Containers:

    • ECS workloads

    • EKS workloads (using the Amazon File Cache CSI Driver).

You deploy File Cache in one subnet (Availability Zone) and instances can connect to it from other subnets in the VPC as long as Security Groups/NACLs allow so (additional costs may occur).

Repositories

Amazon File Cache automatically loads data into the cache when it’s accessed for the first time (lazy loading) and releases data when it’s not used. You can optionally pre-load (sudo lfs hsm_restore path/to/file) data into the cache before starting your workload. ls and stat commands or equivalent will have lazy loading only download metadata, data is actually fetched if the file is accessed.

Amazon File Cache automatically transfers Portable Operating System Interface (POSIX) metadata for files, directories, and symbolic links (symlinks) when importing and exporting data to and from a linked Amazon S3 or NFS data repository. When you export changes in your cache to a linked data repository, Amazon File Cache also exports POSIX metadata changes along with data changes, so you can implement and maintain access controls between your cache and its linked data repositories.

A link between a directory on your cache and an Amazon S3 or NFS data repository is called a data repository association (DRA); Each DRA must have a unique Amazon File Cache directory and an S3 bucket or NFS file system associated with it.

Each cache can have at most 8 repositories of the same repository type (S3 or NFS).

You create the links when you create your cache, you can link to a data repository only when you create the cache, you can’t update or delete a DRA, you need to delete the cache and recreate it.

You can export a file back to the repository with sudo lfs hsm_archive path/to/export/file.

S3 Repositories

For symlinks, Amazon File Cache uses the following Amazon S3 schema:

  • S3 object key – The path to the link, relative to the Amazon File Cache mount dire ctory.

  • S3 object data – The target path of the symlink.

  • S3 object metadata – The metadata for the symlink.

POSIX metadata are stored as follows in S3 objects:

  • x-amz-meta-file-permissions – The file type and permissions in the format <octal file type><octal permission mask>, consistent with st_mode in the Linux stat(2) man page.

  • x-amz-meta-file-owner: The owner user ID (UID) expressed as an integer.

  • x-amz-meta-file-group: The group ID (GID) expressed as an integer.

  • x-amz-meta-file-atime: The last-accessed time in nanoseconds. Terminate the time value with ns; otherwise Amazon File Cache interprets the value as milliseconds.

  • x-amz-meta-file-mtime: The last-modified time in nanoseconds. Terminate the time value with ns; otherwise, Amazon File Cache interprets the value as milliseconds.

  • x-amz-meta-user-agent: always set to aws-fsx-lustre.

Only SSE-S3 and SSE-KMS server side encryption are supported. To access an encrypted bucket in another account you’ll need the AWSServiceRoleForFSxS3Access_fc-…​ role or use a role that has access to the KMS Key.

On-Premise NFS

  • Your on-premises NFS file system must support NFSv3.

  • If you’re using a domain name to link your NFS file system to Amazon File Cache, you must provide the IP address of a DNS server that Amazon File Cache can use to resolve the domain name of the on-premises NFSv3 file system.

  • The DNS server and on premises NFSv3 file system must use private IP addresses, as specified in RFC 1918.

  • You must establish an AWS Direct Connect or VPN connection between your on-premises network and the Amazon VPC where your Amazon File Cache is located.

  • The on-prem firewall must allow bot TCP and UDP traffic for ports 111, 2049, 635, 4045, 4046.

Deployment Types

  • CACHE_1 (default): data is automatically replicated within the same Availability Zone in which the cache is located, and file servers are replaced if they fail.

Integrations

  • AWS Batch with EC2 Launch Templates

  • AWS Thinkbox Deadline

Use Cases

  • make dispersed datasets available to file-based applications on AWS with a unified view, and at high speeds.

  • Speed up workloads completion times and optimize compute resources