A Short Primer on AWS Storage Types

There are about a bazillion web sites that will cover this basic topic. This page is designed so I have a predictably-stable page to point to from any other blog entries that require a suitable reference

AWS offers a number of storage options for customers to use. Each storage option has pricing-tiers that allow a customer to trade off capabilities for pricing.
  • EBS provides block-mode storage. This type of storage is suitable for creating the kinds of devices that EC2-hosted operating systems are typically designed to work with. The various storage-types in this offering (the previous EBS generation also included a generic, magnetic HDD tier) are based on either (magnetic) hard disk drives (HDD) or solid-state drives (SSD). Four discrete performance-profiles are offered:
    • Privisioned IOPs SSD: The fastest and priciest tier - allowing a customer to guarantee minimum IO characteristics for EC2-hosted applications that are latency-sensitive 
    • General Purpose SSD: The second priciest tier - offers I/O bursts based on amount of storage allocated. Generally provides storage characteristics equivalent to a mid-grade solid state drive. Good for hosting a typical operating-system disk and applications with moderate to high performance demands
    • Throughput-optimized HDD: 
    • Cold HDD:
  • S3 provides network-accessed, highly-durable, highly-scalable, geographically-distributed, object-based storage. Access is provided using an HTTP-based interface. Access speeds are governed primarily by the network performance of the EC2 node accessing the storage. Pricing is tiered primarily on the amount of data stored, frequency and mixture of read and write activities associated with that body of stored data and how long the data is stored.
    • Standard: The most-performant S3 tier. Designed for workloads that read from as well as write to S3. Billing mandates no periodicity-minimum (if data is placed into S3 and immediately deleted - or moved to another tier - no time-based fees accrue).
    • Infrequent Access (IA): Designed primarily for write-oriented workloads but which require fast access when retrieval is actually needed. Typical use-case might be data-backups. Billing is based on a thirty day minimum (i.e., even if one uploads to IA and immediately deletes, thirty-days worth of billing charges will accrue).
    • One Zone/Redudced Redundancy: Similar performance to the standard and IA tiers, however, data is stored only in one availability zone. If there is a temporary failure of a zone, ability to read data from that zone will be impacted. If the stored-to zone is lost (e.g., due to fire, natural disaster, etc.), any data stored to that zone using this tier will be similarly lost.
    • Glacier:
    Unfortunately, the object-storage system does not directly support OS-level metadata like ownership attributes, MAC labeling or permissions. To store such OS-specific metadata, one needs to encapsulate OS data inside of protective data-structures (TAR or CPIO archive, streamed filesystem objects like ISOs, etc.)
  • EFS provided file-based, network-accessed storag. EFS is an access-overlay for S3. As such, inherits S3's durability, scalability and distributedness. The file-based access is provided through an NFSv4.1-compliant interface. This makes it well-suited as a shared-storage solution for clustered applications that and applications that need availability of OS-layer metadata. Similar to S3, EFS is charged on a pro-rated, bytes-stored basis.

    Security-note: while the NFSv4.1 specification includes in-flight encryption capabilities, EFS does not currently allow use of these capabilities. To ensure confidentiality of data-in-flight, use security-groups to ensure that only EC2s meant to access a given share have network access to that share. If in-flight encryption is a hard requirement, use the EFS mount-helper: this utility uses stunnel to encapsulate the NFS data-stream within a TLS-encrypted tunnel.

    Performance-note: EFS's I/O performance scaling is tied to the amount of data stored — the larger the amount of data stored, the more-performant access to that data becomes.

No comments: