S3 101

Thats why its called S3 because of the three S’s,it is basically easy to use object storage, with of course a web front end.You pay only for the storage you use so it is dynamic , so capacity planning is no longer a constraint

Common uses are

  • Backup and archive for on premise or cloud
  • Content storage and distribution
  • Bi data analytics
  • Static website hosting
  • Disaster recovery

 

Storage classes are

 

  • General purpose
  • Infrequent access
  • Archive

 

Glacier is another storage service ,but it is optimised for Data archiving and long term backup, good for “cold data” where a retrieval time of hours is acceptable

 

Object storage Vs Block / File storage

In traditional IT environments two kinds of storage dominate

 

  • Block storage, operates at a low level and manages data as numbered fixed size blocks
  • File storage operates at a high level (OS) manages data as a hierarchy of files  

 

These two systems would be accessed over a network  in the form of a SAN using protocols Fibre Channel , but basically server and OS dependant

 

S3 is cloud based object storage , it is server independent and is accessed over the internet , data is managed via standard HTTP verbs

 

Each S3 bucket contains

  • Data
  • MetaData

 

Objects reside in containers called buckets , they are a simple flat file with no hierarchy in terms of a file system. A bucket can hold an infinite number of objects . You can only GET or PUT an object you cannot mount or open a bucket.S3 objects are automatically replicated within a region.

 

Buckets

A bucket is a container, and forms the top level namespace in S3

 

AWS Regions

 

Even though the name for a bucket is global, the bucket is created in the region that you choose, so you can control where your data is stored.

 

Objects

 

They are the entries that are actually stored in the S3 bucket. Data is the actual file itself, and Metadata is data about the file.The data portion is opaque to S3 , it doesn’t care about the actual data itself .The metadata of the object is a set of name/value pairs that describes the object.

 

Metadata breaks down into

  • System metadata  this is used by S3 eg date last modified, size , MD5 digest and HTTP content-Type
  • User metadata created when the object is created , you can tag Data with something meaningful

 

Keys

Every object in a bucket is identified by a unique identifier, it called a key it can be up to 1024 bytes of UTF 8, you can have the same key across two buckets but you cannot have identical keys within the same bucket. Bucket and key create a UID

 

Object URL

S3 is internet based storage and hence has an associated URL

 

http://mybucket.s3.amazonaws.com/jack.doc

 

S3 bucket name = mybucket

Key = jack.doc

 

S3 operations

  • Create / delete bucket
  • Write an object
  • Read an object
  • Delete an object
  • List keys in the bucket

 

REST interface

 

Basically all HTTP verbs form, the API

 

  • Create= HTTP PUT (sometimes POST)
  • Read= HTTP GET
  • delete= HTTP DELETE
  • Update = HTTP post

 

You interact with S3 via higher level interfaces rather than the REST direct these are

 

  • AWS SDK
  • JavaScript
  • Java
  • .NET
  • Node.js
  • PHP
  • Python
  • Ruby
  • Go
  • C++
  • AWS CLI
  • AWS Management console

 

Durability and Availability

 

Durability =  99.99999999%

Availability = 99.99%

 

Availability is achieved by device redundancy/ multiple devices within a region. Though this can lead to data consistency issues , since it takes time for updates to propagate to all new devices

 

Access control

 

To give access of a bucket to others

 

  • Coarse grained access controls S3 ACL’s (READ, WRITE FULL-CONTROL at object or bucket level (legacy)

 

  • Fine grained  access controls , S3 bucket policy , AWS IAM policy and query string manipulation, this is the recommended access control mechanism  

 

Bucket policies include an explicit reference to the IAM principal in the policy, which can be associated with a different AWS account.Using bucket policy you can also specify from where the S3 is accessed eg IP address and also at a particular time of day.

 

Static Website Hosting

 

This is a very common use for S3 , if there is no server side scripting required (PHP,ASP.NET or JSP). Because an S3 bucket has an URL it’s easy to change it into a website.

 

  1. Create a bucket with the same name as the desired website hostname
  2. Upload static files to the bucket
  3. Make all files public (world readable)
  4. Enable static website hosting for the bucket.This includes specifying an index document and an error document
  5. The website will now be available at the S£ website URL <bucket-name>.s3-website-<AWS-region>.amazonaws.com
  6. Create a friendly DNS name in your own domain using DNS CNAME or Amazon Route 53 alias that resolves to the amazon website   URL
  7. The website will now be available at your website domain name

 

Prefixes and Delimiters

 

This provides a way to access objects in a bucket with hierarchy for example you may want to save some server logs by

 

log/2016/january/server 42.log

log/2016/february/server 42.log

 

All of the access methods (including AWS console) support the use of delimiters as above . This technique used in conjunction with bucket policy allow you to control access at the user level.

 

Storage Classes

 

The range of storage classes are

 

standard High durability  and availability, low latency with high throughput
Standard infrequent access Same as standard , but for colder data, eg more long lived , less frequently accessed Lower GB per month cost than standard, minimum object size is 128KB, minimum duration is  30 days’s so use it for infrequent access for data that is older than 300 days
Reduced redundancy storage

RRS

Lower durability ( 4 nines) Reduced cost than standard
Glacier Low cost ,no real time access, retrieval time of several hours Controlled via S3 API copies to RRS

 

Object lifecycle management

 

Data traditionally can be thought of going from left to right

 

Hot

Frequent access low latency
Use S3 standard

Warm

Less frequent

30 days +, use standard IA

Cold

Archive

90 days move to glacier

Deletion

After 3 years delete

 

You can use S3 lifecycle configuration rules to move data

 

Encryption

 

in flight to S3 data is encrypted via HTTPS (to and from S3)

 

At rest you can use several variations of SSE ( server side encryption) as you write the data to S3 you can use Amazon key management service (KMS) use 256 bit  advanced encryption standard (AES)

 

You can also use CSE (client side encryption) in the enterprise

 

SSE-S3 (aws managed keys)

 

Check box encryption where AWS handles the following for S3

 

  • Key management
  • Key protection

 

Every object is encrypted with a unique key, which in itself is encrypted by a separate master key, which is issued monthly with AWS rotating the keys

 

SSE-KMS (AWS KMS Keys)

 

Fully integrated service , where AWS handles key management and protection, but the enterprise manages the keys .It has the following benefits

 

  • Separate permissions for using the master key
  • Auditing see who used your key
  • Failed attempts from users who did not have the right permission to decrypt

 

SSE-C(customer provided keys)

 

Enterprise maintains its own encryption key, but doesn’t manage client side encryption library

 

Client side encryption

 

Encrypt on the client side before transmit, you have two options

 

  • Use AWS KMS managed customer key
  • Use client side master key

 

When using client side the enterprise retains E2E control of the encryption including management of the keys

 

Versioning

 

Helps against accidental deletion of data , by keeping multiple versions of object in a bucket . versioning is activated at the bucket level ,once on it cannot be removed.

 

You can restore an object by referencing the version ID in addition to the bucket ID and object key

 

MFA delete

 

In addition to normal security credentials MFA delete requires an authentication code (temporary one time password).it can only be enabled by the root account (key generated by a virtual MFA device )

 

Pre signed URLs

 

By default objects are private meaning that only the owner has access,but the owner can create a pre signed url which will allow time limited permission to download object’s .key created using

 

  • Owners security credentials
  • Bucket name
  • Object key
  • HTTP method ( GET for download)
  • Expiration date
  • Time

 

Gives good protection against web scrapers

 

Multipart upload

 

AWS provides a multipart upload API for larger files .This gives better network utilisation by virtue of parallel transfers , supports pause and resume , and the ability to upload where the original size is unknown

 

Range GETs

 

The range of  bytes to be downloaded is defined in the HTTP header of the GET , useful if you have poor connectivity and a large object to download

 

Cross region replication.

 

Allows replication of new objects in a bucket in one AWS region to another AWS region.Metadata and ACLs associated with the object is alo part of the replication. Versioning must be turned on in both source and destination buckets , and you must use an IAM policy to give S3 permission to replicate .

 

Commonly used to reduce latency required to access objects .Existing objects in a bucket are not replicated when  it’s turned on this is achieved by a separate command

 

Logging

You can enable S3 access logs to check requests made to the bucket, when you enable you must choose where the logs will be stored , it can be the local bucket or another bucket, its good practice to define a prefix such as your bucket name / logs.They include the following information

 

  • Requester account and ip address
  • Bucket name
  • Request time
  • Action (get , put, list)
  • Response status error code

 

Event notifications

 

When actions are taken on an S3 bucket ,event notifications provide a mechanism, where you can perform other actions in response to the change for example transcoding media files once they are uploaded

 

Notifications are set up at the bucket level , and can be configured via the S3 console , or REST API or by the SDK.

 

Notifications can be sent through SNS ( simple notification service) or SQS (simple queue service) or delivered to AWS Lambda to invoke lambda functions.

 

Best practice , patterns , performance

 

Common pattern is to backup enterprise file storage to an S3 bucket in a hybrid deployment. If you are using S3 ib#n a GET intensive mode , you should use cloudfront as a caching mechanism for the site / Bucket.

 

Amazon glacier

 

Low cost archive storage service with a 3-5 hour retrieval time  for the data

 

Archives / vaults

 

Data is stored in archives and can contain up to 40 TB  of data , and you can have an unlimited amount of archives.Vaults are containers for archives, each AWS account can have up to a 1000 vaults, they can be controlled via IAM policies or vault access policies

 

Data retrieval

 

You can retreive 5% of your data for free each month

 

Glacier vs S3

 

Glacier S3
40 TB archive 5 TB object
System generated archive ID Choose bucket name
Auto encryption Encryption at rest optional