Thats why its called S3 because of the three S’s,it is basically easy to use object storage, with of course a web front end.You pay only for the storage you use so it is dynamic , so capacity planning is no longer a constraint
Common uses are
- Backup and archive for on premise or cloud
- Content storage and distribution
- Bi data analytics
- Static website hosting
- Disaster recovery
Storage classes are
- General purpose
- Infrequent access
- Archive
Glacier is another storage service ,but it is optimised for Data archiving and long term backup, good for “cold data” where a retrieval time of hours is acceptable
Object storage Vs Block / File storage
In traditional IT environments two kinds of storage dominate
- Block storage, operates at a low level and manages data as numbered fixed size blocks
- File storage operates at a high level (OS) manages data as a hierarchy of files
These two systems would be accessed over a network in the form of a SAN using protocols Fibre Channel , but basically server and OS dependant
S3 is cloud based object storage , it is server independent and is accessed over the internet , data is managed via standard HTTP verbs
Each S3 bucket contains
- Data
- MetaData
Objects reside in containers called buckets , they are a simple flat file with no hierarchy in terms of a file system. A bucket can hold an infinite number of objects . You can only GET or PUT an object you cannot mount or open a bucket.S3 objects are automatically replicated within a region.
Buckets
A bucket is a container, and forms the top level namespace in S3
AWS Regions
Even though the name for a bucket is global, the bucket is created in the region that you choose, so you can control where your data is stored.
Objects
They are the entries that are actually stored in the S3 bucket. Data is the actual file itself, and Metadata is data about the file.The data portion is opaque to S3 , it doesn’t care about the actual data itself .The metadata of the object is a set of name/value pairs that describes the object.
Metadata breaks down into
- System metadata this is used by S3 eg date last modified, size , MD5 digest and HTTP content-Type
- User metadata created when the object is created , you can tag Data with something meaningful
Keys
Every object in a bucket is identified by a unique identifier, it called a key it can be up to 1024 bytes of UTF 8, you can have the same key across two buckets but you cannot have identical keys within the same bucket. Bucket and key create a UID
Object URL
S3 is internet based storage and hence has an associated URL
http://mybucket.s3.amazonaws.com/jack.doc
S3 bucket name = mybucket
Key = jack.doc
S3 operations
- Create / delete bucket
- Write an object
- Read an object
- Delete an object
- List keys in the bucket
REST interface
Basically all HTTP verbs form, the API
- Create= HTTP PUT (sometimes POST)
- Read= HTTP GET
- delete= HTTP DELETE
- Update = HTTP post
You interact with S3 via higher level interfaces rather than the REST direct these are
- AWS SDK
- JavaScript
- Java
- .NET
- Node.js
- PHP
- Python
- Ruby
- Go
- C++
- AWS CLI
- AWS Management console
Durability and Availability
Durability = 99.99999999%
Availability = 99.99%
Availability is achieved by device redundancy/ multiple devices within a region. Though this can lead to data consistency issues , since it takes time for updates to propagate to all new devices
Access control
To give access of a bucket to others
- Coarse grained access controls S3 ACL’s (READ, WRITE FULL-CONTROL at object or bucket level (legacy)
- Fine grained access controls , S3 bucket policy , AWS IAM policy and query string manipulation, this is the recommended access control mechanism
Bucket policies include an explicit reference to the IAM principal in the policy, which can be associated with a different AWS account.Using bucket policy you can also specify from where the S3 is accessed eg IP address and also at a particular time of day.
Static Website Hosting
This is a very common use for S3 , if there is no server side scripting required (PHP,ASP.NET or JSP). Because an S3 bucket has an URL it’s easy to change it into a website.
- Create a bucket with the same name as the desired website hostname
- Upload static files to the bucket
- Make all files public (world readable)
- Enable static website hosting for the bucket.This includes specifying an index document and an error document
- The website will now be available at the S£ website URL <bucket-name>.s3-website-<AWS-region>.amazonaws.com
- Create a friendly DNS name in your own domain using DNS CNAME or Amazon Route 53 alias that resolves to the amazon website URL
- The website will now be available at your website domain name
Prefixes and Delimiters
This provides a way to access objects in a bucket with hierarchy for example you may want to save some server logs by
log/2016/january/server 42.log
log/2016/february/server 42.log
All of the access methods (including AWS console) support the use of delimiters as above . This technique used in conjunction with bucket policy allow you to control access at the user level.
Storage Classes
The range of storage classes are
| standard | High durability and availability, low latency with high throughput | |
| Standard infrequent access | Same as standard , but for colder data, eg more long lived , less frequently accessed | Lower GB per month cost than standard, minimum object size is 128KB, minimum duration is 30 days’s so use it for infrequent access for data that is older than 300 days |
| Reduced redundancy storage
RRS |
Lower durability ( 4 nines) | Reduced cost than standard |
| Glacier | Low cost ,no real time access, retrieval time of several hours | Controlled via S3 API copies to RRS |
Object lifecycle management
Data traditionally can be thought of going from left to right
| Hot
Frequent access low latency |
Warm
Less frequent 30 days +, use standard IA |
Cold
Archive 90 days move to glacier |
Deletion
After 3 years delete |
You can use S3 lifecycle configuration rules to move data
Encryption
in flight to S3 data is encrypted via HTTPS (to and from S3)
At rest you can use several variations of SSE ( server side encryption) as you write the data to S3 you can use Amazon key management service (KMS) use 256 bit advanced encryption standard (AES)
You can also use CSE (client side encryption) in the enterprise
SSE-S3 (aws managed keys)
Check box encryption where AWS handles the following for S3
- Key management
- Key protection
Every object is encrypted with a unique key, which in itself is encrypted by a separate master key, which is issued monthly with AWS rotating the keys
SSE-KMS (AWS KMS Keys)
Fully integrated service , where AWS handles key management and protection, but the enterprise manages the keys .It has the following benefits
- Separate permissions for using the master key
- Auditing see who used your key
- Failed attempts from users who did not have the right permission to decrypt
SSE-C(customer provided keys)
Enterprise maintains its own encryption key, but doesn’t manage client side encryption library
Client side encryption
Encrypt on the client side before transmit, you have two options
- Use AWS KMS managed customer key
- Use client side master key
When using client side the enterprise retains E2E control of the encryption including management of the keys
Versioning
Helps against accidental deletion of data , by keeping multiple versions of object in a bucket . versioning is activated at the bucket level ,once on it cannot be removed.
You can restore an object by referencing the version ID in addition to the bucket ID and object key
MFA delete
In addition to normal security credentials MFA delete requires an authentication code (temporary one time password).it can only be enabled by the root account (key generated by a virtual MFA device )
Pre signed URLs
By default objects are private meaning that only the owner has access,but the owner can create a pre signed url which will allow time limited permission to download object’s .key created using
- Owners security credentials
- Bucket name
- Object key
- HTTP method ( GET for download)
- Expiration date
- Time
Gives good protection against web scrapers
Multipart upload
AWS provides a multipart upload API for larger files .This gives better network utilisation by virtue of parallel transfers , supports pause and resume , and the ability to upload where the original size is unknown
Range GETs
The range of bytes to be downloaded is defined in the HTTP header of the GET , useful if you have poor connectivity and a large object to download
Cross region replication.
Allows replication of new objects in a bucket in one AWS region to another AWS region.Metadata and ACLs associated with the object is alo part of the replication. Versioning must be turned on in both source and destination buckets , and you must use an IAM policy to give S3 permission to replicate .
Commonly used to reduce latency required to access objects .Existing objects in a bucket are not replicated when it’s turned on this is achieved by a separate command
Logging
You can enable S3 access logs to check requests made to the bucket, when you enable you must choose where the logs will be stored , it can be the local bucket or another bucket, its good practice to define a prefix such as your bucket name / logs.They include the following information
- Requester account and ip address
- Bucket name
- Request time
- Action (get , put, list)
- Response status error code
Event notifications
When actions are taken on an S3 bucket ,event notifications provide a mechanism, where you can perform other actions in response to the change for example transcoding media files once they are uploaded
Notifications are set up at the bucket level , and can be configured via the S3 console , or REST API or by the SDK.
Notifications can be sent through SNS ( simple notification service) or SQS (simple queue service) or delivered to AWS Lambda to invoke lambda functions.
Best practice , patterns , performance
Common pattern is to backup enterprise file storage to an S3 bucket in a hybrid deployment. If you are using S3 ib#n a GET intensive mode , you should use cloudfront as a caching mechanism for the site / Bucket.
Amazon glacier
Low cost archive storage service with a 3-5 hour retrieval time for the data
Archives / vaults
Data is stored in archives and can contain up to 40 TB of data , and you can have an unlimited amount of archives.Vaults are containers for archives, each AWS account can have up to a 1000 vaults, they can be controlled via IAM policies or vault access policies
Data retrieval
You can retreive 5% of your data for free each month
Glacier vs S3
| Glacier | S3 |
| 40 TB archive | 5 TB object |
| System generated archive ID | Choose bucket name |
| Auto encryption | Encryption at rest optional |