EC2 – / UCCE / AWS / azure/Security / IP-PBX

Thats why its called S3 because of the three S’s,it is basically easy to use object storage, with of course a web front end.You pay only for the storage you use so it is dynamic , so capacity planning is no longer a constraint

Common uses are

Backup and archive for on premise or cloud
Content storage and distribution
Bi data analytics
Static website hosting
Disaster recovery

Storage classes are

General purpose
Infrequent access
Archive

Glacier is another storage service ,but it is optimised for Data archiving and long term backup, good for “cold data” where a retrieval time of hours is acceptable

Object storage Vs Block / File storage

In traditional IT environments two kinds of storage dominate

Block storage, operates at a low level and manages data as numbered fixed size blocks
File storage operates at a high level (OS) manages data as a hierarchy of files

These two systems would be accessed over a network in the form of a SAN using protocols Fibre Channel , but basically server and OS dependant

S3 is cloud based object storage , it is server independent and is accessed over the internet , data is managed via standard HTTP verbs

Each S3 bucket contains

Data
MetaData

Objects reside in containers called buckets , they are a simple flat file with no hierarchy in terms of a file system. A bucket can hold an infinite number of objects . You can only GET or PUT an object you cannot mount or open a bucket.S3 objects are automatically replicated within a region.

Buckets

A bucket is a container, and forms the top level namespace in S3

AWS Regions

Even though the name for a bucket is global, the bucket is created in the region that you choose, so you can control where your data is stored.

Objects

They are the entries that are actually stored in the S3 bucket. Data is the actual file itself, and Metadata is data about the file.The data portion is opaque to S3 , it doesn’t care about the actual data itself .The metadata of the object is a set of name/value pairs that describes the object.

Metadata breaks down into

System metadata this is used by S3 eg date last modified, size , MD5 digest and HTTP content-Type
User metadata created when the object is created , you can tag Data with something meaningful

Keys

Every object in a bucket is identified by a unique identifier, it called a key it can be up to 1024 bytes of UTF 8, you can have the same key across two buckets but you cannot have identical keys within the same bucket. Bucket and key create a UID

Object URL

S3 is internet based storage and hence has an associated URL

http://mybucket.s3.amazonaws.com/jack.doc

S3 bucket name = mybucket

Key = jack.doc

S3 operations

Create / delete bucket
Write an object
Read an object
Delete an object
List keys in the bucket

REST interface

Basically all HTTP verbs form, the API

Create= HTTP PUT (sometimes POST)
Read= HTTP GET
delete= HTTP DELETE
Update = HTTP post

You interact with S3 via higher level interfaces rather than the REST direct these are

AWS SDK
JavaScript
Java
.NET
Node.js
PHP
Python
Ruby
Go
C++
AWS CLI
AWS Management console

Durability and Availability

Durability = 99.99999999%

Availability = 99.99%

Availability is achieved by device redundancy/ multiple devices within a region. Though this can lead to data consistency issues , since it takes time for updates to propagate to all new devices

Access control

To give access of a bucket to others

Coarse grained access controls S3 ACL’s (READ, WRITE FULL-CONTROL at object or bucket level (legacy)

Fine grained access controls , S3 bucket policy , AWS IAM policy and query string manipulation, this is the recommended access control mechanism

Bucket policies include an explicit reference to the IAM principal in the policy, which can be associated with a different AWS account.Using bucket policy you can also specify from where the S3 is accessed eg IP address and also at a particular time of day.

Static Website Hosting

This is a very common use for S3 , if there is no server side scripting required (PHP,ASP.NET or JSP). Because an S3 bucket has an URL it’s easy to change it into a website.

Create a bucket with the same name as the desired website hostname
Upload static files to the bucket
Make all files public (world readable)
Enable static website hosting for the bucket.This includes specifying an index document and an error document
The website will now be available at the S£ website URL <bucket-name>.s3-website-<AWS-region>.amazonaws.com
Create a friendly DNS name in your own domain using DNS CNAME or Amazon Route 53 alias that resolves to the amazon website URL
The website will now be available at your website domain name

Prefixes and Delimiters

This provides a way to access objects in a bucket with hierarchy for example you may want to save some server logs by

log/2016/january/server 42.log

log/2016/february/server 42.log

All of the access methods (including AWS console) support the use of delimiters as above . This technique used in conjunction with bucket policy allow you to control access at the user level.

Storage Classes

The range of storage classes are

standard	High durability and availability, low latency with high throughput
Standard infrequent access	Same as standard , but for colder data, eg more long lived , less frequently accessed	Lower GB per month cost than standard, minimum object size is 128KB, minimum duration is 30 days’s so use it for infrequent access for data that is older than 300 days
Reduced redundancy storage RRS	Lower durability ( 4 nines)	Reduced cost than standard
Glacier	Low cost ,no real time access, retrieval time of several hours	Controlled via S3 API copies to RRS

Object lifecycle management

Data traditionally can be thought of going from left to right

Hot

Frequent access low latency
Use S3 standard

Warm

Less frequent

30 days +, use standard IA

Cold

Encryption

in flight to S3 data is encrypted via HTTPS (to and from S3)

At rest you can use several variations of SSE ( server side encryption) as you write the data to S3 you can use Amazon key management service (KMS) use 256 bit advanced encryption standard (AES)

You can also use CSE (client side encryption) in the enterprise

SSE-S3 (aws managed keys)

Check box encryption where AWS handles the following for S3

Key management
Key protection

Every object is encrypted with a unique key, which in itself is encrypted by a separate master key, which is issued monthly with AWS rotating the keys

SSE-KMS (AWS KMS Keys)

Fully integrated service , where AWS handles key management and protection, but the enterprise manages the keys .It has the following benefits

Separate permissions for using the master key
Auditing see who used your key
Failed attempts from users who did not have the right permission to decrypt

SSE-C(customer provided keys)

Enterprise maintains its own encryption key, but doesn’t manage client side encryption library

Client side encryption

Encrypt on the client side before transmit, you have two options

Use AWS KMS managed customer key
Use client side master key

When using client side the enterprise retains E2E control of the encryption including management of the keys

Versioning

Helps against accidental deletion of data , by keeping multiple versions of object in a bucket . versioning is activated at the bucket level ,once on it cannot be removed.

You can restore an object by referencing the version ID in addition to the bucket ID and object key

MFA delete

In addition to normal security credentials MFA delete requires an authentication code (temporary one time password).it can only be enabled by the root account (key generated by a virtual MFA device )

Pre signed URLs

By default objects are private meaning that only the owner has access,but the owner can create a pre signed url which will allow time limited permission to download object’s .key created using

Owners security credentials
Bucket name
Object key
HTTP method ( GET for download)
Expiration date
Time

Gives good protection against web scrapers

Multipart upload

AWS provides a multipart upload API for larger files .This gives better network utilisation by virtue of parallel transfers , supports pause and resume , and the ability to upload where the original size is unknown

Range GETs

The range of bytes to be downloaded is defined in the HTTP header of the GET , useful if you have poor connectivity and a large object to download

Cross region replication.

Allows replication of new objects in a bucket in one AWS region to another AWS region.Metadata and ACLs associated with the object is alo part of the replication. Versioning must be turned on in both source and destination buckets , and you must use an IAM policy to give S3 permission to replicate .

Commonly used to reduce latency required to access objects .Existing objects in a bucket are not replicated when it’s turned on this is achieved by a separate command

Logging

You can enable S3 access logs to check requests made to the bucket, when you enable you must choose where the logs will be stored , it can be the local bucket or another bucket, its good practice to define a prefix such as your bucket name / logs.They include the following information

Requester account and ip address
Bucket name
Request time
Action (get , put, list)
Response status error code

Event notifications

When actions are taken on an S3 bucket ,event notifications provide a mechanism, where you can perform other actions in response to the change for example transcoding media files once they are uploaded

Notifications are set up at the bucket level , and can be configured via the S3 console , or REST API or by the SDK.

Notifications can be sent through SNS ( simple notification service) or SQS (simple queue service) or delivered to AWS Lambda to invoke lambda functions.

Best practice , patterns , performance

Common pattern is to backup enterprise file storage to an S3 bucket in a hybrid deployment. If you are using S3 ib#n a GET intensive mode , you should use cloudfront as a caching mechanism for the site / Bucket.

Amazon glacier

Low cost archive storage service with a 3-5 hour retrieval time for the data

Archives / vaults

Data is stored in archives and can contain up to 40 TB of data , and you can have an unlimited amount of archives.Vaults are containers for archives, each AWS account can have up to a 1000 vaults, they can be controlled via IAM policies or vault access policies

Data retrieval

You can retreive 5% of your data for free each month

Glacier vs S3

Glacier	S3
40 TB archive	5 TB object
System generated archive ID	Choose bucket name
Auto encryption	Encryption at rest optional

Category: EC2

S3 101