Comprehensive Guide to AWS S3 CLI Commands with Examples

Introduction to AWS S3 Storage Service

Amazon S3 (Simple Storage Service) offers a highly durable, available and infinitely scalable cloud storage service. It allows you to store and retrieve any volume of data from anywhere on the web.

Some key capabilities include:

Storage

Practically unlimited capacity and scale
Storage across multiple isolated data centers
11 x 9s durability (99.999999999%)

Data Transfer

High speed upload and download
Transfer acceleration and multi-part
Intelligent tiering to lower cost tiers

Security

Encryption (both in-transit and at-rest)
IAM, ACL access policies
Block public access options
PCI DSS compliance

Management

AWS console and CLI access
Integrates with hundreds of AWS services
Programmable via SDK in any language

This combination of reliability, security and ease of use has made S3 the most widely used cloud storage solution worldwide.

Internal Architecture

Behind the scenes, S3 stores data across multiple geographically isolated data centers. This ensures high availability and protects against disasters or failures.

The data is redundantly stored on hard disk drives and servers. Databases track metadata and mappings of object IDs to physical locations. Automatic integrity checks detect and heal bit rot and replication ensures consistency.

S3 achieves high scalability via horizontal partitioning and load balancing. Related objects may reside on different servers while lookup tables map objects to locations.

Eventual Consistency Tradeoffs

S3 provides read-after-write consistency for PUTs of new objects. However updates to existing objects demonstrate eventual consistency.

Consistency is sacrificed for availability and partition tolerance per the CAP theorem. Most consistency anomalies are solved within seconds due to replica synchronization.

PRO TIP: Use object versioning if dealing with concurrent updates.

Behind the Scenes Data Flow

When you execute PUT requests to save objects in S3, the data is streamed to an S3 data center edge location. Authenticating your request, storage allocation, replication, error handling all happen automatically.

When you later fetch the object using a GET request, S3 internally locates and streams the data back from the storage drives to you. If latency is an issue, enable transfer acceleration via CloudFront.

Automatic syncing of data across regions provides enhanced performance and redundancy.

Now that we understand the basics of S3, let‘s see how to manage all these objects. This is where the AWS CLI comes in.

Managing AWS S3 with CLI Commands

The AWS command line interface offers simple yet powerful commands to interact with S3 storage programatically without needing to write any code.

You just need to have the AWS CLI installed and configured beforehand:

# Install CLI  

$ aws configure 
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]: json

This uses your IAM access keys to authenticate CLI requests. Now let‘s explore the various data management capabilities.

cp – Efficiently Copy Files

The cp command copies files & objects in and out of S3 buckets like traditional copy.

Syntax:

aws s3 cp <source> <destination> [options]

Let‘s look at some examples:

# Copy file from local FS to S3   
aws s3 cp test.txt s3://mybucket  

# Download file from S3 to local
aws s3 cp s3://mybucket/test.txt ~/downloads    

# Copy between S3 buckets  
aws s3 cp s3://bucket1/test.txt s3://bucket2

You can also recursively copy entire directories with the –recursive parameter.

Some other useful options are:

–dryrun – Test/preview copy operation

–storage-class STANDARD_IA – Save costs by using Infrequent Access tier

–only-show-errors – Reduces output to errors only

–acl public-read – Grant public read access

mv – Move/Rename S3 Objects

The mv command moves files within or between S3 buckets similar to the traditional Unix mv command.

Syntax:

aws s3 mv <source> <destination> [options]

Example usage:

# Move file from local and delete locally
aws s3 mv test.txt s3://mybucket 

# Rename stored object
aws s3 mv s3://mybucket/test.txt s3://mybucket/doc.txt

# Move file between S3 buckets
aws s3 mv s3://bucket1/test.txt s3://bucket2

Useful scenarios are archiving old data into the Glacier storage tier or consolidating multiple buckets.

ls – List S3 Buckets & Objects

The ls command lists all buckets or objects under a specified prefix inside a bucket. This is useful to explore data in S3 programatically.

Syntax:

aws s3 ls [s3://bucketname] [options]

Examples:

View bucket level listings:

# See all buckets 
aws s3 ls 

# List specific bucket contents
aws s3 ls s3://mybucket

# Recursively list bucket objects   
aws s3 ls s3://mybucket --recursive

And directory style listings:

# Directory style listing
aws s3 ls s3://mybucket --summarize --human-readable --recursive 

TOTAL: 10.9MiB
2021-01-01 22:10:56   10.7MiB myfolder/build.iso
2021-01-01 22:11:19      64KiB myfolder/config.yaml

You have fine grained control over the output format.

mb – Make a New S3 Bucket

The mb command makes a new S3 bucket. Bucket names have to be globally unique across AWS.

Syntax:

aws s3 mb s3://bucketname [options]

Example:

aws s3 mb s3://mynewbucket --region us-east-1

Some tips when creating buckets:

Choose a unique name relevant to usage
Set location appropriate to users
Enable default encryption for security

rb – Delete an Empty S3 Bucket

The rb command deletes an existing empty S3 bucket that you own.

Syntax:

aws s3 rb s3://bucketname

Example:

aws s3 rb s3://mynewbucket

To delete an non-empty bucket, add the –force flag. This deletes all objects inside as well.

Advanced AWS S3 Commands

Let‘s look at some advanced, yet useful S3 CLI commands:

sync – Sync a directory, intelligently copying new/updated files:

aws s3 sync dir s3://bucket/prefix

presign – Generate temporary expiring URL for object:

aws s3 presign s3://bucket/object --expires-in 300

rm – Delete objects from a bucket:

aws s3 rm s3://bucketname/prefix --recursive

Explore more such commands with:

aws s3 help

Using AWS S3 Programmatically

The CLI provides easy imperative control to manage S3 storage manually.

However, typically you will want to integrate S3 uploads, downloads and data processing within your own applications.

The AWS SDKs provide direct access to S3 APIs from your code, handling low level details for you automatically.

Let‘s see an example to upload a file to S3 in Python:

import boto3

# Create an S3 client  
s3 = boto3.client(‘s3‘)

# Upload file to bucket
with open(‘test.png‘, ‘rb‘) as f:
    s3.upload_fileobj(f, ‘my-bucket‘, ‘images/test.png‘)

And to download data from S3:

# Download remote object locally
with open(‘downloaded.png‘, ‘wb‘) as f:  
    s3.download_fileobj(‘my-bucket‘, ‘images/test.png‘, f)

You can use similar SDK calls in Java, JavaScript, C#, Go code for your applications.

This enables directly connecting to S3 for your storage needs instead of having to run CLI commands separately.

Security Best Practices

While S3 provides state of the art security, additional practices you should follow include:

Use roles for authorization instead of access keys which could be leaked. Roles provide temporary security tokens and minimizes long term credentials.

Enable default encryption on buckets so all objects are encrypted at rest by default. New data will automatically remain secure without any application changes.

Restrict bucket policies and ACLs to deny any public access unless explicitly needed. Use signed URLs or identity policies for temporary third party access.

Analyze access logs and metrics to detect anomalies indicating any unintended access attempts. Get alerted for unexpected activities.

Enforce Multi Factor Authentication for admin users and disable unused credentials. Rotate access keys periodically to limit damage from any lapsed keys.

Utilize network security mechanisms like VPC endpoints and VPNs to securely access S3 without exposing buckets on the public internet.

Cost Optimization

While S3 offers great value initially, costs can spiral out of control at scale if you‘re not careful.

Some best practices to optimize costs include:

Use Infrequent Access for cold data: Less accessed data can moved to IA for cost savings between 20-50%

Enable object lifecycle policies: To automatically transition objects to lower cost tiers or expire non-current objects.

Use S3 Glacier for archival data: For long term retention of backups, logs etc saving ~75% costs

Analyze storage metrics and usage trends: Identify waste and unnecessary expenditures

Pre-warm Cloudfront caches: So data is served from Edge locations instead of paying for S3 egress

Compress Large Data: Use gzip or other formats to reduce storage footprint

Following these best practices diligently can provide 80-90% savings easily.

Reliability Best Practices

S3 architecturally provides very high reliability. We can enhance it further with:

Cross Region Replication: Critical data can be synchronously replicated across geographically separated S3 data centers to protect against regional failures.

Versioning: Versioning preserves objects even if deleted or overwritten. This safeguards against accidental mismatches or application bugs.

Encryption: Enable encryption both in-transit and at-rest for robust data security against physical disasters.

Multi-part uploads: Break up large data into parts for faster, parallel and reliable uploads

Retry failed requests: Temporary failures or throttling errors can be retried via automated exponential backoff.

Adhering to these discipline will help achieve true enterprise grade reliability.

Troubleshooting Issues

End users can occassionally encounter frustrating issues like throttling errors, access denials or performance problems. Let‘s explore remedies for common scenarios:

Throttling errors or slow uploads: S3 buckets have default limits on maximum request rates and concurrency. You can request S3 support to raise these limits by providing usage details. Alternative parallelize large uploads using multi-part.

Access denied errors: Check the IAM permissions assigned to the user. Explicitly allow the s3:GetObject, s3:PutObject and other actions as needed for your use case. Resource policies may also block unintended access.

High latency performance: Enable S3 transfer acceleration to use the AWS global network and CloudFront edge locations for faster data transfer over long distances.

Multipart upload failures: These uploads have large number of parts so timeouts can cause failures. Set higher socket timeouts. Or use S3 byte-range uploads to handle large files more efficiently.

Query performance issues: For large datasets use S3 Select to filter rows or query data directly instead of pulling entire contents. Also avoid listing millions of small objects unnecessarily.

Getting familiarity with these common scenarios will you diagnose and troubleshoot future cases easily.

Conclusion

In this comprehensive guide, we explored AWS S3 internals, CLI usage and programming integration at depth along with best practices.

The simple yet powerful AWS S3 CLI commands allow efficiently interacting with object storage programatically for infrastructure automation purposes.

For custom application development, use AWS S3 SDKs instead to tightly integrate storage capabilities directly into your own code.

Following the security, cost and reliability guidelines is vital for smooth long term operations at scale.

S3 forms the foundation for many big data lakes and modern cloud native workloads. I hope this article provided a 360 degree perspective so you can utilize it‘s capabilities for your projects confidently!

Let me know if you have any other questions in the comments!