How to deal with AWS (Amazon Web Services)

Swatimeena
5 min readAug 26, 2020
Photo by Hello I’m Nik 🎞 on Unsplash

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. This means customers of all sizes and industries can use it to store and protect any amount of data for a range of use cases, such as websites, mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics. Amazon S3 provides easy-to-use management features so you can organize your data and configure finely-tuned access controls to meet your specific business, organizational, and compliance requirements.

Benefits -

  • Industry-leading performance, scalability, availability, and durability
  • Wide range of cost-effective storage classes
  • Unmatched security, compliance and audit capabilities
  • Easily manage data and access controls
  • Most supported cloud storage service

Pre-requisites: Understanding of Buckets , Containers and Objects.

import boto3
import os
s3 = boto3.client('s3')files = list(map(lambda x: x['Key'],s3.list_objects_v2(
Bucket='bucket_name',
Prefix ='the_starting_string_to_your_folders/objects' ['Contents']))

s3.list_objects_v2 : Returns some or all (up to 1000) of the objects in a bucket. You can use the request parameters as selection criteria to return a subset of the objects in a bucket using a prefix or more arguments.

Prefix — Limits the response to keys that begin with the specified prefix

Bucket name — Name of the bucket (String)

s3.list_objects_v2 returns a dictionary in which the key-value ‘Contents’ is:

Contents’: [
{
‘Key’: ‘string’, ‘LastModified’: datetime(2015, 1, 1), ‘ETag’: ‘string’,
‘Size’: 123,‘StorageClass’: ‘STANDARD’|’REDUCED_REDUNDANCY’|’GLACIER’|’STANDARD_IA’|’ONEZONE_IA’,
‘Owner’: {‘DisplayName’: ‘string’,
‘ID’: ‘string’ }
},
],

We need ‘key’ from contents which is basically the name of the file/object we want.

How to download the file using s3

file_name= 'name of file in which we want to save our data, basically the path of the file's3.download_file('bucket_name', file, 'file_name')

How to upload the file

uploading_path='The file name from which you want to upload the data'uploaded_path='The name with which you want to save the data on aws's3.upload_file(uploading_path,bucket_name,uploaded_path)

How to list all the items in the bucket on aws:

Run the below command in your terminalaws s3 ls s3://bucket_name/For further listing give the folder/object path:aws s3 ls s3://bucket_name/object_path/Like this navigate inside all the objects inside the bucket.

Move the objects inside the bucket:

aws s3 mv s3://bucket_name/move_from_file s3://bucket_name/move_to_pathmove_from_file='path from where you want to move the file along with file name'If we have a object named text_data inside the bucket and move_file.txt is the file we want to move i.e inside the object/folder text_data then move_from_file will be 'text_data/move_file.txt'move_from_file='path where you want to move the file move_file.txt'Let's say we want to move this .txt file to the object final_data inside another object dataset, then move_to_path will be 'dataset/final_data/' (If we want to keep the same name of the file i.e move_file) otherwise 'dataset/final_data/new_name.txt'

What if we want to move multiple files to a new destination ??

Python script to the rescue✌🏻😎:

Files is a list which contains all the files you want to move to new destination.file = 'path of the file we want to copy/delete' or 'text_data/move_file.txt'copy_to_path = dataset/final_data/new_name.txt or 'dataset/final_data/'for file in files:
s3.copy_object(Bucket=bucket_name,Key=copy_to_path,CopySource=bucket_name+'/'+file)
s3.delete_object(Bucket=bucket_name,Key=file)
Basically first copy the object to the new destination (Saves the original file, if things go wrong)
Then delete the previous one, if you don't need the file anymore.
Run this python script and copy/delete/move anything.

How to access the files/folder/objects on aws

  1. Run
aws ls s3://bucket_name/

The above command will show the content of the bucket.

For eg. The content in the bucket_name is shown below:

PRE  s3:/PRE data_nlp/PRE dataset/PRE txt_data/YYYY--MM-DD HH:MM:SS    KKKK random.jsonYYYY--MM-DD HH:MM:SS    JJJJ random.xlsxYYYY--MM-DD HH:MM:SS    PPPP random.npyYYYY--MM-DD HH:MM:SS    OOOO random.csvYYYY--MM-DD HH:MM:SS    TTTT random_0.txtYYYY--MM-DD HH:MM:SS    TTTT random_1.txtYYYY--MM-DD HH:MM:SS    TTTT random_2.txtYYYY--MM-DD HH:MM:SS    AAAA random.pkl
  1. To access move_file.txt i.e., inside the object txt_data
files = list(map(lambda x: x['Key'],s3.list_objects_v2(
Bucket=bucket_name,
Prefix ='txt_data/')['Contents']))

Files will be the list of all the files inside the txt_data.

2. To access all txt files — random_0.txt, random_1.txt, random_2.txt:

txt_files = list(map(lambda x: x['Key'],s3.list_objects_v2(
Bucket=bucket_name,
Prefix ='random_')['Contents']))

Output: txt_files=[‘random_0.txt’, ‘random_1.txt’, ‘random_2.txt’]

Possible Errors (Life is not that easy🤓):

  • While running the command
aws s3 ls s3://bucket_name/ 

Error 1: Unable to locate credentials. You can configure credentials by running “aws. configure”.

Solution: We need two things to run this command

1. AWS_ACCESS_KEY_ID2. AWS_SECRET_ACCESS_KEY

If you have both of them then just run

export AWS_ACCESS_KEY_ID=valueexport AWS_SECRET_ACCESS_KEY=value

in the terminal.

Error 2:

zsh: command not found: aws

Solution 1: If you already have aws in your system run this in terminal

export PATH=~/bin:$PATH

Solution 2: Install aws in the system ( What were you expecting without installing aws 😒😏 )

Follow 
https://docs.aws.amazon.com/cli/latest/userguide/install-macos.html#install-macosos-prereq
1. curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip"2. unzip awscli-bundle.zip3. ./awscli-bundle/install -b ~/bin/aws4. aws --version

References :

Introduction to Amazon S3

You can find everything related to S3 here, So, Dig it in !!!

Find useful things related to Command Line Ineterface here.

Installation help.

Introduce yourself to Buckets, Containers and Objects.

--

--

Swatimeena

Data Scientist @Sprinklr | IIT Bombay | IIT (ISM) Dhanbad