As a software engineer sometimes you have to write a script to add or update some records which you don't want to handle in your codebase. For a few records, you can run your script locally even for hundreds of records you can use multithreading but when you have to deal with thousands of records these methods are not enough. So here comes AWS lambdas or GCP function to streamline your whole script where you don't need to worry about rerunning your script if there is any network or machine interruption.
Summary: Learn how to efficiently handle large record updates using AWS Lambdas, with a driver function and a target function.
Set up your Lambdas:
Set up two AWS lambda functions, one driver function and the other target function. The driver function handles source(Record location) authentication and data processing before invoking the target function asynchronously.
For example, if you need to modify data stored in S3 bucket, here is a simple driver/iterator function:
import boto3
client = boto3.client('lambda')
def lambda_handler(event, context):
index = event['iterator']['index'] + 1
# update payload according to your requirements
response = client.invoke(
FunctionName='LAMBDA_TO_INVOKE',
InvocationType='Event',
Payload=json.dumps({
'bucket_name': 'YOUR_BUCKET_NAME',
'file_key': 'YOUR_FILE_KEY'
})
)
return {
'index': index,
'continue': index < event['iterator']['count'],
'count': event['iterator']['count']
}
The target function fulfills the purpose and performs read-write operations or other modifications at the source.
Here is a simple example of a target/invoked function :
import boto3
s3_client = boto3.client('s3')
def lambda_handler(event, context):
# Extracting necessary information from the event
bucket_name = event['bucket_name']
file_key = event['file_key']
# Get file from S3
s3_response = s3_client.get_object(Bucket=bucket_name, Key=file_key)
file_content = s3_response['Body'].read()
# Perform modification on the file content (example: convert to uppercase)
modified_content = file_content.upper()
# Upload the modified file back to S3
s3_client.put_object(Bucket=bucket_name, Key=file_key, Body=modified_content)
return {
'statusCode': 200,
'body': 'File modified successfully'
}
Below are some valuable tips for automation:
Before you run your script, ensure these things for better results:
Add proper logging to track your modified and remaining records and overall progress.
Keep your OAuth credentials in your credentials manager or any safe place.
Structure your program so that in any case if you have to rerun your program it shouldn't re-modify your data.
When we use any package outside of python environment like pandas we will probably face issue related to "package not resolved" or something like that so you have to create python environment install all required dependencies and then package them all to upload lambdas.
If you are facing a packaging issue, I have discussed that in the below article:
Will write it soon :)