Jeff’s Note #
“Unlike generic exam dumps, ADH analyzes this scenario through the lens of a Real-World Lead Developer.”
“For DVA-C02 candidates, the confusion often lies in choosing between event-driven triggers vs. scheduled polling when dealing with asynchronous external dependencies. In production, this is about knowing exactly when to decouple with queues, when to use native scheduling, and how to avoid over-engineering with orchestration. Let’s drill down.”
The Certification Drill (Simulated Question) #
Scenario #
A social media analytics startup, PicMetrics, operates a platform where users upload photos for sentiment analysis. All uploaded images are stored in Amazon S3 and subsequently reviewed by an external AI moderation service operated by a third-party vendor. The moderation service processes images asynchronously and writes the results (e.g., “approved”, “flagged”, “rejected”) to an Amazon DynamoDB table 1-24 hours after upload. The DynamoDB table uses the S3 object key as the primary key. The vendor also provides a REST API endpoint to query moderation results by object key.
Your team needs to implement an automated system that tags each S3 object with its moderation result as soon as the result becomes available.
The Requirement: #
Design the MOST operationally efficient solution that automatically applies tags to S3 objects based on third-party moderation results stored in DynamoDB.
The Options #
- A) Create an AWS Lambda function triggered by
s3:ObjectCreatedevents. Write the S3 key to an Amazon SQS queue with a 24-hour visibility timeout. Create a second Lambda function that reads from the queue, retrieves results from DynamoDB, and tags the S3 object. - B) Create an AWS Lambda function triggered by
s3:ObjectCreatedevents. Integrate it into an AWS Step Functions standard workflow with a 24-hour Wait state. After the wait, invoke a second Lambda function to retrieve audit results from DynamoDB and tag the S3 object. - C) Create an AWS Lambda function that queries S3 for untagged objects, retrieves moderation results from the DynamoDB table, and applies tags. Configure an Amazon EventBridge scheduled rule to invoke this Lambda function at regular intervals (e.g., hourly).
- D) Launch an Amazon EC2 instance with a cron job that runs a Python script to query the DynamoDB table, retrieve moderation results, and apply tags to untagged S3 objects.
Google adsense #
Correct Answer #
C.
Quick Insight: The Operational Efficiency Imperative #
For DVA-C02, “operationally efficient” means minimal custom orchestration, native AWS scheduling, and idempotent polling. When dealing with unpredictable third-party latency (1-24 hours), scheduled polling with EventBridge is more maintainable than per-object wait states or visibility timeout hacks.
Content Locked: The Expert Analysis #
You’ve identified the answer. But do you know the implementation details that separate a Junior from a Senior?
The Expert’s Analysis #
Correct Answer #
Option C
The Winning Logic #
Option C leverages Amazon EventBridge scheduled rules to invoke a Lambda function at regular intervals. This function:
- Queries S3 for objects without tags (using
s3:GetObjectTaggingor by listing objects and filtering). - Queries DynamoDB by object key to retrieve moderation results.
- Applies tags using
s3:PutObjectTagging.
Why this is most operationally efficient:
- Decoupled from upload timing: The solution doesn’t create per-object state (no SQS messages, no Step Functions executions).
- Native scheduling: EventBridge handles retries, error handling, and invocation tracking without custom code.
- Idempotent by design: Re-processing already-tagged objects is a no-op (check tag existence first).
- Cost-effective: You pay only for Lambda invocations (e.g., hourly) rather than per-object orchestration.
- No wait state waste: Avoids holding Step Functions executions open for 24 hours (which incurs state transition costs).
Key DVA-C02 API calls:
# Pseudo-code for the Lambda function
import boto3
s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ModerationResults')
def lambda_handler(event, context):
bucket = 'picmetrics-uploads'
# List objects without 'moderation-status' tag
response = s3.list_objects_v2(Bucket=bucket)
for obj in response.get('Contents', []):
key = obj['Key']
tags: s3.get_object_tagging(Bucket=bucket, Key=key)
if not any(t['Key'] == 'moderation-status' for t in tags['TagSet']):
# Check DynamoDB for result
db_item = table.get_item(Key={'s3_key': key})
if 'Item' in db_item:
status = db_item['Item']['status']
s3.put_object_tagging(
Bucket=bucket,
Key=key,
Tagging={'TagSet': [{'Key': 'moderation-status', 'Value': status}]}
)
The Trap (Distractor Analysis) #
Why not A (SQS with 24-hour visibility timeout)? #
- Visibility timeout is NOT a delay timer: It only hides a message from other consumers after it’s been received. The message is immediately available when sent.
- No built-in “wait 24 hours” feature: You’d need to manually implement polling logic in the consumer Lambda, defeating the purpose.
- Operational overhead: Managing SQS dead-letter queues, handling timeout edge cases (what if audit completes in 2 hours?), and tuning visibility timeout is error-prone.
- Cost: You pay for SQS requests and Lambda polling invocations per object.
Why not B (Step Functions Wait state)? #
- Wait state duration is fixed: The requirement says audits complete in 1-24 hours. A 24-hour wait means you’ll waste 23 hours if the audit completes in 1 hour.
- Cost explosion: Standard Step Functions charges per state transition. For 1 million uploads/month, that’s 2 million state transitions (start + wait), costing ~$50/month just for orchestration.
- Operational complexity: Managing millions of long-running executions increases CloudWatch Logs volume and debugging difficulty.
- Not idempotent: If the second Lambda fails, you need custom retry logic.
Why not D (EC2 with cron)? #
- Undifferentiated heavy lifting: You manage the OS, patching, scaling, and script dependencies.
- Not serverless: EC2 runs 24/7, even if there are no objects to process.
- Cost: A t3.small (~$15/month) is more expensive than hourly Lambda invocations for this workload.
- Exam trap: DVA-C02 heavily penalizes EC2-based solutions when serverless alternatives exist.
The Technical Blueprint #
# EventBridge Scheduled Rule (Cron Expression: Every hour)
# Rule: rate(1 hour) or cron(0 * * * ? *)
# Lambda Function: S3ObjectTagger
import boto3
import os
s3_client = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(os.environ['DYNAMODB_TABLE'])
def lambda_handler(event, context):
bucket = os.environ['S3_BUCKET']
# Paginate through all objects (production code should handle large buckets)
paginator = s3_client.get_paginator('list_objects_v2')
for page in paginator.paginate(Bucket=bucket):
for obj in page.get('Contents', []):
key = obj['Key']
# Check if object already has moderation tag
try:
tags_response = s3_client.get_object_tagging(Bucket=bucket, Key=key)
existing_tags: {tag['Key']: tag['Value'] for tag in tags_response['TagSet']}
if 'moderation-status' in existing_tags:
continue # Already processed
# Query DynamoDB for moderation result
db_response = table.get_item(Key={'object_key': key})
if 'Item' in db_response:
moderation_status = db_response['Item']['status']
# Apply tag
existing_tags['moderation-status'] = moderation_status
tag_set = [{'Key': k, 'Value': v} for k, v in existing_tags.items()]
s3_client.put_object_tagging(
Bucket=bucket,
Key=key,
Tagging={'TagSet': tag_set}
)
print(f"Tagged {key} with status: {moderation_status}")
except Exception as e:
print(f"Error processing {key}: {str(e)}")
# In production, send to DLQ or CloudWatch Alarms
return {'statusCode': 200, 'body': 'Tagging complete'}
IAM Policy for Lambda:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetObjectTagging",
"s3:PutObjectTagging"
],
"Resource": [
"arn:aws:s3:::picmetrics-uploads",
"arn:aws:s3:::picmetrics-uploads/*"
]
},
{
"Effect": "Allow",
"Action": [
"dynamodb:GetItem"
],
"Resource": "arn:aws:dynamodb:us-east-1:123456789012:table/ModerationResults"
}
]
}
The Comparative Analysis #
| Option | API Complexity | Operational Overhead | Cost (1M objects/month) | Use Case |
|---|---|---|---|---|
| A (SQS + Visibility Timeout) | Medium (SQS SendMessage, ReceiveMessage, DeleteMessage) | High (message retention tuning, DLQ management) | ~$1 SQS + ~$5 Lambda | Valid for immediate retry scenarios, not delayed processing |
| B (Step Functions Wait) | High (StartExecution, Wait state, Lambda integration) | Very High (millions of long-running executions) | ~$50 Step Functions + ~$2 Lambda | Valid for orchestrating multi-step workflows, not simple delays |
| C (EventBridge Scheduled) | Low (EventBridge rule, Lambda invoke) | Low (native scheduling, idempotent design) | ~$0 EventBridge + ~$0.20 Lambda (hourly) | Best for polling external state changes with unpredictable timing |
| D (EC2 Cron) | Low (Boto3 SDK) | Very High (OS patching, scaling, monitoring) | ~$15 EC2 + ~$1 data transfer | Legacy approach; avoid in DVA-C02 |
Real-World Application (Practitioner Insight) #
Exam Rule #
“For the exam, when you see unpredictable third-party latency + asynchronous result availability + no real-time requirement, always pick EventBridge scheduled rules over Step Functions Wait states or SQS visibility timeouts.”
Real World #
“In production, we’d add DynamoDB Streams to trigger tagging as soon as the third-party writes to DynamoDB, rather than polling S3. However, the exam often omits this option to test your understanding of scheduled vs. event-driven patterns. Also, for large S3 buckets (millions of objects), we’d use S3 Inventory to generate daily manifests instead of ListObjectsV2, which can throttle at 5,500 requests/second.”
Production enhancement:
# Alternative: DynamoDB Streams trigger
# When third-party writes to DynamoDB, stream triggers Lambda to tag S3 object immediately
dynamodb_stream_handler(event):
for record in event['Records']:
if record['eventName'] == 'INSERT':
object_key = record['dynamodb']['Keys']['object_key']['S']
status = record['dynamodb']['NewImage']['status']['S']
s3_client.put_object_tagging(
Bucket='picmetrics-uploads',
Key=object_key,
Tagging={'TagSet': [{'Key': 'moderation-status', 'Value': status}]}
)
Stop Guessing, Start Mastering #
Disclaimer
This is a study note based on simulated scenarios for the AWS DVA-C02 exam. Always refer to official AWS documentation and hands-on labs for production implementations.