Skip to main content

AWS DVA-C02 Drill: Event-Driven Tagging - Asynchronous Polling vs. Wait States

Jeff Taakey
Author
Jeff Taakey
21+ Year Enterprise Architect | AWS SAA/SAP & Multi-Cloud Expert.

Jeff’s Note
#

“Unlike generic exam dumps, ADH analyzes this scenario through the lens of a Real-World Lead Developer.”

“For DVA-C02 candidates, the confusion often lies in choosing between event-driven triggers vs. scheduled polling when dealing with asynchronous external dependencies. In production, this is about knowing exactly when to decouple with queues, when to use native scheduling, and how to avoid over-engineering with orchestration. Let’s drill down.”


The Certification Drill (Simulated Question)
#

Scenario
#

A social media analytics startup, PicMetrics, operates a platform where users upload photos for sentiment analysis. All uploaded images are stored in Amazon S3 and subsequently reviewed by an external AI moderation service operated by a third-party vendor. The moderation service processes images asynchronously and writes the results (e.g., “approved”, “flagged”, “rejected”) to an Amazon DynamoDB table 1-24 hours after upload. The DynamoDB table uses the S3 object key as the primary key. The vendor also provides a REST API endpoint to query moderation results by object key.

Your team needs to implement an automated system that tags each S3 object with its moderation result as soon as the result becomes available.

The Requirement:
#

Design the MOST operationally efficient solution that automatically applies tags to S3 objects based on third-party moderation results stored in DynamoDB.

The Options
#

  • A) Create an AWS Lambda function triggered by s3:ObjectCreated events. Write the S3 key to an Amazon SQS queue with a 24-hour visibility timeout. Create a second Lambda function that reads from the queue, retrieves results from DynamoDB, and tags the S3 object.
  • B) Create an AWS Lambda function triggered by s3:ObjectCreated events. Integrate it into an AWS Step Functions standard workflow with a 24-hour Wait state. After the wait, invoke a second Lambda function to retrieve audit results from DynamoDB and tag the S3 object.
  • C) Create an AWS Lambda function that queries S3 for untagged objects, retrieves moderation results from the DynamoDB table, and applies tags. Configure an Amazon EventBridge scheduled rule to invoke this Lambda function at regular intervals (e.g., hourly).
  • D) Launch an Amazon EC2 instance with a cron job that runs a Python script to query the DynamoDB table, retrieve moderation results, and apply tags to untagged S3 objects.

Google adsense
#


Correct Answer
#

C.

Quick Insight: The Operational Efficiency Imperative
#

For DVA-C02, “operationally efficient” means minimal custom orchestration, native AWS scheduling, and idempotent polling. When dealing with unpredictable third-party latency (1-24 hours), scheduled polling with EventBridge is more maintainable than per-object wait states or visibility timeout hacks.


Content Locked: The Expert Analysis
#

You’ve identified the answer. But do you know the implementation details that separate a Junior from a Senior?


The Expert’s Analysis
#

Correct Answer
#

Option C

The Winning Logic
#

Option C leverages Amazon EventBridge scheduled rules to invoke a Lambda function at regular intervals. This function:

  1. Queries S3 for objects without tags (using s3:GetObjectTagging or by listing objects and filtering).
  2. Queries DynamoDB by object key to retrieve moderation results.
  3. Applies tags using s3:PutObjectTagging.

Why this is most operationally efficient:

  • Decoupled from upload timing: The solution doesn’t create per-object state (no SQS messages, no Step Functions executions).
  • Native scheduling: EventBridge handles retries, error handling, and invocation tracking without custom code.
  • Idempotent by design: Re-processing already-tagged objects is a no-op (check tag existence first).
  • Cost-effective: You pay only for Lambda invocations (e.g., hourly) rather than per-object orchestration.
  • No wait state waste: Avoids holding Step Functions executions open for 24 hours (which incurs state transition costs).

Key DVA-C02 API calls:

# Pseudo-code for the Lambda function
import boto3
s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ModerationResults')

def lambda_handler(event, context):
    bucket = 'picmetrics-uploads'
    # List objects without 'moderation-status' tag
    response = s3.list_objects_v2(Bucket=bucket)
    for obj in response.get('Contents', []):
        key = obj['Key']
        tags: s3.get_object_tagging(Bucket=bucket, Key=key)
        if not any(t['Key'] == 'moderation-status' for t in tags['TagSet']):
            # Check DynamoDB for result
            db_item = table.get_item(Key={'s3_key': key})
            if 'Item' in db_item:
                status = db_item['Item']['status']
                s3.put_object_tagging(
                    Bucket=bucket,
                    Key=key,
                    Tagging={'TagSet': [{'Key': 'moderation-status', 'Value': status}]}
                )

The Trap (Distractor Analysis)
#

Why not A (SQS with 24-hour visibility timeout)?
#

  • Visibility timeout is NOT a delay timer: It only hides a message from other consumers after it’s been received. The message is immediately available when sent.
  • No built-in “wait 24 hours” feature: You’d need to manually implement polling logic in the consumer Lambda, defeating the purpose.
  • Operational overhead: Managing SQS dead-letter queues, handling timeout edge cases (what if audit completes in 2 hours?), and tuning visibility timeout is error-prone.
  • Cost: You pay for SQS requests and Lambda polling invocations per object.

Why not B (Step Functions Wait state)?
#

  • Wait state duration is fixed: The requirement says audits complete in 1-24 hours. A 24-hour wait means you’ll waste 23 hours if the audit completes in 1 hour.
  • Cost explosion: Standard Step Functions charges per state transition. For 1 million uploads/month, that’s 2 million state transitions (start + wait), costing ~$50/month just for orchestration.
  • Operational complexity: Managing millions of long-running executions increases CloudWatch Logs volume and debugging difficulty.
  • Not idempotent: If the second Lambda fails, you need custom retry logic.

Why not D (EC2 with cron)?
#

  • Undifferentiated heavy lifting: You manage the OS, patching, scaling, and script dependencies.
  • Not serverless: EC2 runs 24/7, even if there are no objects to process.
  • Cost: A t3.small (~$15/month) is more expensive than hourly Lambda invocations for this workload.
  • Exam trap: DVA-C02 heavily penalizes EC2-based solutions when serverless alternatives exist.

The Technical Blueprint
#

# EventBridge Scheduled Rule (Cron Expression: Every hour)
# Rule: rate(1 hour) or cron(0 * * * ? *)

# Lambda Function: S3ObjectTagger
import boto3
import os

s3_client = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(os.environ['DYNAMODB_TABLE'])

def lambda_handler(event, context):
    bucket = os.environ['S3_BUCKET']
    
    # Paginate through all objects (production code should handle large buckets)
    paginator = s3_client.get_paginator('list_objects_v2')
    for page in paginator.paginate(Bucket=bucket):
        for obj in page.get('Contents', []):
            key = obj['Key']
            
            # Check if object already has moderation tag
            try:
                tags_response = s3_client.get_object_tagging(Bucket=bucket, Key=key)
                existing_tags: {tag['Key']: tag['Value'] for tag in tags_response['TagSet']}
                
                if 'moderation-status' in existing_tags:
                    continue  # Already processed
                
                # Query DynamoDB for moderation result
                db_response = table.get_item(Key={'object_key': key})
                if 'Item' in db_response:
                    moderation_status = db_response['Item']['status']
                    
                    # Apply tag
                    existing_tags['moderation-status'] = moderation_status
                    tag_set = [{'Key': k, 'Value': v} for k, v in existing_tags.items()]
                    s3_client.put_object_tagging(
                        Bucket=bucket,
                        Key=key,
                        Tagging={'TagSet': tag_set}
                    )
                    print(f"Tagged {key} with status: {moderation_status}")
            
            except Exception as e:
                print(f"Error processing {key}: {str(e)}")
                # In production, send to DLQ or CloudWatch Alarms
    
    return {'statusCode': 200, 'body': 'Tagging complete'}

IAM Policy for Lambda:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetObjectTagging",
        "s3:PutObjectTagging"
      ],
      "Resource": [
        "arn:aws:s3:::picmetrics-uploads",
        "arn:aws:s3:::picmetrics-uploads/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem"
      ],
      "Resource": "arn:aws:dynamodb:us-east-1:123456789012:table/ModerationResults"
    }
  ]
}

The Comparative Analysis
#

Option API Complexity Operational Overhead Cost (1M objects/month) Use Case
A (SQS + Visibility Timeout) Medium (SQS SendMessage, ReceiveMessage, DeleteMessage) High (message retention tuning, DLQ management) ~$1 SQS + ~$5 Lambda Valid for immediate retry scenarios, not delayed processing
B (Step Functions Wait) High (StartExecution, Wait state, Lambda integration) Very High (millions of long-running executions) ~$50 Step Functions + ~$2 Lambda Valid for orchestrating multi-step workflows, not simple delays
C (EventBridge Scheduled) Low (EventBridge rule, Lambda invoke) Low (native scheduling, idempotent design) ~$0 EventBridge + ~$0.20 Lambda (hourly) Best for polling external state changes with unpredictable timing
D (EC2 Cron) Low (Boto3 SDK) Very High (OS patching, scaling, monitoring) ~$15 EC2 + ~$1 data transfer Legacy approach; avoid in DVA-C02

Real-World Application (Practitioner Insight)
#

Exam Rule
#

“For the exam, when you see unpredictable third-party latency + asynchronous result availability + no real-time requirement, always pick EventBridge scheduled rules over Step Functions Wait states or SQS visibility timeouts.”

Real World
#

“In production, we’d add DynamoDB Streams to trigger tagging as soon as the third-party writes to DynamoDB, rather than polling S3. However, the exam often omits this option to test your understanding of scheduled vs. event-driven patterns. Also, for large S3 buckets (millions of objects), we’d use S3 Inventory to generate daily manifests instead of ListObjectsV2, which can throttle at 5,500 requests/second.”

Production enhancement:

# Alternative: DynamoDB Streams trigger
# When third-party writes to DynamoDB, stream triggers Lambda to tag S3 object immediately
dynamodb_stream_handler(event):
    for record in event['Records']:
        if record['eventName'] == 'INSERT':
            object_key = record['dynamodb']['Keys']['object_key']['S']
            status = record['dynamodb']['NewImage']['status']['S']
            s3_client.put_object_tagging(
                Bucket='picmetrics-uploads',
                Key=object_key,
                Tagging={'TagSet': [{'Key': 'moderation-status', 'Value': status}]}
            )

Stop Guessing, Start Mastering
#


Disclaimer

This is a study note based on simulated scenarios for the AWS DVA-C02 exam. Always refer to official AWS documentation and hands-on labs for production implementations.

The DevPro Network: Mission and Founder

A 21-Year Tech Leadership Journey

Jeff Taakey has driven complex systems for over two decades, serving in pivotal roles as an Architect, Technical Director, and startup Co-founder/CTO.

He holds both an MBA degree and a Computer Science Master's degree from an English-speaking university in Hong Kong. His expertise is further backed by multiple international certifications including TOGAF, PMP, ITIL, and AWS SAA.

His experience spans diverse sectors and includes leading large, multidisciplinary teams (up to 86 people). He has also served as a Development Team Lead while cooperating with global teams spanning North America, Europe, and Asia-Pacific. He has spearheaded the design of an industry cloud platform. This work was often conducted within global Fortune 500 environments like IBM, Citi and Panasonic.

Following a recent Master’s degree from an English-speaking university in Hong Kong, he launched this platform to share advanced, practical technical knowledge with the global developer community.


About This Site: AWS.CertDevPro.com


AWS.CertDevPro.com focuses exclusively on mastering the Amazon Web Services ecosystem. We transform raw practice questions into strategic Decision Matrices. Led by Jeff Taakey (MBA & 21-year veteran of IBM/Citi), we provide the exclusive SAA and SAP Master Packs designed to move your cloud expertise from certification-ready to project-ready.