Skip to main content

AWS DVA-C02 Drill: DynamoDB BatchGetItem - Multi-Table Retrieval Optimization

Jeff Taakey
Author
Jeff Taakey
21+ Year Enterprise Architect | AWS SAA/SAP & Multi-Cloud Expert.

Jeff’s Note
#

Jeff’s Note
#

“Unlike generic exam dumps, ADH analyzes this scenario through the lens of a Real-World Lead Developer.”

“For DVA-C02 candidates, the confusion often lies in distinguishing between Query, Scan, and BatchGetItem operations when dealing with multiple tables. In production, this is about knowing exactly which DynamoDB API call minimizes round trips while respecting the table’s key schema. The difference between one API call and ten can mean 300ms versus 3 seconds in user-facing latency. Let’s drill down.”


The Certification Drill (Simulated Question)
#

Scenario
#

You’re a backend developer at HarmonyStream, a music discovery platform. Your application serves a dashboard page that displays multiple songs alongside their corresponding artist profiles. The data layer uses two Amazon DynamoDB tables:

  • tracks table: Uses trackTitle as the partition key and artistHandle as the sort key
  • profiles table: Uses artistHandle as the partition key

When a user opens the dashboard, the frontend sends a request containing a list of 15 trackTitle/artistHandle pairs and 8 unique artistHandle values. Your backend must retrieve all this data efficiently—the product team has flagged page load time as a critical UX metric, and your DevOps lead has noted that database round trips are contributing to P99 latency.

The Requirement
#

Retrieve multiple items from both DynamoDB tables with minimal network traffic and optimal application performance.

The Options
#

  • A) Perform a BatchGetItem operation that returns items from both tables in a single request, using the list of trackTitle/artistHandle keys for the tracks table and the list of artistHandle keys for the profiles table.
  • B) Create a local secondary index (LSI) on the tracks table that uses artistHandle as the partition key. Perform a Query operation for each artistHandle on the tracks table filtering by trackTitle. Perform a separate Query operation for each artistHandle on the profiles table.
  • C) Perform a BatchGetItem operation on the tracks table using the trackTitle/artistHandle composite keys. Perform a separate BatchGetItem operation on the profiles table using artistHandle as the key.
  • D) Perform a Scan operation on each table, filtering by the list of trackTitle/artistHandle pairs for the tracks table and the list of artistHandle values for the profiles table.


Correct Answer
#

Option C.

Quick Insight: The API Efficiency Imperative
#

For DVA-C02, understanding BatchGetItem semantics is critical: it accepts up to 100 items from up to 100 tables in a single API call, but each table must be addressed in a separate request structure within the operation. The exam tests whether you know that BatchGetItem provides atomic retrieval of multiple items by their primary keys across tables—without the overhead of multiple Query operations or the inefficiency of full table Scans.


Content Locked: The Expert Analysis
#

You’ve identified the answer. But do you know the implementation details that separate a Junior from a Senior?


The Expert’s Analysis
#

Correct Answer
#

Option C: Perform a BatchGetItem operation on the tracks table using the trackTitle/artistHandle composite keys. Perform a separate BatchGetItem operation on the profiles table using artistHandle as the key.

The Winning Logic
#

This solution is optimal for the following technical reasons:

  1. API Design Match: BatchGetItem is explicitly designed for retrieving multiple items by their primary keys. Since you already know the exact keys (trackTitle/artistHandle for tracks, artistHandle for profiles), this is a direct key-based lookup—the most efficient DynamoDB access pattern.

  2. Minimal Network Overhead: Two BatchGetItem operations (one per table) result in exactly 2 round trips. Each operation can retrieve up to 100 items (16 MB limit), which exceeds this scenario’s requirements (15 tracks + 8 profiles).

  3. Consistent Performance: BatchGetItem has predictable latency (typically <10ms per item for eventually consistent reads) because it uses direct hash lookups via the partition key, avoiding the table scan overhead.

  4. Developer Implementation Pattern:

    import boto3
    
    dynamodb = boto3.resource('dynamodb')
    
    # BatchGetItem for tracks table
    response_tracks = dynamodb.batch_get_item(
        RequestItems={
            'tracks': {
                'Keys': [
                    {'trackTitle': 'Song1', 'artistHandle': 'artist123'},
                    {'trackTitle': 'Song2', 'artistHandle': 'artist456'},
                    # ... up to 100 items
                ]
            }
        }
    )
    
    # BatchGetItem for profiles table
    response_profiles = dynamodb.batch_get_item(
        RequestItems={
            'profiles': {
                'Keys': [
                    {'artistHandle': 'artist123'},
                    {'artistHandle': 'artist456'},
                    # ... up to 100 items
                ]
            }
        }
    )
    
  5. Capacity Unit Consumption: Each item read consumes 0.5 RCU (eventually consistent) or 1 RCU (strongly consistent). This is predictable and cost-effective compared to Scan operations that consume RCUs for every item examined, regardless of filtering.

The Trap (Distractor Analysis)
#

Why not Option A?
#

“Single BatchGetItem across both tables”

  • The API Constraint: While BatchGetItem can address multiple tables in a single SDK call, AWS documentation states that each table requires a separate request structure within the RequestItems parameter. The operational reality is that DynamoDB processes these as separate internal operations, so there’s no actual advantage over making two explicit calls (Option C). The wording in Option A is ambiguous and implies a capability that doesn’t align with how BatchGetItem batching works—you still need to structure the request per-table.

  • DVA-C02 Precision: The exam tests whether you understand that “single operation” doesn’t mean “single table retrieval.” Option C’s explicit two-operation approach is clearer and aligns with documented SDK usage patterns.

Why not Option B?
#

“LSI with multiple Query operations”

  • Schema Violation: LSIs must be created at table creation time and share the same partition key as the base table. You cannot create an LSI on the tracks table with artistHandle as the partition key because the base table uses trackTitle as the partition key. This option describes a Global Secondary Index (GSI), not an LSI.

  • Performance Penalty: Even if you used a GSI, this approach requires:

    • One Query per artistHandle on the tracks table (8 queries)
    • One Query per artistHandle on the profiles table (8 queries)
    • Total: 16 round trips vs. 2 with BatchGetItem
  • Operational Overhead: Query operations with filter expressions (filtering by trackTitle) force DynamoDB to read all items for that partition key and then discard non-matching items client-side, wasting RCUs.

Why not Option D?
#

“Scan operations with filters”

  • Catastrophic Inefficiency: Scan reads every single item in the table, then applies the filter. For a table with 100,000 tracks, you’d consume RCUs for all 100,000 items even if you only need 15.

  • Performance Degradation: Scan operations are sequential and slow. Average latency can exceed 1-2 seconds for moderately sized tables, directly violating the “optimal performance” requirement.

  • Anti-Pattern: AWS best practices explicitly discourage Scan for known-key retrieval. This is a distractor designed to catch candidates who don’t understand DynamoDB access patterns.


The Technical Blueprint
#

BatchGetItem Request Structure (SDK Perspective)

// Node.js SDK v3 example
import { DynamoDBClient, BatchGetItemCommand } from "@aws-sdk/client-dynamodb";

const client = new DynamoDBClient({ region: "us-east-1" });

// Request for tracks table
const tracksCommand = new BatchGetItemCommand({
  RequestItems: {
    "tracks": {
      Keys: [
        { trackTitle: { S: "Neon Lights" }, artistHandle: { S: "synth_wave_99" } },
        { trackTitle: { S: "Digital Rain" }, artistHandle: { S: "cyber_beats" } }
      ],
      ProjectionExpression: "trackTitle, artistHandle, duration, genre"
    }
  }
});

// Request for profiles table
const profilesCommand = new BatchGetItemCommand({
  RequestItems: {
    "profiles": {
      Keys: [
        { artistHandle: { S: "synth_wave_99" } },
        { artistHandle: { S: "cyber_beats" } }
      ],
      ProjectionExpression: "artistHandle, bio, followerCount"
    }
  }
});

// Execute both operations
const [tracksResponse, profilesResponse] = await Promise.all([
  client.send(tracksCommand),
  client.send(profilesCommand)
]);

// Handle UnprocessedKeys for retry logic
if (tracksResponse.UnprocessedKeys && Object.keys(tracksResponse.UnprocessedKeys).length > 0) {
  // Implement exponential backoff retry
}

Key SDK Details for DVA-C02:

  • UnprocessedKeys: BatchGetItem may return unprocessed keys due to provisioned throughput limits. Your code must implement retry logic with exponential backoff.
  • RequestItems Limit: Maximum 100 items per call, 16 MB total response size.
  • Atomicity: BatchGetItem is NOT an atomic transaction. Use TransactGetItems if you need ACID guarantees across items.

The Comparative Analysis
#

Option API Complexity Network Round Trips Performance RCU Efficiency Use Case Fit
A - Single BatchGetItem (Ambiguous) Medium 2 (despite wording) High High ❌ Misleading—functionally identical to C but unclear
B - LSI + Query High 16 Low Medium ❌ Wrong index type; excessive round trips
C - Two BatchGetItem Low 2 Highest Highest Optimal for known keys
D - Scan + Filter Low 2 Lowest Lowest ❌ Anti-pattern for targeted retrieval

Developer Decision Matrix:

  • Known Primary Keys? → Use BatchGetItem (Option C)
  • Need to Query by Non-Key Attribute? → Use Query with GSI (not applicable here)
  • Unknown Keys / Complex Filters? → Use Scan (avoid if possible)
  • Transactional Reads Required? → Use TransactGetItems (not offered in options)

Real-World Application (Practitioner Insight)
#

Exam Rule
#

“For DVA-C02, when the scenario provides specific primary keys and requires retrieving items from multiple tables with minimal network traffic, always choose BatchGetItem over Query or Scan. If the question mentions ‘multiple tables’ without implying a true single-operation benefit, expect that you’ll need one BatchGetItem per table.”

Real World
#

In production at HarmonyStream, we’d enhance this pattern with:

  1. Client-Side Caching: Use ElastiCache (Redis) to cache frequently accessed artist profiles, reducing DynamoDB read costs by ~60%.

  2. SDK Best Practices:

    # Always handle UnprocessedKeys
    def batch_get_with_retry(table_name, keys, max_retries=3):
        unprocessed = keys
        retry_count = 0
        results = []
    
        while unprocessed and retry_count < max_retries:
            response = dynamodb.batch_get_item(
                RequestItems={table_name: {'Keys': unprocessed}}
            )
            results.extend(response['Responses'].get(table_name, []))
            unprocessed = response.get('UnprocessedKeys', {}).get(table_name, {}).get('Keys', [])
            if unprocessed:
                time.sleep(2 ** retry_count)  # Exponential backoff
                retry_count += 1
    
        return results
    
  3. Monitoring: Set CloudWatch alarms for SystemErrors and UserErrors on BatchGetItem operations. High ThrottledRequests metrics indicate you need to increase provisioned RCUs or switch to on-demand billing.

  4. Cost Optimization: For this access pattern (23 items per request), on-demand pricing is likely more cost-effective than provisioned capacity unless request volume exceeds 100 requests/second.


Stop Guessing, Start Mastering
#


Disclaimer

This is a study note based on simulated scenarios for the DVA-C02 exam. AWS service behaviors and best practices evolve—always refer to the latest AWS documentation and SDK references for production implementations.

The DevPro Network: Mission and Founder

A 21-Year Tech Leadership Journey

Jeff Taakey has driven complex systems for over two decades, serving in pivotal roles as an Architect, Technical Director, and startup Co-founder/CTO.

He holds both an MBA degree and a Computer Science Master's degree from an English-speaking university in Hong Kong. His expertise is further backed by multiple international certifications including TOGAF, PMP, ITIL, and AWS SAA.

His experience spans diverse sectors and includes leading large, multidisciplinary teams (up to 86 people). He has also served as a Development Team Lead while cooperating with global teams spanning North America, Europe, and Asia-Pacific. He has spearheaded the design of an industry cloud platform. This work was often conducted within global Fortune 500 environments like IBM, Citi and Panasonic.

Following a recent Master’s degree from an English-speaking university in Hong Kong, he launched this platform to share advanced, practical technical knowledge with the global developer community.


About This Site: AWS.CertDevPro.com


AWS.CertDevPro.com focuses exclusively on mastering the Amazon Web Services ecosystem. We transform raw practice questions into strategic Decision Matrices. Led by Jeff Taakey (MBA & 21-year veteran of IBM/Citi), we provide the exclusive SAA and SAP Master Packs designed to move your cloud expertise from certification-ready to project-ready.