Skip to main content
  1. The AWS Mastery Question Bank: Architect Decision Matrix Hub/
  2. SAA-C03/

AWS SAA-C03 Drill: Event-Driven Decoupling - The Pub/Sub vs. Stream Processing Trade-off

Jeff Taakey
Author
Jeff Taakey
21+ Year Enterprise Architect | AWS SAA/SAP & Multi-Cloud Expert.

Jeff’s Insights
#

“Unlike generic exam dumps, Jeff’s Insights is designed to make you think like a Real-World Production Architect. We dissect this scenario by analyzing the strategic trade-offs required to balance operational reliability, security, and long-term cost across multi-service deployments.”

While preparing for the AWS SAA-C03, many candidates get confused by when to use SNS+SQS versus Kinesis for decoupling. In the real world, this is fundamentally a decision about fan-out scalability vs. stream processing capabilities. Let’s drill into a simulated scenario.

The Architecture Drill (Simulated Question)
#

Scenario
#

GlobalEvents Inc. operates a real-time notification platform that receives incoming event messages from IoT sensors, mobile applications, and third-party webhooks. These messages must be distributed to dozens of downstream microservices and analytics applications, each processing messages independently at their own pace.

The platform experiences extreme volatility in message volume—baseline traffic sits at 5,000 messages per second, but flash events (product launches, breaking news) can spike to 100,000 messages per second within minutes. The engineering team has identified bottlenecks in the current monolithic ingestion layer, where a single application receives and distributes messages, causing cascading failures during spikes.

The CTO has mandated a redesign to decouple the ingestion layer from consumers and ensure the system can scale elastically without message loss.

The Requirement:
#

Design a solution that:

  1. Decouples message producers from consumers
  2. Scales automatically to handle 100,000 msg/sec spikes
  3. Supports dozens of independent consumers processing at different rates
  4. Minimizes operational complexity

The Options
#

  • A) Persist messages to Amazon Kinesis Data Analytics, configure consumer applications to read and process messages directly.
  • B) Deploy the ingestion application on an Amazon EC2 Auto Scaling group, scaling EC2 instance count based on CPU utilization metrics.
  • C) Write messages to a single-shard Amazon Kinesis Data Stream, use AWS Lambda functions to preprocess messages and store them in Amazon DynamoDB, configure consumer applications to read from DynamoDB for processing.
  • D) Publish messages to an Amazon SNS topic with multiple Amazon SQS queue subscriptions (one per consumer), configure consumer applications to process messages from their dedicated queues.

Correct Answer
#

D) Publish messages to an Amazon SNS topic with multiple Amazon SQS queue subscriptions.


The Architect’s Analysis
#

Correct Answer
#

Option D – SNS topic with multiple SQS queue subscriptions.

The Winning Logic
#

This solution leverages AWS’s managed pub/sub pattern to achieve:

  1. True Decoupling: Producers publish once to SNS; SNS fans out to all subscriber queues asynchronously. Consumers never directly impact producers.

  2. Elastic Scalability:

    • SNS supports 100,000+ messages/second per topic without provisioning
    • Each SQS queue scales independently
    • No “shard math” or capacity planning required
  3. Independent Consumer Velocity: Each microservice gets its own SQS queue, allowing slow consumers (batch analytics) and fast consumers (real-time alerts) to coexist without blocking.

  4. Cost Efficiency: At 100K msg/sec sustained for 1 hour:

    • SNS: 360M requests/hour = $180
    • SQS (10 consumers): 3.6B requests = $1,440
    • Total: ~$1,620/hour during peak (vs. $5,000+ for equivalent Kinesis shards)
  5. Built-in Reliability:

    • SNS retries failed deliveries
    • SQS provides message durability and visibility timeouts
    • Dead Letter Queues (DLQs) for poison message handling

The Trap (Distractor Analysis)
#

Why not Option A?
#

Amazon Kinesis Data Analytics is not a message broker—it’s a SQL/Apache Flink engine for stream analytics. You cannot “persist messages” to it for direct consumption by applications. This option reflects a fundamental misunderstanding of service purposes.

Failure Mode: Doesn’t solve the decoupling requirement; KDA is for processing, not distribution.


Why not Option B?
#

While EC2 Auto Scaling sounds scalable, this approach does not decouple—it merely adds horizontal capacity to the same monolithic pattern.

Critical Flaws:

  • Scaling Lag: ASG takes 3-5 minutes to launch instances; during 100K msg/sec spikes, the backlog grows by 18-30 million messages before new capacity arrives
  • Message Loss Risk: If instances terminate mid-processing, in-flight messages are lost unless you build custom persistent queues (reinventing SQS)
  • Cost Inefficiency: Running EC2 instances 24/7 for bursty workloads wastes compute during idle periods

FinOps Impact: Running 20 c5.2xlarge instances ($0.34/hr) continuously = $5,712/month baseline, even when processing only 5K msg/sec.


Why not Option C?
#

This option fails on multiple architectural anti-patterns:

  1. Single-Shard Bottleneck: Kinesis shards support max 1,000 records/sec or 1MB/sec writes. A single shard would fail catastrophically at 100K msg/sec (you’d need 100+ shards).

  2. Unnecessary Complexity: Adding Lambda + DynamoDB preprocessing creates:

    • Additional cost (Lambda invocations, DynamoDB WCUs/RCUs)
    • Additional latency (multi-hop processing)
    • Additional failure points (Lambda throttling, DynamoDB hot partitions)
  3. DynamoDB as Queue Anti-Pattern: Using DynamoDB as a message broker requires:

    • Custom polling logic in each consumer
    • Manual deletion of processed messages
    • Capacity planning for read/write units
    • Inability to handle retry logic natively

Cost Comparison: 100 Kinesis shards ($0.015/shard-hour) + Lambda + DynamoDB = $3,600+/month vs. SNS+SQS at $500-800/month for equivalent throughput.


The Architect Blueprint
#

graph LR A[IoT Sensors/Apps] -->|Publish Messages| B[Amazon SNS Topic] B -->|Fan-out| C[SQS Queue 1<br/>Analytics Service] B -->|Fan-out| D[SQS Queue 2<br/>Notification Service] B -->|Fan-out| E[SQS Queue 3<br/>Fraud Detection] B -->|Fan-out| F[SQS Queue N<br/>Archival Service] C --> G[Consumer 1<br/>Polls at own rate] D --> H[Consumer 2<br/>Polls at own rate] E --> I[Consumer 3<br/>Polls at own rate] F --> J[Consumer N<br/>Polls at own rate] style B fill:#FF9900,stroke:#232F3E,stroke-width:3px,color:#fff style C fill:#527FFF,stroke:#232F3E,stroke-width:2px style D fill:#527FFF,stroke:#232F3E,stroke-width:2px style E fill:#527FFF,stroke:#232F3E,stroke-width:2px style F fill:#527FFF,stroke:#232F3E,stroke-width:2px

Diagram Note: Messages flow once from producers to SNS, which asynchronously delivers copies to all subscribed SQS queues; each consumer polls its queue independently, achieving full decoupling and elastic scalability.


Real-World Application (Practitioner Insight)
#

Exam Rule
#

“For the SAA-C03 exam, when you see ‘dozens of independent consumers’ + ‘decouple’ + ‘variable throughput’, immediately think SNS+SQS fan-out. If the question mentions ordered processing or replay capability, then consider Kinesis.”

Real World
#

In production, we often enhance this pattern with:

  • SNS Message Filtering: Reduce SQS costs by filtering messages at the SNS layer so queues only receive relevant events (e.g., eventType = "payment" for the billing service)
  • SQS FIFO Queues: For consumers requiring strict ordering (e.g., inventory updates), use FIFO queues with message group IDs
  • CloudWatch Alarms: Monitor ApproximateAgeOfOldestMessage metric to detect consumer lag
  • Hybrid Approach: Use SNS to fan out to both SQS (for immediate processing) and Kinesis Firehose (for S3 data lake archival)

Cost Optimization Trick: Enable SQS Long Polling (20-second ReceiveMessageWaitTimeSeconds) to reduce empty receive requests by ~90%, cutting costs from $0.40/million to ~$0.04/million for low-traffic queues.


Disclaimer

This is a study note based on simulated scenarios for the AWS SAA-C03 exam. It is not an official question from AWS or any certification body.

The DevPro Network: Mission and Founder

A 21-Year Tech Leadership Journey

Jeff Taakey has driven complex systems for over two decades, serving in pivotal roles as an Architect, Technical Director, and startup Co-founder/CTO.

He holds both an MBA degree and a Computer Science Master's degree from an English-speaking university in Hong Kong. His expertise is further backed by multiple international certifications including TOGAF, PMP, ITIL, and AWS SAA.

His experience spans diverse sectors and includes leading large, multidisciplinary teams (up to 86 people). He has also served as a Development Team Lead while cooperating with global teams spanning North America, Europe, and Asia-Pacific. He has spearheaded the design of an industry cloud platform. This work was often conducted within global Fortune 500 environments like IBM, Citi and Panasonic.

Following a recent Master’s degree from an English-speaking university in Hong Kong, he launched this platform to share advanced, practical technical knowledge with the global developer community.


About This Site: AWS.CertDevPro.com


AWS.CertDevPro.com focuses exclusively on mastering the Amazon Web Services ecosystem. We transform raw practice questions into strategic Decision Matrices. Led by Jeff Taakey (MBA & 21-year veteran of IBM/Citi), we provide the exclusive SAA and SAP Master Packs designed to move your cloud expertise from certification-ready to project-ready.