Jeff’s Note #
“Unlike generic exam dumps, ADH analyzes this scenario through the lens of a Real-World Site Reliability Engineer (SRE).”
“For SOA-C02 candidates, the confusion often lies in overthinking the solution by introducing unnecessary services (SQS, custom Lambda polling) when native CloudWatch alarm features can handle missing data detection elegantly. In production, this is about knowing exactly how to configure CloudWatch alarms to treat missing data as a breach condition using the
TreatMissingDataparameter. Let’s drill down.”
The Certification Drill #
Scenario #
A financial analytics company operates a real-time trading insights platform. The platform expects hourly market data files to be delivered to an Amazon S3 bucket by an external vendor at exactly 5 minutes past each hour. An S3 event notification triggers an AWS Lambda function that parses and processes the data into Amazon DynamoDB tables for downstream dashboards.
The operations team has reported intermittent data gaps—specifically, there are occasions when the vendor fails to deliver files on schedule, but no alerts are generated. Business analysts only discover these gaps when they query empty dashboards hours later, creating delays in critical decision-making.
The Requirement #
The SRE team must implement an operationally efficient monitoring solution that:
- Alerts the operations team via Amazon SNS when an expected hourly file is not delivered.
- Minimizes additional code, infrastructure, and manual checks.
- Uses native AWS monitoring capabilities wherever possible.
The Options #
- A) Add an S3 Lifecycle rule scoped to objects created in the past hour. Configure an S3 event notification triggered by lifecycle transitions. When zero objects transition, publish a message to an Amazon SNS topic to notify the operations team.
- B) Configure a second S3 event notification that invokes a Lambda function to publish messages to an Amazon SQS queue. Create a CloudWatch alarm on the
ApproximateAgeOfOldestMessagemetric. When the metric exceeds 1 hour, publish to an SNS topic to notify the operations team. - C) Create a CloudWatch alarm on the Lambda function’s
Invocationsmetric. Set the alarm to trigger when invocations equal zero within a 1-hour period. Configure the alarm to treat missing data as breaching. Publish to an SNS topic to notify the operations team. - D) Create a new Lambda function that retrieves the timestamp of the most recent file in the S3 bucket. If the timestamp is older than 1 hour, publish a message to an SNS topic. Create an Amazon EventBridge scheduled rule to invoke this function every hour.
Correct Answer #
C
Quick Insight: The SysOps/SRE Imperative #
This scenario is a classic SOA-C02 test of CloudWatch alarm configuration nuances—specifically, understanding the
TreatMissingDataparameter. AWS expects SysOps professionals to leverage native metric-based alerting rather than building custom polling logic or introducing unnecessary middleware (SQS, secondary Lambda functions). The key is recognizing that absence of an expected event (Lambda invocation) is itself a metric signal, and CloudWatch can natively detect this.
Content Locked: The Expert Analysis #
You’ve identified the answer. But do you know the implementation details that separate a Junior SysOps from a Senior SRE?
The Expert’s Analysis #
Correct Answer #
Option C
The Winning Logic #
Option C is the most operationally efficient solution because it leverages native CloudWatch alarm functionality with zero additional infrastructure:
-
Direct Metric Monitoring: The
Invocationsmetric is already published by Lambda at no additional cost. Every time the S3 event triggers the Lambda function, this metric increments. If the file doesn’t arrive, the metric remains at zero for that period. -
TreatMissingData Configuration: The critical SOA-C02 exam concept here is the alarm’s
TreatMissingDataparameter. By setting this tobreaching(ornotBreachingdepending on the condition), CloudWatch will actively evaluate periods with no data points as alarm states. This is perfect for detecting absence of expected events. -
No Custom Code: Unlike Options B and D, there’s no need to write, test, deploy, or maintain additional Lambda functions or message queue logic.
-
Native Integration: CloudWatch alarms directly publish to SNS topics—no intermediary services required.
SOA-C02-Specific CLI Implementation:
aws cloudwatch put-metric-alarm \
--alarm-name "Missing-Hourly-File-Alarm" \
--alarm-description "Alert when no Lambda invocations in 1 hour" \
--metric-name Invocations \
--namespace AWS/Lambda \
--statistic Sum \
--period 3600 \
--evaluation-periods 1 \
--threshold 1 \
--comparison-operator LessThanThreshold \
--treat-missing-data breaching \
--alarm-actions arn:aws:sns:us-east-1:123456789012:ops-team-alerts \
--dimensions Name=FunctionName,Value=process-market-data
Key Parameters Breakdown:
--period 3600: 1-hour evaluation window--threshold 1: Alarm triggers if invocations < 1--treat-missing-data breaching: This is the exam-critical setting—missing data points are treated as alarm conditions--comparison-operator LessThanThreshold: Detects zero invocations
The Trap (Distractor Analysis) #
Why Not Option A? #
S3 Lifecycle rules are not designed for real-time event detection.
- Lifecycle policies evaluate objects based on age/versioning criteria for cost optimization (transitioning to Glacier, expiration), not for triggering alerts on missing uploads.
- There is no native S3 lifecycle event notification for “zero objects transitioned.” Lifecycle actions operate on existing objects, not absence of objects.
- This is a conceptual misunderstanding of S3 Lifecycle vs. S3 Event Notifications—a common trap for candidates who confuse data management policies with monitoring mechanisms.
SRE Reality Check: You cannot create a lifecycle rule that detects “nothing happened.” This would require external logic to query the bucket, negating the “operationally efficient” requirement.
Why Not Option B? #
Introduces unnecessary complexity with SQS and a secondary Lambda function.
- You’re essentially building a custom message queue polling mechanism to achieve what CloudWatch alarms already do natively.
- The
ApproximateAgeOfOldestMessagemetric is useful for detecting backlog issues in queue-based processing, but here you’re creating a queue just to monitor it—this is over-engineering. - Operational overhead: Now you have two Lambda functions (original processor + SQS publisher), an SQS queue, and additional IAM permissions to manage.
- Cost inefficiency: Additional Lambda invocations and SQS message charges.
SRE Reality Check: This pattern is appropriate when you already have an SQS-based architecture and want dead-letter queue monitoring. Introducing SQS solely for alerting is an anti-pattern.
Why Not Option D? #
Custom polling Lambda with EventBridge scheduled rule—a manual implementation of what CloudWatch does natively.
- You’re writing custom code (
boto3S3 API calls to list objects, parse timestamps, compare times) that requires:- Unit testing
- Error handling (what if S3 API throttles?)
- Maintenance (timezone handling, daylight saving time edge cases)
- EventBridge scheduled rules have a minimum granularity of 1 minute, but this still requires you to manage the Lambda logic.
- If the scheduled Lambda itself fails (cold start timeout, memory issues, IAM permission errors), you now have a single point of failure in your monitoring system.
SRE Reality Check: This is a “junior SysOps” approach—building custom solutions when managed services already provide the feature. In production, this adds toil without benefit.
Example of the unnecessary code you’d write:
import boto3
from datetime import datetime, timedelta
s3 = boto3.client('s3')
sns = boto3.client('sns')
def lambda_handler(event, context):
bucket = 'market-data-bucket'
response = s3.list_objects_v2(Bucket=bucket, MaxKeys=1, Prefix='hourly/')
if 'Contents' not in response:
# No objects at all—send alert
sns.publish(TopicArn='arn:aws:sns:...', Message='No files found')
return
latest_file = response['Contents'][0]
last_modified = latest_file['LastModified']
if datetime.now(last_modified.tzinfo) - last_modified > timedelta(hours=1):
sns.publish(TopicArn='arn:aws:sns:...', Message=f'Last file is {last_modified}')
Compare this to Option C’s zero custom code.
The Technical Blueprint #
CloudWatch Alarm Architecture for Missing Data Detection #
# Step 1: Create SNS Topic for Alerts
aws sns create-topic --name operations-team-alerts
aws sns subscribe \
--topic-arn arn:aws:sns:us-east-1:123456789012:operations-team-alerts \
--protocol email \
--notification-endpoint [email protected]
# Step 2: Create CloudWatch Alarm on Lambda Invocations
aws cloudwatch put-metric-alarm \
--alarm-name "Missing-Hourly-File-Alert" \
--alarm-description "Triggers when expected Lambda invocation is missing" \
--metric-name Invocations \
--namespace AWS/Lambda \
--statistic Sum \
--period 3600 \
--evaluation-periods 1 \
--threshold 1 \
--comparison-operator LessThanThreshold \
--treat-missing-data breaching \
--alarm-actions arn:aws:sns:us-east-1:123456789012:operations-team-alerts \
--dimensions Name=FunctionName,Value=market-data-processor
# Step 3: Test the Alarm (Simulate Missing Data)
# Simply don't upload a file for 1 hour—the alarm will trigger
JSON CloudFormation Snippet (For Infrastructure-as-Code) #
{
"MissingFileAlarm": {
"Type": "AWS::CloudWatch::Alarm",
"Properties": {
"AlarmName": "Missing-Hourly-File-Alert",
"MetricName": "Invocations",
"Namespace": "AWS/Lambda",
"Statistic": "Sum",
"Period": 3600,
"EvaluationPeriods": 1,
"Threshold": 1,
"ComparisonOperator": "LessThanThreshold",
"TreatMissingData": "breaching",
"AlarmActions": [
{ "Ref": "OpsTeamSNSTopic" }
],
"Dimensions": [
{
"Name": "FunctionName",
"Value": "market-data-processor"
}
]
}
}
}
The Comparative Analysis #
| Option | Operational Overhead | Automation Level | Custom Code Required | Cost Impact | SOA-C02 Best Practice Alignment |
|---|---|---|---|---|---|
| A (S3 Lifecycle + Event) | High (misconfigured design) | Low | None, but doesn’t work as intended | Low (but ineffective) | ❌ Conceptual misunderstanding of lifecycle rules |
| B (SQS + Lambda + CW Alarm) | High (3 services to manage) | Medium | Yes (Lambda to SQS publisher) | Medium (Lambda + SQS charges) | ⚠️ Over-engineered for the requirement |
| C (CW Alarm on Invocations) ✅ | Minimal (1 alarm) | High (fully managed) | None | Lowest (alarm only) | ✅ Native AWS monitoring pattern |
| D (Scheduled Lambda Poller) | High (custom monitoring code) | Medium | Yes (timestamp comparison logic) | Medium (scheduled Lambda invocations) | ❌ Reinventing CloudWatch functionality |
Real-World Application (SRE Practitioner Insight) #
Exam Rule #
“For the SOA-C02 exam, when detecting missing expected events in serverless architectures, always prefer CloudWatch alarms on existing service metrics (Lambda Invocations, SQS NumberOfMessagesSent, etc.) configured with TreatMissingData: breaching. Avoid custom polling Lambda functions unless explicitly required by constraints.”
Real World #
“In production SRE environments, we take Option C even further:
- Composite Alarms: Combine the
Invocations < 1alarm with aDuration > Xalarm to detect both missing files AND slow processing. - Anomaly Detection: For less predictable schedules, use CloudWatch anomaly detection on the
Invocationsmetric to alert when invocation patterns deviate from historical baselines. - EventBridge Integration: While Option D’s scheduled Lambda is overkill for monitoring, EventBridge itself can be monitored—if the S3 PutObject event consistently fails to trigger, we use CloudWatch Contributor Insights to analyze event patterns.
Cost Optimization Note: For high-frequency invocations (e.g., every 5 minutes instead of hourly), we’d aggregate metrics with CloudWatch Metric Math to reduce alarm evaluation costs:
# Metric Math example for 5-minute intervals aggregated to hourly
aws cloudwatch put-metric-alarm \
--alarm-name "Missing-File-Aggregated" \
--metrics '[
{"Id":"m1","MetricStat":{"Metric":{"Namespace":"AWS/Lambda","MetricName":"Invocations","Dimensions":[{"Name":"FunctionName","Value":"processor"}]},"Period":300,"Stat":"Sum"}},
{"Id":"e1","Expression":"SUM(METRICS())/12","Label":"Hourly Invocation Count"}
]' \
--evaluation-periods 1 \
--threshold 12 \
--comparison-operator LessThanThreshold \
--treat-missing-data breaching
Debugging Tip: If your alarm isn’t triggering, verify the TreatMissingData setting using:
aws cloudwatch describe-alarms --alarm-names "Missing-Hourly-File-Alert" \
--query 'MetricAlarms[0].TreatMissingData'
This should return "breaching" for Option C’s configuration.”
Stop Guessing, Start Mastering #
Disclaimer
This is a study note based on simulated scenarios for the AWS SOA-C02 exam. Always refer to the latest AWS documentation and best practices for production implementations.