Jeff’s Note #
Unlike generic exam dumps, ADH analyzes this scenario through the lens of a Real-World Site Reliability Engineer.
For SOA-C02 candidates, the confusion often lies in how to efficiently correlate high-level CPU anomalies with the exact process responsible. In production, this is about knowing exactly how to enable detailed per-process metrics using minimal operational overhead. Let’s drill down.
The Certification Drill (Simulated Question) #
Scenario #
BrightWave Technologies operates a Linux-based web application hosted on several Amazon EC2 instances. They have been experiencing intermittent CPU spikes during the evening that persist for about 5 minutes each time. These CPU surges degrade application performance noticeably. The Site Reliability Engineering (SRE) team needs to identify the specific processes or services on the EC2 instances responsible for the increased CPU consumption by capturing the process IDs (PIDs) and their CPU usage.
The Requirement #
The SRE team wants to collect process-level CPU utilization data with minimal manual effort and automatable visibility into which processes cause CPU spikes during these incidents.
The Options #
- A) Configure the Amazon CloudWatch agent with the procstat plugin to collect CPU metrics for individual processes.
- B) Set up an AWS Lambda function that runs every minute to SSH into EC2 instances, captures running process IDs, and sends notifications.
- C) Perform manual SSH login each evening using the private.pem key and run the
topcommand to identify high CPU processes. - D) Use the default EC2 CPU utilization metric in CloudWatch to identify the PID causing the spike.
Google adsense #
leave a comment:
Correct Answer #
A.
Quick Insight: The SysOps Imperative #
The key here is automated, fine-grained monitoring. Default CloudWatch metrics lack process-level granularity and manual investigation is costly and error-prone.
Content Locked: The Expert Analysis #
You’ve identified the answer. But do you know the implementation details that separate a Junior from a Senior?
The Expert’s Analysis #
Correct Answer #
Option A
The Winning Logic #
Option A leverages the Amazon CloudWatch agent’s procstat plugin, which can be configured to collect CPU and memory metrics per process on Linux EC2 instances. This is the most automated, scalable approach to gather detailed diagnostics without manual intervention. The procstat plugin captures metrics such as CPU utilization by process name, PID, or user, and reports them to CloudWatch Logs or CloudWatch metrics for alerting and visualization. This enables rapid identification of resource-hungry processes during spikes.
Key technical details:
- The CloudWatch agent is deployed on the EC2 instance.
procstatplugin is enabled in the agent’s config JSON with filtering rules for processes of interest.- Metrics are published at configurable intervals (e.g., 1 minute).
- CloudWatch Alarms or dashboards can be used to trigger operational responses.
The Trap (Distractor Analysis): #
- Option B: Running a Lambda function to SSH into instances every minute is complex, error-prone, and not scalable. It introduces security concerns and overhead for managing credentials and connectivity. Also, Lambda cold starts and network hiccups could cause missed metrics.
- Option C: Manual SSH
topmonitoring is time-consuming, non-automated, and impractical for persistent problems. It also requires privileged access and real-time presence, which is operationally costly. - Option D: The default EC2 CPU utilization metric aggregates CPU usage across all processes and cannot reveal PIDs or per-process resource consumption. It lacks the granularity needed to isolate which process caused the spike.
The Technical Blueprint #
# Example snippet to configure CloudWatch agent procstat plugin (on EC2 Linux)
cat <<EOF > amazon-cloudwatch-agent.json
{
"metrics": {
"append_dimensions": {
"AutoScalingGroupName": "\${aws:AutoScalingGroupName}",
"InstanceId": "\${aws:InstanceId}"
},
"metrics_collected": {
"procstat": [
{
"measurement": [
"cpu_usage"
],
"metrics_collection_interval": 60,
"pid_file": "/var/run/nginx.pid", # or use process_name filter
"process_name": "nginx"
}
]
}
}
}
EOF
# Start the agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:amazon-cloudwatch-agent.json -s
The Comparative Analysis #
| Option | Operational Overhead | Automation Level | Impact |
|---|---|---|---|
| A | Low | High (agent-based) | Detailed per-process CPU metrics |
| B | Very High | Medium (Lambda-driven) | Complex, insecure, high maintenance |
| C | Very High | None | Manual, not scalable |
| D | None | None | No PID-level detail available |
Real-World Application (Practitioner Insight) #
Exam Rule #
For the exam, always pick CloudWatch agent procstat plugin when you need per-process metrics on EC2.
Real World #
In production, while some teams rely on manual SSH or custom scripts, automating with CloudWatch agent achieves best operational efficiency, consistency, and integration with AWS monitoring.
(CTA) Stop Guessing, Start Mastering #
Disclaimer
This is a study note based on simulated scenarios for the SOA-C02 exam.