Hybrid Observability: Streaming On-Premises Logs and Metrics to Amazon OpenSearch Service
Published on December 28, 2025
Introduction
Centralizing observability data from on-premises servers in a hybrid environment is challenging but essential for unified monitoring, alerting, and analysis. After some trial and error, I settled on an architecture that separates logs and metrics ingestion into two dedicated OpenSearch Ingestion pipelines. This approach provides better control, scalability, and troubleshooting—lessons learned the hard way!
This post details my production-ready setup using the Unified CloudWatch Agent on on-premises servers, Kinesis Data Streams for real-time streaming, and Amazon OpenSearch Ingestion pipelines (with optional Lambda transformation) to deliver data into Amazon OpenSearch Service.
Why Two Separate Ingestion Pipelines?
Logs and metrics have different characteristics:
- Logs: High-volume, text-based, often compressed, require parsing and enrichment.
- Metrics: Structured time-series data (from ProcStat, StatsD, etc.), lower volume, different indexing needs.
Mixing them in one pipeline complicated transformations and scaling. Separating them allows tailored processing, independent scaling, and easier debugging.
Architecture Overview
Pipeline 1: Logs (Real-Time Streaming)
On-Premises Servers
└── Unified CloudWatch Agent → CloudWatch Logs Group (application/system logs)
CloudWatch Logs Group
└── Subscription Filter → Amazon Kinesis Data Streams
Kinesis Data Streams
└── OpenSearch Ingestion Pipeline (optional Lambda transformation) → Amazon OpenSearch Service (logs index)
Pipeline 2: Metrics (Near Real-Time)
On-Premises Servers
└── Unified CloudWatch Agent → CloudWatch Metrics (custom namespace, e.g., CWAgent/onprem)
CloudWatch Metrics
└── Metric Streams → Kinesis Data Firehose → S3 Bucket (buffered JSON)
S3 Bucket
└── Event Notification → OpenSearch Ingestion Pipeline (optional Lambda transformation)→ Amazon OpenSearch Service (metrics index)
Note: CloudWatch Metrics don't stream directly like logs, so we use Metric Streams + Firehose for continuous export to S3, then OSI pulls/processes them.
Core Components
- Unified CloudWatch Agent: Collects both logs and metrics on-premises, ships securely to CloudWatch.
- CloudWatch Logs Subscription: Streams logs in near real-time to Kinesis Data Streams.
- Kinesis Data Streams: Buffers and delivers high-throughput log data reliably.
- CloudWatch Metric Streams: Continuously streams metrics to Firehose.
- Kinesis Data Firehose: Delivers metrics to S3 for durable buffering.
- OpenSearch Ingestion (OSI) Pipelines: Serverless, scalable ingestion with processors (grok, date, etc.) and optional Lambda for complex transformations.
- Amazon OpenSearch Service: Centralized search, analytics, and Dashboards.
Step-by-Step Setup
1. Install and Configure Unified CloudWatch Agent On-Premises
# Example for Linux
wget https://s3.amazonaws.com/amazoncloudwatch-agent/linux/amd64/latest/amazon-cloudwatch-agent.rpm
sudo rpm -U ./amazon-cloudwatch-agent.rpm
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
Select "On-Premises", configure IAM credentials (or Roles Anywhere), enable logs (e.g., /var/log/*.log) and metrics (CPU, memory, disk, procstat).
2. Logs Pipeline: CloudWatch Logs → Kinesis Data Streams → OSI
- Create Kinesis Data Stream.
- Add Subscription Filter on Log Group to the stream.
- Create OSI Pipeline with Kinesis as source (use blueprint "AWS-KinesisDataStreamsPipeline").
- Add processors: decompress (for gzipped CloudWatch batches), grok/parse, optional Lambda for enrichment.
- Sink to OpenSearch logs index.
3. Metrics Pipeline: CloudWatch Metrics → Firehose → S3 → OSI
- Create Metric Stream including CWAgent namespace, output to Firehose.
- Firehose delivers JSON-formatted metrics to S3 (enable buffering).
- Configure S3 event notifications (or SQS) to trigger OSI.
- Create second OSI Pipeline with S3 as source.
- Process/flatten metric data, sink to metrics index (ISM for rollover).
4. Monitoring with CloudWatch
OSI pipelines publish detailed metrics/logs to CloudWatch. Set alarms on IngestionOCU, FailedEvents, etc.
Common Gotchas (Learned the Hard Way)
- CloudWatch batches are gzipped—always decompress first in OSI.
- Metric Streams format is specific; test parsing in OSI.
- IAM: Agent needs CloudWatch permissions; pipelines need cross-resource access.
- Scaling: Add OSI compute units (OCUs) for high volume.
- Index mapping: Define templates for consistent logs/metrics schemas.
Handy Tips
- Use OSI blueprints to bootstrap pipelines quickly.
- Test with low-volume data first.
- Enable detailed CloudWatch logging on pipelines.
- Use Index State Management (ISM) for daily rollovers.
- Visualize in OpenSearch Dashboards—separate indices for logs vs. metrics queries.
What I Like About This Setup
- Real-time (logs) and near-real-time (metrics) with durability via S3.
- Serverless scaling, no infrastructure to manage.
- Powerful processing in OSI without custom code (mostly).
- Unified view in OpenSearch for correlation (logs + metrics).
Challenges Encountered
- Initial confusion around metric export paths.
- Parsing compressed/multi-event batches.
- Cost optimization for high-volume on-prem data.
Conclusion
This dual-pipeline architecture bridges on-premises observability to AWS OpenSearch effectively. Separating logs and metrics streamlined operations and improved reliability. As my hybrid environment grows, this foundation enables advanced analytics and proactive monitoring.
Further Reading