Event-driven analytics pipeline
ETL with Kafka, Debezium, Apache Flink, and Python. Real-time streaming into ClickHouse and MongoDB.
- Kafka
- Flink
- ClickHouse
- Microservices
Problem
Conicle needed real-time analytics from operational databases without blocking production or building one-off ETL jobs. Data had to flow into analytics storage (ClickHouse, MongoDB) with low latency and clear ownership.
System design
- CDC: Debezium captured changes from source DBs and published to Kafka.
- Stream processing: Apache Flink jobs consumed topics, transformed and enriched events, and wrote to ClickHouse and MongoDB.
- Python: Supporting services and glue code in Python for flexibility and data-team collaboration.
Architecture
- Microservices owned their events; a central pipeline consumed and routed by use case.
- Multi-tenant isolation via tenant IDs in payloads and partitioned sinks.
- Monitoring and alerting on lag, throughput, and error rates.
Impact
- Single pipeline for multiple analytics consumers; reduced duplicate ETL and ad-hoc scripts.
- Near real-time dashboards and reporting.
- Clear ownership and scalability per topic and sink.