Transforming Log Analysis with ClickHouse: Cloud CIRCUS's Journey

Introduction

Cloud CIRCUS is on a mission to make marketers' lives easier. The Tokyo-based SaaS company offers a suite of digital tools designed to help businesses across Japan streamline their marketing. One of those tools is BlueMonkey, a CMS platform that facilitates the creation and management of professional websites.

Today, BlueMonkey supports more than 2,000 customer websites on AWS, each with its own CloudFront distribution. While this setup ensures rapid content delivery, it also results in tens of millions of access logs generated daily. These logs are crucial for tracking errors and deciphering user behavior; however, querying them with Amazon Athena can be slow, expensive, and hard to scale.

To address these challenges, Cloud CIRCUS's infrastructure engineer, Kyurin Shu, presented at a January 2025 ClickHouse meetup in Tokyo, showcasing how they successfully transitioned to using ClickHouse for log analysis. What began as a mere experiment transformed into a highly efficient, self-managed analytics pipeline—one that outperforms its predecessor in speed, automation, and cost-efficiency.

FROM ATHENA TO CLICKHOUSE

Initially, Cloud CIRCUS relied on Athena to scrutinize CloudFront logs stored in S3. However, with BlueMonkey's expanding customer base, the performance of Athena deteriorated significantly. Each customer site, equipped with its CloudFront distribution, accumulated millions of log records each day. Kyurin noted, "Analyzing all logs with Athena was challenging; there were too many CloudFront instances and records. It took too much time and cost too much money."

A major hurdle was Athena's data structure, which necessitated a separate table for each CloudFront distribution, complicating cross-site queries. The performance lagged; loading recent logs could take hours, and even straightforward queries were slow. Adding to their frustration, Athena charges based on the data scanned per query, leading to unpredictable and soaring costs.

In pursuit of a more viable solution, Kyurin and the team explored alternatives. "We thought if we could analyze CloudFront logs on our self-built ClickHouse environment, that would be great," he remarked. Drawn to ClickHouse's open-source nature and rapid aggregation capabilities, they appreciated its potential to consolidate all log data into a single table and escape Athena's per-query billing. However, they approached the opportunity cautiously, conducting a technical evaluation to assess ClickHouse's performance through a proof of concept on EC2.


BUILDING THE NEW SYSTEM

The team's approach was pragmatic and cost-effective as they launched their ClickHouse deployment. They initially set up a single Amazon EC2 instance using Amazon Linux 2023, equipped with 4 vCPUs and 32 GB of RAM, and installed ClickHouse via official RPM packages. As Kyurin explained, "We started with a small server for technical verification and to keep costs down." During the early testing phase, it simplified management since both the ClickHouse server and client operated on the same machine.

The next step was schema design, where the team aimed to maintain consistency by mirroring Athena's CloudFront log table configuration. "This ensured we got results similar to those from the queries executed by Athena," Kyurin elaborated, making it easier to draw comparisons between the two systems. A minor yet crucial adjustment was required: while Athena stored HTTP status codes as integers, some CloudFront logs featured values like "000," prompting ClickHouse to label that column as a string.

Utilizing the MergeTree engine, they partitioned the data daily and sorted it by host_header and date, which supported the aggregate queries they intended to run. They then progressed to importing logs. With CloudFront logs already routed to S3 and organized by domain, ClickHouse's s3 table function enabled them to seamlessly pull logs from a date range without manual file listing. Their configuration ensured the first two header lines were omitted and placeholder dashes were treated as null values—streamlining the initial data load.

FASTER QUERIES, LOWER COSTS

The real challenge emerged as they tackled scale since the cumulative volume of logs across all sites was substantial. To expedite this process, they developed a pipeline utilizing Amazon EventBridge and ECS. A scheduled batch task scans S3, compiles lists of domains, and imports the relevant logs into ClickHouse for each domain. "This lets us automate everything," he stated.

For even greater efficiency, they parallelized their import process using AWS Fargate, distributing logs across 20 tasks, with each task managing different domain groups. "What used to take several hours to insert all at once now finishes in under 30 minutes," Kyurin stated proudly. To prevent system overload, they capped concurrent tasks to maintain memory usage below 80%, carefully achieving a balance between performance and stability.

Since implementing the new pipeline, Cloud CIRCUS has witnessed significant improvements in query performance. At the Tokyo meetup, Kyurin demonstrated a daily access count query that took approximately 16 seconds in Athena, and it now completes in merely 0.043 seconds in ClickHouse for a single site. When aggregating data across all domains, ClickHouse executed the query in under nine seconds, processing over 470 million rows—whereas Athena would need 2,000+ separate queries and potentially hours to achieve the same results.


Conclusion

Cloud CIRCUS's transition from Athena to ClickHouse exemplifies the potential of leveraging advanced analytics tools to optimize business operations and facilitate growth. By embracing ClickHouse's capabilities, the team transformed not only their data management processes but also established faster, more reliable methods of query execution. This allowed them to maintain a streamlined workflow despite the increasing complexity of their systems.

Ultimately, their experience serves as a testament to the power of learning, discipline, and growth in navigating significant changes in technology—enabling them to offer comprehensive solutions to the marketers and businesses they support. The operational efficiencies gained through ClickHouse reflect a commitment to persistence in the face of mounting data challenges.

Questions and Answers

1. Why did Cloud CIRCUS switch from Athena to ClickHouse?
Cloud CIRCUS switched to ClickHouse to improve query speed, reduce costs, and streamline log data management across numerous CloudFront instances.

2. What challenges did the team face with Athena?
The main challenges included slow performance due to the high number of CloudFront distributions, high query costs, and difficulties in querying across multiple tables.

3. How did the team ensure a smooth migration to ClickHouse?
They conducted a technical evaluation using a proof of concept to benchmark ClickHouse's performance against Athena before full-scale migration.

4. What improvements has Cloud CIRCUS observed after switching to ClickHouse?
The team has experienced significantly faster query performance and more predictable, lower costs with a self-managed analytics pipeline.

5. How is Cloud CIRCUS's automated log import pipeline structured?
The automated pipeline uses Amazon EventBridge and ECS to scan S3, import relevant logs into ClickHouse, and employs AWS Fargate to parallelize the import process.

tags:ClickHouse, log analysis, Cloud CIRCUS, data management, SaaS

Comments

Social

Popular posts from this blog

Revolutionizing Developer Productivity with Shopify's AI Tool, Roast

Master JSON Merging: Best Practices and Step-by-Step Guide

Unveiling Garbage Collection: The Unsung Hero of Memory Management