Navigating Data Challenges: The Journey of LINE Manga with ClickHouse

Introduction

In the world of digital comics, Japan stands out as a vibrant hub, with its citizens engaging deeply with manga. "Japanese people read a lot of manga, you know," says Kazuki Matsuda from LINE Digital Frontier. This enthusiasm is not exaggerated, especially considering that LINE Manga regularly competes for the top spot against leading mobile games on platforms like the Apple Store and Google Play. LINE Manga, operated by LINE Digital Frontier, a subsidiary of WEBTOON Entertainment—home to the world's largest digital comic platform—boasts around 150 million monthly active users as of the end of March 2025. Leveraging data analytics plays a crucial role in understanding reader behavior, optimizing recommendations, and calculating revenues in real-time.

The Path of Transformation

Nevertheless, Matsuda acknowledges that the road has not always been smooth. Several years ago, LINE Manga underwent a significant transition by migrating its core platform systems to a vast network relying on MySQL databases. Although the speed and ease of use improved, the challenges intensified as the scale of data analysis increased. At the January 2025 ClickHouse meetup in Tokyo, Matsuda explained how he addressed these issues by enabling real-time analysis through ClickHouse without the need to ingest data from MySQL directly. This innovative approach reflects the critical importance of learning from past challenges, showcasing the ability to adapt and grow.

As LINE Manga expanded its user base over more than a decade, the architecture of its platform evolved to meet growing demands. A key element of this architecture's evolution has been MySQL, which has facilitated both horizontal and vertical sharding to ensure smooth operations. Product data is consolidated across multiple databases, while user data is segmented by ID, distributing it across numerous databases. This strategy fostered efficiency for the application but posed significant obstacles for data analysis.

Matsuda recalls, "With MySQL, we struggled with what should have been straightforward tasks." Even basic operations, such as merging book metadata with user purchase data, required custom scripts to locate the correct shards, with manual checks for each query. Summing total sales across the platform demanded extensive coding from scratch, leading to prolonged processing times. Despite owning an internal analytics platform, the reliance on ETL pipelines impeded the company's ability to maintain up-to-date data. Additionally, internal policies limited the ability to upload entire datasets, complicating tasks further.


The ClickHouse Solution

Matsuda's experience highlights that rebuilding or migrating the MySQL stack presented considerable risks and costs. Consequently, it was imperative to find a method to perform operations without moving data from its existing location. This is where ClickHouse entered the picture. Instead of overhauling architecture or managing new pipelines, LINE Manga opted for a method that allowed direct querying of MySQL without duplicating data ingestion.

This strategy employed ClickHouse's MySQL table engine to create virtual tables tied to existing MySQL instances. These tables can be queried like other ClickHouse tables while pushing the queries down to MySQL and processing returned results. As a result, product data and user transactions could be combined, regardless of separate server storage, facilitating real-time SQL answers efficiently. Matsuda noted that using ClickHouse, the filtering of queries is significantly optimized, marking a transformational shift in data processing.

As illustrated in the ClickHouse Tokyo Meetup, Matsuda showcased a demo connecting three MySQL servers (one hosting master data and two storing user shards) to a Mac using Docker Compose. The schema was uniform, mirroring production environments but divided by user IDs. Using clickhouse-local, he executed a single SQL query to merge purchase history across all user shards, illustrating how precise outcomes could be attained without any data replication. With fast processing and simplified debugging, developers gain immediate insights across all shards without worrying about data location.

Conclusion

For LINE Manga, retaining MySQL as a record system while employing ClickHouse for real-time data exploration and debugging has proven invaluable. By focusing on advantages such as sub-millisecond query execution, LINE Manga harnesses ClickHouse's potential to boost efficiency. Engaging with users, Matsuda urged developers to adopt ClickHouse, demonstrating how to elevate local development, conduct queries on existing databases, and efficiently manage outputs. This approach not only serves as a model for other developers but also emphasizes the core idea of discipline in tackling challenges and the necessity of persistence in pursuit of innovative solutions.


Questions and Answers

1. What role does ClickHouse play for LINE Manga?
ClickHouse facilitates real-time data analysis directly querying MySQL databases without data duplication.

2. How does LINE Manga ensure efficient data processing?
Through a combination of MySQL and the ClickHouse architecture, which allows for quick access and analysis of data.

3. Why was moving from MySQL considered risky?
Rebuilding or migrating the MySQL stack involves significant costs and risks that could affect operations.

4. What challenges did LINE Manga face with its previous data systems?
Issues arose from the complex nature of data sharding and difficulties in accessing user data for analysis.

5. How can other developers benefit from LINE Manga's approach?
Their methodology for using ClickHouse to handle complex data challenges serves as a model for improving analytics efficiency and operational strategies.

tags:ClickHouse, data analysis, LINE Manga, MySQL

Comments

Social

Popular posts from this blog

Revolutionizing Developer Productivity with Shopify's AI Tool, Roast

Master JSON Merging: Best Practices and Step-by-Step Guide

Unveiling Garbage Collection: The Unsung Hero of Memory Management