Skip to main content

Command Palette

Search for a command to run...

ClickHouse Advanced Tutorial: Performance Comparison with MySQL

Updated
7 min read
ClickHouse Advanced Tutorial: Performance Comparison with MySQL

Introduction

Nothing is perfect. In terms of databases, you can't expect the best performance for every task and query from your deployed database. However, the vital step as a software developer is to know their strengths and weaknesses and how to deal with them.

In this post, I will compare Clickhouse as a representative of the OLAP database and MySQL of the OLTP. This will help us to choose better solutions for our challenges according to our conditions and desires. Before jumping into the main context, let's discuss OLTP, OLAP, MySQL, and ClickHouse.

OLTP

OLTP stands for Online Transaction Processing and is used for day-to-day operations, such as processing orders and updating customer information. OLTP is best for short, fast transactions and is optimized for quick response times. It is essential to ensure data accuracy and consistency and provide an efficient way to access data.

OLAP

OLAP stands for Online Analytical Processing and is used for data mining and analysis. It enables organizations to analyze large amounts of data from multiple perspectives and identify trends and patterns. OLAP is best for complex queries and data mining and can provide impossible insights with traditional reporting tools.

MySQL

MySQL is a popular open-source database management system. It is used to store and manage data and is utilized by websites and applications to store and manage information. MySQL is a relational database management system that holds data in tables and allows users to query the data. It also provides features such as triggers, stored procedures, and views. MySQL is easy to use and has a wide range of features that can be used to create powerful and efficient applications.

ClickHouse

ClickHouse is an open-source column-oriented database management system developed by Yandex. It is designed to provide high performance for analytical queries. ClickHouse uses a SQL-like query language for querying data and supports different data types, including integers, strings, dates, and floats. It offers various features such as clustering, distributed query processing, and fault tolerance. It also supports replication and data sharding. You can know more about this database by visiting the first part of this series:

Now we can talk about the performance comparison.

Comparison Case Study

I've followed the Clickbench repository methodology for the case study. It uses the hits dataset obtained from the actual traffic recording of one of the world's largest web analytics platforms. hits contain about 100M rows as a single flat table. This repository studies more than 20 databases regarding dataset load time, elapsed time for 43 OLAP queries, and occupied storage. You can access their visualized results here.

To investigate ClickHouse and MySQL performance specifically, I separated 10M rows of the table and chose some of the predefined queries that can make our point more clear. Those queries are mainly in OLAP manner, so they only show ClickHouse strengths compared to MySQL (i.e., MySQL loses in all those queries). Hence, I added other queries showing the opposite (OLTP queries). Although I've limited the benchmark to these two databases, you can generalize the concept to other row-oriented and column-oriented DBMSs.

Disclaimer: This benchmark only clarifies the main difference between column-oriented and row-oriented databases regarding their performance and use cases. It should not be considered a reference for your use cases. Hence, you should perform your benchmarks with your queries to achieve the best decision.

System Specification

Databases are installed on Ubuntu 22.04 LTS on a system with the below specifications:

  • CPU: Intel® Core™ i7-10510U CPU @ 1.80GHz × 8

  • RAM: 16 GiB

  • Storage: 256 GiB SSD

Benchmark Flow

  1. The database is created.

  2. The table is created with the defined DDL.

  3. Data (hits.tsv) is loaded into the table, and its time is measured.

  4. Queries are run, and each query's elapsed time is measured.

Queries

Query NumberStatementType
1SELECT COUNT(*) FROM hits;OLAP
2SELECT SUM(AdvEngineID), COUNT(*), AVG(ResolutionWidth) FROM hits;OLAP
3SELECT URL, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND DontCountHits = 0 AND IsRefresh = 0 AND URL <> '' GROUP BY URL ORDER BY PageViews DESC LIMIT 10;OLAP
4SELECT WatchID, ClientIP, COUNT(*) AS c, SUM(IsRefresh), AVG(ResolutionWidth) FROM hits GROUP BY WatchID, ClientIP ORDER BY c DESC LIMIT 10;OLAP
5SELECT EventTime, WatchID FROM hits WHERE CounterID = 38 AND EventDate = '2013-07-15' AND UserID = '1387668437822950552' AND WatchID = '8899477221003616239';OLTP
6SELECT Title, URL, Referer FROM hits WHERE CounterID = 38 AND EventDate = '2013-07-15' AND UserID = '1387668437822950552' AND WatchID = '8899477221003616239';OLTP
7UPDATE hits SET Title='my title', URL='my url', Referer='my referer' WHERE CounterID = 38 AND EventDate = '2013-07-15' AND UserID = '1387668437822950552' AND WatchID = '8899477221003616239';OLTP

Results

I'll study the results under four categories:

  • Dataset Load

  • Table Size

  • Read Queries Execution

  • Update Query Execution: I've discussed the update query (query number 7) separately since it needs more discussion and attention.

Dataset Load

ClickHouseMySQLRatio
65s11m35sx10.7

Thanks to the LSM and sparse indexes, ClickHouse load time is much faster than MySQL, which uses BTree. However, ClickHouse inserts efficiency is observable in bulk inserts instead of many individual inserts. This behavior comes from the fact that it creates immutable parts for each insert and is unwilling to change, remove or create its data for a few rows.

Table Size

ClickHouse (GiB)MySQL (GiB)Ratio
1.36.32x4.86

The column-oriented structure gives the ability of Data Compression, something that is not available in row-oriented databases. That is why ClickHouse can do a practical favor to the teams storing a high amount of data, reducing the storage cost.

Read Queries Execution

Query NumberClickHouse (s)MySQL (s)Ratio
10.0057.79x1558
20.03016.0x533.3
30.1934.35x22.5
42.600180.93x69.58
50.010.00x0
60.0110.00x0

ClickHouse's sparse index and column-oriented structure have outperformed MySQL in all OLAP queries (numbers 1 to 4). That's why BI and Data Analysts would be more than happy with ClickHouse for their daily reports.

However, MySQL wins the battle when it comes to OLTP queries (numbers 5 and 6). Btree (equipped by MySQL) indeed performs better for pointy queries in which you demand short transactions requiring few rows.

Update Query Execution

For the update query (number 7), we should execute a different query in ClickHouse as it doesn't support updates in a naive way, and the Alter command has to be used:

ALTER TABLE hits UPDATE JavaEnable=0 WHERE CounterID = 38 AND EventDate = '2013-07-15' AND UserID = '1387668437822950552' AND WatchID = '8899477221003616239';

Additionally, ClickHouse applies the update asynchronously. To have the result immediately, you've to perform an optimize command:

OPTIMIZE TABLE hits FINAL;

By performing query number 7 statement shown in the queries table for MySQL and the two above SQL statements for ClickHouse, we achieve the below results:

Query NumberClickHouse (s)MySQL (s)Ratio
7260.000

Again, ClickHouse mutation hatred makes it a loser for real-time updates (and similarly deletes) compared to MySQL. Consequently, other methods like deduplication using ReplacingMergeTree can be utilized to handle updates. You can find valuable resources in the below links:

Conclusion

In this post, I benchmarked MySQL and ClickHouse databases to study some of their strengths and weaknesses that may help us choose a suitable solution. To summarize:

  • MySQL performs better on pointy and OLTP queries.

  • ClickHoues performs better on OLAP queries.

  • ClickHouse is not designed for frequent updates and deletes. You have to handle them with deduplication methods.

  • ClickHouse reduces the storage cost thanks to its column-oriented structure.

  • ClickHouse bulk inserts load time operates far better than MySQL.


Originally published at https://dev.to on June 8, 2023.