The Blog of Hamed Karbasi

Visualizing Avro Kafka Data in Grafana: Streaming Real-Time Schemas

Hamed Karbasi — Sun, 15 Feb 2026 13:21:38 GMT

Hey everyone, if you've been following along from my last post on streaming JSON into Grafana with the Kafka Datasource plugin, you know how awesome it is to watch real-time data light up your dashboards. Today, we're leveling up to Avro; that compact, schema-evolved format so many Kafka setups swear by. I'll walk you through using the plugin's Avro support, whether you're pulling from a Schema Registry or embedding schemas inline. It's straightforward, and you'll be visualizing those binary messages in no time.

Before we jump in

If you haven't checked out my previous article on the JSON feature, I'd highly recommend giving it a quick read first. It covers the foundational concepts of how this plugin works: connecting to Kafka, picking topics, authentication, configuring partitions and offsets, and mapping message fields to Grafana panels. The Avro support builds on the exact same mental model: Kafka topic → live stream → fields in Grafana. So you'll feel right at home.

The only real difference? Avro messages are binary and need a schema to be decoded into something Grafana can work with. Once decoded, everything else (queries, panels, alerts) works identically to JSON.

What is Avro (and why teams use it)?

Apache Avro is a schema-based serialization format that encodes data compactly, often much smaller than JSON, and relies on a schema to describe what each message contains. Think of it as a contract between your producer and consumer: the schema defines field names, types, and structure, and the binary payload is just the raw values packed efficiently.

Here's a quick example to see the difference. Below is an Avro schema of measurements of a sensor:

{
  "type": "record",
  "name": "SensorReading",
  "fields": [
    {"name": "sensor_id", "type": "string"},
    {"name": "temperature", "type": "double"},
    {"name": "humidity", "type": ["null", "int"], "default": null},
    {"name": "timestamp", "type": "long"}
  ]
}

The data in Avro binary (~29 bytes, strongly typed):

\x10sensor-42C@33333\x02\xc8\x01\xd0\xf4\x8b\x9a\xb2\x30

Same data in JSON (~91 bytes, no type safety):

{
  "sensor_id": "sensor-42",
  "temperature": 23.4,
  "humidity": 68,
  "timestamp": 1708027200000
}

↑ Field names repeated in every message + UTF-8 text encoding make JSON ~3x larger than Avro's compact binary format*; JSON types are also implicit; you won't know if temperature is a float, double, or string until runtime.*

Decoded by Grafana plugin (what you see in panels):

{
  "sensor_id": "sensor-42",
  "temperature": 23.4,
  "humidity": 68,
  "timestamp": 1708027200000
}

↑ Ready to map to Grafana fields: temperature becomes a time series, sensor_id a label

The big wins are efficiency and type safety: Avro doesn't repeat field names in every message (huge savings at scale), and the schema enforces types at write time, no surprise strings where you expected numbers. Plus, schema evolution lets you add fields and maintain backward/forward compatibility when you follow the rules, which is why Avro is commonly paired with a Schema Registry in Kafka ecosystems. Teams love it because streaming millions of messages per second becomes cheaper, safer, and more manageable than with JSON.

Big picture: what the plugin does for Avro

This Grafana Kafka datasource plugin can consume Kafka messages and turn them into Grafana-friendly fields in real time, supporting both JSON and Avro payloads. For Avro specifically, the plugin handles the heavy lifting:

Fetches schemas from a Schema Registry (like Confluent's) automatically based on topic/subject naming conventions
Accepts inline schemas you paste directly into the query or upload the schema file (great for quick tests or environments without a registry)
Deserializes binary messages on the fly and flattens nested records into dot-notation fields that Grafana understands

So whether your Avro messages come from IoT sensors, clickstream events, or service logs, the plugin decodes them, and you build dashboards just like you would with JSON data.

↑ A sample of the decoded Avro data in Grafana using the plugin

Prerequisites

Before we dive into configs, make sure you have:

Grafana 10.2+ installed and running
Plugin version 1.2.0+ (Avro support landed here): install via grafana-cli plugins install hamedkarbasi93-kafka-datasource or grab the latest zip from the GitHub releases
Kafka broker (v0.9+ works great)
(Optional) Schema Registry running at something like http://localhost:8081 if you want the registry approach

No Schema Registry? Totally fine! Inline schemas work perfectly for demos, local dev, and quick debugging.

Data source setup for Avro

Head to Connections > Data sources > Add new data source and search for "Kafka". Fill in the basics:

Bootstrap Servers: e.g., localhost:9092
SASL/SSL toggles: if your cluster requires auth, flip these on and add credentials
Schema Registry URL & its username/password: e.g., http://schema-registry:8081

For inline schema mode, you don't need to set the registry URL at the datasource level; you'll paste the schema directly in the data query panel. But if you're using a registry in production, configuring it once here means all your Avro queries auto-fetch schemas without extra config.

Hit Save & Test. A green check means you're connected and the plugin can reach Kafka.

Two ways to decode Avro

Schema Registry (recommended for production)

If you already have a Schema Registry (common with Confluent-style setups), you point the datasource at the registry URL and let it resolve schemas automatically. After hitting Test Connection and receiving the "The schema registry is accessible" confirmation, you can be confident the plugin can reach your registry successfully. The plugin follows Kafka's subject naming convention (e.g., -value for message values), fetches the latest schema version, and deserializes each message.

This keeps dashboards clean because you're not pasting schemas into Grafana queries, and it naturally supports teams that evolve schemas over time, just register a new version, and the plugin picks it up.

Inline/online schema (great for demos and quick debugging)

If you don't have a registry (or you're just prototyping), you can paste the full Avro schema JSON directly into the query editor, and the plugin will use it to decode messages. Alternatively, you can upload an Avro schema file formatted as .avsc for convenience. The plugin validates the inline schema automatically to ensure it's properly formatted before attempting to deserialize messages

Building your first Avro query

Create a new panel, pick your Kafka datasource, and configure the query:

Format: Switch from JSON to Avro
Topic: Type or autocomplete your Avro topic name (e.g., server-metrics)
Partitions: Click "Fetch" to list partitions, then pick specific ones or select all
Offset: Choose "Latest" for live streams, or "Last N" to replay recent messages
Schema Mode:
- Schema Registry: The plugin auto-derives the subject (e.g., server-metrics-value) and fetches the schema
- Inline Schema: Paste the full Avro schema JSON in the text box

Once you save, the plugin starts consuming, deserializing on the fly, and populating fields in the panel's data frame.

Understanding field flattening with nested schemas

One of the features is how the plugin handles nested Avro records: it flattens them into dot-notation field names, allowing Grafana to work with them naturally. Let's use a real example to see this in action.

Here's a nested schema representing server metrics with host info, multi-level metrics, nullable fields, and arrays:

{
    "type": "record",
    "name": "NestedMessage",
    "fields": [
        {
            "name": "host",
            "type": {
                "type": "record",
                "name": "Host",
                "fields": [
                    {"name": "name", "type": "string"},
                    {"name": "ip", "type": "string"}
                ]
            }
        },
        {
            "name": "metrics",
            "type": {
                "type": "record",
                "name": "Metrics",
                "fields": [
                    {
                        "name": "cpu",
                        "type": {
                            "type": "record",
                            "name": "CPU",
                            "fields": [
                                {"name": "load", "type": ["null", "double"], "default": null},
                                {"name": "temp", "type": "double"}
                            ]
                        }
                    },
                    {
                        "name": "mem",
                        "type": {
                            "type": "record",
                            "name": "Memory",
                            "fields": [
                                {"name": "used", "type": "int"},
                                {"name": "free", "type": "int"}
                            ]
                        }
                    }
                ]
            }
        },
        {"name": "value1", "type": ["null", "double"], "default": null},
        {"name": "value2", "type": ["null", "double"], "default": null},
        {
            "name": "tags",
            "type": {"type": "array", "items": "string"}
        },
        {
            "name": "alerts",
            "type": {
                "type": "array",
                "items": {
                    "type": "record",
                    "name": "Alert",
                    "fields": [
                        {"name": "type", "type": "string"},
                        {"name": "severity", "type": "string"},
                        {"name": "value", "type": "double"}
                    ]
                }
            }
        },
        {
            "name": "processes",
            "type": {"type": "array", "items": "string"}
        }
    ]
}

When the plugin decodes a message with this schema, you'll see flattened fields like:

host.name (string) → label/dimension
host.ip (string) → label/dimension
metrics.cpu.load (double, nullable) → time series value
metrics.cpu.temp (double) → time series value
metrics.mem.used (int) → time series value
metrics.mem.free (int) → time series value
value1, value2 (nullable doubles) → time series values
tags → JSON string ["prod","edge"]
alerts → JSON string [{"severity":"warning","type":"cpu_high","value":98.99}]
processes → JSON string ["nginx","mysql","redis"]

This means you can directly reference metrics.cpu.temp in a Graph panel or use host.name as a grouping dimension in a Table; no manual parsing is needed.

Practical tips for nested schemas

Nullable unions (like ["null", "double"]) are handled gracefully; null values just show up as gaps in the time series.
Arrays are serialized as JSON strings in the data frame (e.g., ["nginx","mysql"] or [{"severity":"warning","type":"cpu_high","value":98.99}]). You can parse them further with Grafana's JSON transform or display them as-is in Table panels.
Deep nesting is flattened up to depth 5 by default; beyond that, the plugin won't flatten further to avoid performance issues.
You can adjust both the flatten depth and the maximum field limit (defaults: depth 5, fields 1000) in the Advanced Settings section of the Config Editor if your schema needs more room.
Use field aliases in Grafana's Transform tab to rename metrics.cpu.temp to something friendlier, like "CPU Temperature" in legends.

Testing it out with live data

The repo includes a Go-based producer that can publish Avro messages to your local Kafka. Fire it up:

go run ./example/go \
  -broker localhost:9094 \
  -topic server-metrics \
  -interval 500 \
  -format avro \
  -schema-registry http://localhost:8081

This pushes a message every 500ms with randomized metrics. Now create a panel in Grafana, point it at the server-metrics topic with Avro format, and watch the fields populate in real time. Build a Stat panel for metrics.cpu.temp, a Graph for metrics.mem.used over time, or a Table grouped by host.name, it all just works.

Wrapping up

There you have it! Avro streaming unlocked in Grafana. Whether you're running a full Confluent Platform with Schema Registry or just need to decode some binary messages locally, this plugin makes it dead simple to visualize Avro data alongside your other telemetry.

Questions? Found a bug? Feature idea? Hit up the GitHub issues or drop a comment below. And if this saved you a few hours of head-scratching, toss the repo a star, which helps more folks find it!

Happy streaming! 🚀

Visualizing Kafka Data in Grafana: Consuming Real-Time Messages for Dashboards

Hamed Karbasi — Fri, 19 Dec 2025 12:25:47 GMT

Ever tried visualizing data with Grafana? Whether you’re building a simple dashboard with Prometheus to track resource usage or running complex queries on Loki or Elasticsearch, Grafana is one of the popular go-to tools.

Grafana offers open-source and self-hosted options, as well as enterprise and cloud solutions, all designed to help you gain insights from almost anything. And by anything, I really mean it: from time-series databases like Prometheus and InfluxDB, to relational databases such as PostgreSQL, and even cloud-native metrics from services like AWS CloudWatch and Google Cloud Monitoring.

However, in modern data platforms, a huge amount of data doesn’t sit in databases necessarily; it flows through streaming systems. And one name shows up almost everywhere: Kafka!

Got Kafka? Let's Talk About It!

Kafka is a powerful broker, and if you're using it, you probably have a bunch of questions about your Kafka cluster. These questions usually fall into two categories:

Curious About Kafka Metrics?

How many clusters are we running?
How much disk space is each broker using?
Is the cluster healthy?
What’s the throughput in messages per second or bytes per second?
How many topics and partitions do we have?

These are all Kafka cluster metrics. The good news is that this part is already well covered: you can export Kafka metrics to Prometheus and visualize them in Grafana using ready-made dashboards.

Wondering About Kafka Data?

What messages are being produced on a specific topic?
Can I stream messages in real time to see what’s actually happening?
Can I build dashboards based on the data flowing through Kafka?
My messages are in JSON, Avro, or Protobuf and are fairly complex. How can I visualize them effectively?

If these questions sound familiar, then you’re no longer just interested in metrics. You want visibility into the data itself.

This is where a Grafana data source plugin becomes essential, allowing Grafana to connect directly to Kafka, consume messages, and turn streaming data into dashboards.

Existing Tools for Inspecting Kafka Data

Kafka ecosystems already offer excellent tools for working directly with Kafka data, and many teams rely on them on a daily basis.

kcat (formerly kafkacat) is a lightweight CLI tool that’s extremely useful for quick inspections, debugging producers and consumers, and validating message payloads.
AKHQ, Conduktor, Confluent Control Center, Redpanda Console, and similar UIs provide rich, Kafka-focused interfaces for browsing topics, inspecting messages, and working with schemas.

These tools are purpose-built for Kafka-specific workflows. Beyond message inspection, they offer deep visibility into Kafka itself, including cluster health, metadata, Schema Registry, Kafka Connect, and overall operational state. When teams need to understand or operate Kafka as a system, these tools are often a natural fit.

Then, Why Visualize Kafka Data in Grafana?

The Kafka Data Source plugin does not aim to replace these tools. Instead, it focuses on a different and complementary use case.

In many organizations, Grafana is already the central place for dashboards and observability. Metrics, logs, traces, and business KPIs often live side by side in Grafana, giving teams a shared view of how systems behave.

By bringing Kafka data into Grafana, teams can:

Visualize Kafka data alongside existing dashboards, rather than in a separate, Kafka-only tool
Correlate Kafka messages with application metrics, infrastructure metrics, and alerts
Use Grafana’s alerting and visualization capabilities to gain deeper insights
Follow the principle of having all operational insights in one place

While the aforementioned tools focus exclusively on Kafka, Grafana provides a broader observability context. The Kafka Data Source plugin helps bridge that gap, making Kafka data part of the same story as the rest of your systems.

Kafka Data Source Plugin

The Kafka Data Source plugin works as a Kafka consumer. It reads messages from the topics and partitions you specify and makes that data available directly inside Grafana, so you can visualize streaming data instead of just monitoring cluster metrics.

Get Started

To use the plugin, you first need to install it in Grafana. You can do this either via the Grafana UI or the CLI. Alternatively, you can install it using the plugin ZIP file or by following the provisioning instructions available on the plugin’s GitHub page.

grafana-cli plugins install hamedkarbasi93-kafka-datasource

In this article, we’ll focus on using the plugin to visualize nested JSON messages in Grafana. I’ll cover other features such as Avro in future articles.

What You Need Before You Start

Before diving in, make sure you have:

Grafana with the Kafka Data Source plugin installed
Access to a Kafka cluster, including its connection details (bootstrap servers and TLS/SASL credentials, if required)

Setting Up Your Data Source in Grafana

Once the plugin is installed, you can add a new Kafka data source in Grafana and configure the following settings.

Connection Settings

Bootstrap servers
Enter a comma-separated list of Kafka brokers.
Security protocol
Choose one of the supported protocols:
- PLAINTEXT
- SSL
- SASL_PLAINTEXT
- SASL_SSL

Authentication and TLS

Depending on the selected security protocol, you may need to configure authentication and TLS:

SASL settings
- SASL mechanism
- SASL username
- SASL password
TLS settings
- Server certificate
- Client certificate
- Client key

These fields are required only when SASL or TLS is enabled.

Schema Registry (Avro Only)

Schema Registry settings are only required when the message format is set to Avro. If you’re working with JSON messages, you can safely skip this section.

Advanced JSON Settings

For JSON messages, the plugin provides advanced controls to help manage complex and deeply nested structures:

Maximum JSON flatten depth
Controls how deeply nested objects are flattened.
JSON field limit
Sets the maximum number of fields that will be expanded.

These settings keep dashboards readable, prevent overly wide tables, and help protect Grafana from heavy data loads.

Key Query Editor Fields

Once the data source is configured, you’ll use the query editor to define what data you want to read from Kafka and how it should appear in Grafana.

Topic — Enter the Kafka topic name. You can click Fetch to retrieve the available partitions for that topic.
Partition — Choose All partitions or a specific partition.
Offset — Select the starting position for the query:
- Latest — read the newest messages.
- Last N messages — set N to read the most recent N messages.
- Earliest — start from the earliest available offset.
Timestamp Mode — Choose the timestamp for Grafana points:
- Kafka Event Time — uses Kafka message metadata timestamp.
- Dashboard received time — uses the time Grafana receives the message.
Message Format — Set to JSON for JSON message parsing.

Mapping JSON to Grafana

When the Message Format is set to JSON, the plugin parses the Kafka record value as a JSON object and prepares it for visualization in Grafana. To make nested structures usable in dashboards, the plugin automatically flattens nested JSON into dot-delimited fields.

Example JSON Message

{
  "service": {
    "name": "api",
    "latency": {
      "p50": 23,
      "p95": 75
    }
  },
  "host": "node-2",
  "requests": 128
}

After Flattening

The JSON above is transformed into the following fields:

service.name → "api"
service.latency.p50 → 23 (numeric)
service.latency.p95 → 75 (numeric)
host → "node-2"
requests → 128 (numeric)

How Grafana Uses These Fields

Numeric fields (such as latency percentiles or request counts) are ideal for time series visualizations like line charts.
String or non-numeric fields work best in table panels or as labels and metadata.

This mapping allows you to turn raw Kafka messages into meaningful dashboards that you can leverage Grafana transformations like group-by or filtering to achieve what you are looking for.

Example Workflow

Let’s walk through a simple case study to see how everything comes together.

Step 1: Produce Sample Data

Start by producing some sample JSON messages to a Kafka topic (for example, test). You can use the example producer provided in the repository to generate messages like the one below:

{
  "alerts": [
    {
      "severity": "warning",
      "type": "cpu_high",
      "value": 25.867063332016414
    },
    {
      "severity": "info",
      "type": "mem_low",
      "value": 55.72157456801047
    }
  ],
  "host": {
    "ip": "127.0.0.1",
    "name": "srv-01"
  },
  "metrics": {
    "cpu": {
      "load": 0.25867063332016416,
      "temp": 65.21794911901341
    },
    "mem": {
      "free": 9065,
      "used": 2767
    }
  },
  "processes": [
    "nginx",
    "mysql",
    "redis"
  ],
  "tags": [
    "prod",
    "edge"
  ],
  "value1": 0.25867063332016416,
  "value2": 1.1144314913602094
}

This message includes a mix of nested objects, arrays, numeric values, and strings, a fairly realistic example of what Kafka messages often look like in practice.

Step 2: Explore the Data in Grafana

Next, open the Explore page in your Grafana instance. Select the Kafka data source and configure the query editor with the following values:

Topic: test
Partition: All partitions
Offset: Latest
Timestamp Mode: Kafka Event Time
Message Format: JSON

Once the query is running, you should see live data streaming into Grafana as a table. Nested JSON fields will be flattened automatically, making it easy to inspect both numeric and non-numeric values.

Step 3: Visualize the Data with Dashboards

To take this a step further, you can use the provisioned dashboard included in the repository to visualize the same Kafka data using multiple panels.

This dashboard demonstrates how different parts of the message can be visualized:

numeric fields as time series
structured data in tables
multiple signals derived from the same Kafka topic

It’s a simple example, but it highlights the core idea: Kafka data can be treated just like any other data source in Grafana and combined with existing dashboards and observability signals.

Conclusion

Kafka plays a central role in many modern data platforms, but working with its data often requires jumping between specialized tools. At the same time, Grafana has become the place where teams come together to observe, correlate, and understand how their systems behave. The Kafka Data Source plugin is designed to bridge that gap.

By making Kafka data available directly in Grafana, the plugin allows you to visualize streaming data alongside existing dashboards, correlate it with metrics and alerts, and gain insights without leaving the tools your organization already relies on. It doesn’t replace Kafka-native tools; instead, it complements them by bringing Kafka data into the broader observability picture.

In this article, we focused on visualizing JSON messages and building simple, real-time workflows. In upcoming articles, we’ll explore more advanced features such as Avro support, schema registry integration, and additional use cases that build on the same core ideas.

Your Turn

How would visualizing Kafka data in Grafana change the way your team monitors and understands your systems?
What challenges have you faced when working with Kafka data, and how do you currently deal with them?
What’s the first Kafka use case you’d try visualizing in Grafana?

Securing Kafka: Demystifying SASL, SSL, and Authentication Essentials

Hamed Karbasi — Sun, 31 Aug 2025 14:36:38 GMT

The first time I had to secure a Kafka cluster, I felt like I was drowning in acronyms: SASL, SSL, SCRAM, Kerberos… The more I delved into the documentation, the more confused I became.

If you’ve felt the same, don’t worry! You’re not alone. Kafka security sounds more complicated than it is. Once you separate protocols (the road your data travels on) from mechanisms (how you prove who you are), it starts making sense.

At each step, I'll break down the jargon and provide practical configuration examples.

Why Kafka Security Even Matters

Kafka often sits at the heart of an organization’s data pipeline. It’s where clickstream data, financial transactions, and system logs all flow.

If it’s left unsecured:

Anyone could publish garbage messages.
Attackers could silently read sensitive data.
A bad actor could impersonate a legitimate client.

That’s why Kafka bakes in a flexible security model. You just have to piece together the building blocks.

Protocols vs. Mechanisms: A Simple Analogy

Here’s the mental model that helped me:

Protocol = the road. Is it safe to drive on? Is it fenced off? Or is it a wide-open dirt path anyone can walk onto?
Mechanism = the toll booth. How do you prove you belong there? Show an ID, enter a password, flash a badge?

Kafka gives you both:

SSL/TLS = the secure road (encrypted tunnel).
SASL = the toll booth framework (authentication).

SSL/TLS: The Secure Tunnel

SSL/TLS makes sure no one can peek into or tamper with the messages flowing between your clients and brokers.

How it works in practice:

The Kafka broker shows a certificate, proving it’s legit.
The client checks if the broker’s certificate can be trusted.
Optionally, the client shows its own certificate back (mutual TLS).

Here’s what that looks like in Kafka broker config:

listeners=SSL://:9092
ssl.keystore.location=/etc/kafka/secrets/kafka.server.keystore.jks
ssl.keystore.password=YOUR_PASSWORD
ssl.key.password=YOUR_PASSWORD
ssl.truststore.location=/etc/kafka/secrets/kafka.server.truststore.jks
ssl.truststore.password=YOUR_PASSWORD

From then on, all traffic over the port 9092 is encrypted.

What are the Keystore and Truststore?

Keystore: Think of it as your ID wallet. It holds your private key and certificate that prove who you are (the broker in this case).
Truststore: This is your list of trusted IDs. It contains certificates from other parties you trust (like clients or certificate authorities).

In the example above, the broker uses its keystore to prove its identity to clients. The truststore is used to verify client certificates if mutual TLS is enabled. So, the truststore is what the broker uses to check others' identities.

What are those JKS files?

.jks files are Java KeyStores, which are file formats used to store cryptographic keys and certificates. In Kafka, they are used for SSL/TLS encryption to ensure secure communication between clients and brokers. In other environments, other formats like PEM or PFX might be used, but Kafka primarily relies on JKS due to its Java foundation.

You can generate JKS files using Java's keytool utility. For a step-by-step guide, check out this resource.

Understanding SSL vs. TLS: What's the Difference?

SSL (Secure Sockets Layer) and TLS (Transport Layer Security) are like the older and younger siblings in the world of secure communication protocols. While they often get mentioned together, they have some important differences:

Evolution: Think of SSL as the trailblazer. It paved the way, but TLS took over with more advanced versions. SSL 3.0 was the last of its kind, and then TLS 1.0 came along, evolving through versions 1.1, 1.2, and now the latest, 1.3, each bringing stronger security features.
Security Enhancements: TLS is like a fortified castle compared to SSL's wooden fort. It uses stronger encryption algorithms and more secure hash functions, making it a tough nut to crack against modern cyber threats.
Deprecation of SSL: SSL has had its day in the sun, but due to vulnerabilities, it's now considered outdated and insecure. Most systems have moved on to TLS to keep data safe and sound.
Handshake Process: Both protocols use a handshake to establish a secure connection, but TLS does it with more finesse. Its handshakes are more efficient and secure, especially in the latest versions.
Backward Compatibility: TLS is designed to be backward compatible with SSL, allowing systems to support both during transitions. However, it's wise to disable SSL support to avoid potential security risks.

In a nutshell, while SSL laid the groundwork, TLS is the modern standard that offers enhanced security and performance. When securing Kafka or any other system, using the latest version of TLS is crucial to fend off vulnerabilities.

SASL: The Authentication Framework

SASL (Simple Authentication and Security Layer) isn’t an encryption, but a framework for checking identities. You can run SASL on:

PLAINTEXT (not secure, don’t use in production).
SSL (secure, because SSL wraps the authentication exchange in encryption).

So in real deployments, you’ll almost always see SASL_SSL.

SASL Mechanisms in Kafka

Here are the main authentication options you can plug into SASL:

PLAIN
- Username + password in clear text.
- Fine for quick tests, unsafe in production.
SCRAM (Salted Challenge Response Authentication Mechanism)
- Like PLAIN but more secure.
- Passwords are stored salted + hashed.
- Production-friendly.
GSSAPI (Kerberos)
- Great if your company already uses Kerberos.
- Heavyweight, but enterprise-grade.
OAUTHBEARER
- Uses OAuth 2.0 tokens.
- Perfect for integrating with modern identity providers (Okta, Keycloak, etc.).

Example: SASL_PLAINTEXT with PLAIN

Let's say you want to set up a quick test environment using SASL_PLAINTEXT with the PLAIN mechanism. Here's how you can configure both the Kafka broker and client.

Broker config:

listeners=SASL_PLAINTEXT://:9092
sasl.enabled.mechanisms=PLAIN
sasl.mechanism.inter.broker.protocol=PLAIN
security.inter.broker.protocol=SASL_PLAINTEXT

Wait! inter.broker.protocol? What is that?

This setting defines how Kafka brokers authenticate with each other. In a multi-broker setup, brokers need to communicate securely. By setting sasl.mechanism.inter.broker.protocol=PLAIN, you're specifying that brokers will use the PLAIN mechanism for their internal communication. Of course, you can choose the same mechanism as clients or a different one, depending on your security requirements.

Example: SASL_PLAINTEXT with SCRAM

This is a step up from PLAIN, adding better password security:

Broker config:

listeners=SASL_PLAINTEXT://:9092
sasl.enabled.mechanisms=SCRAM-SHA-512
sasl.mechanism.inter.broker.protocol=SCRAM-SHA-512
security.inter.broker.protocol=SASL_PLAINTEXT

How is SCRAM more secure than PLAIN?

SCRAM enhances security over PLAIN in several key ways:

Password Storage: In PLAIN, passwords are stored in clear text, making them vulnerable if the storage is compromised. SCRAM stores passwords in a salted and hashed format, which means even if someone gains access to the stored credentials, they can't easily retrieve the original passwords.
Challenge-Response Mechanism: SCRAM uses a challenge-response mechanism during authentication. This means that the password is never sent over the network in clear text. Instead, a challenge is issued, and the client responds with a hashed version of the password combined with the challenge, making it much harder for attackers to intercept and misuse the password.
Salting: SCRAM adds a unique salt to each password before hashing it. This means that even if two users have the same password, their stored hashes will be different, protecting against rainbow table attacks.
Iterative Hashing: SCRAM allows for multiple iterations of hashing, which increases the time it takes to compute the hash. This makes brute-force attacks significantly more difficult, as attackers would need to spend more time and resources to guess passwords.

Put them all together: SASL_SSL

This combo is one of the most common in production because it’s secure without being overcomplicated.

Broker config:

listeners=SASL_SSL://:9094
sasl.enabled.mechanisms=SCRAM-SHA-512
sasl.mechanism.inter.broker.protocol=SCRAM-SHA-512
security.inter.broker.protocol=SASL_PLAINTEXT
ssl.keystore.location=/etc/kafka/secrets/kafka.server.keystore.jks
ssl.keystore.password=YOUR_PASSWORD
ssl.key.password=YOUR_PASSWORD
ssl.truststore.location=/etc/kafka/secrets/kafka.server.truststore.jks
ssl.truststore.password=YOUR_PASSWORD

Client config:

security.protocol=SASL_SSL
sasl.mechanism=SCRAM-SHA-512
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required \
    username="alice" \
    password="alice-secret";

Result:

All traffic is encrypted.
Only clients with valid SCRAM credentials get in.

Docker Compose Example

Here is a full example using Docker Compose with Bitnami's Kafka image, which supports three listeners: Plaintext, SASL_PLAINTEXT with SCRAM, and SASL_SSL with SCRAM out of the box. This setup includes SSL certificates and user credentials. For more information, check out this GitHub repo. There I have explained how to generate the certificates and run test clients to see how different mechanisms and protocols work.

version: '3.8'
services:
  kafka:
    image: bitnami/kafka:3.7
    container_name: kafka
    environment:
      - KAFKA_KRAFT_MODE=true
      - KAFKA_CFG_PROCESS_ROLES=broker,controller
      - KAFKA_CFG_NODE_ID=1
      - KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=1@kafka:9093
      - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092,SASL_PLAINTEXT://:29092,SASL_SSL://:39092,CONTROLLER://:9093
      - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092,SASL_PLAINTEXT://kafka:29092,SASL_SSL://kafka:39092
      - KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=SASL_PLAINTEXT:SASL_PLAINTEXT,PLAINTEXT:PLAINTEXT,SASL_SSL:SASL_SSL,CONTROLLER:PLAINTEXT
      - KAFKA_CFG_SSL_CLIENT_AUTH=required
      - KAFKA_CFG_SSL_KEYSTORE_LOCATION=/bitnami/kafka/config/certs/kafka.keystore.jks
      - KAFKA_CFG_SSL_KEYSTORE_PASSWORD=bitnami123
      - KAFKA_CFG_SSL_KEY_PASSWORD=bitnami123
      - KAFKA_CFG_SSL_TRUSTSTORE_LOCATION=/bitnami/kafka/config/certs/kafka.truststore.jks
      - KAFKA_CFG_SSL_TRUSTSTORE_PASSWORD=bitnami123
      - KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
      - KAFKA_CFG_SASL_ENABLED_MECHANISMS=SCRAM-SHA-512
      - KAFKA_CFG_SASL_MECHANISM_INTER_BROKER_PROTOCOL=SCRAM-SHA-512
      - KAFKA_CFG_INTER_BROKER_LISTENER_NAME=SASL_PLAINTEXT
      - KAFKA_CLIENT_USERS=testuser
      - KAFKA_CLIENT_PASSWORDS=testpass
      - KAFKA_INTER_BROKER_USER=admin
      - KAFKA_INTER_BROKER_PASSWORD=adminpass
      - KAFKA_CFG_SUPER_USERS=User:admin
      - KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE=true
      - ALLOW_PLAINTEXT_LISTENER=yes
      - KAFKA_CFG_CLUSTER_ID=VkGbvjMzQNKtC-P_RMzqgg
    ports:
      - "9092:9092"
      - "29092:29092"
      - "39092:39092"
    volumes:
      - ./certs:/bitnami/kafka/config/certs:ro

What are those certs?

The certs directory contains the necessary SSL certificates and keystores for secure communication. You can generate these using tools like OpenSSL or Java's keytool.

Final Thoughts

Common Real-World Patterns

SSL only → Secure channel, optional client certs.
SASL_SSL + SCRAM → The sweet spot for many teams.
SASL_SSL + Kerberos/OAUTHBEARER → Enterprises with Kerberos in place.

Best Practices (Learned the Hard Way)

Avoid using SASL_PLAINTEXT outside of development environments unless you are certain the network channel is secure.
Rotate credentials and certificates regularly.
Prefer SCRAM or OAUTHBEARER over PLAIN for authentication.
Use Kafka ACLs to enforce the least privilege.

Conclusion

Securing Kafka might initially seem daunting due to the myriad of acronyms and configurations, but breaking it down into its core components—protocols and mechanisms—simplifies the process. By understanding SSL/TLS as the secure channel and SASL as the authentication framework, you can effectively protect your Kafka cluster. Implementing best practices such as using SCRAM or OAUTHBEARER for authentication, regularly rotating credentials, and enforcing Kafka ACLs ensures robust security. With these insights, securing Kafka becomes a manageable task, transforming what once felt like a complex challenge into a straightforward process.

The trick to understanding Kafka security is simple:

SSL/TLS = the secure pipe (encryption).
SASL = the framework for authentication.
Mechanisms = the actual way you prove identity.

Once you see it this way, the acronym soup clears up, and securing Kafka stops feeling like dark magic.

Harnessing Ceph S3 with Kubernetes: An Operator Solution

Hamed Karbasi — Fri, 05 Apr 2024 10:00:00 GMT

Photo by Sixteen Miles Out on Unsplash

Introduction

Ceph & Rados Gateway

Ceph is an open-source, distributed storage system designed for scalability and reliability, providing unified solutions for block, file, and object storage under a single platform. At its core lies the Reliable Autonomic Distributed Object Store (RADOS), which ensures data is stored fault-tolerant across multiple storage devices.

Ceph RGW (Rados Gateway) is an integral part of Ceph that offers object storage accessible via RESTful APIs, compatible with Amazon S3 and OpenStack Swift protocols. By leveraging Ceph's distributed architecture, RGW provides a scalable, high-performance platform for storing and retrieving unstructured data such as images, videos, and backups. It is a crucial component for applications demanding highly available and scalable object storage solutions.

OK! What is the problem?

Nowadays, many private cloud providers utilize Ceph and RGW to deliver S3-compatible storage for their users. However, when it comes to the production usage under many users' demand, you (as the cloud administrator) end up with many requests inquiring about S3 user, bucket, and quota management, which can be overwhelming:

Create or delete my user on S3
Increase or decrease my user quota
Create or delete a bucket
Create sub-users and give fine-grained access to my bucket

Nevertheless, we propose a Kubernetes-native solution to enable users to handle such inquiries independently, with less burden on administrators and more capability with users.

Kubernetes Operators

Kubernetes Operators automate application management in Kubernetes environments. They utilize the Operator SDK for development, simplifying, creating, and managing these operators by encapsulating operational knowledge into code. This approach efficiently manages complex applications, significantly enhancing automation and operational efficiency. By extending Kubernetes' capabilities with custom resources, Operators offer a powerful method for automating deployment, scaling, and management tasks, streamlining the lifecycle of distributed systems.

Enough! What is your solution?

We have developed the Ceph S3 Operator, an open-source Kubernetes operator crafted to streamline the management of S3 users and buckets within a Ceph cluster environment. After installation, users can provision their S3User or S3Bucket with the Kubernetes objects, enabling them to utilize GitOps solutions like ArgoCD to manage S3 users or buckets.

Features

S3 User Management
Bucket Management
Subuser Support
Bucket policy Support
Quota Management
Webhook Integration
E2E Testing
Helm Chart and OLM Support

Ceph S3 Operator vs. COSI

When managing users and buckets via Kubernetes Custom Resource Definitions (CRDs) on Ceph storage with an S3-compatible API, you may come across the Container Object Storage Interface (COSI) as an alternative solution. This raises the question: Why did we choose to develop a new operator from scratch instead of utilizing the COSI standard or its implementations? Although COSI offers similar features, it has some significant limitations. COSI currently lacks support for essential features such as quota validation and bucket access policies, which are crucial for many teams to manage and control their storage resources effectively. By developing a dedicated operator, we aim to provide a more flexible and feature-rich solution that caters to a wider range of organizational needs.

Prerequisites

Kubernetes v1.23.0+
Ceph v14.2.10+: prior Ceph versions don't support the sub-user bucket policy. Nevertheless, other features are expected to work correctly within those earlier releases.
ClusterResourceQuota CRD (already installed in OpenShift clusters): kubectl apply -f config/external-crd

Installation

Using Operator Life Cycle Manager (OLM):

If you have already installed OLM on your Kubernetes cluster, you can install the operator from OperatorHub:

Create a secret containing your Ceph Cluster credentials:

apiVersion: v1
kind: Secret
metadata:
  name: s3-operator-controller-manager-config-override
  namespace: operators

stringData:
  config.yaml: |
    s3UserClass: ceph-default
    clusterName: production
    validationWebhookTimeoutSeconds: 10
    rgw:
      endpoint: 
      accessKey: 
      secretKey: 

type: Opaque

Remember to substitute the Ceph endpoint, the admin access and secret keys accordingly in the above secret object.

2. Create the Subscription:

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: ceph-s3-operator-subscription
  namespace: operators
spec:
  channel: stable
  name: ceph-s3-operator
  source: operatorhubio-catalog
  sourceNamespace: olm
  config:
    volumes:
    - name: config
      secret:
        items:
        - key: config.yaml
          path: config.yaml
        secretName: s3-operator-controller-manager-config-override
    volumeMounts:
    - mountPath: /s3-operator/config/
      name: config

3. Create the Operator Group

apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: global-operators
  namespace: operators

Using Helm:

Create a custom-values.yaml file and specify your Ceph cluster configuration. Of course, you can set other operator configurations like its resources according to the chart [values.yaml](https://github.com/snapp-incubator/ceph-s3-operator/blob/main/charts/ceph-s3-operator/values.yaml).

# custom-values.yaml
controllerManagerConfig:
  configYaml: |
    s3UserClass: ceph-default
    clusterName: production
    validationWebhookTimeoutSeconds: 10
    rgw:
      endpoint: 
      accessKey: 
      secretKey:

2. Install the helm chart:

helm upgrade --install ceph-s3-operator \
oci://ghcr.io/snapp-incubator/ceph-s3-operator/helm-charts/ceph-s3-operator \
--version v0.3.7 \
--values custom-values.yaml

Whether installed via OLM or helm, supposing everything goes well, you can see the controller pod running successfully:

kubectl get pods -n operators

Usage

S3UserClaim

Since the operator assesses S3 quotas concerning the cluster's resource quota and namespace resource quota, we need to create CRQ and RQ objects along with the hard quotas consisting of the below parameters:

s3/objects: Maximum number of objects the user can store
s3/size: Maximum size of the storage the user can store in bytes
s3/buckets: Maximum number of buckets the user can store

apiVersion: quota.openshift.io/v1
kind: ClusterResourceQuota
metadata:
  name: myteam
spec:
  quota:
    hard:
      s3/objects: 5k
      s3/size: 22k
      s3/buckets: 5k
  selector:
    labels:
      matchLabels:
        snappcloud.io/team: myteam
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: example-quota
  namespace: s3-test
spec:
  hard:
    s3/objects: 1000
    s3/size: 20000
    s3/buckets: 15

2. Now, we need to add thesnappcloud.io/teamlabel specified in the previous step to the namespace where we plan to deploy the operator objects (s3-test here):

kubectl label namespace s3-test snappcloud.io/team=myteam

3. Create an S3UserClaim object. With the specified quota, this creates a user, a read-only sub-user, and two other sub-users on Ceph RGW.

apiVersion: s3.snappcloud.io/v1alpha1
kind: S3UserClaim
metadata:
  name: s3userclaim-sample
  namespace: s3-test
spec:
  adminSecret: s3-sample-admin-secret
  quota:
    maxBuckets: 5
    maxObjects: 1000
    maxSize: 1000
  readonlySecret: s3-sample-readonly-secret
  s3UserClass: ceph-default
  subusers:
    - subuser1
    - subuser2

s3UserClaim

The user and sub-user credentials are kept in the secrets:

User and sub-user credentials

With the access and secret keys in the above secrets, we would have the following:

s3-sample-admin-secret: Credentials for the main user capable of creating a bucket under its tenant with full access.
s3-sample-readonly-secret: Credentials for a read-only sub-user which can only read the buckets and objects in the tenant
s3userclaim-sample-subuserx: Credentials for the subuser1 that can only list the buckets in the tenant and doesn't have any further access unless we promote its access in the S3Bucket object (we will see it in the next section)

Note: Any user provision is handled via the S3UserClaim. We have a S3User which is created by the S3 User Claim instance automatically and is not applicable for the operator user. With the claim, we’ve followed the similar concept in the persistent volume claim (PVC) and persistent volume (PV).

S3User

If you try to create or edit the S3UserClaim with a quota exceeding any of the ClusterResourceQuota or ResourceQuota, the change will be prohibited by the operator webhook. For example, if I try to increase the s3userclaim-samplemaxObjects to 10K, which is greater than the corresponding values in CRQ and RQ (5K and 1K), I will face the bellow error:

The operator avoids the change due to the quota excess

S3Bucket

Now, let's create an S3Bucket whose owner is the S3User we created in the previous step:

apiVersion: s3.snappcloud.io/v1alpha1
kind: S3Bucket
metadata:
  name: s3bucket-sample
  namespace: s3-test
spec:
  s3DeletionPolicy: delete
  s3UserRef: s3userclaim-sample

The s3DeletionPolicy is on delete by default, which will delete the corresponding bucket on storage if the S3Bucket object is deleted. However, setting it to retain, keeps the storage bucket even if the S3Bucket is deleted (for unintentional deletion).

As expressed before, sub-users don't have either write or read access to the bucket by default. However, you can gain their access by specifying the s3SubuserBinding field:

apiVersion: s3.snappcloud.io/v1alpha1
kind: S3Bucket
metadata:
  name: s3bucket-sample
  namespace: s3-test
spec:
  s3DeletionPolicy: delete
  s3SubuserBinding:
    - access: write
      name: subuser1
    - access: read
      name: subuser2
  s3UserRef: s3userclaim-sample

Now, subuser1 has the read, and subuser2 has the read/write access to the s3bucket-sample. You can change them at any moment and re-apply the Yaml file.

Conclusion

In conclusion, the Ceph S3 Operator presents a Kubernetes solution that simplifies the management and deployment of Ceph S3 storage in cloud-native environments. Automating complex processes and ensuring seamless integration with existing Kubernetes infrastructure enhances operational efficiency and paves the way for more resilient and scalable storage solutions. Embracing this operator signifies a step forward in optimizing storage strategies, unlocking new possibilities for developers and organizations keen on leveraging the best of both Ceph S3 and Kubernetes.

Originally published at itnext.io.

ClickHouse Advanced Tutorial: Performance Comparison with MySQL

Hamed Karbasi — Thu, 08 Jun 2023 10:00:00 GMT

Introduction

Nothing is perfect. In terms of databases, you can't expect the best performance for every task and query from your deployed database. However, the vital step as a software developer is to know their strengths and weaknesses and how to deal with them.

In this post, I will compare Clickhouse as a representative of the OLAP database and MySQL of the OLTP. This will help us to choose better solutions for our challenges according to our conditions and desires. Before jumping into the main context, let's discuss OLTP, OLAP, MySQL, and ClickHouse.

OLTP

OLTP stands for Online Transaction Processing and is used for day-to-day operations, such as processing orders and updating customer information. OLTP is best for short, fast transactions and is optimized for quick response times. It is essential to ensure data accuracy and consistency and provide an efficient way to access data.

OLAP

OLAP stands for Online Analytical Processing and is used for data mining and analysis. It enables organizations to analyze large amounts of data from multiple perspectives and identify trends and patterns. OLAP is best for complex queries and data mining and can provide impossible insights with traditional reporting tools.

MySQL

MySQL is a popular open-source database management system. It is used to store and manage data and is utilized by websites and applications to store and manage information. MySQL is a relational database management system that holds data in tables and allows users to query the data. It also provides features such as triggers, stored procedures, and views. MySQL is easy to use and has a wide range of features that can be used to create powerful and efficient applications.

ClickHouse

ClickHouse is an open-source column-oriented database management system developed by Yandex. It is designed to provide high performance for analytical queries. ClickHouse uses a SQL-like query language for querying data and supports different data types, including integers, strings, dates, and floats. It offers various features such as clustering, distributed query processing, and fault tolerance. It also supports replication and data sharding. You can know more about this database by visiting the first part of this series:

https://hamedkarbasi.com/clickhouse-basic-tutorial-an-introduction

Now we can talk about the performance comparison.

Comparison Case Study

I've followed the Clickbench repository methodology for the case study. It uses the hits dataset obtained from the actual traffic recording of one of the world's largest web analytics platforms. hits contain about 100M rows as a single flat table. This repository studies more than 20 databases regarding dataset load time, elapsed time for 43 OLAP queries, and occupied storage. You can access their visualized results here.

https://github.com/ClickHouse/ClickBench

To investigate ClickHouse and MySQL performance specifically, I separated 10M rows of the table and chose some of the predefined queries that can make our point more clear. Those queries are mainly in OLAP manner, so they only show ClickHouse strengths compared to MySQL (i.e., MySQL loses in all those queries). Hence, I added other queries showing the opposite (OLTP queries). Although I've limited the benchmark to these two databases, you can generalize the concept to other row-oriented and column-oriented DBMSs.

Disclaimer: This benchmark only clarifies the main difference between column-oriented and row-oriented databases regarding their performance and use cases. It should not be considered a reference for your use cases. Hence, you should perform your benchmarks with your queries to achieve the best decision.

System Specification

Databases are installed on Ubuntu 22.04 LTS on a system with the below specifications:

CPU: Intel® Core™ i7-10510U CPU @ 1.80GHz × 8
RAM: 16 GiB
Storage: 256 GiB SSD

Benchmark Flow

The database is created.
The table is created with the defined DDL.
Data (hits.tsv) is loaded into the table, and its time is measured.
Queries are run, and each query's elapsed time is measured.

Queries

Query Number	Statement	Type
1	`SELECT COUNT(*) FROM hits;`	OLAP
2	`SELECT SUM(AdvEngineID), COUNT(*), AVG(ResolutionWidth) FROM hits;`	OLAP
3	`SELECT URL, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND DontCountHits = 0 AND IsRefresh = 0 AND URL <> '' GROUP BY URL ORDER BY PageViews DESC LIMIT 10;`	OLAP
4	`SELECT WatchID, ClientIP, COUNT(*) AS c, SUM(IsRefresh), AVG(ResolutionWidth) FROM hits GROUP BY WatchID, ClientIP ORDER BY c DESC LIMIT 10;`	OLAP
5	`SELECT EventTime, WatchID FROM hits WHERE CounterID = 38 AND EventDate = '2013-07-15' AND UserID = '1387668437822950552' AND WatchID = '8899477221003616239';`	OLTP
6	`SELECT Title, URL, Referer FROM hits WHERE CounterID = 38 AND EventDate = '2013-07-15' AND UserID = '1387668437822950552' AND WatchID = '8899477221003616239';`	OLTP
7	`UPDATE hits SET Title='my title', URL='my url', Referer='my referer' WHERE CounterID = 38 AND EventDate = '2013-07-15' AND UserID = '1387668437822950552' AND WatchID = '8899477221003616239';`	OLTP

Results

I'll study the results under four categories:

Dataset Load
Table Size
Read Queries Execution
Update Query Execution: I've discussed the update query (query number 7) separately since it needs more discussion and attention.

Dataset Load

ClickHouse	MySQL	Ratio
65s	11m35s	x10.7

Thanks to the LSM and sparse indexes, ClickHouse load time is much faster than MySQL, which uses BTree. However, ClickHouse inserts efficiency is observable in bulk inserts instead of many individual inserts. This behavior comes from the fact that it creates immutable parts for each insert and is unwilling to change, remove or create its data for a few rows.

Table Size

ClickHouse (GiB)	MySQL (GiB)	Ratio
1.3	6.32	x4.86

The column-oriented structure gives the ability of Data Compression, something that is not available in row-oriented databases. That is why ClickHouse can do a practical favor to the teams storing a high amount of data, reducing the storage cost.

Read Queries Execution

Query Number	ClickHouse (s)	MySQL (s)	Ratio
1	0.005	7.79	x1558
2	0.030	16.0	x533.3
3	0.193	4.35	x22.5
4	2.600	180.93	x69.58
5	0.01	0.00	x0
6	0.011	0.00	x0

ClickHouse's sparse index and column-oriented structure have outperformed MySQL in all OLAP queries (numbers 1 to 4). That's why BI and Data Analysts would be more than happy with ClickHouse for their daily reports.

However, MySQL wins the battle when it comes to OLTP queries (numbers 5 and 6). Btree (equipped by MySQL) indeed performs better for pointy queries in which you demand short transactions requiring few rows.

Update Query Execution

For the update query (number 7), we should execute a different query in ClickHouse as it doesn't support updates in a naive way, and the Alter command has to be used:

ALTER TABLE hits UPDATE JavaEnable=0 WHERE CounterID = 38 AND EventDate = '2013-07-15' AND UserID = '1387668437822950552' AND WatchID = '8899477221003616239';

Additionally, ClickHouse applies the update asynchronously. To have the result immediately, you've to perform an optimize command:

OPTIMIZE TABLE hits FINAL;

By performing query number 7 statement shown in the queries table for MySQL and the two above SQL statements for ClickHouse, we achieve the below results:

Query Number	ClickHouse (s)	MySQL (s)	Ratio
7	26	0.00	0

Again, ClickHouse mutation hatred makes it a loser for real-time updates (and similarly deletes) compared to MySQL. Consequently, other methods like deduplication using ReplacingMergeTree can be utilized to handle updates. You can find valuable resources in the below links:

Conclusion

In this post, I benchmarked MySQL and ClickHouse databases to study some of their strengths and weaknesses that may help us choose a suitable solution. To summarize:

MySQL performs better on pointy and OLTP queries.
ClickHoues performs better on OLAP queries.
ClickHouse is not designed for frequent updates and deletes. You have to handle them with deduplication methods.
ClickHouse reduces the storage cost thanks to its column-oriented structure.
ClickHouse bulk inserts load time operates far better than MySQL.

Originally published at https://dev.to on June 8, 2023.

ClickHouse Basic Tutorial: Keys & Indexes

Hamed Karbasi — Fri, 02 Jun 2023 10:00:00 GMT

Photo by Maksym Kaharlytskyi on Unsplash

In the previous parts, we saw an introduction to ClickHouse and its features. Furthermore, we learned about its different table engine families and their most usable members. In this part, I will walk through the special keys and indexes in ClickHouse, which can help reduce query latency and database load significantly.

It should be said that these concepts are only applicable to the default table engine family: Merge-Trees.

Primary Key

ClickHouse indexes are based on Sparse Indexing, an alternative to the B-Tree index utilized by traditional DBMSs. In B-tree, every row is indexed, which is suitable for locating and updating a single row, also known as pointy-queries common in OLTP tasks. This comes with the cost of poor performance on high-volume insert speed and high memory and storage consumption. On the contrary, the sparse index splits data into multiple parts, each group by a fixed portion called granules. ClickHouse considers an index for every granule (group of data) instead of every row, and that's where the sparse index term comes from. Having a query filtered on the primary keys, ClickHouse looks for those granules and loads the matched granules in parallel to the memory. That brings a notable performance on range queries common in OLAP tasks. Additionally, as data is stored in columns in multiple files, it can be compressed, resulting in much less storage consumption.

The nature of the spars-index is based on LSM trees allowing you to insert high-volume data per second. All these come with the cost of not being suitable for pointy queries, which is not the purpose of the ClickHouse.

Structure

In the below figure, we can see how ClickHouse stores data:

ClickHouse Data Store Structure

Data is split into multiple parts (ClickHouse default or user-defined partition key)
Parts are split in granules which is a logical concept, and ClickHouse doesn't split data into them as the physical. Instead, it can locate the granules via the marks. Granules' locations (start and end) are defined in the mark files with the mrk2 extension.
Index values are stored in the primary.idx file, which contains one row per granule.
Columns are stored as compressed blocks in .bin files: One file for every column in the Wide and a single file for all columns in the Compact format. Being Wide or Compact is determined by ClickHouse based on the size of the columns.

Now let's see how ClickHouse finds the matching rows using primary keys:

ClickHouse finds the matching granule marks utilizing the primary.idx file via the binary search.
Looks into the mark files to find the granules' location in the bin files.
Loads the matching granules from the bin files into the memory in parallel and looks for the matching rows in those granules using binary search.

Case Study

To clarify the flow mentioned above, let's create a table and insert data into it:

CREATE TABLE default.projects
(

    `project_id` UInt32,

    `name` String,

    `created_date` Date
)
ENGINE = MergeTree
ORDER BY (project_id, created_date)

INSERT INTO projects 
SELECT * FROM generateRandom('project_id Int32, name String, created_date Date', 10, 10, 1)
LIMIT 10000000;

First, if you don't specify primary keys separately, ClickHouse will consider sort keys (in order by) as primary keys. Hence, in this table, project_id and created_date are the primary keys. Every time you insert data into this table, it will sort data first by project_id and then by created_date.

If we look into the data structure stored on the hard drive, we face this:

Physical files stored in a part

We have five parts, and one of them is: all_1_1_0. You can visit this link if you're curious about the naming convention. As you can see, columns are stored in bin files, and we see mark files named as primary keys along with the primary.idx file.

Filter on the first primary-key

Now let's filter on project_id, which is the first primary key, and explain its indexes:

Index analysis of a query on first primary key

As you can see, the system has detected project_id as a primary key and ruled out 1224 granules out of 1225 using it!

Filter on second primary-key

What if we filter on created_date: the second PK:

EXPLAIN indexes=1
SELECT * FROM projects WHERE created_date=today()

Index analysis of a query on second primary key

The database has detected created_date as a primary key, but it hasn't been able to filter any granules. Why? Because ClickHouse uses binary search only for the first key and generic exclusive search for other keys, which is much less efficient than the former. So how can we make it more efficient?

If we substitute project_id and created_date in the sort keys while defining the table, you will achieve better results in filtering for the non-first keys since the created_date has lower cardinality (uniqueness) than the project_id:

CREATE TABLE default.projects
(

    `project_id` UInt32,

    `name` String,

    `created_date` Date
)
ENGINE = MergeTree
ORDER BY (created_date, project_id)

EXPLAIN indexes=1
SELECT * FROM projects WHERE project_id=700

Index analysis of a query on second primary key on an improved sort keys table

If we filter on the project_id, the second key, now ClickHouse, would use only 909 granules instead of the whole data.

So to summarize, always try to order the primary keys from low to high cardinality.

Order Key

I mentioned earlier that if you don't specify the PRIMARY KEY option, ClickHouse considers sort keys as the primary keys. However, if you want to set primary keys separately, it should be a subset of the sort keys. As a result, additional keys specified in the sort keys are only utilized for sorting purposes and don't play any role in indexing.

CREATE TABLE default.projects
(

    `project_id` UInt32,

    `name` String,

    `created_date` Date
)
ENGINE = MergeTree
PRIMARY KEY (created_date, project_id)
ORDER BY (created_date, project_id, name)

In this example, created_date and project_id columns are utilized in the sparse index and sorting, and name column is only used as the last item for sorting.

Use this option if you wish to use a column in the ORDER BY part of the query since it will eliminate the database sorting effort while running it.

Partition Key

A partition is a logical combination of parts in ClickHouse. It considers all parts under no specific partition by default. To find out more, look into the system.parts table for that projects table defined in the previous section:

SELECT
    name,
    partition
FROM
    system.parts
WHERE
    table = 'projects';

Parts structure in an unpartitioned table

You can see that the projects table has no particular partition. However, you can customize it using the PARTITION BY option:

CREATE TABLE default.projects_partitioned
(

    `project_id` UInt32,

    `name` String,

    `created_date` Date
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(created_date)
PRIMARY KEY (created_date, project_id)
ORDER BY (created_date, project_id, name)

In the above table, ClickHouse partitions data based on the month of the created_date column:

Parts structure in a partitioned table

Index

ClickHouse creates a min-max index for the partition key and uses it as the first filter layer in query running. Let's see what happens when we filter data by a column existent in the partition key:

EXPLAIN indexes=1
SELECT * FROM projects_partitioned WHERE created_date='2020-02-01'

Index analysis on a partitioned table

You can see that database has chosen one part out of 16 using the min-max index of the partition key.

Usage

Partitioning in ClickHouse aims to bring data manipulation capabilities to the table. For instance, you can delete or move parts belonging to partitions older than a year. It is way more efficient than an unpartitioned table since ClickHouse has split data based on the month physically on the storage. Consequently, such operations can be performed easily.

Although Clickhouse creates an additional index for the partition key, it should never be considered a query performance improvement method because it loses the performance battle to define the column in the sort keys. So if you wish to enhance the query performance, contemplate those columns in the sort keys and use a column as the partition key if you have particular plans for data manipulation based on that column.

Finally, don't get partitions in ClickHouse wrong with the same term in the distributed systems where data is split on different nodes. You should use shards and distributed tables if you're inclined to achieve such purposes.

Skip Index

You may have recognized that defining a column in the last items of the sort key cannot be helpful, mainly if you only filter on that column without the sort keys. What should you do in those cases?

Consider a dictionary you want to read. You can find words using the table of contents, sorted by the alphabet. Those items are the sort keys in the table. You can simply find a word starting with W, but how can you find pages containing words related to wars?

You can put marks or sticky notes on those pages making your effort less the next time. That's how Skip Index works. It helps the database filter granules that don't have desired values of some columns by creating additional indexes.

Case Study

Consider the projects table defined in the Order By section. created_date and project_id were defined as primary keys. Now if we filter on the name column, we'll encounter this:

EXPLAIN indexes=1
SELECT * FROM projects WHERE name='hamed'

Index analysis on a query on non-indexed column

The result was expected. Now what if we define a skip index on it?

ALTER TABLE projects ADD INDEX name_index name TYPE bloom_filter GRANULARITY 1;

The above command creates a skip index on the name column. I've used the bloom filter type because the column was a string. You can find more about the other kinds here.

This command only makes the index for the new data. Wishing to create for already inserted, you can use this:

ALTER TABLE projects MATERIALIZE INDEX name_index;

Let's see the query analysis this time:

Index analysis on a query on skip-indexed column

As you can see, the skip index greatly affected granules' rule-out and performance.

While the skip index performed efficiently in this example, it can show poor performance in other cases. It depends on the correlation of your specified column and sort keys and settings like index granularity and its type.

Conclusion

In conclusion, understanding and utilizing ClickHouse's primary keys, order keys, partition keys, and skip index is crucial for optimizing query performance and scalability. Choosing appropriate primary keys, order keys, and partitioning strategies can enhance data distribution, improve query execution speed, and prevent overloading. Additionally, leveraging the skip index feature intelligently helps minimize disk I/O and reduce query execution time. By considering these factors in your ClickHouse schema design, you can unlock the full potential of ClickHouse for efficient and performant data solutions.

Originally published at https://dev.to on June 2, 2023.

How to Schedule Database Backups with Cronjob and Upload to AWS S3

Hamed Karbasi — Tue, 23 May 2023 10:00:00 GMT

Introduction

The update procedure is a vital operation every team should consider. However, doing it manually can be exhausting. As a toil, you can automate it simply by creating a cronjob to take the backup and upload it to your desired object storage.

This article explains how to automate your database backup with a Cronjob and upload it to the Cloud S3. Postgres has been considered as the database, but you can generalize it to any other database or data type you want.

Step 1: Create configmap

To perform the backup and upload, we need a bash script. It should first login into the cloud via the oc login command. Then gets your desired database pod name. Executes the dump command, zips it, and downloads it via the rsync command. Finally, it uploads it to the AWS S3 object storage.

During running the script and before the dumping, it gets a prompt from the user as an Are you sure? which you can bypass by the ‍‍-y‍ option.

All credentials like OKD Token or Postgres Password are passed to the application as environment variables.

By putting this bash script in a config map, it can be mounted as a volume in the cronjob. Remember to replace PROJECT_HERE with your project name and customize the variables in the bash script according to your project specifications.

kind: ConfigMap
apiVersion: v1
metadata:
  name: bash-script
  namespace: PROJECT_HERE
data:
  backup.sh: >
    #!/bin/bash

    # This file provides a backup script for postgres

    # Variables: Modify your variables according to the okd projects and database secrets

    NAMESPACE=PROJECT_HERE

    S3_URL=https://s3.mycompany.com

    STATEFULSET_NAME=postgres

    BACKUP_NAME=backup-$(date "+%F")

    S3_BUCKET=databases-backup

    # Exit the script anywhere faced the error 

    set -e

    # Define the confirm option about user prompt (yes or no)

    confirm=""

    # Parse command-line options

    while getopts "y" opt; do
        case $opt in
        y)
            confirm="y"
            ;;
        \?)
            echo "Invalid option: -$OPTARG" >&2
            exit 1
            ;;
        esac
    done

    # Login to OKD

    oc login ${S3_URL} --token=${OKD_TOKEN}

    POD_NAME=$(oc get pods -n ${NAMESPACE} | grep ${STATEFULSET_NAME} | cut -d' '
    -f1)

    echo The backup of database in pod ${POD_NAME} will be dumped in ${BACKUP_NAME}
    file.

    DUMP_COMMAND='PGPASSWORD="'${POSTGRES_USER_PASSWORD}'" pg_dump -U
    '${POSTGRES_USER}' '${POSTGRES_DB}' > /bitnami/postgresql/backup/'${BACKUP_NAME}

    GZIP_COMMAND='gzip /bitnami/postgresql/backup/'${BACKUP_NAME}

    REMOVE_COMMAND='rm /bitnami/postgresql/backup/'${BACKUP_NAME}.gz

    # Prompt the user for confirmation if the -y option was not provided

    if [[ $confirm != "y" ]]; then
        read -r -p "Are you sure you want to proceed? [y/N] " response
        case "$response" in
        [yY][eE][sS] | [yY])
            confirm="y"
            ;;
        *)
            echo "Aborted"
            exit 0
            ;;
        esac
    fi

    # Dump the backup and zip it

    oc exec -n ${NAMESPACE} "${POD_NAME}" -- sh -c "${DUMP_COMMAND} && ${GZIP_COMMAND}"

    echo Transfer it to current local folder

    oc rsync -n ${NAMESPACE} ${POD_NAME}:/bitnami/postgresql/backup/ /backup-files
    &&
        oc exec -n ${NAMESPACE} "${POD_NAME}" -- sh -c "${REMOVE_COMMAND}"

    # Send backup files to AWS S3

    aws --endpoint-url "${S3_URL}" s3 sync /backup-files
    s3://${S3_BUCKET}

Step 2: Create secrets

Database, AWS, and OC credentials should be kept as secrets. First, we’ll create a secret containing the AWS CA Bundles. After downloading the bundle, you can make a secret file from it:

$ oc create secret -n PROJECT_HERE generic certs --from-file ca-bundle.crt

You should replace PROJECT_HERE with your project name.

Now let’s create another secret for other credentials. Consider that you should specify AWS_CA_BUNDLE with=/certs/ca-bundle.crt

kind: Secret
apiVersion: v1
metadata:
  name: mysecret
  namespace: PROJECT_HERE

data:
  AWS_CA_BUNDLE: 
  OKD_TOKEN: 
  POSTGRES_USER_PASSWORD: 
  POSTGRES_USER:
  POSTGRES_DB:  
  AWS_SECRET_ACCESS_KEY: 
  AWS_ACCESS_KEY_ID: 

type: Opaque

Step 3: Create cronjob

To create the cronjob, we need a docker image capable of running oc and aws commands. You can find this image and its Docker file here if you are inclined to customize it.

Now let’s create the cronjob:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: database-backup
  namespace: PROJECT_HERE 
spec:
  schedule: 0 3 * * *
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: hamedkarbasi/aws-cli-oc:1.0.0
            command: ["/bin/bash", "-c", "/backup-script/backup.sh -y"]
            envFrom:
              - secretRef:
                  name: mysecret
            volumeMounts:
              - name: script
                mountPath: /backup-script/backup.sh
                subPath: backup.sh
              - name: certs
                mountPath: /certs/ca-bundle.crt
                subPath: ca-bundle.crt
              - name: kube-dir
                mountPath: /.kube
              - name: backup-files
                mountPath: /backup-files
          volumes:
            - name: script
              configMap:
                name: backup-script
                defaultMode: 0777
            - name: certs
              secret: 
                secretName: certs
            - name: kube-dir
              emptyDir: {}
            - name: backup-files
              emptyDir: {}
          restartPolicy: Never

Again, you should replace PROJECT_HERE with your project name and the schedule parameter, with your desired run job frequency. By putting all manifests in a folder named backup, we can apply it to Kubernetes:

$ oc apply -f backup

This cronjob will be run at 3:00 AM every night, dumping the database and uploading to the AWS S3.

Conclusion

In conclusion, automating database backups to AWS S3 using cronjob can save you time and effort while ensuring your valuable data is stored securely in the cloud. Following the steps outlined in this guide, you can easily set up a backup schedule that meets your needs and upload your backups to AWS S3 for safekeeping. Remember to test your backups regularly to ensure they can be restored when needed, and keep your AWS credentials and permissions secure to prevent unauthorized access. With these best practices in mind, you can have peace of mind knowing that your database backups are automated and securely stored in the cloud.

Originally published at itnext.io.

ClickHouse Basic Tutorial: Table Engines

Hamed Karbasi — Sat, 29 Apr 2023 10:00:00 GMT

In this part, I will cover ClickHouse table engines. Like any other database, ClickHouse uses engines to determine a table's storage, replication, and concurrency methodologies. Every engine has pros and cons, and you should choose them by your need. Moreover, engines are categorized into families sharing the main features. As a practical article, I will deep dive into the most usable ones in every family and leave the others to your interest.

Now, let's start with the first and most usable family:

Merge-Tree Family

‌As it said, this is the most possible choice when you want to create a table in ClickHouse. It's based on the data structure of the Log Structured Merge-Tree. LSM trees are optimized for write-intensive workloads. They are designed to handle a large volume of writes by buffering them in memory and then periodically flushing them to disk in sorted order. This allows for faster writes of massive data and reduces the likelihood of disk fragmentation. They are considered an alternative to the B-Tree data structure which is common in traditional relational databases like MySQL.

Note: For all engines of this family, you can use Replicated as a prefix to the engine name to create a replication of the table on every ClickHouse node.

Now let's investigate common engines in this family.

MergeTree

Here is an example of a merge-tree DDL:

CREATE TABLE inventory
(
    `id` Int32,
    `status` String,
    `price` String,
    `comment` String
)
ENGINE = MergeTree
PRIMARY KEY (id, price)
ORDER BY (id, price, status)

Merge-tree tables use sparse indexing to optimize queries. Briefly, in sparse indexing, data is split into multiple parts. Every part is sorted by the order by keys (referred to as sort keys), where the first key has the highest priority in sorting. Then every part is broken down into groups called granules whose first and last items for primary keys are considered as marks. Since these marks are extracted from the sorted data, primary keys should be a subset of sort keys. Then for every query containing a filter on primary keys, ClickHouse performs a binary search on those marks to find the target granules as fast as possible. Finally, ClickHouse loads target granules in memory and searches for the matching rows.

Note: You can omit the PRIMARY KEY in DDL, and ClickHouse will consider sort keys as primary keys.

ReplaingMergeTree

DDL

In this engine, rows with equal order keys are replaced by the last row. Consider the below engine:

CREATE TABLE inventory
(
    `id` Int32,
    `status` String,
    `price` String,
    `comment` String
)
ENGINE = ReplacingMergeTree
PRIMARY KEY (id)
ORDER BY (id, status);

Suppose that you insert a row in this table:

INSERT INTO inventory VALUES (23, 'success', '1000', 'Confirmed');

Now let's insert another row with the same sort keys:

INSERT INTO inventory VALUES (23, 'success', '2000', 'Cancelled');

Now the latter row will replace the previous one. Note that if you get select rows, you may face both of them:

SELECT * from inventory WHERE id=23;

That's because Clickhouse performs the replacement process while merging the parts, which happens in the background asynchronously and not immediately. To see the final result immediately, you can use the FINAL modifier:

SELECT * from inventory FINAL WHERE id=23;

Note: You can specify a column as version while defining the table to replace rows accordingly.

Usage

Replacing Merge Tree is widely used for deduplication. As ClickHouse performs poorly in frequent updates, you can update a column by inserting a new row with the equal sort keys, and ClickHouse will remove the stalled rows in the background. Surely it's challenging to update sort keys because it won't delete the old rows in that situation. In that case, you can use Collapsing Merge Trees, explained in the next part.

CollapsingMergeTree

In this engine, you can define a sign column and ask the database to delete stall rows with sign=-1 and keep the new row with sign=1.

DDL

CREATE TABLE inventory
(
    `id` Int32,
    `status` String,
    `price` String,
    `comment` String,
    `sign` Int8
)
ENGINE = CollapsingMergeTree(sign)
PRIMARY KEY (id)
ORDER BY (id, status);

Let's insert a row in this table:

INSERT INTO inventory VALUES (23, 'success', '1000', 'Confirmed', 1);

Now to update the row:

INSERT INTO inventory VALUES (23, 'success', '1000', 'Confirmed', -1), (23, 'success', '2000', 'Cancelled', 1);

To see the results:

SELECT * FROM inventory;

To see the final results immediately:

SELECT * FROM inventory FINAL;

Usage

Collapsing Merge Trees can handle updates and deletes in a more controlled manner. For example, you can update sorts keys by inserting the same row with sign=-1 and the row with new sort keys with sign=1. There are two challenges with this engine:

Since you need to insert the old row with sign=1, you need to inquire about it by fetching from the database or another data store.
In case of inserting multiple rows accidentally or deliberately, with the sign equal to 1 or -1, you may face unwanted results. That's why you should consider all situations explained here.

AggreragatingMergeTree

Using this engine, you can materialize the aggregation of a table into another one.

DDL

Consider this inventory table. We need to have the maximum price per every item id and the sum of its number of items in another table.

CREATE TABLE inventory
 (
    `id` Int32,
    `status` String,
    `price` Int32,
    `num_items` UInt64
) ENGINE = MergeTree ORDER BY (id, status);

Now let's materialize its results into another table via AggregatingMergeTree:

CREATE MATERIALIZED VIEW agg_inventory
(
    `id` Int32,
    `max_price` AggregateFunction(max, Int32),
    `sum_items` AggregateFunction(sum, UInt64)
)
ENGINE = AggregatingMergeTree() ORDER BY (id)
AS SELECT
    id,
    maxState(price) as max_price,
    sumState(num_items) as sum_items
FROM inventory2
GROUP BY id;

Now let's insert rows into it and see the results:

INSERT INTO inventory2 VALUES (3, 100, 2), (3, 500, 4);

SELECT id, maxMerge(max_price) AS max_price, sumMerge(sum_items) AS sum_items 
FROM agg_inventory WHERE id=3 GROUP BY id;

Usage

This engine helps you reduce the response time of heavy, fixed analytics queries by calculating them in writing time. That will end up decreasing in database load in query time too.

Log Family

Lightweight engines with minimum functionality. They're the most effective when you need to quickly write many small tables (up to approximately 1 million rows) and read them later. Additionally, there are no indexes in this family. However, Log and StripeLog engines can break down data into multiple blocks to support multi-threading while reading data.

I will only look into the TinyLog engine. To check the others, you can visit this link.

TinyLog

This table is mainly used as a write-once method. i.e., you will write data once and read it as often as you want. As ClickHouse reads data in a single stream, it's better to keep the size of the table up to 1M rows.

CREATE TABLE log_location
 (
    `id` Int32,
    `long` String,
    `lat` Int32
) ENGINE = TinyLog;

Usage

You can use this engine as an intermediate state for batch operations.

Integration Family

The engines in this family are widely used to connect with other databases and brokers with the ability to fetch or insert data.

I'll cover MySQL and Kafka Engines, but you can study the others here.

MySQL Engine

With this engine, you can connect with a MySQL database through ClickHouse and read its data or insert rows.

CREATE TABLE mysql_inventory
(
    `id` Int32,
    `price` Int32
)
ENGINE = MySQL('host:port', 'database', 'table', 'user', 'password')

Kafka Engine

Using this engine, you can make a connection to a Kafka Cluster and read its data with a defined consumer group. This engine is broadly used for CDC purposes.

To learn more about this feature, read this article specifically on this topic.

Conclusion

In this article, we saw some of the most important engines of the ClickHouse database. It is clear that ClickHouse provides a wide range of engine options to suit various use-cases. The Merge Tree engine is the default engine and is suitable for most scenarios, but it can be replaced with other engines like AggregatingMergeTree, TinyLog, etc.

It's important to note that choosing the right engine for your use-case can significantly improve performance and efficiency. Therefore, it's worth taking the time to understand the strengths and limitations of each engine and select the one that best meets your needs.

Originally published at https://dev.to on April 29, 2023.

Step-by-Step Guide: Deploying Kafka Connect via Strimzi Operator on Kubernetes

Hamed Karbasi — Tue, 25 Apr 2023 10:00:00 GMT

Photo by Ryoji Iwata on Unsplash

Strimzi is almost the richest Kubernetes Kafka operator, which you can utilize to deploy Apache Kafka or its other components like Kafka Connect, Kafka Mirror, etc. This article will provide a step-by-step tutorial about deploying Kafka Connect on Kubernetes. I brought all issues I encountered during the deployment procedure and their best mitigation.

Note: Consider that this operator is based on Apache Kafka, not the Confluent Platform. That's why you may need to add some confluent artifacts like Confluent Avro Converter to get the most out of it.

This article is based on Strimzi v0.29.0. Thus you're able to install the following versions of Kafka Connect:

Strimzi: 0.29.0
Apache Kafka & Kafka Connect: Up to 3.2
Equivalent Confluent Platform: 7.2.4

Note: You can convert Confluent Platform version to Apache Kafka version and vice versa with the provided table here.

Installation

Openshift GUI and Kubernetes CLI

If you're using Openshift, navigate to Operators > installed Operators > Strimzi > Kafka Connect.

Now you will face a form containing the Kafka connect configurations. You can get the equivalent Yaml file of the form by clicking on Yaml View. Any update on the form view will be applied to the Yaml view on the fly. Although the form view is quite straightforward, It's strongly recommended not to use it for creating the instance directly. Use it only for converting your desired configuration to a Yaml file and then deploy the operator with the kubectl apply command. So to summarize:

Enter the configuration in the form view
Click on Yaml view
Copy its contents to a Yaml file on your local (e.g. kafka-connect.yaml)
Run: kubectl apply -f kafka-connect.yaml

Now the Kafka-Connect kind should be deployed or updated. The deployed resources consist of Deployment and pods, Service, config maps, and secrets.

Let's get through the minimum configuration and make it more advanced, step by step.

Minimum Configuration

To deploy a simple minimum configuration of Kafka Connect, you can use the below Yaml:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnect
metadata:
  name: my-connect-cluster
  namespace: 
spec:
  config:
    config.storage.replication.factor: -1
    config.storage.topic: okd4-connect-cluster-configs
    group.id: okd4-connect-cluster
    offset.storage.replication.factor: -1
    offset.storage.topic: okd4-connect-cluster-offsets
    status.storage.replication.factor: -1
    status.storage.topic: okd4-connect-cluster-status
  bootstrapServers: kafka1, kafka2
  version: 3.2.0
  replicas: 1

You can have the Kafka Connect Rest API on port 8083 exposed on the pod. You can expose it on a private or internal network by defining a route on OKD.

REST API Authentication

With the configuration explained here, you can add authentication to the Kafka Connect REST proxy. Unfortunately, that doesn't work on the Strimzi operator, as discussed here. So to provide security on Kafka Connect, you've two options:

Use the Kafka Connector operator API. Strimzi operator lets you have a Connector kind defined in a YAML file. However, it may not be practical for some use cases since updating, pausing, and stopping connectors via the REST API is necessary.
Put the insecure REST API behind an authenticated API Gateway like Apache APISIX or any other tool or self-developed application.

JMX Prometheus Metrics

To expose JMX Prometheus Metrics, useful for observing connectors statuses in Grafana, add the below configuration:

  metricsConfig:
    type: jmxPrometheusExporter
    valueFrom:
      configMapKeyRef:
        key: jmx-prometheus
        name: configs
  jmxOptions: {}

It uses a pre-defined config for Prometheus export. You can use this config:

startDelaySeconds: 0
ssl: false
lowercaseOutputName: false
lowercaseOutputLabelNames: false
rules:
- pattern : "kafka.connect([^:]+):"
  name: "kafka_connect_connect_worker_metrics_$1"
- pattern : "kafka.connect<>([^:]+)"
  name: "kafka_connect_connect_metrics_$2"
  labels:
    client: "$1"
- pattern: "debezium.([^:]+)]+)><>RowsScanned"
  name: "debezium_metrics_RowsScanned"
  labels:
    plugin: "$1"
    name: "$3"
    context: "$2"
    table: "$4"
- pattern: "debezium.([^:]+)]+)>([^:]+)"
  name: "debezium_metrics_$4"
  labels:
    plugin: "$1"
    name: "$3"
    context: "$2"

Service for External Prometheus

If you are intended to deploy Prometheus in companion with Strimzi to collect the metrics, follow the instructions here. However, in the case of using external Prometheus, the story goes another way:

Strimzi operator only creates port mapping in Service for these ports:

8083: Kafka Connect REST API
9999: JMX port

Sadly it doesn't create a mapping for port 9404, the Prometheus exporter HTTP port. So we've to create a service on our own:

kind: Service
apiVersion: v1
metadata:
  name: kafka-connect-jmx-prometheus
  namespace: kafka-connect
  labels:
    app.kubernetes.io/instance: kafka-connect
    app.kubernetes.io/managed-by: strimzi-cluster-operator
    app.kubernetes.io/name: kafka-connect
    app.kubernetes.io/part-of: strimzi-kafka-connect
    strimzi.io/cluster: kafka-connect
    strimzi.io/kind: KafkaConnect
spec:
  ports:
    - name: tcp-prometheus
      protocol: TCP
      port: 9404
      targetPort: 9404
  type: ClusterIP
  selector:
    strimzi.io/cluster: kafka-connect
    strimzi.io/kind: KafkaConnect
    strimzi.io/name: kafka-connect-connect
status:
  loadBalancer: {}

Note: This method only works for single-pod deployments since you should define a route for the service and even in the case of headless service, the route returns one IP of a pod at a time. Hence, Prometheus can't scrape all pods metrics. That's why it is recommended to use Podmonitor and Prometheus on Cloud. This issue is discussed here

Plugins and Artifacts

To add plugins and artifacts, there are two ways:

Operator Build Section

To add plugins, you can use the operator build section. It gets the plugin or artifact addresses, downloads them in the build stage (The operator creates the build config automatically), and adds them to the plugin directory of the image.

It supports jar, tgz, zip, and maven. However, in the case of Maven, a multi-stage Dockerfile is created, which is problematic to Openshift, and it faces failure in the build stage. Hence, you should only use other types that don't need compile stage (i.e., jar, zip, tgz) and end up with a single-stage Dockerfile.

For example, to add the Debezium MySQL plugin, you can use the below configuration:

spec:  
  build:
    output:
      image: 'kafkaconnect:1.0'
      type: imagestream
    plugins:
      - artifacts:
          - type: tgz
            url: >-
              https://repo1.maven.org/maven2/io/debezium/debezium-connector-mysql/2.1.4.Final/debezium-connector-mysql-2.1.4.Final-plugin.tar.gz
        name: debezium-connector-mysql

Note: Strimzi operator is only able to download public artifacts. So if you wish to download a privately secured artifact that is not accessible by Kubernetes, you've to give up this method and follow the next one.

Changing Image

The operator is able to use your desired image instead of its default one. Thus you can add your desired artifacts and plugins by building an image manually or via CI/CD. One of the other reasons why you may want to use this method is that Strimzi uses Apache Kafka image, not the Confluent Platform. So the deployments don't have Confluent useful packages like Confluent Avro Converter, etc. So you need to add them to your image and configure the operator to use your docker image.

For example, If you want to add your customized Debezium MySQL Connector plugin from Gitlab Generic Packages and Confluent Avro Converter to the base image, first use this Dockerfile:

ARG CONFLUENT_VERSION=7.2.4

# Install confluent avro converter
FROM confluentinc/cp-kafka-connect:${CONFLUENT_VERSION} as cp
# Reassign version
ARG CONFLUENT_VERSION
RUN confluent-hub install --no-prompt confluentinc/kafka-connect-avro-converter:${CONFLUENT_VERSION}

# Copy privious artifacts to the main strimzi kafka image
FROM quay.io/strimzi/kafka:0.29.0-kafka-3.2.0
ARG GITLAB_TOKEN
ARG CI_API_V4_URL=https://gitlab.snapp.ir/api/v4
ARG CI_PROJECT_ID=3873
ARG DEBEZIUM_CONNECTOR_MYSQL_CUSTOMIZED_VERSION=1.0
USER root:root

# Copy Confluent packages from previous stage
RUN mkdir -p /opt/kafka/plugins/avro/
COPY --from=cp /usr/share/confluent-hub-components/confluentinc-kafka-connect-avro-converter/lib /opt/kafka/plugins/avro/

# Connector plugin debezium-connector-mysql
RUN 'mkdir' '-p' '/opt/kafka/plugins/debezium-connector-mysql' \
    && curl --header "${GITLAB_TOKEN}" -f -L \
    --output /opt/kafka/plugins/debezium-connector-mysql.tgz \
    ${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/generic/debezium-customized/${DEBEZIUM_CONNECTOR_MYSQL_CUSTOMIZED_VERSION}/debezium-connector-mysql-customized.tar.gz \
    && 'tar' 'xvfz' '/opt/kafka/plugins/debezium-connector-mysql.tgz' '-C' '/opt/kafka/plugins/debezium-connector-mysql' \
    && 'rm' '-vf' '/opt/kafka/plugins/debezium-connector-mysql.tgz'

USER 1001

Build the image. Push it to the image stream or any other docker repository and configure the operator by adding the below line:

spec:  
  image: image-registry.openshift-image-registry.svc:5000/kafka-connect/kafkaconnect-customized:1.0

Kafka Authentication

Depending on its type, you need to use different configurations to add Kafka authentication. However, to bring an example, here you can see the configuration for Kafka with SASL/Plaintext mechanism and scram-sha-512:

spec:
  authentication:
    passwordSecret:
      password: kafka-password
      secretName: mysecrets
    type: scram-sha-512
    username: myuser

No need to say that you must provide the password in a secret file named mysecret.

Handling File Credentials

Since connectors need credentials to access databases, you've to define them as secrets and access them with environment variables. However, if there are too many of them, you can put all credentials in a file and address them in the connector with the $file modifier:

1- Put all credentials as the value of a key named credentials in a secret file.

Credentials file:

USERNAME_DB_1=user1
PASSWORD_DB_1=pass1

USERNAME_DB_2=user2
PASSWORD_DB_2=pass2

Secret file:

kind: Secret
apiVersion: v1
metadata:
  name: mysecrets
  namespace: kafka-connect
data:
  credentials:  YOUR DATA>

2- Configure the operator with the secret as volume:

spec:
  config:
    config.providers: file
    config.providers.file.class: org.apache.kafka.common.config.provider.FileConfigProvider  
  externalConfiguration:
    volumes:
      - name: database_credentials
        secret:
          items:
            - key: credentials
              path: credentials
          optional: false
          secretName: mysecrets

3- Now in the connector, you can access PASSWORD_DB_1 with the below command:

"${file:/opt/kafka/external-configuration/database_credentials/credentials:PASSWORD_DB_1}"

Put it all together

If we put all configurations together, we'll have the below configuration for Kafka Connect:

Service, route and build configuration are ommited since we've discussed earlier in the article.

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnect
metadata:
  name: kafka-connect
  namespace: kafka-connect
spec:
  authentication:
    passwordSecret:
      password: kafka-password
      secretName: mysecrets
    type: scram-sha-512
    username: myuser
  config:
    config.providers: file
    config.providers.file.class: org.apache.kafka.common.config.provider.FileConfigProvider
    config.storage.replication.factor: -1
    config.storage.topic: okd4-connect-cluster-configs
    group.id: okd4-connect-cluster
    offset.storage.replication.factor: -1
    offset.storage.topic: okd4-connect-cluster-offsets
    status.storage.replication.factor: -1
    status.storage.topic: okd4-connect-cluster-status
  bootstrapServers: 'kafka1:9092, kafka2:9092'
  metricsConfig:
    type: jmxPrometheusExporter
    valueFrom:
      configMapKeyRef:
        key: jmx-prometheus
        name: configs
  resources:
    limits:
      memory: 1Gi
    requests:
      memory: 1Gi
  readinessProbe:
    failureThreshold: 10
    initialDelaySeconds: 60
    periodSeconds: 20
  jmxOptions: {}
  livenessProbe:
    failureThreshold: 10
    initialDelaySeconds: 60
    periodSeconds: 20
  image: image-registry.openshift-image-registry.svc:5000/kafka-connect/kafkaconnect-customized:1.0
  version: 3.2.0
  replicas: 2
  externalConfiguration:
    volumes:
      - name: database_credentials
        secret:
          items:
            - key: credentials
              path: credentials
          optional: false
          secretName: mysecrets

Conclusion

In conclusion, deploying Kafka Connect using the Strimzi Operator can be a powerful and efficient way to manage data integration in your organization. By leveraging the flexibility and scalability of Kafka, along with the ease of use and automation provided by the Strimzi Operator, you can streamline your data pipelines and improve your data-driven decision-making. In this article, I've covered the key steps involved in deploying Kafka Connect via the Strimzi Operator, including creating its minimal custom resource definition (CRD), REST API Basic authentication issue, Kafka Authentication, JMX Prometheus metrics, plugins and artifacts and handling file credentials. Following these steps, you can easily customize your Kafka Connect deployment to meet your specific needs.

Originally published at itnext.io

ClickHouse Basic Tutorial: An Introduction

Hamed Karbasi — Thu, 13 Apr 2023 10:00:00 GMT

This is the first part of the ClickHouse Tutorial Series. In this series, I cover some practical and vital aspects of the ClickHouse database, a robust OLAP technology many enterprise companies utilize.

In this part, I'll talk about the main features, weaknesses, installation, and usage of ClickHouse. I'll also refer to some helpful links for those who want to dive into broader details.

What is ClickHouse

ClickHouse is an open-source column-oriented database developed by Yandex. It is designed to provide high performance for analytical queries. ClickHouse uses a SQL-like query language for querying data and supports different data types, including integers, strings, dates, and floats. It offers various features such as clustering, distributed query processing, and fault tolerance. It also supports replication and data sharding. ClickHouse is used by companies such as Yandex, Facebook, and Uber for data analysis, machine learning, and more.

Main Features

The main features of Clickhouse Database are:

Column-Oriented

Data in ClickHouse is stored in columns instead of rows, bringing at least two benefits:

Every column can be sorted in a separate file; hence, stronger compression happens on each column and the whole table.
In range queries common in analytical processing, the system can access and process data easier since data is sorted in some columns (i.e., columns defined as sort keys). Additionally, it can parallelize processes on multi-cores while loading massive columns.

Row-Oriented Database (Gif by ClickHouse)

Columnar Database (Gif by ClickHouse)

Note: It should not get mistaken with Wide-Column databases like Cassandra as they store data in rows but enable you to denormalize intensive data in a table with many columns leading to a No-SQL structure.

Data Compression

Thanks to compression algorithms (zstd and LZ4), data occupies much less storage, even more than 20x smaller! You can study some of the benchmarks on ClickHouse and other databases storage here.

ClickHouse columnar structure leads to storing and reading columns more efficiently (Graph by Altinity)

Scalability

ClickHouse scales well both vertically and horizontally. It can be scaled by adding extra replicas and extra shards to process queries in a distributed way. ClickHouse supports multi-master asynchronous replication and can be deployed across multiple data centers. All nodes are equal, which allows for avoiding having single points of failure.

Weaknesses

To mention some:

Lack of full-fledged UPDATE/DELETE implementation: ClickHouse is unsuited for modification and mutations. So you'll come across poor performance regarding those kinds of queries.
OLTP queries like pointy ones would not make you happy since ClickHouse is easily outperformed by traditional RDBMSs like MySQL with those queries.

Rivals and Alternatives

To name a few:

Apache Druid
ElasticSearch
SingleStore
Snowflake
TimescaleDB

Surely, each one is suitable for different use cases and has its pros and cons, but I won't discuss their comparison here. However, you can study some valuable benchmarks here and here.

Quick Start

Installation

I only cover the Docker approach here. For other methods on different distros, please follow ClicHouse's official Installation.

The docker-compose file:

version: '2'
services:
  clickhouse:
    container_name: myclickhouse
    image: clickhouse/clickhouse-server:latest
    ports:
      - "8123:8123"
      - "9000:9000"
    volumes:
      - ./clickhouse-data:/var/lib/clickhouse/  
    restart: unless-stopped

And then run it by:

docker compose up -d

As you can see, two ports have been exposed:

8123: HTTP API Port for HTTP requests, used by JDBC, ODBC, and web interfaces.
9000: Native Protocol port (ClickHouse TCP protocol). Used by ClickHouse apps and processes like clickhouse-server, clickhouse-client, and native ClickHouse tools. Used for inter-server communication for distributed queries.

It's up to your client driver to choose one of them. For example, DBeaver uses 8123, and Python ClickhHouse-Driver uses 9000.

To continue the tutorial, we use ClickHouse-Client available on the installed server:

docker exec -it myclickhouse clickhouse-client

Database and Table Creation

Create database test:

CREATE DATABASE test;

create table orders:

CREATE TABLE test.orders
(`OrderID` Int64,
`CustomerID` Int64,
`OrderDate` DateTime,
`Comments` String,
`Cancelled` Bool)
ENGINE = MergeTree
PRIMARY KEY (OrderID, OrderDate)
ORDER BY (OrderID, OrderDate, CustomerID)
SETTINGS index_granularity = 8192;

In the next parts, we'll talk about other configurations like Engine, PRIMARY KEY, ORDER BY, etc.

Insert Data

To insert sample data:

INSERT INTO test.orders 
VALUES (334, 123, '2021-09-15 14:30:00', 'some comment', 
false);

Read Data

Just like any other SQL query:

SELECT OrderID, OrderDate FROM test.orders;

Conclusion

In the first part of the ClickHouse Tutorial Series, we discussed the traits, features, and weaknesses of ClickHouse. Then we saw how to set up an instance with minimum configuration, create a database and table, insert data into it, and read from it.

Useful Links

Originally published at https://dev.to on April 13, 2023.

Apply CDC from MySQL to ClickHouse

Hamed Karbasi — Fri, 17 Feb 2023 11:00:00 GMT

Photo by Martin Adams on Unsplash

Introduction

Suppose that you have a database handling OLTP queries. To tackle intensive analytical BI reports, you set up an OLAP-friendly database such as ClickHouse. How do you synchronize your follower database (which is ClickHouse here)? What challenges should you be prepared for?

Synchronizing two or more databases in a data-intensive application is one of the usual routines you may have encountered before or are dealing with now. Thanks to Change Data Capture (CDC) and technologies such as Kafka, this process is not sophisticated anymore. However, depending on the databases you’re utilizing, it could be challenging if the source database works in the OLTP paradigm and the target in the OLAP. In this article, I will walk through this process from MySQL as the source to ClickHouse as the destination. Although I’ve limited this article to those technologies, it’s pretty generalizable to similar cases.

System Design Overview

Contrary to what it sounds, it’s quite straightforward. The database changes are captured via Debezium and published as events on Apache Kafka. ClickHouse consumes those changes in partial order by Kafka Engine. Real-time and eventually consistent.

Case Study

Imagine that we have an orders table in Mysql with the following DDL:

CREATE TABLE `orders` (
  `id` int(11) NOT NULL,
  `status` varchar(50) NOT NULL,
  `price` varchar(50) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Users may create, delete and update any column or the whole record. We want to capture its changes and sink them to ClickHouse to synchronize them.

We’re going to use Debezium v2.1 and the ReplacingMergeTree engine in ClickHouse.

Implementation

Step 1: CDC with Debezium

Most databases have a log that every operation is written there before applying on data (Write Ahead Log or WAL). In Mysql, this file is called Binlog. If you read that file, parse and apply it to your target database, you’re following the Change Data Capture (CDC) manifest.

CDC is one of the best ways to synchronize two or multiple heterogeneous databases. It’s real-time, eventually consistent, and prevents you from the other methods imposing more costs like batch-backfills with Airflow. No matter what happens on the source, you can capture it in order and be consistent with the original (eventually, of course!)

Debezium is a well-known tool for reading and parsing the Binlog. It simply integrates with Kafka Connect as a connector and produces every change on a Kafka topic.

To do so, you’ve to enable log-bin on the MySQL database and set up Kafka Connect, Kafka, and Debezium accordingly. Since it is well-explained in other articles like this or this, I’ll only focus on the Debezium configuration customized for our purpose: Capture the changes while being functional and parsable by ClickHouse.

Before showing the whole configuration, we should discuss three necessary configs:

Extracting New Record State

Debezium emits every record concluding of before and after states for every operation by default which is hard to parse by ClickHouse Kafka Table. Additionally, it creates tombstone records (i.e., a record with a Null value) in case of a delete operation (Again, unparsable by Clickhouse). The entire behavior has been demonstrated in the table below.

We use the ExtractNewRecod transformer in the Debezium configuration to handle the problem. Thanks to this option, Debezium only keeps the after state for the create/update operations and disregards the before state. But as a drawback, It drops the Delete record containing the previous state and the tombstone record mentioned earlier. In other words, you won’t capture the delete operation anymore. Don’t worry! We’ll tackle it in the next section.

"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState"

The picture below shows how the state before is dropped and after is flattened by using the ExtractNewRecord configuration.

Left: Record without ExtractNewRecord config; Right: Record with ExtractNewRecord config

Rewriting Delete Events

To capture delete operations, we must add the rewrite config as below:

"transforms.unwrap.delete.handling.mode":"rewrite"

Debezium adds field __deleted with this config*,* which is true for the delete operation and false for the others. Hence, a deletion would contain the previous state as well as a __deleted:true field.

The __deleted field is added after using the rewrite configuration

Handling Non-Primary Keys Update

Providing the mentioned configurations, updating a record (every column except the primary key) emits a simple record with the new state. Having another relational database with the same DDL is OK since the updated record replaces the previous one in the destination. But in the case of ClickHouse, the story goes wrong!

In our example, the source uses id as the primary key, and ClickHouse uses id and status as order keys. Replaces and uniqueness only guarantees for the records with the same id and status! So what happens if the source updates the status column? We end up with duplicate records implying equal ids but different statuses in ClikHouse!

Fortunately, there is a way. By default, Debezium creates a delete record and a create record for updating on primary keys. So if the source updates the id, it emits a delete record with the previous id and a create record with the new id. The previous one with the __deleted=ture field replaces our stall record in CH. Then the records implying deletion can be filtered in the view. We can extend this behavior to other columns with the below option:

"message.key.columns": "inventory.orders:id;inventory.orders:status"

Now by putting together all the above options and the usual ones, we’ll have a fully functional Debezium configuration capable of handling any change desired by ClickHouse:

{
    "name": "mysql-connector",
    "config": {
        "connector.class": "io.debezium.connector.mysql.MySqlConnector",
        "database.hostname": "mysql",
        "database.include.list": "inventory",
        "database.password": "mypassword",
        "database.port": "3306",
        "database.server.id": "2",
        "database.server.name": "dbz.inventory.v2",
        "database.user": "root",
        "message.key.columns": "inventory.orders:id;inventory.orders:status",
        "name": "mysql-connector-v2",
        "schema.history.internal.kafka.bootstrap.servers": "broker:9092",
        "schema.history.internal.kafka.topic": "dbz.inventory.history.v2",
        "snapshot.mode": "schema_only",
        "table.include.list": "inventory.orders",
        "topic.prefix": "dbz.inventory.v2",
        "transforms": "unwrap",
        "transforms.unwrap.delete.handling.mode": "rewrite",
        "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState"
  }
}

Important: How to choose the Debezium key columns?

By changing the key columns of the connector, Debezium uses those columns as the topic keys instead of the default Primary key of the source table. So different operations related to a record of the database may end up at the other partitions in Kafka. As records lose their order in different partitions, it can lead to inconsistency in Clikchouse unless you ensure that ClickHouse order keys and Debezium message keys are the same.

The rule of thumb is as below:

Design the partition key and order key based on your desired table design.
Extract the source origin of the partition and sort keys, supposing they are calculated during materialization.
Union all of those columns
Define the result of step 3 as the message.column.keys in the Debezium connector configuration.
Check if the Clickhouse sort key has all those columns. If not, add them.

Step 2: ClickHouse Tables

ClickHouse can sink Kafka records into a table by utilizing Kafka Engine. We need to define three tables: Kafka table, Consumer Materilizaed table, and Main table.

Kafka Table

Kafka table defines the record structure and Kafka topic intended to be read.

CREATE TABLE default.kafka_orders
(
    `id` Int32,
    `status` String,
    `price` String,
    `__deleted` Nullable(String)
)
ENGINE = Kafka('broker:9092', 'inventory.orders', 'clickhouse', 'AvroConfluent')
SETTINGS format_avro_schema_registry_url = 'http://schema-registry:8081'

Consumer Materializer

Every record of the Kafka Table is only read once since its consumer group bumps the offset, and we can’t read it twice. So, we need to define a main table and materialize every Kafka table record to it via the view Materializer:

CREATE MATERIALIZED VIEW default.consumer__orders TO default.stream_orders
(
    `id` Int32,
    `status` String,
    `price` String,
    `__deleted` Nullable(String)
) AS
SELECT
    id AS id,
    status AS status,
    price AS price,
    __deleted AS __deleted
FROM default.kafka_orders

Main Table

The main table has the source structure plus the __deleted field. I’m using a Replacing Merge Tree since we need to replace stall records with their deleted or updated ones.

CREATE TABLE default.stream_orders
(
    `id` Int32,
    `status` String,
    `price` String,
    `__deleted`String
)
ENGINE = ReplacingMergeTree
ORDER BY (id, price)
SETTINGS index_granularity = 8192

View Table

Finally, we need to filter every deleted record (since we don’t want to see them) and have the most recent one in case of having different records with the same sort key. This can be tackled by using the Final modifier. But to avoid using filter and final in every query, we can define a simple View to do the job implicitly:

CREATE VIEW default.orders
(
    `id` Int32,
    `status` String,
    `price` String,
    `__deleted` String
) AS
SELECT *
FROM default.stream_orders
FINAL
WHERE __deleted = 'false'

Note: It’s inefficient to use Final for every query, especially in production. You can use aggregations to see the last records or wait for ClickHouse to merge records in the background.

Conclusion

In this article, we saw how we could synchronize the ClickHouse database with MySQL via CDC and prevent duplication using a soft-delete approach.

Originally published at Medium.

A Quick Start to Load Test with k6

Hamed Karbasi — Sat, 09 Oct 2021 10:00:00 GMT

Note: To access the files of this post please visit https://github.com/exaco/k6-loadtest

Introduction

After developing an overwhelming, ready-to-sell product, every organization will face a question titled: Is it ready for production yet or not? Besides the business-market fit, UI/UX acceptance, on the technical side, we are worried about the product under load conditions. It is working and fully functional with one user. What about 10, 100, or 1000 users working simultaneously? Would it crash or not?

The above questions are typically answered by the following QA/QC and performance test concepts defined here:

Smoke Test: Targets the system’s functionality under the slightest load: Is it working with only one user?

Load Test: Targets the system under normal usage by the users. You should ask: “How many users are using the system simultaneously typically, and how long will their sessions last?:

Stress Test: What is the maximum capacity of the system? How many users with what kind of behavior should use the site to disrate its quality due to the SLO: i.e., availability, request latency, throughput, etc.

Soak Test: Would the system sustain for a long time(generally from hours to days) under normal conditions? It’s working for 15min under typical conditions in the load test but is it feasible for a much longer time?

Answering the above questions helps you have a better perspective of the capability of your product under different conditions while redirecting the company’s business team to schedule marketing and release plans better according to the system capacity.

Now that we know the load tests’ importance, the next question comes to mind titling: “What tools can we utilize?”

Load Test Tools

There are various load test tools; among them, some popular are:

k6.io: Created by Load Impact; written in GO; scriptable in Javascript
Locust: Created by Jonathan Heyman; written in Python; scriptable in Python
Jmeter: Created by Apache foundation; written in Java; scriptable(Limited) in XML.

We’ll investigate k6 here. It’s an Open source load testing tool and SaaS for engineering teams. Some of its pros are as follows:

Easy to write test scenarios, especially with a screen recorder
Capable of handling 30.000–40.000 generating up to 300,000 Requests per Second(RPS) with only one instance.
Recently acquired by GrafanaLabs, demonstrating its reputation.
Easily exportable to influx DB and visualizable in Grafana(lovely by SRE teams)

What tool should I use?

If you want to test your APIs, server backend services, k6 can be a reasonable choice. It’s easy to use and capable. But remember, it can only test the backend services, not the frontend components! If you want to test those components(client-side), too, Selenium is the better choice. Keep in mind that Selenium can only perform the functional(Smoke) test and not the load or stress tests!

Getting started with k6

To use k6 and acquire the results, follow these steps:

1. Install the requirements

Influxdb v1.8: Timeseries database to store test results
Grafana: Visualizing the results

To install the above tools, you can use the below docker-compose file:

version: '3'
services:
  influxdb:
    image: influxdb:1.8
    volumes:
      # Mount for influxdb data directory and configuration
      - ./influxdb_data:/var/lib/influxdb
    ports:
      - "8086:8086"

  grafana:
    image: grafana/grafana:8.1.4
    ports:
      - "3000:3000"

AUT: The application under test; As an example, we used a Flask app in conjunction with VueJs provided by https://github.com/testdrivenio/flask-vue-crud. The installation guide is in the repository readme.
k6: Run the below commands:

$ sudo apt-get update && sudo apt-get install ca-certificates gnupg2 -y
$ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
$ echo "deb https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
$ sudo apt-get update
$ sudo apt-get install k6

2. Write the test scenario

k6 uses a javascript file as the tests scenario. Don’t freak out if you’re not familiar or dominated with js. k6 has provided a beautiful and friendly Chrome extension. If you’re testing a web application, install the extension here.

You can learn the usage of the extension by the below gif:

Eventually, the recorded extension will redirect you to the below page.

You may choose the k6 native test builder, which provides a graphical request editor capable of launching testing online. But if you want to run k6 on your own, choose the script editor. Copy the js file into another file called test.js.

Remember that the current file is not perfect for the test, and you’ve to check all of the requests to make sure of the being normal and generalization of the scenario. Furthermore, the k6 screen recorder only catches the requests from the browser and cannot fetch your application frontend logic. For example, you may use xsrf-tokens or pass a last request parameter to the next one. Like our AUT example, which uses book ids from the Get request, we’ve to edit the put and delete requests to support the Ids. So, always check and edit the test files according to your requirements.

3. Run the test

After editing the Test file, you can run it via:

$ k6 run --vus=2 --duration=3m test.js

vus is the number of simultaneous virtual users, and duration is the test duration.

Otherwise, you can determine it in the test file with the option variable:

export let options = {
    stages: [
    { duration: '2m', target: 30 }, // simulate ramp-up of traffic from 1 to 30 users over 2 minutes.
    { duration: '2m', target: 30 }, // stay at 30 users for 3 minutes
    { duration: '2m', target: 0 }, // ramp-down to 0 users
];
};

4. Test Results

k6 will give you a summary of the results. This summary includes check statuses, number iterations, and http_req_duration stats which is the most important because it shows TTFB(Time to First Byte).

k6 lets you store the results with more details in the Influxdb and visualize preferable insights via Grafana. To do that, add the — outoption with Influxdb address just like below:

$ k6 run --out influxdb=http://localhost:8086/myk6db test.js

Now you can jump into Grafana and see the results. But before that, you need a pre-created dashboard. You can import the Grafana dashboard provided here.

Run a smoke test

In this smoke test, only one user is using the system for one minute. To launch it, add test_mode=smoke environment variable.

$ k6 run -e test_mode=smoke --out influxdb=http://localhost:8086/myk6db test.js

It’s evident that all of the requests are executed under 10ms and passed 100%.

Run a stress test

Now we run a stress test on the system to see its capability to handle 200 users simultaneously.

The system is facing severe issues. The latencies have increased from milliseconds to 9.3s. Some endpoints were unable to finish the request. This system may be considered a failure under some SLOs. Of course, it’s not a surprise! We’ve used its Flask in development mode(not production with several workers), increasing its capacity much higher.

Originally published at Medium.