Harnessing Ceph S3 with Kubernetes: An Operator Solution

Introduction

Ceph & Rados Gateway

Ceph is an open-source, distributed storage system designed for scalability and reliability, providing unified solutions for block, file, and object storage under a single platform. At its core lies the Reliable Autonomic Distributed Object Store (RADOS), which ensures data is stored fault-tolerant across multiple storage devices.

Ceph RGW (Rados Gateway) is an integral part of Ceph that offers object storage accessible via RESTful APIs, compatible with Amazon S3 and OpenStack Swift protocols. By leveraging Ceph's distributed architecture, RGW provides a scalable, high-performance platform for storing and retrieving unstructured data such as images, videos, and backups. It is a crucial component for applications demanding highly available and scalable object storage solutions.

OK! What is the problem?

Nowadays, many private cloud providers utilize Ceph and RGW to deliver S3-compatible storage for their users. However, when it comes to the production usage under many users' demand, you (as the cloud administrator) end up with many requests inquiring about S3 user, bucket, and quota management, which can be overwhelming:

Create or delete my user on S3
Increase or decrease my user quota
Create or delete a bucket
Create sub-users and give fine-grained access to my bucket

Nevertheless, we propose a Kubernetes-native solution to enable users to handle such inquiries independently, with less burden on administrators and more capability with users.

Kubernetes Operators

Kubernetes Operators automate application management in Kubernetes environments. They utilize the Operator SDK for development, simplifying, creating, and managing these operators by encapsulating operational knowledge into code. This approach efficiently manages complex applications, significantly enhancing automation and operational efficiency. By extending Kubernetes' capabilities with custom resources, Operators offer a powerful method for automating deployment, scaling, and management tasks, streamlining the lifecycle of distributed systems.

Enough! What is your solution?

We have developed the Ceph S3 Operator, an open-source Kubernetes operator crafted to streamline the management of S3 users and buckets within a Ceph cluster environment. After installation, users can provision their S3User or S3Bucket with the Kubernetes objects, enabling them to utilize GitOps solutions like ArgoCD to manage S3 users or buckets.

Features

S3 User Management
Bucket Management
Subuser Support
Bucket policy Support
Quota Management
Webhook Integration
E2E Testing
Helm Chart and OLM Support

Ceph S3 Operator vs. COSI

When managing users and buckets via Kubernetes Custom Resource Definitions (CRDs) on Ceph storage with an S3-compatible API, you may come across the Container Object Storage Interface (COSI) as an alternative solution. This raises the question: Why did we choose to develop a new operator from scratch instead of utilizing the COSI standard or its implementations? Although COSI offers similar features, it has some significant limitations. COSI currently lacks support for essential features such as quota validation and bucket access policies, which are crucial for many teams to manage and control their storage resources effectively. By developing a dedicated operator, we aim to provide a more flexible and feature-rich solution that caters to a wider range of organizational needs.

Prerequisites

Kubernetes v1.23.0+
Ceph v14.2.10+: prior Ceph versions don't support the sub-user bucket policy. Nevertheless, other features are expected to work correctly within those earlier releases.
ClusterResourceQuota CRD (already installed in OpenShift clusters): kubectl apply -f config/external-crd

Installation

Using Operator Life Cycle Manager (OLM):

If you have already installed OLM on your Kubernetes cluster, you can install the operator from OperatorHub:

Create a secret containing your Ceph Cluster credentials:

apiVersion: v1
kind: Secret
metadata:
  name: s3-operator-controller-manager-config-override
  namespace: operators

stringData:
  config.yaml: |
    s3UserClass: ceph-default
    clusterName: production
    validationWebhookTimeoutSeconds: 10
    rgw:
      endpoint: <CEPH_HTTP_URL>
      accessKey: <S3_ADMIN_ACCESS_KEY>
      secretKey: <S3_ADMIN_SECRET_KEY>

type: Opaque

Remember to substitute the Ceph endpoint, the admin access and secret keys accordingly in the above secret object.

2. Create the Subscription:

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: ceph-s3-operator-subscription
  namespace: operators
spec:
  channel: stable
  name: ceph-s3-operator
  source: operatorhubio-catalog
  sourceNamespace: olm
  config:
    volumes:
    - name: config
      secret:
        items:
        - key: config.yaml
          path: config.yaml
        secretName: s3-operator-controller-manager-config-override
    volumeMounts:
    - mountPath: /s3-operator/config/
      name: config

3. Create the Operator Group

apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: global-operators
  namespace: operators

Using Helm:

Create a custom-values.yaml file and specify your Ceph cluster configuration. Of course, you can set other operator configurations like its resources according to the chart [values.yaml](https://github.com/snapp-incubator/ceph-s3-operator/blob/main/charts/ceph-s3-operator/values.yaml).

# custom-values.yaml
controllerManagerConfig:
  configYaml: |
    s3UserClass: ceph-default
    clusterName: production
    validationWebhookTimeoutSeconds: 10
    rgw:
      endpoint: <CEPH_HTTP_URL>
      accessKey: <S3_ADMIN_ACCESS_KEY>
      secretKey: <S3_ADMIN_SECRET_KEY>

2. Install the helm chart:

helm upgrade --install ceph-s3-operator \
oci://ghcr.io/snapp-incubator/ceph-s3-operator/helm-charts/ceph-s3-operator \
--version v0.3.7 \
--values custom-values.yaml

Whether installed via OLM or helm, supposing everything goes well, you can see the controller pod running successfully:

kubectl get pods -n operators

Usage

S3UserClaim

Since the operator assesses S3 quotas concerning the cluster's resource quota and namespace resource quota, we need to create CRQ and RQ objects along with the hard quotas consisting of the below parameters:

s3/objects: Maximum number of objects the user can store
s3/size: Maximum size of the storage the user can store in bytes
s3/buckets: Maximum number of buckets the user can store

apiVersion: quota.openshift.io/v1
kind: ClusterResourceQuota
metadata:
  name: myteam
spec:
  quota:
    hard:
      s3/objects: 5k
      s3/size: 22k
      s3/buckets: 5k
  selector:
    labels:
      matchLabels:
        snappcloud.io/team: myteam
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: example-quota
  namespace: s3-test
spec:
  hard:
    s3/objects: 1000
    s3/size: 20000
    s3/buckets: 15

2. Now, we need to add thesnappcloud.io/teamlabel specified in the previous step to the namespace where we plan to deploy the operator objects (s3-test here):

kubectl label namespace s3-test snappcloud.io/team=myteam

3. Create an S3UserClaim object. With the specified quota, this creates a user, a read-only sub-user, and two other sub-users on Ceph RGW.

apiVersion: s3.snappcloud.io/v1alpha1
kind: S3UserClaim
metadata:
  name: s3userclaim-sample
  namespace: s3-test
spec:
  adminSecret: s3-sample-admin-secret
  quota:
    maxBuckets: 5
    maxObjects: 1000
    maxSize: 1000
  readonlySecret: s3-sample-readonly-secret
  s3UserClass: ceph-default
  subusers:
    - subuser1
    - subuser2

s3UserClaim

The user and sub-user credentials are kept in the secrets:

User and sub-user credentials

With the access and secret keys in the above secrets, we would have the following:

s3-sample-admin-secret: Credentials for the main user capable of creating a bucket under its tenant with full access.
s3-sample-readonly-secret: Credentials for a read-only sub-user which can only read the buckets and objects in the tenant
s3userclaim-sample-subuserx: Credentials for the subuser1 that can only list the buckets in the tenant and doesn't have any further access unless we promote its access in the S3Bucket object (we will see it in the next section)

Note: Any user provision is handled via the S3UserClaim. We have a S3User which is created by the S3 User Claim instance automatically and is not applicable for the operator user. With the claim, we’ve followed the similar concept in the persistent volume claim (PVC) and persistent volume (PV).

S3User

If you try to create or edit the S3UserClaim with a quota exceeding any of the ClusterResourceQuota or ResourceQuota, the change will be prohibited by the operator webhook. For example, if I try to increase the s3userclaim-samplemaxObjects to 10K, which is greater than the corresponding values in CRQ and RQ (5K and 1K), I will face the bellow error:

The operator avoids the change due to the quota excess

S3Bucket

Now, let's create an S3Bucket whose owner is the S3User we created in the previous step:

apiVersion: s3.snappcloud.io/v1alpha1
kind: S3Bucket
metadata:
  name: s3bucket-sample
  namespace: s3-test
spec:
  s3DeletionPolicy: delete
  s3UserRef: s3userclaim-sample

The s3DeletionPolicy is on delete by default, which will delete the corresponding bucket on storage if the S3Bucket object is deleted. However, setting it to retain, keeps the storage bucket even if the S3Bucket is deleted (for unintentional deletion).

As expressed before, sub-users don't have either write or read access to the bucket by default. However, you can gain their access by specifying the s3SubuserBinding field:

apiVersion: s3.snappcloud.io/v1alpha1
kind: S3Bucket
metadata:
  name: s3bucket-sample
  namespace: s3-test
spec:
  s3DeletionPolicy: delete
  s3SubuserBinding:
    - access: write
      name: subuser1
    - access: read
      name: subuser2
  s3UserRef: s3userclaim-sample

Now, subuser1 has the read, and subuser2 has the read/write access to the s3bucket-sample. You can change them at any moment and re-apply the Yaml file.

Conclusion

In conclusion, the Ceph S3 Operator presents a Kubernetes solution that simplifies the management and deployment of Ceph S3 storage in cloud-native environments. Automating complex processes and ensuring seamless integration with existing Kubernetes infrastructure enhances operational efficiency and paves the way for more resilient and scalable storage solutions. Embracing this operator signifies a step forward in optimizing storage strategies, unlocking new possibilities for developers and organizations keen on leveraging the best of both Ceph S3 and Kubernetes.

Originally published at itnext.io.

Harnessing Ceph S3 with Kubernetes: An Operator Solution

Introduction

Ceph & Rados Gateway

OK! What is the problem?

Kubernetes Operators

Enough! What is your solution?

Features

Ceph S3 Operator vs. COSI

Prerequisites

Installation

Using Operator Life Cycle Manager (OLM):

Using Helm:

Usage

S3UserClaim

S3Bucket

Conclusion

Comments

More from this blog

Visualizing Avro Kafka Data in Grafana: Streaming Real-Time Schemas

Visualizing Kafka Data in Grafana: Consuming Real-Time Messages for Dashboards

Securing Kafka: Demystifying SASL, SSL, and Authentication Essentials

ClickHouse Advanced Tutorial: Performance Comparison with MySQL

Command Palette

Introduction

Ceph & Rados Gateway

OK! What is the problem?

Kubernetes Operators

Enough! What is your solution?

Features

Ceph S3 Operator vs. COSI

Prerequisites

Installation

Using Operator Life Cycle Manager (OLM):

Using Helm:

Usage

S3UserClaim

S3Bucket

Conclusion

Comments

More from this blog