Harnessing Ceph S3 with Kubernetes: An Operator Solution

Photo by Sixteen Miles Out on Unsplash
Introduction
Ceph & Rados Gateway
Ceph is an open-source, distributed storage system designed for scalability and reliability, providing unified solutions for block, file, and object storage under a single platform. At its core lies the Reliable Autonomic Distributed Object Store (RADOS), which ensures data is stored fault-tolerant across multiple storage devices.
Ceph RGW (Rados Gateway) is an integral part of Ceph that offers object storage accessible via RESTful APIs, compatible with Amazon S3 and OpenStack Swift protocols. By leveraging Ceph's distributed architecture, RGW provides a scalable, high-performance platform for storing and retrieving unstructured data such as images, videos, and backups. It is a crucial component for applications demanding highly available and scalable object storage solutions.
OK! What is the problem?
Nowadays, many private cloud providers utilize Ceph and RGW to deliver S3-compatible storage for their users. However, when it comes to the production usage under many users' demand, you (as the cloud administrator) end up with many requests inquiring about S3 user, bucket, and quota management, which can be overwhelming:
Create or delete my user on S3
Increase or decrease my user quota
Create or delete a bucket
Create sub-users and give fine-grained access to my bucket
Nevertheless, we propose a Kubernetes-native solution to enable users to handle such inquiries independently, with less burden on administrators and more capability with users.
Kubernetes Operators
Kubernetes Operators automate application management in Kubernetes environments. They utilize the Operator SDK for development, simplifying, creating, and managing these operators by encapsulating operational knowledge into code. This approach efficiently manages complex applications, significantly enhancing automation and operational efficiency. By extending Kubernetes' capabilities with custom resources, Operators offer a powerful method for automating deployment, scaling, and management tasks, streamlining the lifecycle of distributed systems.
Enough! What is your solution?
We have developed the Ceph S3 Operator, an open-source Kubernetes operator crafted to streamline the management of S3 users and buckets within a Ceph cluster environment. After installation, users can provision their S3User or S3Bucket with the Kubernetes objects, enabling them to utilize GitOps solutions like ArgoCD to manage S3 users or buckets.
Features
S3 User Management
Bucket Management
Subuser Support
Bucket policy Support
Quota Management
Webhook Integration
E2E Testing
Helm Chart and OLM Support
Ceph S3 Operator vs. COSI
When managing users and buckets via Kubernetes Custom Resource Definitions (CRDs) on Ceph storage with an S3-compatible API, you may come across the Container Object Storage Interface (COSI) as an alternative solution. This raises the question: Why did we choose to develop a new operator from scratch instead of utilizing the COSI standard or its implementations? Although COSI offers similar features, it has some significant limitations. COSI currently lacks support for essential features such as quota validation and bucket access policies, which are crucial for many teams to manage and control their storage resources effectively. By developing a dedicated operator, we aim to provide a more flexible and feature-rich solution that caters to a wider range of organizational needs.
Prerequisites
Kubernetes v1.23.0+
Ceph v14.2.10+: prior Ceph versions don't support the sub-user bucket policy. Nevertheless, other features are expected to work correctly within those earlier releases.
ClusterResourceQuota CRD (already installed in OpenShift clusters):
kubectl apply -f config/external-crd
Installation
Using Operator Life Cycle Manager (OLM):
If you have already installed OLM on your Kubernetes cluster, you can install the operator from OperatorHub:
- Create a secret containing your Ceph Cluster credentials:
apiVersion: v1
kind: Secret
metadata:
name: s3-operator-controller-manager-config-override
namespace: operators
stringData:
config.yaml: |
s3UserClass: ceph-default
clusterName: production
validationWebhookTimeoutSeconds: 10
rgw:
endpoint: <CEPH_HTTP_URL>
accessKey: <S3_ADMIN_ACCESS_KEY>
secretKey: <S3_ADMIN_SECRET_KEY>
type: Opaque
Remember to substitute the Ceph endpoint, the admin access and secret keys accordingly in the above secret object.
2. Create the Subscription:
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: ceph-s3-operator-subscription
namespace: operators
spec:
channel: stable
name: ceph-s3-operator
source: operatorhubio-catalog
sourceNamespace: olm
config:
volumes:
- name: config
secret:
items:
- key: config.yaml
path: config.yaml
secretName: s3-operator-controller-manager-config-override
volumeMounts:
- mountPath: /s3-operator/config/
name: config
3. Create the Operator Group
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: global-operators
namespace: operators
Using Helm:
- Create a
custom-values.yamlfile and specify your Ceph cluster configuration. Of course, you can set other operator configurations like its resources according to the chart[values.yaml](https://github.com/snapp-incubator/ceph-s3-operator/blob/main/charts/ceph-s3-operator/values.yaml).
# custom-values.yaml
controllerManagerConfig:
configYaml: |
s3UserClass: ceph-default
clusterName: production
validationWebhookTimeoutSeconds: 10
rgw:
endpoint: <CEPH_HTTP_URL>
accessKey: <S3_ADMIN_ACCESS_KEY>
secretKey: <S3_ADMIN_SECRET_KEY>
2. Install the helm chart:
helm upgrade --install ceph-s3-operator \
oci://ghcr.io/snapp-incubator/ceph-s3-operator/helm-charts/ceph-s3-operator \
--version v0.3.7 \
--values custom-values.yaml
Whether installed via OLM or helm, supposing everything goes well, you can see the controller pod running successfully:
kubectl get pods -n operators

Usage
S3UserClaim
- Since the operator assesses S3 quotas concerning the cluster's resource quota and namespace resource quota, we need to create CRQ and RQ objects along with the hard quotas consisting of the below parameters:
s3/objects: Maximum number of objects the user can store
s3/size: Maximum size of the storage the user can store in bytes
s3/buckets: Maximum number of buckets the user can store
apiVersion: quota.openshift.io/v1
kind: ClusterResourceQuota
metadata:
name: myteam
spec:
quota:
hard:
s3/objects: 5k
s3/size: 22k
s3/buckets: 5k
selector:
labels:
matchLabels:
snappcloud.io/team: myteam
---
apiVersion: v1
kind: ResourceQuota
metadata:
name: example-quota
namespace: s3-test
spec:
hard:
s3/objects: 1000
s3/size: 20000
s3/buckets: 15
2. Now, we need to add thesnappcloud.io/teamlabel specified in the previous step to the namespace where we plan to deploy the operator objects (s3-test here):
kubectl label namespace s3-test snappcloud.io/team=myteam
3. Create an S3UserClaim object. With the specified quota, this creates a user, a read-only sub-user, and two other sub-users on Ceph RGW.
apiVersion: s3.snappcloud.io/v1alpha1
kind: S3UserClaim
metadata:
name: s3userclaim-sample
namespace: s3-test
spec:
adminSecret: s3-sample-admin-secret
quota:
maxBuckets: 5
maxObjects: 1000
maxSize: 1000
readonlySecret: s3-sample-readonly-secret
s3UserClass: ceph-default
subusers:
- subuser1
- subuser2

s3UserClaim
The user and sub-user credentials are kept in the secrets:

User and sub-user credentials
With the access and secret keys in the above secrets, we would have the following:
s3-sample-admin-secret: Credentials for the main user capable of creating a bucket under its tenant with full access.s3-sample-readonly-secret: Credentials for a read-only sub-user which can only read the buckets and objects in the tenants3userclaim-sample-subuserx: Credentials for the subuser1 that can only list the buckets in the tenant and doesn't have any further access unless we promote its access in the S3Bucket object (we will see it in the next section)
Note: Any user provision is handled via the S3UserClaim. We have a S3User which is created by the S3 User Claim instance automatically and is not applicable for the operator user. With the claim, we’ve followed the similar concept in the persistent volume claim (PVC) and persistent volume (PV).

S3User
If you try to create or edit the S3UserClaim with a quota exceeding any of the ClusterResourceQuota or ResourceQuota, the change will be prohibited by the operator webhook. For example, if I try to increase the s3userclaim-samplemaxObjects to 10K, which is greater than the corresponding values in CRQ and RQ (5K and 1K), I will face the bellow error:

The operator avoids the change due to the quota excess
S3Bucket
Now, let's create an S3Bucket whose owner is the S3User we created in the previous step:
apiVersion: s3.snappcloud.io/v1alpha1
kind: S3Bucket
metadata:
name: s3bucket-sample
namespace: s3-test
spec:
s3DeletionPolicy: delete
s3UserRef: s3userclaim-sample
The s3DeletionPolicy is on delete by default, which will delete the corresponding bucket on storage if the S3Bucket object is deleted. However, setting it to retain, keeps the storage bucket even if the S3Bucket is deleted (for unintentional deletion).
As expressed before, sub-users don't have either write or read access to the bucket by default. However, you can gain their access by specifying the s3SubuserBinding field:
apiVersion: s3.snappcloud.io/v1alpha1
kind: S3Bucket
metadata:
name: s3bucket-sample
namespace: s3-test
spec:
s3DeletionPolicy: delete
s3SubuserBinding:
- access: write
name: subuser1
- access: read
name: subuser2
s3UserRef: s3userclaim-sample
Now, subuser1 has the read, and subuser2 has the read/write access to the s3bucket-sample. You can change them at any moment and re-apply the Yaml file.
Conclusion
In conclusion, the Ceph S3 Operator presents a Kubernetes solution that simplifies the management and deployment of Ceph S3 storage in cloud-native environments. Automating complex processes and ensuring seamless integration with existing Kubernetes infrastructure enhances operational efficiency and paves the way for more resilient and scalable storage solutions. Embracing this operator signifies a step forward in optimizing storage strategies, unlocking new possibilities for developers and organizations keen on leveraging the best of both Ceph S3 and Kubernetes.
Originally published at itnext.io.




