HPE Developer Community Portal

Using Raw Block and Ephemeral Inline Volumes on Kubernetes

Michael Mattsson

With the release of the HPE CSI Driver for Kubernetes 1.2.0 quite a few new Container Storage Interface (CSI) concepts were introduced as fully supported features. As always, new capabilities introduce new YAML stanzas that needs to be understood to take full advantage of these capabilities. In this blog post, we’ll explore how to expose raw block volumes and the different ways to declare an ephemeral inline volume for Kubernetes Pods.

Many CSI drivers support these capabilities. For the examples below, we’ll use a recent version of the HPE CSI Driver with the default StorageClass installed on Kubernetes 1.18. Do note that ephemeral inline volumes are still considered beta in Kubernetes.

Raw block volumes

Kubernetes supports running a diverse set of applications with various needs when it comes to infrastructure requirements, such as compute, networking and storage. Historically, a “volume” on Kubernetes translates to a POSIX-like filesystem to store persistent data at a given path inside a Pod. With the introduction of raw block volumes, there’s now a way to present the underlying block device that the filesystem normally is created on. This is beneficial for applications that are capable of addressing the device directly to store data. It effectively removes the double-buffering effects that filesystems introduces along with the POSIX semantics and filesystem internals. Applications that truly can take advantage of raw block volumes on Kubernetes are few and far between.

The concept of presenting a raw block volume to a Pod on Kubernetes is very similar to how Raw Device Mappings (RDMs) are presented on VMware vSphere, where a virtual machine get unfettered direct access to a LUN on a storage fabric exposed to the VMware ESX host.

Let’s compare the Kubernetes minutia needed to declare a regular volume versus a raw block volume.

apiVersion: v1 
kind: PersistentVolumeClaim 
metadata: 
  name: block-device 
spec: 
  accessModes: 
  - ReadWriteOnce 
  resources: 
    requests: 
      storage: 32Gi 
  volumeMode: Block 

This is a conventional Persistent Volume Claim (PVC). The only thing that stands out is the .spec.volumeMode. By default, volumeMode is set to Filesystem and is usually never called out explicitly. Setting the volumeMode attribute to Block will change this, presenting the device itself, once it is exposed to a Pod.

To be able to address the block device, there’s additional details that needs to be declared in the Pod specification. Let’s bring up a Pod as an example:

--- 
apiVersion: v1 
kind: Pod 
metadata: 
  name: ioping 
spec: 
  containers: 
  - name: ioping 
    image: hpestorage/ioping 
    command: [ "ioping" ] 
    args: [ "/dev/xvda" ] 
    volumeDevices: 
    - name: raw 
      devicePath: /dev/xvda 
  volumes: 
  - name: raw 
    persistentVolumeClaim: 
      claimName: block-device 

The .spec.volumes stanza is exactly the same as it would be for using a filesystem. It’s the .spec.containers.volumeDevices and .spec.containers.volumeDevices.devicePath that just got introduced. Creating the above PVC and Pod would result in the following log output:

$ kubectl logs -f pod/ioping 
4 KiB <<< /dev/xvda (block device 32 GiB): request=1 time=1.10 ms (warmup) 
4 KiB <<< /dev/xvda (block device 32 GiB): request=2 time=1.01 ms 
4 KiB <<< /dev/xvda (block device 32 GiB): request=3 time=862.1 us 
4 KiB <<< /dev/xvda (block device 32 GiB): request=4 time=1.11 ms 
4 KiB <<< /dev/xvda (block device 32 GiB): request=5 time=895.1 us 
4 KiB <<< /dev/xvda (block device 32 GiB): request=6 time=1.11 ms 
4 KiB <<< /dev/xvda (block device 32 GiB): request=7 time=976.4 us 
4 KiB <<< /dev/xvda (block device 32 GiB): request=8 time=853.5 us (fast) 
4 KiB <<< /dev/xvda (block device 32 GiB): request=9 time=912.7 us 
^C 

It’s evident that we are indeed accessing a raw block device from inside the Pod.

Real world example for raw block volumes: Rook

Rook is a Cloud Native Computing Foundation (CNCF) incubator project (a graduation proposal in the works at this time) to provide open source cloud-native storage for Kubernetes. Rook provides object, file and block storage to Kubernetes using CEPH. Rook is complementary to the HPE CSI Driver, which only provide block (Nimble, Primera, 3PAR) versus Rook giving the option to deploy a distributed filesystem on Kubernetes backed by Enterprise storage to present additional data access protocols.

Let’s assume we have deployed the Rook Operator on the Kubernetes cluster. Creating a new CephCluster is done as follows:

--- 
apiVersion: ceph.rook.io/v1 
kind: CephCluster 
metadata: 
  name: rook-ceph 
  namespace: rook-ceph 
spec: 
  cephVersion: 
    image: ceph/ceph:v14.2.9 
  dataDirHostPath: /var/lib/rook 
  mon: 
    count: 3 
    volumeClaimTemplate: 
      spec: 
        storageClassName: hpe-standard 
        resources: 
          requests: 
            storage: 10Gi 
  storage: 
   storageClassDeviceSets: 
    - name: set1 
      count: 3 
      portable: true 
      tuneSlowDeviceClass: false 
      volumeClaimTemplates: 
      - metadata: 
          name: data 
        spec: 
          resources: 
            requests: 
              storage: 32Gi 
          storageClassName: hpe-standard 
          volumeMode: Block 
          accessModes: 
            - ReadWriteOnce 

Pay attention to the volumeMode: Block attribute in the specification. We can further inspect the PVC created by the StatefulSet that has been declared:

kubectl get pvc -n rook-ceph -l ceph.rook.io/DeviceSetPVCId=set1-data-0 -o json | json items.0.spec 
{ 
  "accessModes": [ 
    "ReadWriteOnce" 
  ], 
  "resources": { 
    "requests": { 
      "storage": "32Gi" 
    } 
  }, 
  "storageClassName": "hpe-standard", 
  "volumeMode": "Block", 
  "volumeName": "pvc-26e4e5d5-0e08-46c6-9a2e-679e0bde6264" 
}

It’s now possible to use the CEPH cluster to create filesystems and object stores.

Note! Use Rook at your own risk. This is an example, not an endorsement.

Ephemeral inline volumes

The term ephemeral inline volume is quite a mouthful for what it is – a temporary placement of data which you don’t really care about long-term, most commonly talked about as a "scratch disk". However, this is a very important construct for data intensive applications where Kubernetes administrators now have the ability to dictate placement of IO intensive applications that require temporary storage. Up until the introduction of ephemeral inline volumes, applications have simply used the container runtime provided union filesystem inside the container for scratch space or used other shared mechanisms like hostPath or emptyDir. Sharing resources on the host has its challenges. First off, the Kubernetes admin has no means to put any sort of boundaries in place for an individual container. That, in turn, could lead to potentially having a single container consume the entire host filesystem and starve other containers on the host for resources.

The term “inline” means the volume declaration resides inside the Pod specification. Each Pod, regardless of replica count, will be given a dedicated ReadWriteOnce volume as per the declaration. Let’s see what it looks like.

--- 
apiVersion: v1 
kind: Pod 
metadata: 
  name: my-pod-inline-mount-2 
spec: 
  containers: 
    - name: pod-datelog-1 
      image: nginx 
      command: ["bin/sh"] 
      args: ["-c", "while true; do date >> /data/mydata.txt; sleep 1; done"] 
      volumeMounts: 
        - name: my-volume-1 
          mountPath: /data 
  volumes: 
    - name: my-volume-1 
      csi: 
       driver: csi.hpe.com 
       fsType: ext3 
       volumeAttributes: 
         csi.storage.k8s.io/ephemeral: "true" 
         inline-volume-secret-name: nimble-secret 
         inline-volume-secret-namespace: kube-system 
         accessProtocol: "iscsi" 
         size: "7Gi" 

The interesting part here is the .spec.volumes.csi stanza. This is the bare minimum amount of parameters required to provision an inline volume. Any additional parameters supported by the Container Storage Provider (CSP) may be used here. Note that there’s no StorageClass at play here. All parameters, including the Secret needs to be part of the declaration. This is where a word of caution is warranted. Handing out the Secret to a user is the same as handing over credentials to the CSP backend!

CSI ephemeral inline volumes provide a means for the Kubernetes admin to make cluster users aware of how and where temporary storage resources may be provisioned. This is not a particularly good idea with the HPE Nimble Storage CSP at this time as there’s no mechanism to create the necessary separation. With the HPE 3PAR/Primera CSP, it’s possible to create a separate Virtual Domain for inline volumes and the user is essentially a tenant on the backend storage array.

Next steps

It’s always exciting to talk about new features and capabilities. Take the new CSI driver for a spin and let us know what you think. We hang out on the HPE DEV Slack community. Sign up on slack.hpedev.io if you’re an external HPE user or login directly at hpedev.slack.com if you’re an HPE employee.