
Working with Benk: A storage provisioning and IO performance benchmark suite for Kubernetes
January 12, 2024Recently Hewlett Packard Enterprise (HPE) published an open source benchmark suite for storage drivers capable of dynamically provisioning persistent volumes to Kubernetes. The suite is called Benk and it is an acronym that plays with the word bench as in benchmark and Kubernetes. Benk is used internally at HPE for mapping performance metrics around the provisioning process itself as well as for IO performance. It’s still a bit rough around the edges and not feature complete, but it’s still a very useful tool to capture performance across large swaths of configurations with a high degree of automation and repeatability without too much user attendance.
A few of the features include:
- Highly customizable rendering of Kubernetes resources through kustomize templates
- A simple Kubernetes batch job that manage all the provisioning, decommissioning and benchmarking
- A single configuration file per job that abstracts Kubernetes constructs and IO parameters
- Use of industry standard Flexible I/O tester (FIO) for filesystem benchmarking
- Easy-to-build-your-own output templates using Jinja2 with the rich metrics in JSON format
Let’s walk through a practical example of configuring, running and reporting a benchmark with Benk.
Prerequisites
At the time of writing, parts of Benk can only be run on Mac and Linux due to a dependency on Bash. It will of course run in WSL on Windows. Besides cloning the GitHub repository with git
, a recent version of kubectl
and Python 3.x needs to be installed.
Some pro-efficiency working with scripts, JSON and Kubernetes is helpful to better understand the workflows.
Hello Benk!
The synopsis section of Benk on GitHub highlights the main steps to run the default job. It's a good measuring stick one can use to understand if the lights come on and determine if the system under test is ready for a more complex job.
I'll walk you through each step and explain more as I go along. Descriptions are below the commands unless noted.
git clone https://github.com/hpe-storage/benk && cd benk
Cloning and entering the benk
directory is now considered your home. Your logs will go into logs
, your workload configurations are in kustomize/overlays
and the Jinja2 templates are in jinja2
.
pip3 install -r requirements.txt
This will install the required Python packages for the Benk outputter.py
script. Python is not required if you don’t intend to render any output and you just want to use Benk for load generation.
cp kustomize/base/config-dist.env kustomize/base/config.env cp kustomize/base/storageclass-dist.yaml kustomize/base/storageclass.yaml
The two “dist” files are the base of your configuration. A config.env
file that contains information about your storage backend that will be inserted into the StorageClass
Secret
reference. While the base configuration is heavily biased towards the HPE CSI Driver for Kubernetes, any storage driver that uses a StorageClass
for dynamic provisioning for filesystems Persistent Volume Claims
(PVC) can be used. Using the storageclass.yaml
to configure your driver and pertinent details is at your discretion.
kubectl create ns benk
All Namespace
resources will be provisioned into the “benk” Namespace
.
kubectl apply -k kustomize/overlays/default kubectl wait -n benk --for=condition=complete job/benk
This will create the default job. Wait for it to complete. It runs a 80/20 read/write random 8K workload for 30 seconds on a single replica Deployment
using a single PVC
.
kubectl logs -n benk job/benk | jq
This will render the log in colorized and indented JSON. The log is pretty substantial and intentionally left out from the blog. An example log entry from the above job is available on GitHub.
There’s also a tiny output template provided in the repository jinja2/example-default.yaml.j2
that pulls some data out of the log file.
kubectl logs -n benk job/benk | ./src/benk/outputter.py -t jinja2/example-default.yaml.j2 -l- --- report: name: Example YAML template logfile: <stdin> jobs: - threads: 1 runtime: 49s iops: 1258 bandwidth: 10MB/s bs: 8k
Congratulations. You've just completed the three corners of running Benk. You've configured the environment and a job, ran the job successfully and distilled the results into something more easily readable by humans for our demonstration.
The sequencer
The core of Benk, for now, is to run preconfigured jobs with kustomize. It may seem like a lot of work for very little output. That’s why the repository contain two important scripts, sequencer.sh
and sequencer-cluster.sh
. These scripts will help you run multiple jobs and structure the output to create more meaty reports.
First, copy the “default” kustomize directory into eight separate directories.
for i in {1..8}; do cp -a kustomize/overlays/default kustomize/overlays/mytest-${i}; done
Now, edit each config.env
in each directory. I use vi
to “:wn” myself through the templates.
vi kustomize/overlays/mytest-*/config.env
In each iteration, double workloadThreads
so you end up with a sequence like 1, 2, 4, 8, 16, 32, 64 and 128. Would it not be useful to run these in sequence and report the results in the same template used previously to understand if the system scales to when adding more threads to the workload? (The entire demo environment runs in virtual machines on decade-old hardware, so please don’t judge.)
Run the sequence:
./sequencer.sh mytest-
What happens here is that the sequencer script will use globbing to find all overlays that have the prefix “mytest-”. Be mindful of how you name things to avoid unexpected jobs to be executed.
Note that the same report can be used for all the jobs.
./src/benk/outputter.py -l logs/run-mytest-<unique timestamp>.log -t jinja2/example-default.yaml.j2 --- report: name: Example YAML template logfile: logs/run-mytest-20240111161937.log jobs: - threads: 1 runtime: 55s iops: 1664 bandwidth: 13MB/s bs: 8k - threads: 2 runtime: 49s iops: 3291 bandwidth: 26MB/s bs: 8k - threads: 4 runtime: 48s iops: 8044 bandwidth: 63MB/s bs: 8k - threads: 8 runtime: 53s iops: 12075 bandwidth: 94MB/s bs: 8k - threads: 16 runtime: 46s iops: 16357 bandwidth: 128MB/s bs: 8k - threads: 32 runtime: 51s iops: 17284 bandwidth: 135MB/s bs: 8k - threads: 64 runtime: 48s iops: 17489 bandwidth: 137MB/s bs: 8k - threads: 128 runtime: 54s iops: 18761 bandwidth: 147MB/s bs: 8k
Observe that it shows diminishing returns by adding more than 16 threads to this workload.
A/B comparison
A/B testing is something that has gained popularity in human interaction testing for websites. This is also a crucial testing methodology used to analyze systems performance by simply changing a single parameter and rerunning the exact same test to compare the outcomes. We’re in luck as Benk allows reporting on two log files at once with just a slightly different data structures being fed to the Jinja2 templates.
Now, change the block size for the previous workload example. We’re going to test out if increasing the block size will increase the bandwidth for the workload.
Change the “workloadBlockSize” to “64k” with the vi
ceremony.
vi kustomize/overlays/mytest-*/config.env
Re-run the sequencer.
./sequencer.sh mytest-
You'll have to use a different template, as Jinja2 now has to deal with managing two log files at a time. The syntax for the outputter.py
script is also slightly different.
./src/benk/outputter.py -a logs/run-mytest-<unique timestamp>.log -b logs/run-mytest-<unique timestamp>.log -t jinja2/example-default-ab.md.j2
This report template will summarize the findings in a markdown table suitable to include in a GitHub pull request or similar to illustrate a certain discovery.
| Threads | A (MB/s) | B (MB/s) | Diff | | ------- | -------- | -------- | ---- | | 1 | 13 | 55 | 4.2x | | 2 | 26 | 106 | 4.1x | | 4 | 63 | 181 | 2.9x | | 8 | 94 | 450 | 4.8x | | 16 | 128 | 661 | 5.2x | | 32 | 135 | 748 | 5.5x | | 64 | 137 | 840 | 6.1x | | 128 | 147 | 833 | 5.7x |
Clearly, the performance increases nearly 5x across the board. We also have a discrepancy in the dataset. Which one?
A good exercise would be to factor in latency into the report in order to understand the impact, which there will be, as performance plateaus when you add more workloads – usually as a result of higher latency. But, how high?
More sequencers and examples!
There’s also a sequencer-cluster.sh
script that allows users to orchestrate load generation across multiple clusters attached to one or many storage systems to isolate problems with high concurrency. The possibilities are quite endless.
You can learn more about multi-cluster scaling in the GitHub repository.
You'll also find more practical examples in the GitHub repository stemming from real world performance testing conducted by the HPE Hybrid Cloud solutions team.
Summary
Hopefully, this blog post gets you started on ideas you’d like to build and use cases you want to explore. The possibilities are endless. Bear in mind that Benk is, by all means, provided as-is. In addition, to be fully transparent, this is a very early implementation that HPE chose to open source in order to better collaborate with customers and partners to isolate performance bottlenecks. For example, not all advertised features in config.env
have been implemented yet and the CLI is not yet completed.
HPE invites collaboration and accepts pull requests to Benk on GitHub. The first issue discusses a solution on how to collapse initial configuration and reporting into a single intuitive Python CLI.
Let us know what you’re building or have questions. The team behind the tool is available on HPE Developer Community Slack in the #Kubernetes channel. Sign up here and sign in at hpedev.slack.com.
Related
Catch HPE at KubeCon NA 2022
Oct 4, 2022Containers, Kubernetes, and MapR: The Time is Now
Feb 5, 2021
Deploy stateful MongoDB applications on Kubernetes clusters in HPE GreenLake for Private Cloud Enterprise
Aug 16, 2022Deploying Cribl Stream Containers on HPE GreenLake for Private Cloud Enterprise
Jan 17, 2024Doryd: A Dynamic Provisioner for Docker Volume plugins
Dec 6, 2017