Tom Phelan

Complex Stateful Applications on Kubernetes: KubeDirector version 0.2

September 9, 2019

Last summer, I wrote here about our BlueK8s initiative and a new open source project for deploying and managing complex stateful scale-out applications on Kubernetes: KubeDirector. KubeDirector enables data scientists familiar with data-intensive distributed applications such as Hadoop, Spark, Cassandra, TensorFlow, Caffe2, etc. to easily run these applications on Kubernetes.

In my blog post on the Kubernetes site in the fall, I introduced version 0.1 of KubeDirector and described how it works. Since then, we’ve seen a lot of interest in KubeDirector from the community we’re very excited about the progress so far. The BlueData team behind this effort is now part of HPE, and the KubeDirector project continues to move full steam ahead.

1*mbaqzb3tkv_zx36nxxdgbw

To that end, we just pushed out the next release and our first public update of KubeDirector: version 0.2. You can check out the full details on our github site here: https://github.com/bluek8s/kubedirector/releases/tag/v0.2.0

Some of the highlights of what’s new in version 0.2 of KubeDirector include:

A fully deployable Cloudera 5.14.2 image is now available in the catalog of example applications
Cluster launch performance has been enhanced through additional work on launch parallelization
The “configcli” tool used in application setup is now included in the “nodeprep” directory.
We’ve made additional improvements to the Makefile support and functionality:
- KubeDirector can now be built and deployed on Ubuntu systems
- “make deploy” now waits for deployment to succeed before returning
- “make teardown” now waits for teardown to finish before returning.
KubeDirector actions are now recorded as Kubernetes events and can be viewed by the standard “kubectl describe” command
KubeDirector has been tested on the following Kubernetes platforms:
- DigitalOcean Kubernetes (DOK)
- Google Kubernetes Engine (GKE)
- Amazon Elastic Container Service for Kubernetes (EKS)
- Kubernetes version 1.13.2 on CentOS kernels

See below for a screenshot of KubeDirector v0.2 running four pods of a Spark cluster:

1*6syu8q9lacfchtp9 ctdca

One of those pods is a Jupyter notebook, as shown below:

1*bazfzfp7zyqvpvtww ekbw

Join the Community

We’re working towards the next version of KubeDirector (and the broader BlueK8s initiative) and we’d welcome your help as developers, contributors, and adopters. Follow @BlueK8s on Twitter and get involved through these channels:

KubeDirector chat room on Slack
KubeDirector GitHub repo

Complex Stateful Applications on Kubernetes: KubeDirector version 0.2

Tags

Related

3 ways a data fabric enables a data-first approach

A Functional Approach to Logging in Apache Spark

Getting Started with DataTaps in Kubernetes Pods

Accessing HPE Ezmeral Data Fabric Object Storage from Spring Boot S3 Micro Service deployed in K3s cluster

An Inside Look at the Components of a Recommendation Engine

Analyzing Flight Delays with Apache Spark GraphFrames and MapR Database

Apache Spark as a Distributed SQL Engine

Apache Spark Machine Learning Tutorial

HPE Developer Newsletter

HPE Developer

About HPE