Be an Open Source Contributor

In the world of open source, you need to be fluent with Git. If you are not yet, we recommend you read the following 3-part blog series:

We also offer a Workshop-on-Demand that you can take to get hands-on experience with Git. Feel free to register by clicking below:

Getting started with Git

When you get started with open source, it means that you will either contribute to an existing project or create your own project. If you want to create your own project, don’t forget to create a public repo for it. For HPE employees, be aware that there is an open source review process you need to follow before releasing any code to open source. Make sure you check this HPE-only website.

The HPE Developer Community has highlighted a number of contributors/maintainers in a series of blog posts. Read about their respective journeys and join the team in our Hall of Fame:

Kartik Mathur (KubeDirector)
Souptick Joarder (Linux kernel)
Bruno Cornec (MondoRescue)
Agustín Martínez Fayó (SPIRE)
Brad Chamberlain (Chapel)
Shimrit Yacobi (Grommet)

One last piece of advice. When joining an existing open source project in order to contribute, you will have to make yourself known to other contributors and maintainers. This might take time. Be patient. Start small by documenting issues in the code or in the documentation and propose a solution. You can also review and comment on proposed changes. In most cases, there will be a Slack or a Gitter forum dedicated to the project. Don’t hesitate to join and start a discussion there. Once your name becomes associated with good feedback and proposals, it will be a lot easier for you to contribute code and get it approved and merged.

And that’s when it gets really exciting.

We estimate that there are 200 million packages on GitHub today, so finding the right place to engage is not that easy. If you are looking for a good project to contribute to, we suggest the following list (in alphabetical order). Enjoy the journey and check out our Open Source page to learn more about some of the key projects HPE supports.

Name/Repo	Description
AgStack	AgStack consists of an open repository to create and publish models, with free and easy access to public data, interoperable frameworks for cross-project use and topic-specific extensions and toolboxes. It leverages existing technologies, such as agriculture standards (AgGateway, UN-FAO, CAFA, USDA and NASA-AR); public data (Landsat, Sentinel, NOAA and Soilgrids; models (UC-ANR IPM), and open source projects like Hyperledger, Kubernetes, Open Horizon, Postgres, Django and more.
Apache Logging (log4j)	Apache Log4j is a Java-based logging utility originally written by Ceki Gülcü. It is part of the Apache Logging Services, a project of the Apache Software Foundation.
Airflow	Apache Airflow (or simply Airflow) is a platform to programmatically author, schedule, and monitor workflows.
Arrow	Arrow is a library for Typed Functional Programming in Kotlin.
Bailo	Managing the lifecycle of machine learning to support scalability, impact, collaboration, compliance and sharing.
Calcite	Apache Calcite is a dynamic data management framework.
Chapel	Chapel is a modern programming language designed for productive parallel computing at scale. Chapel's design and implementation have been undertaken with portability in mind, permitting Chapel to run on multicore desktops and laptops, commodity clusters, and the cloud, in addition to the high-end supercomputers for which it was originally created.
CMF	Common metadata framework (CMF) addresses the problems associated with tracking of pipeline metadata from distributed sites and tracks code, data and metadata together for end-to-end traceability.
HPE CSI	A Container Storage Interface (CSI) Driver for Kubernetes. The HPE CSI Driver for Kubernetes allows you to use a Container Storage Provider (CSP) to perform data management operations on storage resources.
DataHub	The Metadata Platform for the Modern Data Stack.
Determined AI	Determined is an open source deep learning training platform that makes building models fast and easy.
Drill	Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired, in part, by Google's Dremel.
Druid	Druid is a high performance real-time analytics database. Druid's main value add is to reduce time to insight and action.
Falco	The Falco Project, originally created by Sysdig, is an incubating CNCF open source cloud native runtime security tool. Falco makes it easy to consume kernel events, and enrich those events with information from Kubernetes and the rest of the cloud native stack. Falco can also be extended to other data sources by using plugins. Falco has a rich set of security rules specifically built for Kubernetes, Linux, and cloud-native. If a rule is violated in a system, Falco will send an alert notifying the user of the violation and its severity.
Fluent	Fluentd is a cloud native logging solution to unify data collection and consumption.
Grafana	The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.
Grommet	A react-based framework that provides accessibility, modularity, responsiveness, and theming in a tidy package.
Istio	An open platform to connect, manage, and secure microservices.
Jaeger	Jaeger, inspired by Dapper and OpenZipkin, is a distributed tracing platform created by Uber Technologies and donated to Cloud Native Computing Foundation. It can be used for monitoring microservices-based distributed systems.
Julia	Julia is a high-level, high-performance, dynamic programming language. While it is a general-purpose language and can be used to write any application, many of its features are well suited for numerical analysis and computational science.
Jupyter Notebook	The Jupyter notebook is a web-based notebook environment for interactive computing.
Kafka	Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
KubeDirector	KubeDirector uses standard Kubernetes (K8s) facilities of custom resources and API extensions to implement stateful scaleout application clusters. This approach enables transparent integration with K8s user/resource management and existing K8s clients and tools.
Kubernetes	Kubernetes, also known as K8s, is an open source system for managing containerized applications across multiple hosts. It provides basic mechanisms for deployment, maintenance, and scaling of applications. Kubernetes builds upon a decade and a half of experience at Google running production workloads at scale using a system called Borg, combined with best-of-breed ideas and practices from the community.
LFEdge	LF Edge aims to establish an open, interoperable framework for edge computing independent of hardware, silicon, cloud, or operating system.
Linux	The Linux operating system.
Linux KI	The LinuxKI Toolset (or LinuxKI for short) is an opensourced advanced mission critical performance troubleshooting tool for Linux. It is designed to identify performance issues beyond the typical performance metrics and results in faster root cause for many performance issues.
LLVM	The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
ODIM	Open Distributed Infrastructure Management (ODIM). A bold collaborative open source initiative to bring together a critical mass of infrastructure management and orchestration stakeholders to define and execute the collaborative work in several areas.
OpenLineage	An Open Standard for lineage metadata collection.
OpenMetaData	Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
OpenTelemetry	OpenTelemetry is a vendor-neutral standard way to collect telemetry data for applications, their supporting infrastructures, and services.
OpenSHMEM	OpenSHMEM is an effort to create a specification for a standardized API for parallel programming in the Partitioned Global Address Space. Along with the specification the project is also creating a reference implementation of the API.
OpenBMC	OpenBMC is a Linux distribution for management controllers used in devices such as servers, top of rack switches or RAID appliances. It uses Yocto, OpenEmbedded, systemd, and D-Bus to allow easy customization for your platform.
PrestoDB	Presto is a distributed SQL query engine for big data.
Prometheus	Prometheus, a Cloud Native Computing Foundation project, is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts when specified conditions are observed.
R Studio	RStudio is an integrated development environment (IDE) for the R programming language.
SmartSim	SmartSim is a workflow library that makes it easier to use common Machine Learning (ML) libraries, like PyTorch and TensorFlow, in High Performance Computing (HPC) simulations and applications. SmartSim launches ML infrastructure on HPC systems alongside user workloads.
Spark	Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.
SPIFFE/SPIRE	SPIRE (the SPIFFE Runtime Environment) is a toolchain of APIs for establishing trust between software systems across a wide variety of hosting platforms. SPIRE exposes the SPIFFE Workload API, which can attest running software systems and issue SPIFFE IDs and SVIDs to them. This in turn allows two workloads to establish trust between each other, for example by establishing an mTLS connection or by signing and verifying a JWT token. SPIRE can also enable workloads to securely authenticate to a secret store, a database, or a cloud provider service.
Zookeeper	ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.

Go to Community Page

Be an Open Source Contributor

HPE Developer Newsletter

HPE Developer

About HPE