Data/ML Engineer

It’s your job to ensure the right data powers the right applications at the right time and in the right place.

With an increased number and variety of workloads, how can you address all aspects of data logistics and processing that can make or break the success of any data-intensive project, including analytics and AI/machine learning? And do it easily and reliably?

On this page, we provide content to help you meet these challenges. You will find a rotating selection of foundational material, ideas to help you get inspired, as well as practical tips on key issues to improve efficiency and performance. You’ll also learn what Hewlett Packard Enterprise (HPE) offers.

The roles of the Data/ML Engineer and Data Scientist can overlap. You may also find content of interest to you on the Data Scientist page. Content on this page changes as new material becomes available or new topics arise, so check back regularly.

Get Inspired

A sampler of new ideas related to data/ML engineering:

Learn how industry innovation may affect your job.

Building a Foundation

Key to data science projects is a unifying data infrastructure to handle logistics and the containerization of applications

Simplify operations and workflows with the right data fabric and orchestrate containerized applications with open source Kubernetes.

Unit testing isn’t just for code: you need to unit test your data. Watch Deequ: Unit Tests for Data
Data locality helps support GPUs and other accelerators from a data point of view. Read How fine-grained data placement helps optimize application performance
Better connections between data producers and data consumers make data science more successful. Read Getting value from your data shouldn’t be this hard

Study the technical paper HPE Ezmeral Data Fabric: Modern infrastructure for data storage and management
Read What’s your superpower for data management?
View the HPE Ezmeral Data Fabric platform page
Read Kuberneticized machine learning and AI using Kubeflow
Learn how management of large scale Kubernetes clusters is made easier with HPE Ezmeral Runtime Enterprise

Addressing Key Concerns

What can I do to lower the entry barriers to developing new AI/ML/data science projects?

AI/ML projects can and should be run on the same system as analytics projects: Read “Chapter 3: AI and Analytics Together” in the free eBook AI and Analytics at Scale: Lessons from Real-World Production Systems

Who should be included on the team to ensure the success of the project?

Read The New Data Science Team: Who’s on First?

How do I handle data movement?

Read A better approach to major data motion: built-in data mirroring
Watch the webinar Data Motion at Scale: the Untold Story

What makes it easier to deal with edge computing in large-scale systems?

Read To the edge and back again: Meeting the challenges of edge computing

How do I ensure data trust and security?

New approaches are improving the connection between data producers and data consumers. See how in the video Dataspaces: connecting to data you can trust
Learn about the SPIFFE and SPIRE projects that are hosted by the CNCF Foundation

How are others doing this?

Check out these real-world case studies

Skill Up

Munch & Learn technology talk

Monthly meetups where you can hear from experts on the newest technologies. Catch up on any you may have missed and register for upcoming talks.

Workshops-on-Demand

Free, in-depth, hands-on workshops that allow you to explore details of a technology by interacting with it. Designed to fit your schedule, these workshops are available 24/7 – from anywhere at any time.

HPE Ezmeral Data Fabric 101 – Get to know the basics around the data fabric

Documentation

The HPE Ezmeral Data Fabric platform page offers documentation and API information along with informative videos and tutorials. Additional documentation can be found here.

HPE Ezmeral Data Fabric 7.0 documentation

Engage

Ping us with your comments, questions, and requests for information.

HPE Dev Slack

Blog articles and tutorials

Abhishek Kumar Agarwal

Streamline and optimize ML workflows with HPE Ezmeral Unified Analytics

Sep 27, 2023

Isha Ghodgaonkar

End-to-end, easy-to-use pipeline for training a model on Medical Image Data using HPE Machine Learning Development Environment

Jun 16, 2023

Andrew Mendez

Production-ready object detection model training workflow with HPE Machine Learning Development Environment

Jun 16, 2023

Thirukkannan M

ML Ops – Deploying an ML model in HPE GreenLake Platform ML Ops service

Aug 8, 2022

Sweta Katkoria

How to Set Up an Automation Pipeline to View Historical Trend Data of Clusters with HPE GreenLake for Private Cloud Enterprise

Jun 9, 2022

Denis Choukroun

Deep Learning Model Training – A First-Time User’s Experience with Determined – Part 2

May 3, 2022

Denis Choukroun

Deep Learning Model Training – A First-Time User’s Experience with Determined - Part 1

Apr 14, 2022

Srikanth Venkata Seshu

Highlighting key features of HPE Ezmeral Runtime Enterprise Release 5.4

Mar 31, 2022

By Neil Conway and Alex Putnam

Writing Deep Learning Tools for all Data Scientists, Not Just Unicorns

Feb 11, 2022

Dale Rensing

HPE Developer launches its Munch & Learn technical talks

Jan 27, 2022

Cenz Wong

Getting Started with DataTaps in Kubernetes Pods

Jul 6, 2021

Don Wake

On-Premise Adventures: How to build an Apache Spark lab on Kubernetes

Jun 15, 2021

Carol McDonald

Real-Time Streaming Data Pipelines with Apache APIs: Kafka, Spark Streaming, and HBase

Feb 19, 2021

Ranjit Lingaiah

How to Use Secondary Indexes in Spark With Open JSON Application Interface (OJAI)

Feb 5, 2021

Tugdual Grall

Setting Up Spark Dynamic Allocation on MapR

Feb 5, 2021

Will Ochandarena

Scaling with Kafka – Common Challenges Solved

Jan 29, 2021

Carol McDonald

Streaming Data Pipeline to Transform, Store and Explore Healthcare Dataset With Apache Kafka API, Apache Spark, Apache Drill, JSON and MapR Database

Jan 14, 2021

Michael Farnbach

Best Practices on Migrating from a Data Warehouse to a Big Data Platform

Dec 16, 2020

Nicolas Perez

Spark Data Source API: Extending Our Spark SQL Query Engine

Dec 16, 2020

Carol McDonald

Fast data processing pipeline for predicting flight delays using Apache APIs: Kafka, Spark Streaming and Machine Learning (part 1)

Oct 21, 2020

Terry He

How to Use a Table Load Tool to Batch Puts into HBase/MapR Database

Oct 15, 2020

Ian Downard

How to Persist Kafka Data as JSON in NoSQL Storage Using MapR Event Store and MapR Database

Sep 25, 2020

Magnus Pierre

CRUD with the New Golang Client for MapR Database

Sep 18, 2020

Carol McDonald

Datasets, DataFrames, and Spark SQL for Processing of Tabular Data

Aug 19, 2020

Carol McDonald

Tips and Best Practices to Take Advantage of Spark 2.x

Jul 8, 2020

Carol McDonald

Data Modeling Guidelines for NoSQL JSON Document Databases

Jul 8, 2020

Your Role Page

Data/ML Engineer

Get Inspired

Building a Foundation

Addressing Key Concerns

Skill Up

Blog articles and tutorials

Streamline and optimize ML workflows with HPE Ezmeral Unified Analytics

End-to-end, easy-to-use pipeline for training a model on Medical Image Data using HPE Machine Learning Development Environment

Production-ready object detection model training workflow with HPE Machine Learning Development Environment

ML Ops – Deploying an ML model in HPE GreenLake Platform ML Ops service

How to Set Up an Automation Pipeline to View Historical Trend Data of Clusters with HPE GreenLake for Private Cloud Enterprise

Deep Learning Model Training – A First-Time User’s Experience with Determined – Part 2

Deep Learning Model Training – A First-Time User’s Experience with Determined - Part 1

Highlighting key features of HPE Ezmeral Runtime Enterprise Release 5.4

Writing Deep Learning Tools for all Data Scientists, Not Just Unicorns

HPE Developer launches its Munch & Learn technical talks

Getting Started with DataTaps in Kubernetes Pods

On-Premise Adventures: How to build an Apache Spark lab on Kubernetes

Real-Time Streaming Data Pipelines with Apache APIs: Kafka, Spark Streaming, and HBase

How to Use Secondary Indexes in Spark With Open JSON Application Interface (OJAI)

Setting Up Spark Dynamic Allocation on MapR

Scaling with Kafka – Common Challenges Solved

Streaming Data Pipeline to Transform, Store and Explore Healthcare Dataset With Apache Kafka API, Apache Spark, Apache Drill, JSON and MapR Database

Best Practices on Migrating from a Data Warehouse to a Big Data Platform

Spark Data Source API: Extending Our Spark SQL Query Engine

Fast data processing pipeline for predicting flight delays using Apache APIs: Kafka, Spark Streaming and Machine Learning (part 1)

How to Use a Table Load Tool to Batch Puts into HBase/MapR Database

How to Persist Kafka Data as JSON in NoSQL Storage Using MapR Event Store and MapR Database

CRUD with the New Golang Client for MapR Database

Datasets, DataFrames, and Spark SQL for Processing of Tabular Data

Tips and Best Practices to Take Advantage of Spark 2.x

Data Modeling Guidelines for NoSQL JSON Document Databases

HPE Developer Newsletter

HPE Developer

About HPE