Alex Putnam

Build Transformative AI Applications at Scale with HPE Machine Learning Development Environment

November 4, 2021

Building and training optimized machine learning (ML) models at scale is considered the most demanding and critical stage of ML development. Doing it well requires researchers and data scientists to overcome many challenges typically encountered in High Performance Computing (HPC) environments.

These challenges often include properly setting up and managing a fast-moving ML software ecosystem and infrastructure spanning specialized compute, storage, network fabric, and tensor processors (e.g., GPUs). Additionally, users need to program, schedule, and train their models to maximize the use of the highly specialized infrastructure they have set up, which can create complexity and impede productivity.

To meet these challenges, ML Engineers and data scientists are on a never-ending search for novel and innovative solutions that help them focus on building better models and accelerate their time-to-production. HPE Machine Learning Development Environment is designed to help them specifically achieve this.

Built upon the widely popular open-source Determined Training Platform, HPE Machine Learning Development Environment reduces the complexity and cost associated with machine learning model development by removing the need to write infrastructure code and makes it easy for IT administrators to set up, manage, secure, and share AI compute clusters.

By adopting HPE Machine Learning Development Environment, model developers can:

Train models faster using state-of-the-art distributed training without changing the model code.
Automatically find high-quality models with advanced hyperparameter tuning from the creators of Hyperband.
Maximize GPU performance with smart scheduling and cut cloud GPU costs by seamlessly using spot/preemptible instances.
Track and reproduce work with experiment tracking that runs out-of-the-box, covering code versions, metrics, checkpoints, and hyperparameters.

HPE Machine Learning Development Environment integrates these features into an easy-to-use, high-performance machine learning environment — which means you can spend your time building models instead of managing infrastructure.

To learn more about HPE Machine Learning Development Environment visit our landing page and get in touch with our team of ML and distributed system experts.

To learn more about the open-source project that powers HPE Machine Learning Development Environment, we invite you to check out the Determined Training Platform on GitHub, read the Documentation, and join the Determined Community Slack to get started. Stay tuned to the HPE DEV blog for more informative articles on this subject.

Build Transformative AI Applications at Scale with HPE Machine Learning Development Environment

Tags

Related

Activation Memory: A Deep Dive using PyTorch

Activation Memory: What is it?

AI News #14

AI News #16

AI News #18

AI News #19

AI News #21

Announcing GenAI studio: Your generative AI playground built on Determined

HPE Developer Newsletter

HPE Developer

About HPE