Well commented code for different types of training configurations
-
Updated
Jul 1, 2021 - Jupyter Notebook
Well commented code for different types of training configurations
Faster large mini-batch distributed training w/o. squeezing devices
This project contains scripts/modules for distributed training
Short course: Introduction to Machine Learning
Everything is born from a simple experiment.
A GitHub repository showcasing the implementation of AI scaling techniques and integration with MLflow for streamlined experiment tracking and management in machine learning workflows.
Tools for ML/MXNet on Kubernetes. Rework of original tf-operator to support MXNet framework.
Transfer Learning applied to Image Classification (VGG16 - Distributed Training on Multi-GPUs)
Training Using Multiple GPUs
Compression-accelerated distributed DNN training system at large scales.
Access programming assignments and labs from the TensorFlow Advanced Techniques and TensorFlow Developer Specializations by deeplearning.ai on Coursera. 🚀🧠
Example of Distributed pyTorch
Distributed training of a CNN using MNIST dataset, Tensorflow and Horovod
Distributed training using PyTorch DDP & Suggestive resource allocation
General purpose Kubernetes operator for DL frameworks written in Python
Official DGL Implementation of "Distributed Graph Data Augmentation Technique for Graph Neural Network". KSC 2023
Distributed DP-Helmet: Scalable Differentially Private Non-interactive Averaging of Single Layers
Project showcasing how to get started with Distributed XGBoost using PySpark in CML.
Metaflow On Kubernetes
Add a description, image, and links to the distributed-training topic page so that developers can more easily learn about it.
To associate your repository with the distributed-training topic, visit your repo's landing page and select "manage topics."