Workshop Studio link: https://catalog.workshops.aws/optimize-foundation-model-deployment-on-amazon-sagemaker
Hosting foundation models(FMs) can be challenging. Larger models are often more accurate because they include billions of parameters, but their size can also result in slower inference latency or decreased throughput. Hosting an FM can require more accelerator memory and optimized kernels to achieve the best performance.
In this workshop, we demonstrate how to use SageMaker Deep Learning Containers(DLCs) and various strategies to optimize FM inference to optimize cost and performance.
The goal of this workshop is to give you hands-on experience with deploying foundation models using Amazon SageMaker
This workshop provides a hands on experience deploying foundation models using Amazon SageMaker. This workshop tackles the following topics -
- Lab 1: Hosting large models on Amazon SageMaker with Large Model Inference(LMI) Deep Learning Container(DLC) and TensorRT-LLM.
- Lab 2: Deploy Llama2 13b SmoothQuant Model with high performance on SageMaker using Sagemaker LMI.
- Lab 3: Multi-LoRA adapter inference on Amazon SageMaker.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.