Skip to content

Releases: pytorch/executorch

v0.2.0

29 Apr 22:39
Compare
Choose a tag to compare

Full Changelog: v0.1.0...v0.2.0

Foundational Improvements

Large generative AI model support

  • Support generative AI models like Meta Llama 3 8B and Llama 2 7B on Android and iOS phones
  • 4-bit group-wise weight quantization
  • XNNPACK Delegate and kernels for best performance on CPU (WIP on other backends)
  • KV Cache support through PyTorch mutable buffer
  • Custom ops for SDPA, with kv cache and multi-query attention
  • ExecuTorch Runtime + tokenizer and sampler

Core ExecuTorch improvements

  • Simplified setup experience
  • Support for PyTorch mutable buffers
  • Support for multi-gigabyte models
  • Constant data moved to its own .pte segment for more efficient serialization
  • Better kernel coverage in portable lib, XNNPACK, ARM, CoreML, MPS and HTP delegates.
  • SDK - better profiling and debugging within delegates
  • API improvements/simplification
  • Dozens of fixes to fuzzer-identified .pte file-parsing issues
  • Vulkan delegate for mobile GPU
  • Data-type based selective build for optimizing binary size
  • Compatibility with torchtune
  • More models supported across different backends
  • Python code now available as the "executorch" pip package in PyPI

Hardware Acceleration Improvements

Arm

  • Significant boost in operator test coverage thought the use of TOSA reference model, as well as improved CI coverage
  • Added support for quantization with the ArmQuantizer
  • Added support for MobileNet v2 TOSA generation
  • Working towards MobileNet v2 execution on Ethos-U
  • Added support for multiple new operators on Ethos-U compiler
  • Added NCHW/NHWC conversion for Ethos-U targets until NHWC is supported by ExecuTorch
  • Arm backend example now works on MacOS

Apple Core ML

  • [SDK] ExecuTorch SDK Integration for better debugging and profiling experience
  • [SDK] ExecuTorch SDK integration using the new MLComputePlan API released in iOS 17.4 and macOS 14.4
  • [SDK] A model lowered to the CoreML backend can be profiled using the ExecuTorch Inspector without additional setup
  • [SDK] Profiling surfaces Core ML specific information for each operation in the model, including: supported compute devices, preferred compute device, and estimated cost for each compute device.
  • [SDK] The Core ML delegate backend also supports logging intermediate tensors for model debugging.
  • [Partitioner] Enables a developer to lower a model even if Core ML doesn’t support all the operations in the model.
  • [Partitioner] A developer will now be able to specify the operations that should be skipped by the Core ML backend when lowering the model.
  • [Quantizer] Leverages PyTorch 2.0 export-based quantization APIs.
  • [Quantizer] Encodes specific quantization rules in order to optimize the model for execution on Apple silicon
  • [Quantizer] Integrated with ExecuTorch Core ML delegate conversion pipeline

Apple MPS

  • Support for over 100 ops (parity with PyTorch MPS backend supported ops)
  • Support for iOS/iPadOS>=14.4+ / macOS>=12.4
  • Support for MPSPartitioner
  • Support for following dtypes: fp16, fp32, bfloat16, int8, int16, int32, int64, uint8, bool
  • Support for profiling (etrecord, etdump) through Inspector API
  • Full unit testing coverage for AOT and runtime for all supported operators
  • Enabled storiesllama (floating point) on MPS

Qualcomm

  • Support for Snapdragon 8 Gen 3 is added.
  • Enabled on-device compilation. (aka QNN online-prepare)
  • Enabled 4-bit and 16-bit quantization.
  • Qualcomm AI Studio QNN Profiling is integrated into ExecuTorch flow.
  • Enabled storiesllama on HTP-fp16 (but this effort is mainly thanks to Chen Lai from Meta being the main contributor for this)
  • Added more operators support
  • Additional models validated since v0.1.0:
    • FbNet
    • W2l (Wav2LetterModel)
    • SSD300_VGG16
    • ViT
    • Quantized MobileBert (Quantized MobileBert contribution was submitted prior to v0.1.0 timeline, but merged afterwards)

Cadence HiFi

  • Expanded operator support for Cadence HiFi targets
  • Added first small model (RNNT-emformer predictor) to the Cadence HiFi examples

Model Support

Validated with one or more delegates

Meta Llama 2 7B LearningToPaint resnet50
Meta Llama 3 8B lennard_jones shufflenet_v2_x1_0
Conformer LSTM squeezenet1_1
dcgan maml_omniglot SqueezeSAM
Deeplab_v3 mnasnet1_0 timm_efficientnet
Edsr Mobilebert Torchvision_vit
Emformer_rnnt Mobilenet_v2 Wav2letter
functorch_dp_cifar10 Mobilenet_v3 Yolo v5
Inception_v3 phlippe_resnet
Inception_v4 resnet18

Tested with torch.export but not optimized for performance

Aquila 1 7B GPT-2 PLaMo 13B
Aquila 2 7B GPT-J 6B Qwen 1.5 7B
Baichuan 1 7B InternLM2 7B Refact
BioGPT Koala RWKV 5 world 1B5
BLOOM 7B1 MiniCPM 2B sft Stable LM 2 1.6B
Chinese Alpaca 2 7B Mistral 7B Stable LM 3B
Chinese LLaMA 2 7B Mixtral 8x7B MoE Starcoder
CodeShell Persimmon 8B chat Starcoder 2
Deepseek Phi 1 Vigogne (French)
GPT Neo 1.3B Phi 1.5 Yi 6B
GPT NeoX 20B Phi 2

v0.1.0

17 Oct 03:39
Compare
Choose a tag to compare
v0.1.0 Pre-release
Pre-release

Initial public release of ExecuTorch. See https://pytorch.org/executorch for documentation.

Important: This is a preview release

This is a preview version of ExecuTorch and should be used for testing and evaluation purposes only. It is not yet recommended for use in production settings. We welcome any feedback, suggestions, and bug reports from the community to help us improve the technology. Please use the PyTorch Forums for discussion and feedback about ExecuTorch using the tag #executorch, and our GitHub repository for bug reporting.

stable-2023-09-19

20 Sep 22:20
Compare
Choose a tag to compare
stable-2023-09-19 Pre-release
Pre-release

New models enabled (e2e tested via portable lib):

  • Emformer RNN-T Transcriber, Predictor, Joiner (as three modules)

Quantization:

  • Enabled quantization for incpetion_v4 and deeplab_v3 in examples with XNNPACKQuantizer

API changes:

  • Runtime API
    • Many runtime APIs changed to improve ergonomics and to better match the style guide. Most of these changes are non-breaking (unless indicated as breaking), since the old APIs are available but marked as deprecated. We recommend that users migrate off of the deprecated APIs before the next release.
      • For an example of how these API changes affected common use cases, see the edits made to examples/executor_runner/executor_runner.cpp under the "Files changed" tab of stable-2023-09-12...78f884f
    • Breaking behavioral change: MethodMeta
      • MethodMeta::num_non_const_buffers and MethodMeta::non_const_buffer_size no longer require adjusting by 1 to skip over the reserved zero index. This will require that users of MethodMeta remove adjustments while counting and iterating over non-const buffers.
      • Details about the change, including migration to adapt to the new behavior: 5762802
      • Also note that these methods have been renamed to num_memory_planned_buffers and memory_planned_buffer_size (see note below)
      • Note that the deprecated Program::num_non_const_buffers and Program::get_non_const_buffer_size methods did not change behavior re: skipping index zero. But they are deprecated, and will be removed in a future release, so we recommend that users migrate to the MethodMeta API and behavior.
    • MethodMeta method names changed from non_const to memory_planned
      • MethodMeta::num_non_const_buffers() is now MethodMeta::num_memory_planned_buffers()
      • MethodMeta::non_const_buffer_size(N) is now MethodMeta::memory_planned_buffer_size(N)
      • Changed in 6944c45
      • The old names are available but deprecated, and will be removed in a future release
    • Breaking code-compatibility change: Method's constructor and init() method are now private
      • Users should not have used these methods; Method instances should only be created by Program::load_method()
      • Changed in 4f3e5e6
    • MemoryManager constructor no longer requires const_allocator or kernel_temporary_allocator
      • A new constructor lets users avoid creating zero-sized allocators that they don't use
      • It also renames the parameters for the remaining allocators to make their uses more clear
      • Changed in 6944c45
      • Example migration to the new constructor: fedc04c
      • The old constructor is available but deprecated, and will be removed in a future release
    • Breaking code-compatibility change: MemoryManager is now final and cannot be subclassed
    • HierarchicalAllocator's constructor now takes an array of Span<uint8_t> instead of an array of MemoryAllocator
      • Changed in 58c8c92
      • Example migration to the new API: 0bce2cb
      • The old constructor is still available but deprecated, and will be removed in a future release
    • Breaking code-compatibility change: HierarchicalAllocator is now final and cannot be subclassed
    • Program::Load() renamed to Program::load()
      • Changed in 8a5f3e8
      • The old name is still available but deprecated, and will be removed in a future release
    • FileDataLoader::From() renamed to FileDataLoader::from()
      • Changed in e2dd0be
      • The old name is still available but deprecated, and will be removed in a future release
    • MmapDataLoader::From() renamed to MmapDataLoader::from()
      • Changed in 395e51a
      • The old name is still available but deprecated, and will be removed in a future release
  • Delegate API
    • File rename: runtime/backend/backend_registry.cpp -> runtime/backend/interface.cpp
    • Partition API update: Partitioner.partition function takes ExportedProgram instead of torch.nn.GraphModule. With this change we access the parameters and buffer in partition function.
      • How to rebase: access graphmodule by exported_program.graph_module
  • SDK
    • BundledProgram updates APIs to enable user bundling test cases on specific method by using method name instead of method id in the past
      • AOT: class BundledConfig (method_names: List[str], inputs: List[List[Any]], expected_outputs: List[List[Any]]). method_names is the new added attribute.

      • Runtime: Replace the original method_idx with method_name

        • API for load bundled test input to ET program:
          __ET_NODISCARD Error LoadBundledInput( Method& method, serialized_bundled_program* bundled_program_ptr, MemoryAllocator* memory_allocator, const char* method_name, size_t testset_idx);
        • API for verify result with bundled expected output:
          __ET_NODISCARD Error VerifyResultWithBundledExpectedOutput( Method& method, serialized_bundled_program* bundled_program_ptr, MemoryAllocator* memory_allocator, const char* method_name, size_t testset_idx, double rtol = 1e-5, double atol = 1e-8);
      • Details and examples can be found https://github.com/pytorch/executorch/blob/stable/docs/website/docs/tutorials/bundled_program.md

Bug Fixes:

  • When exporting with enable_aot=True, all constant tensors will be lifted as inputs to the graph (in addition to the parameters and buffers).
  • Kwargs are now consistently placed in the call_spec of the exported program.

stable-2023-09-12

13 Sep 18:59
Compare
Choose a tag to compare
stable-2023-09-12 Pre-release
Pre-release

New models enabled (e2e tested via portable lib):

  • MobileBert

Export API

Runtime API

  • Method
    • Added set_output_data_ptr(), which is a simpler and safer way to set the output buffers if they were not memory-planned
    • Program::load_method() now accepts an optional EventTracer parameter for non-global profiling and event data collection

Delegation API

  • backend.init() and backend.execute() API changes.
    • BackendInitContext is a new added argument for backend.init and BackendExecutionContext is the new added argument for backend.execute().
    • How to rebase on these apis changes?
      • For backend.init, if runtime_allocator is not used, just mark context is not used with __ET_UNUSED. Otherwise, runtime_allocator can be accessed from the context.
      • For backend.execute, nothing has been added to context yet, just mark it with __ET_UNUSED directly. We’ll add event tracer for profiling via context soon.
  • backend.preprocess() API changes
    • Updated backend.preprocess:
      • def preprocess( edge_program: ExportedProgram, compile_specs: List[CompileSpec], ) -> PreprocessResult
    • How to rebase on this API changes?
      • Wrap the result like PreprocessResult(processed_bytes=bytes)
  • Partitioner.partition API changes
    • Updated Partition class definition. Move partition_tags from class attribute to be part of the ParititionResult.
      • def partition(self, graph_module: GraphModule) -> PartitionResult
    • How to rebase on this API change?
      • Wrap both partition_tags and the tagged_graph together as PartitionResult
  • Example Quantizer and Delegate e2e demo
    • Added an example to show to add a quantizer and have it working with delegate to fully delegated a quantized MobileNetV2 model to the example backend.

XnnpackDelegate

  • In an effort to align better with the rest of the Executorch AoT stack, XnnpackDelegate added preliminary support to also handle graphs exported with the canonical capture config (i.e. CaptureConfig.enable_aot=True and CaptureConfig._unlift=False)

SDK

Misc

  • Linter enabled
  • pytest enabled. Rerun pip install . to install pytest and other deps
  • gtest enabled via buck, for example, run gtest for runtime/core
    • /tmp/buck2 test runtime/core/test/…
  • Index operator rewrite:
    • Fixed bug related to null indices.
    • Implemented full Numpy’s advanced indexing functionality (now it is possible to use multidimensional indices, and masks that only index a subspace).
  • Build/CMake
    • CMake release build mode with size optimization flags. We have an example in examples/selective_build/test_selective_build.sh

stable-2023-08-29

29 Aug 16:58
Compare
Choose a tag to compare
stable-2023-08-29 Pre-release
Pre-release

New models enabled (e2e tested via portable lib):

  • Wav2Letter
  • Inception V3 and Inception V4
  • Resnet18 and Resnet50

Quantization:

  • Enabled E2E MobileNet V2:
  • MobileNet V3:
    • Needs bumping up the pytorch nightly version (dev20230828) in order to enable MobileNet V3 quantization. However, this breaks ViT export, hence this cut will skip MobileNet V3 quantization until we resolve ViT export breakage.

Delegation:

  • API update:
    • [breaking changes] delegate AOT APIs are moved from executorch/backends/ to executorch/exir/backend. To address the breakage: Update from executorch.backends.backend_details to from executorch.exir.backend.backend_details, and from executorch.backends.backend_api to from executorch.exir.backend.backend_api
  • XNNPACK:
    • XNNPACK delegated models can run on Mac/Linux in OSS
    • XNNPACK lowering workflow examples have been added for MobileNet V2 (with quantization and delegation) and MobileNet V3 (with delegation)
    • Showcase preliminary XNNPACK perf stats on Linux x86 & Mac M1

Selective build:

  • Added buck2 examples to demonstrate 3 APIs to do selective build on any executorch runtime build
  • Run test_selective_build.sh

stable-2023-08-15

15 Aug 19:05
Compare
Choose a tag to compare
stable-2023-08-15 Pre-release
Pre-release
  • New models in example folder:
    • Torchvision ViT. Run the example from executorch dir:
      • python3 -m examples.export.export_example --model_name="vit"
      • buck2 run //examples/executor_runner:executor_runner -- --model_path vit.pte
  • Quantization workflow example added and validated to work with MV2:
    • python3 -m examples.quantization.example --model_name mv2
  • CMake build:
  • Custom ops:
    • Add examples to register custom ops into EXIR and Executorch runtime.
    • Note: buck2 in test_custom_ops.sh should point to installed buck2 if it is not accessible in system’s PATH

stable-2023-08-01

09 Aug 00:42
Compare
Choose a tag to compare
stable-2023-08-01 Pre-release
Pre-release

Initial release to early users.