TorchServe

TorchServe is a performant, flexible and easy-to-use tool for serving PyTorch eager mode and torschripted models. Multi-model management with the optimized worker to model allocation. REST and gRPC support for batched inference. Export your model for optimized inference. Torchscript out of the box, ORT, IPEX, TensorRT, FasterTransformer. Performance Guide: built-in support to optimize, benchmark and profile PyTorch and TorchServe performance. Expressive handlers: An expressive handler architecture that makes it trivial to support inferencing for your use case with many supported out of the box. Out-of-box support for system-level metrics with Prometheus exports, custom metrics and PyTorch profiler support.

Features

REST and gRPC support for batched inference
Deploy complex DAGs with multiple interdependent models
Default way to serve PyTorch models
Export your model for optimized inference
Performance Guide
Metrics API

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow TorchServe

TorchServe Web Site

User Reviews

Be the first to post a review of TorchServe!

Additional Project Details

Operating Systems

Windows

Programming Language

Java

Related Categories

Java Machine Learning Software, Java LLM Inference Tool

Registered

2022-08-05

Similar Business Software

PyTorch

Transition seamlessly between eager and graph modes with TorchScript, and accelerate the path to production with TorchServe. Scalable distributed training and performance optimization in research and production is enabled by the torch-distributed backend. A rich ecosystem of tools and libraries...

See Software
NVIDIA Triton Inference Server

NVIDIA Triton™ inference server delivers fast and scalable AI in production. Open-source inference serving software, Triton inference server streamlines AI inference by enabling teams deploy trained AI models from any framework (TensorFlow, NVIDIA TensorRT®, PyTorch, ONNX, XGBoost, Python,...

See Software
AWS Neuron

It supports high-performance training on AWS Trainium-based Amazon Elastic Compute Cloud (Amazon EC2) Trn1 instances. For model deployment, it supports high-performance and low-latency inference on AWS Inferentia-based Amazon EC2 Inf1 instances and AWS Inferentia2-based Amazon EC2 Inf2...

See Software
NVIDIA TensorRT

NVIDIA TensorRT is an ecosystem of APIs for high-performance deep learning inference, encompassing an inference runtime and model optimizations that deliver low latency and high throughput for production applications. Built on the CUDA parallel programming model, TensorRT optimizes neural...

See Software
Intel Tiber AI Cloud

Intel® Tiber™ AI Cloud is a powerful platform designed to scale AI workloads with advanced computing resources. It offers specialized AI processors, such as the Intel Gaudi AI Processor and Max Series GPUs, to accelerate model training, inference, and deployment. Optimized for enterprise-level...

See Software
Qualcomm Cloud AI SDK

The Qualcomm Cloud AI SDK is a comprehensive software suite designed to optimize trained deep learning models for high-performance inference on Qualcomm Cloud AI 100 accelerators. It supports a wide range of AI frameworks, including TensorFlow, PyTorch, and ONNX, enabling developers to compile,...

See Software

Report inappropriate content

TorchServe

Serve, optimize and scale PyTorch models in production

Get an email when there's a new version of TorchServe

Features

Project Samples

Project Activity

Categories

License

Follow TorchServe

User Reviews

Additional Project Details

Operating Systems

Programming Language

Related Categories

Registered