cookbook

TensorRT Cookbook

General Introduction

This repository is presented for NVIDIA TensorRT beginners and developers, which provides TensorRT-related learning and reference materials, as well as code examples.
This README.md contains catalogue of the cookbook, you can search your interested subtopics and go to the corresponding directory to read. (not finished)
Related materials (slices, datasets, models and PDF files): Baidu Netdisk (Extraction code: gpq2)
Related video tutorial on Bilibili website: TensorRT-8.6.1, TensorRT-8.2.3, Hackathon2022
Chinese contents will be translated into English in the future ([\u4E00-\u9FA5]+)
Recommend order to read (and try) the subtopics if you are freshman to the TensorRT:
- 01-SimpleDemo/TensorRT8.6
- 04-BuildEngineByONNXParser/pyTorch-ONNX-TensorRT
- 07-Tool/Netron
- 07-Tool/trtexec
- 05-Plugin/BasicExample
- 05-Plugin/UseONNXParserAndPlugin-pyTorch
- ...

Steps to setup

We recommend to use NVIDIA-optimized Docker to run the examples. Steps to install
Start the container

docker run -it -e NVIDIA_VISIBLE_DEVICES=0 --gpus "device=0" --name trt-cookbook \
--shm-size 16G --ulimit memlock=-1 --ulimit stack=67108864 \
-v ~:/work \
nvcr.io/nvidia/pytorch:23.04-py3 /bin/bash

Inside thecontainer, go to directory of cookbook firstly

cd <PathToCookBook>
pip3 install -r requirements.txt

prepare the dataset we may need in some examples, they can be found from here or here or the Baidu Netdisk above.

cd 00-MNISTData

# download dataset (4 gz files in total) and put them here as <PathToCookBook>/00-MNISTData/*.gz

python3 extractMnistData.py  # extract some of the dataset into pictures for later use

There is also an 8.png from test set of the dataset in this directory, which is used as the input data of TensorRT
now we can go to other directory to try the examples we are interested and run the examples

Table of tested docker images

Notice that pyTorch and TensorFlow in NVIDIA-optimized Docker is somewhere different from the version installed directly by pip install.
The information of the table can be found from here

Name of Docker Image	python	CUDA	cuBLAS	cuDNN	TensorRT	Nsight-Systems	Lowest Driver	Comment
nvcr.io/nvidia/tensorrt:19.12-py3	3.6	10.2.89	10.2.2.89	7.6.5	6.0.1	2019.6.1	440.33.01	Last version with TensorRT 6
nvcr.io/nvidia/tensorrt:21.06-py3	3.8	11.3.1	11.5.1.109	8.2.1	7.2.3.4	2021.2.1.58	465.19.01	Last version with TensorRT 7
nvcr.io/nvidia/pytorch:23.02-py3	3.8	12.0.1	12.0.2	8.7.0	8.5.3	2022.5.1	525	Last version with pyTorch 1
nvcr.io/nvidia/tensorflow:23.03-tf1-py3	3.8	12.1.0	12.1.0	8.8.1.3	8.5.3	2023.1.1.127	530	Last version with TensorFlow 1
nvcr.io/nvidia/tensorrt:23.03-py3	3.8	12.1.0	12.1.0	8.8.1.3	8.5.3	2023.1.1.127	530	Last version with TensorRT 8.5
nvcr.io/nvidia/pytorch:23.06-py3	3.10	12.1.1	12.1.3.1	8.9.2	8.6.1.6	2023.2.3	530	cookbook prefer version
nvcr.io/nvidia/tensorflow:23.06-tf2-py3	3.10	12.1.1	12.1.3.1	8.9.2	8.6.1.6	2023.2.3	530
nvcr.io/nvidia/paddlepaddle:23.06-py3	3.10	12.1.1	12.1.3.1	8.9.2	8.6.1.6	2023.2.3	530
nvcr.io/nvidia/tensorrt:23.06-py3	3.10	12.1.1	12.1.3.1	8.9.2	8.6.1.6	2023.2.3	530

Important update of the repository

18th June 2023. Updated to TensorRT 8.6 GA. Finish TensorRT tutorial (slice + audio) for Bilibili.
17th March 2023. Freeze code of branch TensorRT-8.5
- Translate almost all contents into English (except 02-API/Layer/*.md)
- Come to development work of TensorRT 8.6 EA
10th October 2022. Updated to TensorRT 8.5 GA. Cookbook with TensorRT 8.4 is remained in branch old/TensorRT-8.4.
15th July 2022. Updated to TensorRT 8.4 GA. Cookbook with older version of TensorRT is remained in branch old/TensorRT-8.2.

Useful Links

Catalogue and introduction of the files

include

Common files used by some examples in the cookobook.

00-MNISTData -- Related dataset

The MNIST dataset used by Cookbook, which needs to be downloaded and preprocessed before running other example code

01-SimpleDemo -- Examples of a complete program using TensorRT

Basic steps to use TensorRT, including network construction, engine construction, serialization/deserialization, inference computation, etc.
Examples cover different versions of TensorRT, all with equivalent implementations in C++ and python

02-API -- Examples of TesnorRT core objects and API

API usage of TensorRT objects, mainly to show the functions and the effect of their parameters.

03-BuildEngineByTensorRTAPI -- Examples of rebuilding a model using layer API

Taking a complete handwritten digit recognition model based on MNIST dataset as an example, the process of rebuilding the trained models from various machine learning frameworks in TensorRT with layer APIs is demonstrated, including the weights extraction from the original model, the model rebuilding and weights loading etc.
All examples based on MNIST dataset can run in FP16 mode or INT8-PTQ mode.
Some examples of importing typical layers from various machine learning frameworks to TensorRT.

04-BuildEngineByONNXParser -- Examples of rebuilding a model using ONNX Parser

Taking a complete handwritten digit recognition model based on MNIST dataset as an example, the process of rebuilding the trained models from various machine learning frameworks in TensorRT with build-in ONNX parser is demonstrated, including the process of exporting the model into ONNX format and then into TensorRT.
All examples based on MNIST dataset can run in FP16 mode or INT8-PTQ/INT8-QAT mode.

05-Plugin -- Examples of writing customized plugins

Example of implementing customized plugins and applying them to TensorRT network.
The core example is usePluginV2DynamicExt， which represents the most basic usage of the mainstream plugin in TensorRT 8. All other examples are extended from this one. Please read this example first if you are fresh to write Plugin.
There attached a file test*Plugin.py in each example for unit test. It is stongly recommended to conduct unit test after writing plugin but before integrating it into the model.
Here are examples of combining ONNX parser and plugin to build the model in TensorRT.

06-FameworkTRT -- Examples of using build-in TensorRT API in Machine Learning Frameworks

Examples of using build-in TensorRT API in various machine learning frameworks.

07-Tool -- Assistant tools for development

08-Advance -- Examples of advanced features in TensorRT

09-BestPractice -- Examples of interesting TensorRT optimization

10-ProblemSolving[TODO]

Some error messages and corresponding solutions when deploying models to TensorRT.

50-Resource -- Resource of document

PDF version of the slides of TensorRT tutorial (in Chinese), as well as some other useful reference materials.
Latex is not supported by the the Markdown of GitLab/Github, so the formula in the *.md can not be rendered during online preview. All the *.md with Latex are exported as PDF and saved one copy in this directory.

51-Uncategorized

Unclassified things.

52-Deprecated

Some APIs and usages in TensorRT which have been deprecated or removed in the newer version of TensorRT. If you run them directly, you will get error messages.

99-NotFinish

Incomplete example code and plans of new example codes proposed by our readers.

Tree diagram of the repo

├── 00-MNISTData
│   ├── test
│   └── train
├── 01-SimpleDemo
│   ├── TensorRT6
│   ├── TensorRT7
│   ├── TensorRT8.0
│   └── TensorRT8.5
├── 02-API
│   ├── AlgorithmSelector
│   ├── AuxStream
│   ├── Builder
│   ├── BuilderConfig
│   ├── CudaEngine
│   ├── EngineInspector
│   ├── ErrorRecoder
│   ├── ExecutionContext
│   ├── GPUAllocator
│   ├── HostMemory
│   ├── INT8-PTQ
│   │   └── C++
│   ├── Layer
│   │   ├── ActivationLayer
│   │   ├── AssertionLayer
│   │   ├── CastLayer
│   │   ├── ConcatenationLayer
│   │   ├── ConstantLayer
│   │   ├── ConvolutionNdLayer
│   │   ├── DeconvolutionNdLayer
│   │   ├── EinsumLayer
│   │   ├── ElementwiseLayer
│   │   ├── FillLayer
│   │   ├── FullyConnectedLayer
│   │   ├── GatherLayer
│   │   ├── GridSampleLayer
│   │   ├── IdentityLayer
│   │   ├── IfConditionStructure
│   │   ├── LoopStructure
│   │   ├── LRNLayer
│   │   ├── MatrixMultiplyLayer
│   │   ├── NMSLayer
│   │   ├── NonZeroLayer
│   │   ├── NormalizationLayer
│   │   ├── OneHotLayer
│   │   ├── PaddingNdLayer
│   │   ├── ParametricReLULayer
│   │   ├── PluginV2Layer
│   │   ├── PoolingNdLayer
│   │   ├── QuantizeDequantizeLayer
│   │   ├── RaggedSoftMaxLayer
│   │   ├── ReduceLayer
│   │   ├── ResizeLayer
│   │   ├── ReverseSequenceLayer
│   │   ├── RNNv2Layer
│   │   ├── ScaleLayer
│   │   ├── ScatterLayer
│   │   ├── SelectLayer
│   │   ├── ShapeLayer
│   │   ├── ShuffleLayer
│   │   ├── SliceLayer
│   │   ├── SoftmaxLayer
│   │   ├── TopKLayer
│   │   └── UnaryLayer
│   ├── Logger
│   ├── Network
│   ├── ONNXParser
│   ├── OptimizationProfile
│   ├── OutputAllocator
│   ├── Profiler
│   ├── ProfilingVerbosity
│   ├── Refit
│   ├── Runtime
│   ├── TacticSource
│   ├── Tensor
│   └── TimingCache
├── 03-BuildEngineByTensorRTAPI
│   ├── MNISTExample-Paddlepaddle
│   ├── MNISTExample-pyTorch
│   │   └── C++
│   ├── MNISTExample-TensorFlow1
│   ├── MNISTExample-TensorFlow2
│   ├── TypicalAPI-Paddlepaddle
│   ├── TypicalAPI-pyTorch
│   ├── TypicalAPI-TensorFlow1
│   └── TypicalAPI-TensorFlow2
├── 04-BuildEngineByONNXParser
│   ├── Paddlepaddle-ONNX-TensorRT
│   ├── pyTorch-ONNX-TensorRT
│   │   └── C++
│   ├── pyTorch-ONNX-TensorRT-QAT
│   ├── TensorFlow1-Caffe-TensorRT
│   ├── TensorFlow1-ONNX-TensorRT
│   ├── TensorFlow1-ONNX-TensorRT-QAT
│   ├── TensorFlow1-UFF-TensorRT
│   ├── TensorFlow2-ONNX-TensorRT
│   └── TensorFlow2-ONNX-TensorRT-QAT-TODO
├── 05-Plugin
│   ├── API
│   ├── C++-UsePluginInside
│   ├── C++-UsePluginOutside
│   ├── LoadDataFromNpz
│   ├── MultipleVersion
│   ├── PluginProcess
│   ├── PluginReposity
│   │   ├── AddScalarPlugin-TRT8
│   │   ├── BatchedNMS_TRTPlugin-TRT8
│   │   ├── CCLPlugin-TRT6-StaticShape
│   │   ├── CCLPlugin-TRT7-DynamicShape
│   │   ├── CumSumPlugin-V2.1-TRT8
│   │   ├── GruPlugin
│   │   ├── LayerNormPlugin-TRT8
│   │   ├── Mask2DPlugin
│   │   ├── MaskPugin
│   │   ├── MaxPlugin
│   │   ├── MMTPlugin
│   │   ├── MultinomialDistributionPlugin-cuRAND-TRT8
│   │   ├── MultinomialDistributionPlugin-thrust-TRT8
│   │   ├── OneHotPlugin-TRT8
│   │   ├── ReducePlugin
│   │   ├── Resize2DPlugin-TRT8
│   │   ├── ReversePlugin
│   │   ├── SignPlugin
│   │   ├── SortPlugin-V0.0-useCubAlone
│   │   ├── SortPlugin-V1.0-float
│   │   ├── SortPlugin-V2.0-float4
│   │   ├── TopKAveragePlugin
│   │   └── WherePlugin
│   ├── PluginSerialize-TODO
│   ├── PythonPlugin
│   │   └── circ_plugin_cpp
│   ├── UseCuBLAS
│   ├── UseFP16
│   ├── UseINT8-PTQ
│   ├── UseINT8-QDQ
│   ├── UseONNXParserAndPlugin-pyTorch
│   ├── UsePluginV2DynamicExt
│   ├── UsePluginV2Ext
│   └── UsePluginV2IOExt
├── 06-UseFrameworkTRT
│   ├── TensorFlow1-TFTRT
│   ├── TensorFlow2-TFTRT
│   └── Torch-TensorRT
├── 07-Tool
│   ├── FP16FineTuning
│   ├── Netron
│   ├── NetworkInspector
│   │   └── C++
│   ├── NetworkPrinter
│   ├── NsightSystems
│   ├── nvtx
│   ├── OnnxGraphSurgeon
│   │   └── API
│   ├── Onnxruntime
│   ├── Polygraphy-API
│   ├── Polygraphy-CLI
│   │   ├── convertExample
│   │   ├── dataExample
│   │   ├── debugExample
│   │   ├── HelpInformation
│   │   ├── inspectExample
│   │   ├── runExample
│   │   ├── surgeonExample
│   │   └── templateExample
│   ├── trex
│   │   ├── model
│   │   ├── trex
│   │   └── trex.egg-info
│   └── trtexec
├── 08-Advance
│   ├── BuilderOptimizationLevel
│   ├── CreateExecutionContextWithoutDeviceMemory
│   ├── C++StaticCompilation
│   ├── CudaGraph
│   ├── DataFormat
│   ├── DynamicShapeOutput
│   ├── EmptyTensor
│   ├── Event
│   ├── ExternalSource
│   ├── HardwareCompatibility
│   ├── LabeledDimension
│   ├── MultiContext
│   ├── MultiOptimizationProfile
│   ├── MultiStream
│   ├── Safety-TODO
│   ├── Sparsity
│   │   └── pyTorch-ONNX-TensorRT-ASP
│   ├── StreamAndAsync
│   ├── StrictType
│   ├── TensorRTGraphSurgeon
│   ├── TorchOperation
│   └── VersionCompatibility
├── 09-BestPractice
│   ├── AdjustReduceLayer
│   ├── AlignSize
│   ├── ComputationInAdvance
│   │   └── Convert3DMMTo2DMM
│   ├── ConvertTranposeMultiplicationToConvolution
│   ├── EliminateSqueezeUnsqueezeTranspose
│   ├── IncreaseBatchSize
│   ├── UsingMultiOptimizationProfile
│   ├── UsingMultiStream
│   └── WorkFlowOnONNXModel
├── 10-ProblemSolving
│   ├── ParameterCheckFailed
│   ├── SliceNodeWithBoolIO
│   ├── WeightsAreNotPermittedSinceTheyAreOfTypeInt32
│   └── WeightsHasCountXButYWasExpected
├── 51-Uncategorized
├── 52-Deprecated
│   ├── BindingEliminate-TRT8
│   ├── ConcatenationLayerBUG-TRT8.4
│   ├── ErrorWhenParsePadNode-TRT-8.4
│   ├── FullyConnectedLayer-TRT8.4
│   ├── FullyConnectedLayerWhenUsingParserTRT-8.4
│   ├── MatrixMultiplyDeprecatedLayer-TRT8
│   ├── max_workspace_size-TRT8.4
│   ├── MultiContext-TRT8
│   ├── ResizeLayer-TRT8
│   ├── RNNLayer-TRT8
│   └── ShapeLayer-TRT8
├── 99-NotFinish
│   └── TensorRTElementwiseBug
└── include

Name		Name	Last commit message	Last commit date
parent directory ..
00-MNISTData		00-MNISTData
01-SimpleDemo		01-SimpleDemo
02-API		02-API
03-BuildEngineByTensorRTAPI		03-BuildEngineByTensorRTAPI
04-BuildEngineByONNXParser		04-BuildEngineByONNXParser
05-Plugin		05-Plugin
06-UseFrameworkTRT		06-UseFrameworkTRT
07-Tool		07-Tool
08-Advance		08-Advance
09-BestPractice		09-BestPractice
10-ProblemSolving		10-ProblemSolving
51-Uncategorized		51-Uncategorized
52-Deprecated		52-Deprecated
include		include
.clang-format		.clang-format
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.style.yapf		.style.yapf
README.md		README.md
clean.sh		clean.sh
requirements.txt		requirements.txt

Files

cookbook

Directory actions

More options

Directory actions

More options

Latest commit

History

cookbook

Folders and files

parent directory

TensorRT Cookbook

General Introduction

Steps to setup

Table of tested docker images

Important update of the repository

Useful Links

Catalogue and introduction of the files

include

00-MNISTData -- Related dataset

01-SimpleDemo -- Examples of a complete program using TensorRT

02-API -- Examples of TesnorRT core objects and API

03-BuildEngineByTensorRTAPI -- Examples of rebuilding a model using layer API

04-BuildEngineByONNXParser -- Examples of rebuilding a model using ONNX Parser

05-Plugin -- Examples of writing customized plugins

06-FameworkTRT -- Examples of using build-in TensorRT API in Machine Learning Frameworks

07-Tool -- Assistant tools for development

08-Advance -- Examples of advanced features in TensorRT

09-BestPractice -- Examples of interesting TensorRT optimization

10-ProblemSolving[TODO]

50-Resource -- Resource of document

51-Uncategorized

52-Deprecated

99-NotFinish

Tree diagram of the repo