-
This repository is presented for NVIDIA TensorRT beginners and developers, which provides TensorRT-related learning and reference materials, as well as code examples.
-
This README.md contains catalogue of the cookbook, you can search your interested subtopics and go to the corresponding directory to read. (not finished)
-
Related materials (slices, datasets, models and PDF files): Baidu Netdisk (Extraction code: gpq2)
-
Related video tutorial on Bilibili website: TensorRT-8.6.1, TensorRT-8.2.3, Hackathon2022
-
Chinese contents will be translated into English in the future ([\u4E00-\u9FA5]+)
-
Recommend order to read (and try) the subtopics if you are freshman to the TensorRT:
- 01-SimpleDemo/TensorRT8.6
- 04-BuildEngineByONNXParser/pyTorch-ONNX-TensorRT
- 07-Tool/Netron
- 07-Tool/trtexec
- 05-Plugin/BasicExample
- 05-Plugin/UseONNXParserAndPlugin-pyTorch
- ...
-
We recommend to use NVIDIA-optimized Docker to run the examples. Steps to install
-
Start the container
docker run -it -e NVIDIA_VISIBLE_DEVICES=0 --gpus "device=0" --name trt-cookbook \
--shm-size 16G --ulimit memlock=-1 --ulimit stack=67108864 \
-v ~:/work \
nvcr.io/nvidia/pytorch:23.04-py3 /bin/bash
- Inside thecontainer, go to directory of cookbook firstly
cd <PathToCookBook>
pip3 install -r requirements.txt
- prepare the dataset we may need in some examples, they can be found from here or here or the Baidu Netdisk above.
cd 00-MNISTData
# download dataset (4 gz files in total) and put them here as <PathToCookBook>/00-MNISTData/*.gz
python3 extractMnistData.py # extract some of the dataset into pictures for later use
- There is also an 8.png from test set of the dataset in this directory, which is used as the input data of TensorRT
- now we can go to other directory to try the examples we are interested and run the examples
- Notice that pyTorch and TensorFlow in NVIDIA-optimized Docker is somewhere different from the version installed directly by pip install.
- The information of the table can be found from here
Name of Docker Image | python | CUDA | cuBLAS | cuDNN | TensorRT | Nsight-Systems | Lowest Driver | Comment |
---|---|---|---|---|---|---|---|---|
nvcr.io/nvidia/tensorrt:19.12-py3 | 3.6 | 10.2.89 | 10.2.2.89 | 7.6.5 | 6.0.1 | 2019.6.1 | 440.33.01 | Last version with TensorRT 6 |
nvcr.io/nvidia/tensorrt:21.06-py3 | 3.8 | 11.3.1 | 11.5.1.109 | 8.2.1 | 7.2.3.4 | 2021.2.1.58 | 465.19.01 | Last version with TensorRT 7 |
nvcr.io/nvidia/pytorch:23.02-py3 | 3.8 | 12.0.1 | 12.0.2 | 8.7.0 | 8.5.3 | 2022.5.1 | 525 | Last version with pyTorch 1 |
nvcr.io/nvidia/tensorflow:23.03-tf1-py3 | 3.8 | 12.1.0 | 12.1.0 | 8.8.1.3 | 8.5.3 | 2023.1.1.127 | 530 | Last version with TensorFlow 1 |
nvcr.io/nvidia/tensorrt:23.03-py3 | 3.8 | 12.1.0 | 12.1.0 | 8.8.1.3 | 8.5.3 | 2023.1.1.127 | 530 | Last version with TensorRT 8.5 |
nvcr.io/nvidia/pytorch:23.06-py3 | 3.10 | 12.1.1 | 12.1.3.1 | 8.9.2 | 8.6.1.6 | 2023.2.3 | 530 | cookbook prefer version |
nvcr.io/nvidia/tensorflow:23.06-tf2-py3 | 3.10 | 12.1.1 | 12.1.3.1 | 8.9.2 | 8.6.1.6 | 2023.2.3 | 530 | |
nvcr.io/nvidia/paddlepaddle:23.06-py3 | 3.10 | 12.1.1 | 12.1.3.1 | 8.9.2 | 8.6.1.6 | 2023.2.3 | 530 | |
nvcr.io/nvidia/tensorrt:23.06-py3 | 3.10 | 12.1.1 | 12.1.3.1 | 8.9.2 | 8.6.1.6 | 2023.2.3 | 530 |
-
18th June 2023. Updated to TensorRT 8.6 GA. Finish TensorRT tutorial (slice + audio) for Bilibili.
-
17th March 2023. Freeze code of branch TensorRT-8.5
- Translate almost all contents into English (except 02-API/Layer/*.md)
- Come to development work of TensorRT 8.6 EA
-
10th October 2022. Updated to TensorRT 8.5 GA. Cookbook with TensorRT 8.4 is remained in branch old/TensorRT-8.4.
-
15th July 2022. Updated to TensorRT 8.4 GA. Cookbook with older version of TensorRT is remained in branch old/TensorRT-8.2.
-
Others:
- Common files used by some examples in the cookobook.
- The MNIST dataset used by Cookbook, which needs to be downloaded and preprocessed before running other example code
-
Basic steps to use TensorRT, including network construction, engine construction, serialization/deserialization, inference computation, etc.
-
Examples cover different versions of TensorRT, all with equivalent implementations in C++ and python
- API usage of TensorRT objects, mainly to show the functions and the effect of their parameters.
-
Taking a complete handwritten digit recognition model based on MNIST dataset as an example, the process of rebuilding the trained models from various machine learning frameworks in TensorRT with layer APIs is demonstrated, including the weights extraction from the original model, the model rebuilding and weights loading etc.
-
All examples based on MNIST dataset can run in FP16 mode or INT8-PTQ mode.
-
Some examples of importing typical layers from various machine learning frameworks to TensorRT.
-
Taking a complete handwritten digit recognition model based on MNIST dataset as an example, the process of rebuilding the trained models from various machine learning frameworks in TensorRT with build-in ONNX parser is demonstrated, including the process of exporting the model into ONNX format and then into TensorRT.
-
All examples based on MNIST dataset can run in FP16 mode or INT8-PTQ/INT8-QAT mode.
-
Example of implementing customized plugins and applying them to TensorRT network.
-
The core example is usePluginV2DynamicExt, which represents the most basic usage of the mainstream plugin in TensorRT 8. All other examples are extended from this one. Please read this example first if you are fresh to write Plugin.
-
There attached a file test*Plugin.py in each example for unit test. It is stongly recommended to conduct unit test after writing plugin but before integrating it into the model.
-
Here are examples of combining ONNX parser and plugin to build the model in TensorRT.
- Examples of using build-in TensorRT API in various machine learning frameworks.
- Some error messages and corresponding solutions when deploying models to TensorRT.
-
PDF version of the slides of TensorRT tutorial (in Chinese), as well as some other useful reference materials.
-
Latex is not supported by the the Markdown of GitLab/Github, so the formula in the *.md can not be rendered during online preview. All the *.md with Latex are exported as PDF and saved one copy in this directory.
- Unclassified things.
- Some APIs and usages in TensorRT which have been deprecated or removed in the newer version of TensorRT. If you run them directly, you will get error messages.
- Incomplete example code and plans of new example codes proposed by our readers.
├── 00-MNISTData
│ ├── test
│ └── train
├── 01-SimpleDemo
│ ├── TensorRT6
│ ├── TensorRT7
│ ├── TensorRT8.0
│ └── TensorRT8.5
├── 02-API
│ ├── AlgorithmSelector
│ ├── AuxStream
│ ├── Builder
│ ├── BuilderConfig
│ ├── CudaEngine
│ ├── EngineInspector
│ ├── ErrorRecoder
│ ├── ExecutionContext
│ ├── GPUAllocator
│ ├── HostMemory
│ ├── INT8-PTQ
│ │ └── C++
│ ├── Layer
│ │ ├── ActivationLayer
│ │ ├── AssertionLayer
│ │ ├── CastLayer
│ │ ├── ConcatenationLayer
│ │ ├── ConstantLayer
│ │ ├── ConvolutionNdLayer
│ │ ├── DeconvolutionNdLayer
│ │ ├── EinsumLayer
│ │ ├── ElementwiseLayer
│ │ ├── FillLayer
│ │ ├── FullyConnectedLayer
│ │ ├── GatherLayer
│ │ ├── GridSampleLayer
│ │ ├── IdentityLayer
│ │ ├── IfConditionStructure
│ │ ├── LoopStructure
│ │ ├── LRNLayer
│ │ ├── MatrixMultiplyLayer
│ │ ├── NMSLayer
│ │ ├── NonZeroLayer
│ │ ├── NormalizationLayer
│ │ ├── OneHotLayer
│ │ ├── PaddingNdLayer
│ │ ├── ParametricReLULayer
│ │ ├── PluginV2Layer
│ │ ├── PoolingNdLayer
│ │ ├── QuantizeDequantizeLayer
│ │ ├── RaggedSoftMaxLayer
│ │ ├── ReduceLayer
│ │ ├── ResizeLayer
│ │ ├── ReverseSequenceLayer
│ │ ├── RNNv2Layer
│ │ ├── ScaleLayer
│ │ ├── ScatterLayer
│ │ ├── SelectLayer
│ │ ├── ShapeLayer
│ │ ├── ShuffleLayer
│ │ ├── SliceLayer
│ │ ├── SoftmaxLayer
│ │ ├── TopKLayer
│ │ └── UnaryLayer
│ ├── Logger
│ ├── Network
│ ├── ONNXParser
│ ├── OptimizationProfile
│ ├── OutputAllocator
│ ├── Profiler
│ ├── ProfilingVerbosity
│ ├── Refit
│ ├── Runtime
│ ├── TacticSource
│ ├── Tensor
│ └── TimingCache
├── 03-BuildEngineByTensorRTAPI
│ ├── MNISTExample-Paddlepaddle
│ ├── MNISTExample-pyTorch
│ │ └── C++
│ ├── MNISTExample-TensorFlow1
│ ├── MNISTExample-TensorFlow2
│ ├── TypicalAPI-Paddlepaddle
│ ├── TypicalAPI-pyTorch
│ ├── TypicalAPI-TensorFlow1
│ └── TypicalAPI-TensorFlow2
├── 04-BuildEngineByONNXParser
│ ├── Paddlepaddle-ONNX-TensorRT
│ ├── pyTorch-ONNX-TensorRT
│ │ └── C++
│ ├── pyTorch-ONNX-TensorRT-QAT
│ ├── TensorFlow1-Caffe-TensorRT
│ ├── TensorFlow1-ONNX-TensorRT
│ ├── TensorFlow1-ONNX-TensorRT-QAT
│ ├── TensorFlow1-UFF-TensorRT
│ ├── TensorFlow2-ONNX-TensorRT
│ └── TensorFlow2-ONNX-TensorRT-QAT-TODO
├── 05-Plugin
│ ├── API
│ ├── C++-UsePluginInside
│ ├── C++-UsePluginOutside
│ ├── LoadDataFromNpz
│ ├── MultipleVersion
│ ├── PluginProcess
│ ├── PluginReposity
│ │ ├── AddScalarPlugin-TRT8
│ │ ├── BatchedNMS_TRTPlugin-TRT8
│ │ ├── CCLPlugin-TRT6-StaticShape
│ │ ├── CCLPlugin-TRT7-DynamicShape
│ │ ├── CumSumPlugin-V2.1-TRT8
│ │ ├── GruPlugin
│ │ ├── LayerNormPlugin-TRT8
│ │ ├── Mask2DPlugin
│ │ ├── MaskPugin
│ │ ├── MaxPlugin
│ │ ├── MMTPlugin
│ │ ├── MultinomialDistributionPlugin-cuRAND-TRT8
│ │ ├── MultinomialDistributionPlugin-thrust-TRT8
│ │ ├── OneHotPlugin-TRT8
│ │ ├── ReducePlugin
│ │ ├── Resize2DPlugin-TRT8
│ │ ├── ReversePlugin
│ │ ├── SignPlugin
│ │ ├── SortPlugin-V0.0-useCubAlone
│ │ ├── SortPlugin-V1.0-float
│ │ ├── SortPlugin-V2.0-float4
│ │ ├── TopKAveragePlugin
│ │ └── WherePlugin
│ ├── PluginSerialize-TODO
│ ├── PythonPlugin
│ │ └── circ_plugin_cpp
│ ├── UseCuBLAS
│ ├── UseFP16
│ ├── UseINT8-PTQ
│ ├── UseINT8-QDQ
│ ├── UseONNXParserAndPlugin-pyTorch
│ ├── UsePluginV2DynamicExt
│ ├── UsePluginV2Ext
│ └── UsePluginV2IOExt
├── 06-UseFrameworkTRT
│ ├── TensorFlow1-TFTRT
│ ├── TensorFlow2-TFTRT
│ └── Torch-TensorRT
├── 07-Tool
│ ├── FP16FineTuning
│ ├── Netron
│ ├── NetworkInspector
│ │ └── C++
│ ├── NetworkPrinter
│ ├── NsightSystems
│ ├── nvtx
│ ├── OnnxGraphSurgeon
│ │ └── API
│ ├── Onnxruntime
│ ├── Polygraphy-API
│ ├── Polygraphy-CLI
│ │ ├── convertExample
│ │ ├── dataExample
│ │ ├── debugExample
│ │ ├── HelpInformation
│ │ ├── inspectExample
│ │ ├── runExample
│ │ ├── surgeonExample
│ │ └── templateExample
│ ├── trex
│ │ ├── model
│ │ ├── trex
│ │ └── trex.egg-info
│ └── trtexec
├── 08-Advance
│ ├── BuilderOptimizationLevel
│ ├── CreateExecutionContextWithoutDeviceMemory
│ ├── C++StaticCompilation
│ ├── CudaGraph
│ ├── DataFormat
│ ├── DynamicShapeOutput
│ ├── EmptyTensor
│ ├── Event
│ ├── ExternalSource
│ ├── HardwareCompatibility
│ ├── LabeledDimension
│ ├── MultiContext
│ ├── MultiOptimizationProfile
│ ├── MultiStream
│ ├── Safety-TODO
│ ├── Sparsity
│ │ └── pyTorch-ONNX-TensorRT-ASP
│ ├── StreamAndAsync
│ ├── StrictType
│ ├── TensorRTGraphSurgeon
│ ├── TorchOperation
│ └── VersionCompatibility
├── 09-BestPractice
│ ├── AdjustReduceLayer
│ ├── AlignSize
│ ├── ComputationInAdvance
│ │ └── Convert3DMMTo2DMM
│ ├── ConvertTranposeMultiplicationToConvolution
│ ├── EliminateSqueezeUnsqueezeTranspose
│ ├── IncreaseBatchSize
│ ├── UsingMultiOptimizationProfile
│ ├── UsingMultiStream
│ └── WorkFlowOnONNXModel
├── 10-ProblemSolving
│ ├── ParameterCheckFailed
│ ├── SliceNodeWithBoolIO
│ ├── WeightsAreNotPermittedSinceTheyAreOfTypeInt32
│ └── WeightsHasCountXButYWasExpected
├── 51-Uncategorized
├── 52-Deprecated
│ ├── BindingEliminate-TRT8
│ ├── ConcatenationLayerBUG-TRT8.4
│ ├── ErrorWhenParsePadNode-TRT-8.4
│ ├── FullyConnectedLayer-TRT8.4
│ ├── FullyConnectedLayerWhenUsingParserTRT-8.4
│ ├── MatrixMultiplyDeprecatedLayer-TRT8
│ ├── max_workspace_size-TRT8.4
│ ├── MultiContext-TRT8
│ ├── ResizeLayer-TRT8
│ ├── RNNLayer-TRT8
│ └── ShapeLayer-TRT8
├── 99-NotFinish
│ └── TensorRTElementwiseBug
└── include