Skip to content
/ OpenOCR Public

OpenOCR: A general OCR system with accuracy and efficiency. Supporting 24 Scene Text Recognition methods trained from scratch on large-scale real datasets, and will continue to add the latest methods.

License

Notifications You must be signed in to change notification settings

Topdu/OpenOCR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

07ae718 · Mar 30, 2025
Mar 24, 2025
Mar 30, 2025
Mar 30, 2025
Mar 30, 2025
Mar 28, 2025
Dec 17, 2024
Aug 2, 2024
May 31, 2024
Mar 30, 2025
Mar 30, 2025
Nov 25, 2024
Feb 23, 2025
Mar 25, 2025

Repository files navigation

OpenOCR: A general OCR system with accuracy and efficiency

If you find this project useful, please give us a star🌟.

license PyPI

🚀 Quick Start | English | 简体中文


We aim to establish a unified benchmark for training and evaluating models in scene text detection and recognition. Building on this benchmark, we introduce a general OCR system with accuracy and efficiency, OpenOCR. This repository also serves as the official codebase of the OCR team from the FVL Laboratory, Fudan University.

We sincerely welcome the researcher to recommend OCR or relevant algorithms and point out any potential factual errors or bugs. Upon receiving the suggestions, we will promptly evaluate and critically reproduce them. We look forward to collaborating with you to advance the development of OpenOCR and continuously contribute to the OCR community!

Features

Ours STR algorithms

  • SMTR&FocalSVTR (Yongkun Du, Zhineng Chen*, Caiyan Jia, Xieping Gao, Yu-Gang Jiang. Out of Length Text Recognition with Sub-String Matching, AAAI 2025. Doc, Paper)
  • DPTR (Shuai Zhao, Yongkun Du, Zhineng Chen*, Yu-Gang Jiang. Decoder Pre-Training with only Text for Scene Text Recognition, ACM MM 2024. Paper)
  • IGTR (Yongkun Du, Zhineng Chen*, Yuchen Su, Caiyan Jia, Yu-Gang Jiang. Instruction-Guided Scene Text Recognition, TPAMI 2025. Doc, Paper)
  • SVTRv2 (Yongkun Du, Zhineng Chen*, Hongtao Xie, Caiyan Jia, Yu-Gang Jiang. SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition, 2024. Doc, Paper)
  • CDistNet (Tianlun Zheng, Zhineng Chen*, Shancheng Fang, Hongtao Xie, Yu-Gang Jiang. CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition, IJCV 2024. Paper)
  • MRN (Tianlun Zheng, Zhineng Chen*, Bingchen Huang, Wei Zhang, Yu-Gang Jiang. MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition, ICCV 2023. Paper, Code)
  • TPS++ (Tianlun Zheng, Zhineng Chen*, Jinfeng Bai, Hongtao Xie, Yu-Gang Jiang. TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition, IJCAI 2023. Paper, Code)
  • CPPD (Yongkun Du, Zhineng Chen*, Caiyan Jia, Xiaoting Yin, Chenxia Li, Yuning Du, Yu-Gang Jiang. Context Perception Parallel Decoder for Scene Text Recognition, TPAMI (accepted). PaddleOCR Doc, Paper)
  • SVTR (Yongkun Du, Zhineng Chen*, Caiyan Jia, Xiaoting Yin, Tianlun Zheng, Chenxia Li, Yuning Du, Yu-Gang Jiang. SVTR: Scene Text Recognition with a Single Visual Model, IJCAI 2022 (Long). PaddleOCR Doc, Paper)
  • NRTR (Fenfen Sheng, Zhineng Chen, Bo Xu. NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition, ICDAR 2019. Paper)

Recent Updates

Quick Start

Note: OpenOCR supports inference using both the ONNX and Torch frameworks, with the dependency environments for the two frameworks being isolated. When using ONNX for inference, there is no need to install Torch, and vice versa.

1. ONNX Inference

Install OpenOCR and Dependencies:

pip install openocr-python
pip install onnxruntime

Usage:

from openocr import OpenOCR
onnx_engine = OpenOCR(backend='onnx', device='cpu')
img_path = '/path/img_path or /path/img_file'
result, elapse = onnx_engine(img_path)

2. Pytorch inference

Dependencies:

  • PyTorch version >= 1.13.0
  • Python version >= 3.7
conda create -n openocr python==3.8
conda activate openocr
# install gpu version torch
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia
# or cpu version
conda install pytorch torchvision torchaudio cpuonly -c pytorch

After installing dependencies, the following two installation methods are available. Either one can be chosen.

2.1. Python Modules

Install OpenOCR:

pip install openocr-python

Usage:

from openocr import OpenOCR
engine = OpenOCR()
img_path = '/path/img_path or /path/img_file'
result, elapse = engine(img_path)

# Server mode
# engine = OpenOCR(mode='server')

2.2. Clone this repository:

git clone https://github.com/Topdu/OpenOCR.git
cd OpenOCR
pip install -r requirements.txt
wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/openocr_det_repvit_ch.pth
wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/openocr_repsvtr_ch.pth
# Rec Server model
# wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/openocr_svtrv2_ch.pth

Usage:

# OpenOCR system: Det + Rec model
python tools/infer_e2e.py --img_path=/path/img_fold or /path/img_file
# Det model
python tools/infer_det.py --c ./configs/det/dbnet/repvit_db.yml --o Global.infer_img=/path/img_fold or /path/img_file
# Rec model
python tools/infer_rec.py --c ./configs/rec/svtrv2/repsvtr_ch.yml --o Global.infer_img=/path/img_fold or /path/img_file
Export ONNX model
pip install onnx
python tools/toonnx.py --c configs/rec/svtrv2/repsvtr_ch.yml --o Global.device=cpu
python tools/toonnx.py --c configs/det/dbnet/repvit_db.yml --o Global.device=cpu
Inference with ONNXRuntime
pip install onnxruntime
# OpenOCR system: Det + Rec model
python tools/infer_e2e.py --img_path=/path/img_fold or /path/img_file --backend=onnx --device=cpu
# Det model
python tools/infer_det.py --c ./configs/det/dbnet/repvit_db.yml --o Global.backend=onnx Global.device=cpu Global.infer_img=/path/img_fold or /path/img_file
# Rec model
python tools/infer_rec.py --c ./configs/rec/svtrv2/repsvtr_ch.yml --o Global.backend=onnx Global.device=cpu Global.infer_img=/path/img_fold or /path/img_file

Local Demo

pip install gradio==4.20.0
wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/OCR_e2e_img.tar
tar xf OCR_e2e_img.tar
# start demo
python demo_gradio.py

Reproduction schedule:

Scene Text Recognition

Method Venue Training Evaluation Contributor
CRNN TPAMI 2016
ASTER TPAMI 2019 pretto0
NRTR ICDAR 2019
SAR AAAI 2019 pretto0
MORAN PR 2019
DAN AAAI 2020
RobustScanner ECCV 2020 pretto0
AutoSTR ECCV 2020
SRN CVPR 2020 pretto0
SEED CVPR 2020
ABINet CVPR 2021 YesianRohn
VisionLAN ICCV 2021 YesianRohn
PIMNet ACM MM 2021 TODO
SVTR IJCAI 2022
PARSeq ECCV 2022
MATRN ECCV 2022
MGP-STR ECCV 2022
LPV IJCAI 2023
MAERec(Union14M) ICCV 2023
LISTER ICCV 2023
CDistNet IJCV 2024 YesianRohn
BUSNet AAAI 2024
DCTC AAAI 2024 TODO
CAM PR 2024
OTE CVPR 2024
CFF IJCAI 2024 TODO
DPTR ACM MM 2024 fd-zs
VIPTR ACM CIKM 2024 TODO
IGTR TPAMI 2025
SMTR AAAI 2025
CPPD TPAMI Online Access
FocalSVTR-CTC 2024
SVTRv2 2024
ResNet+Trans-CTC
ViT-CTC

Contributors


Yiming Lei (pretto0), Xingsong Ye (YesianRohn), and Shuai Zhao (fd-zs) from the FVL Laboratory, Fudan University, with guidance from Dr. Zhineng Chen (Homepage), completed the majority work of the algorithm reproduction. Grateful for their outstanding contributions.

Scene Text Detection (STD)

TODO

Text Spotting

TODO


Citation

If you find our method useful for your reserach, please cite:

@article{Du2024SVTRv2,
      title={SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition},
      author={Yongkun Du and Zhineng Chen and Hongtao Xie and Caiyan Jia and Yu-Gang Jiang},
      journal={CoRR},
      volume={abs/2411.15858},
      eprinttype={arXiv},
      year={2024},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.15858}
}

Acknowledgement

This codebase is built based on the PaddleOCR, PytorchOCR, and MMOCR. Thanks for their awesome work!

About

OpenOCR: A general OCR system with accuracy and efficiency. Supporting 24 Scene Text Recognition methods trained from scratch on large-scale real datasets, and will continue to add the latest methods.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages