OpenCV  4.5.2
Open Source Computer Vision
High Level API: TextDetectionModel and TextRecognitionModel

Prev Tutorial: How to run custom OCR model

Next Tutorial: Conversion of PyTorch Classification Models and Launch with OpenCV Python

Original author Wenqing Zhang
Compatibility OpenCV >= 4.5

Introduction

In this tutorial, we will introduce the APIs for TextRecognitionModel and TextDetectionModel in detail.


TextRecognitionModel:

In the current version, cv::dnn::TextRecognitionModel only supports CNN+RNN+CTC based algorithms, and the greedy decoding method for CTC is provided. For more information, please refer to the original paper

Before recognition, you should setVocabulary and setDecodeType.

cv::dnn::TextRecognitionModel::recognize() is the main function for text recognition.

TextDetectionModel:

cv::dnn::TextDetectionModel API provides these methods for text detection:

In the current version, cv::dnn::TextDetectionModel supports these algorithms:

The following provided pretrained models are variants of DB (w/o deformable convolution), and the performance can be referred to the Table.1 in the paper). For more information, please refer to the official code


You can train your own model with more data, and convert it into ONNX format. We encourage you to add new algorithms to these APIs.

Pretrained Models

TextRecognitionModel:

``` crnn.onnx: url: https://drive.google.com/uc?export=dowload&id=1ooaLR-rkTl8jdpGy1DoQs0-X0lQsB6Fj sha: 270d92c9ccb670ada2459a25977e8deeaf8380d3, alphabet_36.txt: https://drive.google.com/uc?export=dowload&id=1oPOYx5rQRp8L6XQciUwmwhMCfX0KyO4b parameter setting: -rgb=0; description: The classification number of this model is 36 (0~9 + a~z). The training dataset is MJSynth.

crnn_cs.onnx: url: https://drive.google.com/uc?export=dowload&id=12diBsVJrS9ZEl6BNUiRp9s0xPALBS7kt sha: a641e9c57a5147546f7a2dbea4fd322b47197cd5 alphabet_94.txt: https://drive.google.com/uc?export=dowload&id=1oKXxXKusquimp7XY1mFvj9nwLzldVgBR parameter setting: -rgb=1; description: The classification number of this model is 94 (0~9 + a~z + A~Z + punctuations). The training datasets are MJsynth and SynthText.

crnn_cs_CN.onnx: url: https://drive.google.com/uc?export=dowload&id=1is4eYEUKH7HR7Gl37Sw4WPXx6Ir8oQEG sha: 3940942b85761c7f240494cf662dcbf05dc00d14 alphabet_3944.txt: https://drive.google.com/uc?export=dowload&id=18IZUUdNzJ44heWTndDO6NNfIpJMmN-ul parameter setting: -rgb=1; description: The classification number of this model is 3944 (0~9 + a~z + A~Z + Chinese characters + special characters). The training dataset is ReCTS (https://rrc.cvc.uab.es/?ch=12). ```

More models can be found in here, which are taken from clovaai. You can train more models by CRNN, and convert models by torch.onnx.export.

TextDetectionModel:

```

```

We will release more models of DB here in the future.

```

Images for Testing

``` Text Recognition: url: https://drive.google.com/uc?export=dowload&id=1nMcEy68zDNpIlqAn6xCk_kYcUTIeSOtN sha: 89205612ce8dd2251effa16609342b69bff67ca3

Text Detection: url: https://drive.google.com/uc?export=dowload&id=149tAhIcvfCYeyufRoZ9tmc2mZDKE_XrF sha: ced3c03fb7f8d9608169a913acf7e7b93e07109b ```

Example for Text Recognition

Step1. Loading images and models with a vocabulary

```cpp // Load a cropped text line image // you can find cropped images for testing in "Images for Testing" int rgb = IMREAD_COLOR; // This should be changed according to the model input requirement. Mat image = imread("path/to/text_rec_test.png", rgb);

// Load models weights TextRecognitionModel model("path/to/crnn_cs.onnx");

// The decoding method // more methods will be supported in future model.setDecodeType("CTC-greedy");

// Load vocabulary // vocabulary should be changed according to the text recognition model std::ifstream vocFile; vocFile.open("path/to/alphabet_94.txt"); CV_Assert(vocFile.is_open()); String vocLine; std::vector<String> vocabulary; while (std::getline(vocFile, vocLine)) { vocabulary.push_back(vocLine); } model.setVocabulary(vocabulary); ```

Step2. Setting Parameters

```cpp // Normalization parameters double scale = 1.0 / 127.5; Scalar mean = Scalar(127.5, 127.5, 127.5);

// The input shape Size inputSize = Size(100, 32);

model.setInputParams(scale, inputSize, mean); ``` Step3. Inference ```cpp std::string recognitionResult = recognizer.recognize(image); std::cout << "'" << recognitionResult << "'" << std::endl; ```

Input image:

text_rec_test.png
Picture example

Output: ``` 'welcome' ```

Example for Text Detection

Step1. Loading images and models ```cpp // Load an image // you can find some images for testing in "Images for Testing" Mat frame = imread("/path/to/text_det_test.png"); ```

Step2.a Setting Parameters (DB) ```cpp // Load model weights TextDetectionModel_DB model("/path/to/DB_TD500_resnet50.onnx");

// Post-processing parameters float binThresh = 0.3; float polyThresh = 0.5; uint maxCandidates = 200; double unclipRatio = 2.0; model.setBinaryThreshold(binThresh) .setPolygonThreshold(polyThresh) .setMaxCandidates(maxCandidates) .setUnclipRatio(unclipRatio) ;

// Normalization parameters double scale = 1.0 / 255.0; Scalar mean = Scalar(122.67891434, 116.66876762, 104.00698793);

// The input shape Size inputSize = Size(736, 736);

model.setInputParams(scale, inputSize, mean); ```

Step2.b Setting Parameters (EAST) ```cpp TextDetectionModel_EAST model("EAST.pb");

float confThreshold = 0.5; float nmsThreshold = 0.4; model.setConfidenceThreshold(confThresh) .setNMSThreshold(nmsThresh) ;

double detScale = 1.0; Size detInputSize = Size(320, 320); Scalar detMean = Scalar(123.68, 116.78, 103.94); bool swapRB = true; model.setInputParams(detScale, detInputSize, detMean, swapRB); ```

Step3. Inference ```cpp std::vector<std::vector<Point>> detResults; model.detect(detResults);

// Visualization polylines(frame, results, true, Scalar(0, 255, 0), 2); imshow("Text Detection", image); waitKey(); ```

Output:

text_det_test_results.jpg
Picture example

Example for Text Spotting

After following the steps above, it is easy to get the detection results of an input image. Then, you can do transformation and crop text images for recognition. For more information, please refer to Detailed Sample ```cpp // Transform and Crop Mat cropped; fourPointsTransform(recInput, vertices, cropped);

String recResult = recognizer.recognize(cropped); ```

Output Examples:

detect_test1.jpg
Picture example
detect_test2.jpg
Picture example

Source Code

The source code of these APIs can be found in the DNN module.

Detailed Sample

For more information, please refer to:

Test with an image

Examples: ```bash example_dnn_scene_text_recognition -mp=path/to/crnn_cs.onnx -i=path/to/an/image -rgb=1 -vp=/path/to/alphabet_94.txt example_dnn_scene_text_detection -mp=path/to/DB_TD500_resnet50.onnx -i=path/to/an/image -ih=736 -iw=736 example_dnn_scene_text_spotting -dmp=path/to/DB_IC15_resnet50.onnx -rmp=path/to/crnn_cs.onnx -i=path/to/an/image -iw=1280 -ih=736 -rgb=1 -vp=/path/to/alphabet_94.txt example_dnn_text_detection -dmp=path/to/EAST.pb -rmp=path/to/crnn_cs.onnx -i=path/to/an/image -rgb=1 -vp=path/to/alphabet_94.txt ```

Test on public datasets

Text Recognition:

The download link for testing images can be found in the Images for Testing

Examples: ```bash example_dnn_scene_text_recognition -mp=path/to/crnn.onnx -e=true -edp=path/to/evaluation_data_rec -vp=/path/to/alphabet_36.txt -rgb=0 example_dnn_scene_text_recognition -mp=path/to/crnn_cs.onnx -e=true -edp=path/to/evaluation_data_rec -vp=/path/to/alphabet_94.txt -rgb=1 ```

Text Detection:

The download links for testing images can be found in the Images for Testing

Examples: ```bash example_dnn_scene_text_detection -mp=path/to/DB_TD500_resnet50.onnx -e=true -edp=path/to/evaluation_data_det/TD500 -ih=736 -iw=736 example_dnn_scene_text_detection -mp=path/to/DB_IC15_resnet50.onnx -e=true -edp=path/to/evaluation_data_det/IC15 -ih=736 -iw=1280 ```