Thursday, June 11, 2020

Text detection and localization with Tesseract ORC

Hello all, hope you are doing good and keeping safe. I am writing new blog post after a long break, still adjusting with the new life style. Not sure when a vaccine will come for #convid19 and we will get back to our normal life again. Anyways lets get started. Today we will learn about how to detect and localize text in image utilizing Tesseract OCR.

Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License. Originally developed by Hewlett-Packard as proprietary software in the 1980s, it was released as open source in 2005 and development has been sponsored by Google since 2006.[wikipedia]



Text detection is the process of detecting and localizing where in an image text exists. In this blog, we will detect and draw the bounding box where ever text is detected. Before we actually start coding, we will learn how to install Tesseract in our system.
Step 1: Installing Tesseract 4 depends on which version of ubuntu you have. If you have ubuntu 18.04, it is super easy, just use this command
              sudo apt install tesseract-ocr
But if you have ubuntu version lower than 18.04 , follow the below commands:
              sudo add-apt-repository ppa:alex-p/tesseract-ocr
              sudo apt-get update
              sudo apt install tesseract-ocr
Once your installation is done you can check the Tesseract version by :
              tesseract -v
This is what my terminal showed when I did ‘tesseract -v’ .

Step2: Once you have installed the Tesseract, we need to install the pillow which will give binding with our python. So that we can use tesseract in our python code. Follow the below commands to install the pillow:
            pip install pillow
            pip install pytesseract
           pip install imutils
Great!! now we can start coding with python for text detection.

Step3: Open a new python file, name as you want and import the necessary packages.

Step4: Read the test image. By default opencv reads image as BGR format, but for tesseract we need RGB format, so convert the image to RGB.


Step4: Now we detect the text with tesseract’s ‘image_to_data’ function. Now we need to post process this to draw the bounding boxes.




Step5: We will walk through the text detected and get bounding box coordinates, the text and its confidence at which it was detected.This part is called text localization.




Step6: We can put a threshold to filter the weak detentions0.
Step7: Draw the bounding boxes and write the corresponding texts in the original image.


Step8: Show the image. Congratulations!! you have detected text in image.

If you want to download the whole code with the test image you can download it from here. Do let me know your feedback and comments below. Stay connected for more blog post, till than stay home stay safe.