Technical: #opencv

Showing posts with label #opencv. Show all posts

Thursday, June 11, 2020

Text detection and localization with Tesseract ORC

Hello all, hope you are doing good and keeping safe. I am writing new blog post after a long break, still adjusting with the new life style. Not sure when a vaccine will come for #convid19 and we will get back to our normal life again. Anyways lets get started. Today we will learn about how to detect and localize text in image utilizing Tesseract OCR.

Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License. Originally developed by Hewlett-Packard as proprietary software in the 1980s, it was released as open source in 2005 and development has been sponsored by Google since 2006.[wikipedia]

Text detection is the process of detecting and localizing where in an image text exists. In this blog, we will detect and draw the bounding box where ever text is detected. Before we actually start coding, we will learn how to install Tesseract in our system.
Step 1: Installing Tesseract 4 depends on which version of ubuntu you have. If you have ubuntu 18.04, it is super easy, just use this command
              sudo apt install tesseract-ocr
But if you have ubuntu version lower than 18.04 , follow the below commands:
              sudo add-apt-repository ppa:alex-p/tesseract-ocr
              sudo apt-get update

sudo apt install tesseract-ocr

Once your installation is done you can check the Tesseract version by :
tesseract -v
This is what my terminal showed when I did ‘tesseract -v’ .

Step2: Once you have installed the Tesseract, we need to install the pillow which will give binding with our python. So that we can use tesseract in our python code. Follow the below commands to install the pillow:
pip install pillow

pip install pytesseract

pip install imutils

Great!! now we can start coding with python for text detection.

Step3: Open a new python file, name as you want and import the necessary packages.

Step4: Read the test image. By default opencv reads image as BGR format, but for tesseract we need RGB format, so convert the image to RGB.

Step4: Now we detect the text with tesseract’s ‘image_to_data’ function. Now we need to post process this to draw the bounding boxes.

Step5: We will walk through the text detected and get bounding box coordinates, the text and its confidence at which it was detected.This part is called text localization.

Step6: We can put a threshold to filter the weak detentions0.

Step7: Draw the bounding boxes and write the corresponding texts in the original image.

Step8: Show the image. Congratulations!! you have detected text in image.

If you want to download the whole code with the test image you can download it from here. Do let me know your feedback and comments below. Stay connected for more blog post, till than stay home stay safe.

Friday, December 20, 2019

Reading Image frame by frame from Saved videoes or Camera Using opencv Python

One of my friend was asking about reading image frames from videoes, so I thought a quick block may be very helpful for beginners. It is actually very easy, just follow the bellow steps:

Step 1. Installations

a. Install python

If you still do not have python in your system , please install python

For linux: sudo apt-get update $ sudo apt-get install python3.6

For windows: download the installation file from python website and follow the instructions

b. Install Opencv

For linux:

sudo pip3 install opencv-python

For Windows:

pip3 install opencv-python

Step2: Reading saved or camera video

- At first need to import the opencv

- Read the video either from camera or saved video

- While frame is available show the frame and save it. Finally release the camera and window which we used to show.

Hope you liked this post. I am posting the script below so that you can just copy paste. leave your feedback below.

import cv2

#if reading from saved video, need to specify where the file is saved
#cap = cv2.VideoCapture('D:\project\spoof\classification\test_video\test.avi')

#if reading from camera, camera id is 0 here
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    cv2.imshow('frame',gray)
    cv2.imwrite('savedImage.jpg', img)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Wednesday, May 29, 2019

Harry Potter's magical Cloak with opencv

Hi there, last few blogs were hardcore machine learning and AI. Today let’s learn something interesting, lets do some magic using computer vision. I hope you all know about Harry Potter’s ‘invisible cloak’, the one he uses to become invisible. We will see how we can do the same magic trick with the help of computer vision. I will code with python and use the opencv library.

Below is the video for your reference:

The algorithm is very simple, we will separate the foreground and background image with segmentation. And then remove the foreground object from every frame. We are using a red coloured cloth as foreground image; you can use any other color of your choice but need to tweak the code accordingly. We will use the following steps:

Import necessary libraries, create output video
Capture and store the background for every frame.
Detect the red coloured part in every frame.
Segment out the red coloured part with a mask image.
Generate the final magical output.

Step1: Import necessary libraries, create output video

Import the libraries. OpenCV is a library of programming functions mainly aimed at real-time computer vision. NumPy is the fundamental package for scientific computing with Python. In machine learning as we need to deal with a huge amount of data, we use NumPy, which is faster than normal array. Prepare for the output video.

Step2: Capture and store the background for every frame

The main idea is to replace the current frames’ red pixels with background pixels to generate the invisible effect. To do that first we need to store the background image for every frame.

cap.read() method is used to capture the current frame and stores the variables in ‘background’. The method also returns a Boolean True/False store in ret, if the frame is read correctly it returns Trues else false.

We are capturing the background in a for loop, so that we have several frames for background as averaging over multiple frames also reduces noise.

Step3: Detect the red coloured part in every frame

Now we will focus on detecting the red part of the image. As RGB (Red-Green-Blue) values are highly sensitive to illumination we will convert the RGB image to HSV (Hue – Saturation – Value) space. After we convert the frame to HSV space we will specify, some specific color range to detect the red color.

In general, the Hue values are distributed over a circle ranging between 0-360 degrees, but in OpenCV the range is from 0-180. And the red colour is represented by 0-30 as well as 150-180 values. We use the range 0-10 and 170-180 to avoid detection of skin as red. And then combine the masks with a OR operator(for python + is used).

Step4: Segment out the red coloured part with a mask image

Now that we where the red part is in the frame from the mask image, we will use this mask to segment that part from the whole frame. We will do a morphology open and dilation for that.

Step5: Generate the final magical output

Finally, we will replace the pixels of the detected red coloured region with corresponding pixel values of the static background, which we saved earlier and finally generate the output which creates the magical effect.

So now you can create your own video with invisible cloak. You can download the running python code from here: full code

Hope you enjoyed the magical aspect of computer vision. Do let me know your feedback and suggestion in the comment below. Thank you