Showing posts with label #ImageProcessing. Show all posts
Showing posts with label #ImageProcessing. Show all posts

Thursday, June 11, 2020

Text detection and localization with Tesseract ORC

Hello all, hope you are doing good and keeping safe. I am writing new blog post after a long break, still adjusting with the new life style. Not sure when a vaccine will come for #convid19 and we will get back to our normal life again. Anyways lets get started. Today we will learn about how to detect and localize text in image utilizing Tesseract OCR.

Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License. Originally developed by Hewlett-Packard as proprietary software in the 1980s, it was released as open source in 2005 and development has been sponsored by Google since 2006.[wikipedia]



Text detection is the process of detecting and localizing where in an image text exists. In this blog, we will detect and draw the bounding box where ever text is detected. Before we actually start coding, we will learn how to install Tesseract in our system.
Step 1: Installing Tesseract 4 depends on which version of ubuntu you have. If you have ubuntu 18.04, it is super easy, just use this command
              sudo apt install tesseract-ocr
But if you have ubuntu version lower than 18.04 , follow the below commands:
              sudo add-apt-repository ppa:alex-p/tesseract-ocr
              sudo apt-get update
              sudo apt install tesseract-ocr
Once your installation is done you can check the Tesseract version by :
              tesseract -v
This is what my terminal showed when I did ‘tesseract -v’ .

Step2: Once you have installed the Tesseract, we need to install the pillow which will give binding with our python. So that we can use tesseract in our python code. Follow the below commands to install the pillow:
            pip install pillow
            pip install pytesseract
           pip install imutils
Great!! now we can start coding with python for text detection.

Step3: Open a new python file, name as you want and import the necessary packages.

Step4: Read the test image. By default opencv reads image as BGR format, but for tesseract we need RGB format, so convert the image to RGB.


Step4: Now we detect the text with tesseract’s ‘image_to_data’ function. Now we need to post process this to draw the bounding boxes.




Step5: We will walk through the text detected and get bounding box coordinates, the text and its confidence at which it was detected.This part is called text localization.




Step6: We can put a threshold to filter the weak detentions0.
Step7: Draw the bounding boxes and write the corresponding texts in the original image.


Step8: Show the image. Congratulations!! you have detected text in image.

If you want to download the whole code with the test image you can download it from here. Do let me know your feedback and comments below. Stay connected for more blog post, till than stay home stay safe.




Saturday, December 28, 2019

A simple classifier to classify Cars and aeroplanes with CNN(Part 2: inference)


Hello there, hope you are doing well. This is a sequential post of classifier with CNN. In our earlier post we learned how collect the data, organize them and train a model for classification. In this post we will learn how we can use the trained model and actually classify the Cars and Planes. When I was starting to train a CNN and learn, I had a difficult time to learn how to use the model and actually see the result. All the article or blogs I was following only talks about how to train the network but no one was actually talking about how we can see the classification results. Enough talking lets start :

If you followed my previous post, the model file(model.h5) was created with 96% accuracy and save in the models folder. Now we will use that model for inference. 

Step1: We will start by importing the required libraries as we did for the training.
Step2: In the test.py code we will specify where the model and the test images are. We will load the model and the weights. Specify the image size we dealing with.

Step3: Now we will define a function for prediction which will take the test image as input and return the prediction output accordingly. As we have only two classes(cars and areoplanes), we will get the probability of two classes as output. We will read that probability and show the output result.
You can clone the whole project from github here. Do let me know if you have any feedback or suggestions. Hope you enjoyed coding with me. Wish you all a very happy new year 2020 in advace.


Wednesday, March 6, 2019

Basic Concepts of Artificial Intelligence, Machine Learning, Deep Learning


Today we will start our journey to the world of Artificial Intelligence(AI). We will learn the basic definition of Artificial Intelligence (AI), Machine Learning(ML), Deep Learning(DL), Natural Language Processing(NLP), Computer Vision and Image Processing. Later we will go deeper with the machine learning algorithms and how those algorithm works. This tutorial is for beginners, if you have an idea of AI skip this course and go to the next lesson where I will discuss different Machine Learning algorithms.


What is Artificial Intelligence(AI)?
-          Artificial intelligence (AI) is the ability of a machine or a computer program to think and learn by doing certain task. The concept of AI is based on the idea of building machines capable of thinking, acting, and learning like humans. On other words the creating the machine capable of understanding the environment, understanding the problem and act intelligently according to the situation.

What is Machine Learning(ML)?
-          Machine Learning(ML) is an application of AI that provides system the ability to automatically learn and improve performance without being explicitly programmed. ML focuses on the development of computer program that can access data and learn for themselves. The main aim is to allow computer learn automatically without human intervention or assistance and act accordingly.
-          Next question in your mind may have, how the machine is learning? –  The answer is as human learns. Frist the machine gathers information and knowledge then use those knowledge to take decisions. Also, past experiences helps to take decisions in future.

What is Deep Learning(DL) or Deep Neural Network(DNN)?
-          Deep Learning(DL) is part of a broader family of Machine Learning and AI, which emulate the learning approach that human beings use to gain certain types of knowledge. Traditionally machine learning algorithms used to be linear, but with deep learning algorithms are stacked in a hierarchy of increasing complexity and abstraction. Because this process mimics a system of human neurons, deep learning is sometimes referred to as Deep Neural Learning(DNN) or deep neural networking. Let me explain the concept with an example blow-
-          A baby when starts learning about what a cat is (and is not) by pointing to some objects and saying the word cat. The parent guides him by saying, "Yes, that is a cat," or, "No, that is not a cat." As the baby continues to point to objects, he becomes more aware of the features that all cat have. What the baby does, without knowing it, is clarify a complex abstraction by building a hierarchy in which each level of abstraction is created with knowledge that was gained from the preceding layer of the hierarchy. A machine follows more or less similar approach. Each algorithm in the hierarchy applies a nonlinear transformation on its input and uses what it learns to create a statistical model as output. Iterations continue until the output has reached an acceptable level of accuracy. The number of processing layers through which data must pass is what inspired the label deep.


What is Natural Language Processing(NLP)?
-          Natural Language Processing is the ability of a computer program to understand human languages as it is spoken. NLP is also component of AI. The development of NLP is challenging because traditionally computer requires human to speak to them in a programming language or unambiguous or highly structured, clear commands. Whereas natural languages are generally ambiguous, have different structures, dialects, regional effects which are difficult to distinguish.
-          Semantic analysis and Natural Language Processing can help machines automatically understand text, which supports the even larger goal of translating information, understanding potentially valuable piece of customer feedback, understanding insight in a tweet or in a customer service log into the realm of business intelligence for customer support, corporate intelligence or knowledge management.

What is Computer Vision and Image Processing?
-          Computer vision is about granting the computer the ability to ‘see’ and ‘understand’ what it sees. In image processing you get an image as input and provide processed image as output, whereas in computer vision you get an image (or video) as input and provide other quantitative data as an output (e.g geometrical information about the objects in question). Computer Vision tries to do what a human brain does with the retinal input, it includes understanding and predicting, detecting certain things. For example, given an input image, using computer vision the computer can classify the objects (cars,humans,train.. etc) as human does. There are many other applications but this is just to give you a basic idea.

  This was the basic concepts. Please comment below if you have any questions or feedback. Stay tuned for more detailed concepts of Machine Learning Algorithms.

Next topic is Supervised, Un-Supervised, Semi-Supervised machine and Reinforcement Learning algorithms