Wednesday, February 2, 2022

Multi-tasking Neural Networks

 

    Hello all, hope you all are keeping safe and doing well. I am seeing a lot of discussing around ‘multi-task learning’ so I thought to share my understanding about it. ‘Multi-tasking Neural Networks’ is a huge topic and there are many researches going on around this topic. I read some of the papers and I am trying to give you all a brief idea about the multi-task learning. I will try to make it as simple as possible, if you have any questions feel free to drop a line in the comment section and I will get back to you soon.

Let’s start with answering the main question “What is multi-tasking learning or what is multi-task neural networks?

              Multi-tasking neural network is to build a model for solving variety of multiple tasks with a single model. Let me explain it with the below image:


If you see the image, I have group different kind of task with different colors, there are 4 different kinds of tasks :

  • Moving objects : cars, pedestrian…
  • Static objects : lanes,bridge..
  • Road signs :u-turns,left/right turns …
  • Traffic signals: red,green,yellow …

Now, the aim is to build a single model to do all these different tasks. To do that we need to create an architecture or neural network, which will share some part in the initial layers but will be able to predict different tasks. Lets try to put the architecture with a block diagram for better understanding.

In general, these is how a multi-tasking neural network looks, it has a common backbone and separate heads for each task. The backbone is used to extract features and then those features are moved forward to the corresponding head for further processing.

Now there are many different architectural considerations that can be taken care such as how many layers the backbone should share!!! Or where the head should brunch off.. etc. In practice it takes lot of trail and error to find out about these layer calculations and to find out how big should be the heads. Also, some tasks may help each other, some may hamper. So, before we build our multi-tasking network it is important to understand the problem statement and group the tasks properly. Many researches are going around finding the tasks those can be put in single network. One of the interesting papers was “Which Tasks Should BeLearned Together in Multi-task Learning?", go ahead and read it for better understanding.

Now let’s talk about the loss functions. The loss functions are the most difficult things to me whenever I talk about neural networks. Lets talk about our problem, can we simply do :

Loss = α1L1+α2L2+ α3L3+ α4L4 ? , where L1, L2,L3,L4 are loss for each tasks?

The answer is NO, we cannot simply take the summation of all the losses. Depending upon tasks loss can be in different scales, some tasks can have more data, some may have more noise. Also, if some loss is larger than other, it can dominate the other losses. So, tuning the loss functions manually or setting weights to some losses would be a trivial job itself. Thankfully there are many research-based approaches to tune the loss automatically tune the loss functions and regularize them, depending upon the tasks. One of the famous technique to calculate the loss function is explained in this paper: “Multi-Task Learning UsingUncertainty to Weigh Losses for Scene Geometry and Semantics” .

Summary: Now let’s talk about the advantages and disadvantages of multi-tasking NNs and conclude today’s topic.

Multi-tasking NNs sound interesting and promising, it can do variety of different tasks with a single model but in practice it may not always that helpful. Because the tasks are tightly coupled and if we want to tune/change any layers for one task we may have to redo tests and tune all other tasks trained together. On the other hand, if we train separate model for each task, we can change whatever required for a particular network without disturbing other tasks.

So, to summarize, if we have enough data for each task and the tasks features are quite similar than we can use multi-tasking learning to build a single model. But if enough data is not available and we do not know much about the tasks, then it is better to train separate models for each task.

I think it is enough for today..feel free to hit me with lot of questions. Thanks and stay safe.

 

Thursday, December 30, 2021

Wish all of you a very Happy New Year 2022!

I am happy that 2021 is finally ending. It was the most difficult year of my life so far. I lost my only brother due to COVID-19 in July’21. Since then, it has been a never-ending fight with myself to accept this loss and move forward… The only hope is someday I will meet him again in the Heaven.


Professionally though, I had an awesome year. As a team, we learned new techniques every day, we experimented with different methods and enhanced our skills together. I cannot thank my team enough for supporting me like a real family when I needed it the most. I’m so grateful and look forward to scaling new heights with the team in 2022.


Hope everyone can move forward with humility to welcome the new year. Hope 2022 brings lots of success, joy and happiness to everyone… Wish all of you a very Happy New Year 2022!


Wednesday, November 4, 2020

Socket programming with python(sending text messages and image files)

          Recently I was working with socket programming and I was amazed to learn how we can communicate between 2 machines placed remotely. With sockets, not only communicating we can send/receive any kind of data including images. So I thought of sharing this knowledge with you all. 


          We will directly go to the implementation as there are many descriptions online about the socket programming. Today I will share two implementation, first one is sending text messages between sockets and second one is sending image from server to client machine via sockets. So Let’s get started:

    1.Sending text messages between machines via sockets:
    Server Side: In the server we have a bind() method which binds it to a specific ip(or local if connected via LAN) and port so that it can listen to incoming requests on that ip and port. The server also has a listen() method which puts the server into listen mode. This allows the server to listen to incoming connections. And last a server has an accept() and close(). Please see the below code for example. You can download the code from github
           Client side: The client code will connect with the server machine with the specified ip/local and print the received massage
You need to run the server code first, once you run it, it will print “Server started listening” , then run the client code which will print “Hey! Welcome” . Here is the output from the terminals

 
2.Sending images from server machine to client via sockets:
    We will use opencv to read image data at the server and then pack the data with pickle to send it to the client machine. In the client machine we will unpack the data with pickle and then display it.

    Server code:
            Server code would be similar with the above mentioned code, except the part of reading the image with opencv and dumping it with pickle.
    Then with a while loop we will send the data as long as it takes
        After the whole data is send we will shutdown and close connection
    Client Code:
        In the client side we will get the data size first, then retrieve the data
   
Then we will convert the data for visualization and show it with opencv

    Note:
While packing and unpacking data for sockets be careful about the format. For example if it is linux -> linux , then in the ‘struct.pack’ function we need to specify it as 'L', whereas communicating with R-pi or windows it should be '=L' .

         You can download the whole code from github. Do share your comments and feedback below. Stay inside, stay safe and keep learning cheers.

Thursday, June 11, 2020

Text detection and localization with Tesseract ORC

Hello all, hope you are doing good and keeping safe. I am writing new blog post after a long break, still adjusting with the new life style. Not sure when a vaccine will come for #convid19 and we will get back to our normal life again. Anyways lets get started. Today we will learn about how to detect and localize text in image utilizing Tesseract OCR.

Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License. Originally developed by Hewlett-Packard as proprietary software in the 1980s, it was released as open source in 2005 and development has been sponsored by Google since 2006.[wikipedia]



Text detection is the process of detecting and localizing where in an image text exists. In this blog, we will detect and draw the bounding box where ever text is detected. Before we actually start coding, we will learn how to install Tesseract in our system.
Step 1: Installing Tesseract 4 depends on which version of ubuntu you have. If you have ubuntu 18.04, it is super easy, just use this command
              sudo apt install tesseract-ocr
But if you have ubuntu version lower than 18.04 , follow the below commands:
              sudo add-apt-repository ppa:alex-p/tesseract-ocr
              sudo apt-get update
              sudo apt install tesseract-ocr
Once your installation is done you can check the Tesseract version by :
              tesseract -v
This is what my terminal showed when I did ‘tesseract -v’ .

Step2: Once you have installed the Tesseract, we need to install the pillow which will give binding with our python. So that we can use tesseract in our python code. Follow the below commands to install the pillow:
            pip install pillow
            pip install pytesseract
           pip install imutils
Great!! now we can start coding with python for text detection.

Step3: Open a new python file, name as you want and import the necessary packages.

Step4: Read the test image. By default opencv reads image as BGR format, but for tesseract we need RGB format, so convert the image to RGB.


Step4: Now we detect the text with tesseract’s ‘image_to_data’ function. Now we need to post process this to draw the bounding boxes.




Step5: We will walk through the text detected and get bounding box coordinates, the text and its confidence at which it was detected.This part is called text localization.




Step6: We can put a threshold to filter the weak detentions0.
Step7: Draw the bounding boxes and write the corresponding texts in the original image.


Step8: Show the image. Congratulations!! you have detected text in image.

If you want to download the whole code with the test image you can download it from here. Do let me know your feedback and comments below. Stay connected for more blog post, till than stay home stay safe.




Saturday, December 28, 2019

A simple classifier to classify Cars and aeroplanes with CNN(Part 2: inference)


Hello there, hope you are doing well. This is a sequential post of classifier with CNN. In our earlier post we learned how collect the data, organize them and train a model for classification. In this post we will learn how we can use the trained model and actually classify the Cars and Planes. When I was starting to train a CNN and learn, I had a difficult time to learn how to use the model and actually see the result. All the article or blogs I was following only talks about how to train the network but no one was actually talking about how we can see the classification results. Enough talking lets start :

If you followed my previous post, the model file(model.h5) was created with 96% accuracy and save in the models folder. Now we will use that model for inference. 

Step1: We will start by importing the required libraries as we did for the training.
Step2: In the test.py code we will specify where the model and the test images are. We will load the model and the weights. Specify the image size we dealing with.

Step3: Now we will define a function for prediction which will take the test image as input and return the prediction output accordingly. As we have only two classes(cars and areoplanes), we will get the probability of two classes as output. We will read that probability and show the output result.
You can clone the whole project from github here. Do let me know if you have any feedback or suggestions. Hope you enjoyed coding with me. Wish you all a very happy new year 2020 in advace.


Friday, December 20, 2019

Reading Image frame by frame from Saved videoes or Camera Using opencv Python

One of my friend was asking about reading image frames from videoes, so I thought a quick block may be very helpful for beginners. It is actually very easy, just follow the bellow steps:


Step 1. Installations



a. Install python




If you still do not have python in your system , please install python


For linux: sudo apt-get update $ sudo apt-get install python3.6

For windows: download the installation file from python website and follow the instructions



b. Install Opencv



For linux:

sudo pip3 install opencv-python
 
For Windows:

pip3 install opencv-python






Step2: Reading saved or camera video



- At first need to import the opencv



- Read the video either from camera or saved video


- While frame is available show the frame and save it. Finally release the camera and window which we used to show.


Hope you liked this post. I am posting the script below so that you can just copy paste. leave your feedback below.

import cv2

#if reading from saved video, need to specify where the file is saved
#cap = cv2.VideoCapture('D:\project\spoof\classification\test_video\test.avi')

#if reading from camera, camera id is 0 here
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    cv2.imshow('frame',gray)
    cv2.imwrite('savedImage.jpg', img)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()


Saturday, October 12, 2019

A simple classifier to classify Cars and aeroplanes with CNN(Part 1)



Today we will build a simple supervised algorithm with keras to classify cars and aeroplanes.  We will implement a simple CNN(convolution Neural Network), which we will train with the dataset, after the model is generated we can easily classify the images. Here we are using only two classes, but you can classify as many classes as you want.

I am using a small dataset. For training 200 images of cars and 200 images of planes. And for testing 50 images from each class. You can use your own dataset with different classes if you want.
The dataset contains lot of information or features of the images we provide. The model learns the distinguishable features from the data-set in the training process. With that information we can classify the images.  So let’s get started.




We will divide this tutorial in two parts, in part1 we will learn how to train the data-set and generate the model file and in part2 we will use this model file to do inference and real classification.

Step 1: Preparing Data-set

You can download the data from my github here: gitHub

Once you have the data-set we need to organize our data before we start actual training code. Below image shows the structure of folders for the data.


Photos of Cars:


Photos of Planes:

Step 2: Installing required Packages
  • -          Tensorflow > 1.13
  • -          Numpy
  • -          Keras


Step 3: Implementation 
Frist we will import the required libraries 


Read the data-set


Initialize the CNN and writing the layer… we will have one convolution layer followed by an 
activation function and a pooling. And we will repeat the same.


Flattering, dense layer, dropouts and activation at the end.

Compiling the CNN we shall use the ‘rmsprop’ optimisation method, binary cross entropy loss function


Now we have feed the images to the CNN we just created


Finally the classifier, model will be saves as ‘model.h5'


If you run the above code the result should look something like this-



  After 10 epoch is done the model will be save with an accuracy of 96%.



You can download the whole code from my git repository here: gitHub

Stay tuned for the inference part. Do share your feedback in the comment section. See you soon. Regards.

Wednesday, May 29, 2019

Harry Potter's magical Cloak with opencv



Hi there, last few blogs were hardcore machine learning and AI. Today let’s learn something interesting, lets do some magic using computer vision. I hope you all know about Harry Potter’s ‘invisible cloak’, the one he uses to become invisible. We will see how we can do the same magic trick with the help of computer vision. I will code with python and use the opencv library.
Below is the video for your reference:




The algorithm is very simple, we will separate the foreground and background image with segmentation. And then remove the foreground object from every frame. We are using a red coloured cloth as foreground image; you can use any other color of your choice but need to tweak the code accordingly. We will use the following steps:

  1. Import necessary libraries, create output video
  2. Capture and store the background for every frame.
  3. Detect the red coloured part in every frame.
  4. Segment out the red coloured part with a mask image.
  5. Generate the final magical output.

Step1: Import necessary libraries, create output video

Import the libraries. OpenCV is a library of programming functions mainly aimed at real-time computer vision. NumPy is the fundamental package for scientific computing with Python. In machine learning as we need to deal with a huge amount of data, we use NumPy, which is faster than normal array. Prepare for the output video.



Step2: Capture and store the background for every frame

The main idea is to replace the current frames’ red pixels with background pixels to generate the invisible effect. To do that first we need to store the background image for every frame.
cap.read() method is used to capture the current frame and stores the variables in ‘background’. The method also returns a Boolean True/False store in ret, if the frame is read correctly it returns Trues else false.
We are capturing the background in a for loop, so that we have several frames for background as averaging over multiple frames also reduces noise.

Step3: Detect the red coloured part in every frame

Now we will focus on detecting the red part of the image. As RGB (Red-Green-Blue) values are highly sensitive to illumination we will convert the RGB image to HSV (Hue – Saturation – Value) space. After we convert the frame to HSV space we will specify, some specific color range to detect the red color.

In general, the Hue values are distributed over a circle ranging between 0-360 degrees, but in OpenCV the range is from 0-180. And the red colour is represented by 0-30 as well as 150-180 values. We use the range 0-10 and 170-180 to avoid detection of skin as red. And then combine the masks with a OR operator(for python + is used).

Step4: Segment out the red coloured part with a mask image

Now that we where the red part is in the frame from the mask image, we will use this mask to segment that part from the whole frame. We will do a morphology open and dilation for that.

Step5: Generate the final magical output

Finally, we will replace the pixels of the detected red coloured region with corresponding pixel values of the static background, which we saved earlier and finally generate the output which creates the magical effect.

So now you can create your own video with invisible cloak. You can download the running python code from here: full code

Hope you enjoyed the magical aspect of computer vision. Do let me know your feedback and suggestion in the comment below. Thank you


Saturday, April 27, 2019

Linear Regression Implementation with python


Hello all, I hope from last few posts you already have good theoretical concept about the Machine Learning Algorithms. Today, we will do a Simple Linear Regression implementation with python. It won’t take much time and I will try to explain every step with simple words.
It is called Simple Linear Regression as it considers only one feature of input data and make the prediction. For example, here we will consider a housing price data set. As it is Simple Regression, it will only consider the size of the house to predict the price of it. But Multiple Regression, to predict the price it may consider several features such as locality, Front/back facing house etc. Below is the input data which we will use for the prediction, here house_size(x) is the input ranging from 1k sqr meter to 14k sqr meter and price(y) of the house ranging from 300 to 1100 dollar.






A scattered plot of the housing data looks like this:


Now we must find a line, which fits this scattered plot known as Regression line, so that we can predict house price for any given size(x). The equation for the Regression line looks like this

-          h(x_ith)= B0 + B1*x_ith

where, h(x_ith) represents prediction for x_ith and B0,B1 are the regression coefficients. To make the prediction, we need to estimate the regression coefficients (B0, B1). For implementation we need to follow the below steps:
  • Step1: Import the libraries. NumPy is the fundamental package for scientific computing with Python. In machine learning as we need to deal with a huge amount of data we use NumPy, which is faster than normal array. Matplotlib is a plotting library in python, we will use it for visualization.

  • Step2: Take the mean of the house_size(x) and the price(y). Calculate cross-deviation and deviation by calculating Sum of Squared Errors.
  • Step3: Calculate regression coefficients or the prediction error(explained in previous block here:  )
  • Step4: Plot the scattered points on the graph with red colors. The x-axis represents the size of the house(house_size) and the y-axis represents the price. (figure above)

  • Step5: Predict the regression line with minimum error and plot it with purple color.

  • Step6: Lastly, write the main and call the main function. And the final output of the code is

Estimated coefficients:
b_0 = 295.95147839272175 
b_1 = 57.31614859742229
                And the graph should look like this-


You can download the full code(linearRegression.py) from github here: source code
Hope you enjoyed today’s post. Stay tuned for more python implementation. Do let me know your feedbacks and comments below.
I want to share a good news, my blog was featured in the top4 machine learningblogs, please look at number 19 here: https://blog.feedspot.com/machine_learning_blogs/

Next blog:Harry Potter's magical Cloak with opencv

Saturday, April 13, 2019

Top 10 interview questions in Machine Learning(ML) and Artificial Intelligence (AI)


Hello Folks, if you read my previous three posts on Artificial Intelligence (AI), then congratulations you have the basic knowledge about the Machine Learning algorithms if not please read them. Today I would like to discuss about some most commonly used interview question on the field of Machine Learning and AI. Which would help you crack your interviews in machine Learning. Most of the basic things are already covered, remaining we will learn here.
Let’s get started



  1. What is Gradient Decent?
-             Gradient decent is an optimization algorithm which minimizes any given function. Given a function Gradient decent starts with an initial set of parameters and iteratively move to the set of parameters which provides minimum for that particular function. It is little difficult to visualize; I will try to give an example with figures for better understanding.
-              In the above figure the blue dots are actual house prices(y_Actual) corroding to the house size, green line is the predicted house price(y_Prediction) and yellow dotted lines are prediction errors (prediction error= y_Prediction - y_Actual). So, the aim is to improve the prediction by minimizing the prediction error (y_Predict - y_Actual). Gradient decent is the algorithm which is used to minimize the prediction error and optimize the function.


  1. What are the differences between Random forest and Gradient boosting? Or explain the difference between bagging and boosting algorithms.
The difference between Random Forest and Gradient boosting is as follows-
-              Randam forest uses bagging and samples randomly, whereas gradient boosting uses bagging, boosting samples with an increased weight on the ones that it got wrong previously
-              Because all the trees in random forest are built without any consideration for any of the other trees, this is incredibly easy to parallelize, which means that it can train really quick. Whereas gradient boosting is iterative in that it relies on the results of the tree before it, in order to apply a higher weight to the ones that the previous tree got incorrect. So, boosting can't be parallelized, and it takes much longer to train.
-              The final predictions for random forest are typically an unweighted average or an unweighted voting, while boosting uses a weighted voting. 
-              Lastly, random forest is easier to tune, faster to train and harder to overfit, while gradient boosting is harder to tune, slower to train, and easier to overfit.
So, with that why would you go with gradient boosting? Well, the trade-off is that gradient boosting is typically more powerful and better-performing if tuned properly.

  1. What are the benefits of using gradient boosting?
-              Well, it's one of the most powerful machine learning classifiers out there. It also accepts various types of inputs just like random forest, so it makes it very flexible. It can also be used for classification or regression, and the outputs feature importance which can be super useful. But it's not perfect. Some of the drawbacks are that it takes longer to train because it can't be parallelized, it's more likely to overfit because it obsesses over those ones that it got wrong, and it can get lost pursuing those outliers that don't really represent the overall population.

  1. What are Bias and Variance?
-              The prediction error in machine learning algorithms can be divided into three types-
o             Bias error,
o             Variance error and
o             Irreducible error
-              The irreducible error cannot be reduced whatever algorithm is used. So, we will focus into Bias and variance error.
-              Bias is the assumptions made by the model to make the target function easier to approximate. High bias can cause an algorithm to miss the relevant relations between features and target outputs (under fitting).
-              Variance is the amount that the estimate of the target function will change given different training data. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (over-fitting).

  1. What is Bias Variance trade-off?
-              The bias and variance trade-off is an import aspect of machine learning algorithm. To get an accurate model, an engineer’s goal is to reduce the bias and variance as much as possible. However, it is not feasible in real life. If a learning algorithm has low bias it must be very flexible so the it can fir any data. But if the learning algorithm is too flexible it will fit ever training data set and increase the variance error. So, there should be a trade-off between bias and variance when selecting models of different flexibility or complexity and in selecting appropriate training sets to minimize these sources of error!
  1. Explain the difference between L1 and L2 regularization
-              L2 regularization tends to spread error among all the terms, while L1 is more binary/sparser, with many variables either being assigned a 1 or 0 in weighting.
  1. Difference between KMEAN and KNN(K Nearest Neighbor) algorithms
-              The main difference is Kmean clustering is unsupervised whereas KNN is supervised machine learning algorithm. Which means KNN needs labelled data for prediction but Kmean doesn’t need as it is unsupervised.
-              Kmean is used for clustering problem whereas KNN is a supervised learning algorithm used for classification and regression problem.

  1. What are different Machine Learning techniques?
-              The different type of machine learning algorithms are-
o   Supervised Machine Learning Algorithms,
o   Unsupervised Machine Learning ALgoritms,
o   Semi-Supervised Machine Learning Algorithms,
o   Re-inforcement Machine Learning algorithms
  1. Difference Between Supervised and Unsupervised machine learning algorithms
-              please read my previous post here :Supervised, Un-Supervised, Semi-Supervised machine and Reinforcement Learning algorithms

  1. What are most commonly used Machine Learning Algorithms?
-              please read my previous post here:10 Most Commonly Used Machine Learning Algorithms

If you have any other question which I can add to this list, please let me know in the comment section. Any feedback or suggestion is always welcome. Stay tuned for next post. Regards, Mostafiz

Next post:Linear Regression Implementation with python