Wednesday, February 2, 2022

Multi-tasking Neural Networks

 

    Hello all, hope you all are keeping safe and doing well. I am seeing a lot of discussing around ‘multi-task learning’ so I thought to share my understanding about it. ‘Multi-tasking Neural Networks’ is a huge topic and there are many researches going on around this topic. I read some of the papers and I am trying to give you all a brief idea about the multi-task learning. I will try to make it as simple as possible, if you have any questions feel free to drop a line in the comment section and I will get back to you soon.

Let’s start with answering the main question “What is multi-tasking learning or what is multi-task neural networks?

              Multi-tasking neural network is to build a model for solving variety of multiple tasks with a single model. Let me explain it with the below image:


If you see the image, I have group different kind of task with different colors, there are 4 different kinds of tasks :

  • Moving objects : cars, pedestrian…
  • Static objects : lanes,bridge..
  • Road signs :u-turns,left/right turns …
  • Traffic signals: red,green,yellow …

Now, the aim is to build a single model to do all these different tasks. To do that we need to create an architecture or neural network, which will share some part in the initial layers but will be able to predict different tasks. Lets try to put the architecture with a block diagram for better understanding.

In general, these is how a multi-tasking neural network looks, it has a common backbone and separate heads for each task. The backbone is used to extract features and then those features are moved forward to the corresponding head for further processing.

Now there are many different architectural considerations that can be taken care such as how many layers the backbone should share!!! Or where the head should brunch off.. etc. In practice it takes lot of trail and error to find out about these layer calculations and to find out how big should be the heads. Also, some tasks may help each other, some may hamper. So, before we build our multi-tasking network it is important to understand the problem statement and group the tasks properly. Many researches are going around finding the tasks those can be put in single network. One of the interesting papers was “Which Tasks Should BeLearned Together in Multi-task Learning?", go ahead and read it for better understanding.

Now let’s talk about the loss functions. The loss functions are the most difficult things to me whenever I talk about neural networks. Lets talk about our problem, can we simply do :

Loss = α1L1+α2L2+ α3L3+ α4L4 ? , where L1, L2,L3,L4 are loss for each tasks?

The answer is NO, we cannot simply take the summation of all the losses. Depending upon tasks loss can be in different scales, some tasks can have more data, some may have more noise. Also, if some loss is larger than other, it can dominate the other losses. So, tuning the loss functions manually or setting weights to some losses would be a trivial job itself. Thankfully there are many research-based approaches to tune the loss automatically tune the loss functions and regularize them, depending upon the tasks. One of the famous technique to calculate the loss function is explained in this paper: “Multi-Task Learning UsingUncertainty to Weigh Losses for Scene Geometry and Semantics” .

Summary: Now let’s talk about the advantages and disadvantages of multi-tasking NNs and conclude today’s topic.

Multi-tasking NNs sound interesting and promising, it can do variety of different tasks with a single model but in practice it may not always that helpful. Because the tasks are tightly coupled and if we want to tune/change any layers for one task we may have to redo tests and tune all other tasks trained together. On the other hand, if we train separate model for each task, we can change whatever required for a particular network without disturbing other tasks.

So, to summarize, if we have enough data for each task and the tasks features are quite similar than we can use multi-tasking learning to build a single model. But if enough data is not available and we do not know much about the tasks, then it is better to train separate models for each task.

I think it is enough for today..feel free to hit me with lot of questions. Thanks and stay safe.