Hello all,
hope you all are keeping safe and doing well. I am seeing a lot of discussing
around ‘multi-task learning’ so I thought to share my understanding about it.
‘Multi-tasking Neural Networks’ is a huge topic and there are many researches
going on around this topic. I read some of the papers and I am trying to give
you all a brief idea about the multi-task learning. I will try to make it as
simple as possible, if you have any questions feel free to drop a line in the
comment section and I will get back to you soon.
Let’s start
with answering the main question “What is multi-tasking learning or what is
multi-task neural networks?”
Multi-tasking neural network is to
build a model for solving variety of multiple tasks with a single model. Let me
explain it with the below image:
If you see the image, I have group different kind of task with different colors, there are 4 different kinds of tasks :
- Moving objects : cars, pedestrian…
- Static objects : lanes,bridge..
- Road signs :u-turns,left/right turns …
- Traffic signals: red,green,yellow …
Now, the
aim is to build a single model to do all these different tasks. To do that we
need to create an architecture or neural network, which will share some part in
the initial layers but will be able to predict different tasks. Lets try to put
the architecture with a block diagram for better understanding.
In general,
these is how a multi-tasking neural network looks, it has a common backbone and
separate heads for each task. The backbone is used to extract features and then
those features are moved forward to the corresponding head for further
processing.
Now there
are many different architectural considerations that can be taken care such as
how many layers the backbone should share!!! Or where the head should brunch
off.. etc. In practice it takes lot of trail and error to find out about these
layer calculations and to find out how big should be the heads. Also, some tasks
may help each other, some may hamper. So, before we build our multi-tasking
network it is important to understand the problem statement and group the tasks
properly. Many researches are going around finding the tasks those can be put
in single network. One of the interesting papers was “Which Tasks Should BeLearned Together in Multi-task Learning?", go ahead and read it for better
understanding.
Now let’s
talk about the loss functions. The loss functions are the most difficult things
to me whenever I talk about neural networks. Lets talk about our problem, can
we simply do :
Loss = α1L1+α2L2+ α3L3+ α4L4 ? , where L1, L2,L3,L4 are loss
for each tasks?
The answer
is NO, we cannot simply take the summation of all the losses. Depending
upon tasks loss can be in different scales, some tasks can have more data, some
may have more noise. Also, if some loss is larger than other, it can dominate
the other losses. So, tuning the loss functions manually or setting weights to some
losses would be a trivial job itself. Thankfully there are many research-based
approaches to tune the loss automatically tune the loss functions and
regularize them, depending upon the tasks. One of the famous technique to calculate
the loss function is explained in this paper: “Multi-Task Learning UsingUncertainty to Weigh Losses for Scene Geometry and Semantics” .
Summary: Now let’s talk about the advantages and disadvantages
of multi-tasking NNs and conclude today’s topic.
Multi-tasking NNs sound interesting and promising, it can do
variety of different tasks with a single model but in practice it may not
always that helpful. Because the tasks are tightly coupled and if we want to
tune/change any layers for one task we may have to redo tests and tune all
other tasks trained together. On the other hand, if we train separate model for
each task, we can change whatever required for a particular network without
disturbing other tasks.
So, to summarize, if we have enough data for each task and
the tasks features are quite similar than we can use multi-tasking learning to
build a single model. But if enough data is not available and we do not know
much about the tasks, then it is better to train separate models for each task.
I think it is enough for today..feel free to hit me with lot
of questions. Thanks and stay safe.