Saturday, April 13, 2019

Top 10 interview questions in Machine Learning(ML) and Artificial Intelligence (AI)


Hello Folks, if you read my previous three posts on Artificial Intelligence (AI), then congratulations you have the basic knowledge about the Machine Learning algorithms if not please read them. Today I would like to discuss about some most commonly used interview question on the field of Machine Learning and AI. Which would help you crack your interviews in machine Learning. Most of the basic things are already covered, remaining we will learn here.
Let’s get started



  1. What is Gradient Decent?
-             Gradient decent is an optimization algorithm which minimizes any given function. Given a function Gradient decent starts with an initial set of parameters and iteratively move to the set of parameters which provides minimum for that particular function. It is little difficult to visualize; I will try to give an example with figures for better understanding.
-              In the above figure the blue dots are actual house prices(y_Actual) corroding to the house size, green line is the predicted house price(y_Prediction) and yellow dotted lines are prediction errors (prediction error= y_Prediction - y_Actual). So, the aim is to improve the prediction by minimizing the prediction error (y_Predict - y_Actual). Gradient decent is the algorithm which is used to minimize the prediction error and optimize the function.


  1. What are the differences between Random forest and Gradient boosting? Or explain the difference between bagging and boosting algorithms.
The difference between Random Forest and Gradient boosting is as follows-
-              Randam forest uses bagging and samples randomly, whereas gradient boosting uses bagging, boosting samples with an increased weight on the ones that it got wrong previously
-              Because all the trees in random forest are built without any consideration for any of the other trees, this is incredibly easy to parallelize, which means that it can train really quick. Whereas gradient boosting is iterative in that it relies on the results of the tree before it, in order to apply a higher weight to the ones that the previous tree got incorrect. So, boosting can't be parallelized, and it takes much longer to train.
-              The final predictions for random forest are typically an unweighted average or an unweighted voting, while boosting uses a weighted voting. 
-              Lastly, random forest is easier to tune, faster to train and harder to overfit, while gradient boosting is harder to tune, slower to train, and easier to overfit.
So, with that why would you go with gradient boosting? Well, the trade-off is that gradient boosting is typically more powerful and better-performing if tuned properly.

  1. What are the benefits of using gradient boosting?
-              Well, it's one of the most powerful machine learning classifiers out there. It also accepts various types of inputs just like random forest, so it makes it very flexible. It can also be used for classification or regression, and the outputs feature importance which can be super useful. But it's not perfect. Some of the drawbacks are that it takes longer to train because it can't be parallelized, it's more likely to overfit because it obsesses over those ones that it got wrong, and it can get lost pursuing those outliers that don't really represent the overall population.

  1. What are Bias and Variance?
-              The prediction error in machine learning algorithms can be divided into three types-
o             Bias error,
o             Variance error and
o             Irreducible error
-              The irreducible error cannot be reduced whatever algorithm is used. So, we will focus into Bias and variance error.
-              Bias is the assumptions made by the model to make the target function easier to approximate. High bias can cause an algorithm to miss the relevant relations between features and target outputs (under fitting).
-              Variance is the amount that the estimate of the target function will change given different training data. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (over-fitting).

  1. What is Bias Variance trade-off?
-              The bias and variance trade-off is an import aspect of machine learning algorithm. To get an accurate model, an engineer’s goal is to reduce the bias and variance as much as possible. However, it is not feasible in real life. If a learning algorithm has low bias it must be very flexible so the it can fir any data. But if the learning algorithm is too flexible it will fit ever training data set and increase the variance error. So, there should be a trade-off between bias and variance when selecting models of different flexibility or complexity and in selecting appropriate training sets to minimize these sources of error!
  1. Explain the difference between L1 and L2 regularization
-              L2 regularization tends to spread error among all the terms, while L1 is more binary/sparser, with many variables either being assigned a 1 or 0 in weighting.
  1. Difference between KMEAN and KNN(K Nearest Neighbor) algorithms
-              The main difference is Kmean clustering is unsupervised whereas KNN is supervised machine learning algorithm. Which means KNN needs labelled data for prediction but Kmean doesn’t need as it is unsupervised.
-              Kmean is used for clustering problem whereas KNN is a supervised learning algorithm used for classification and regression problem.

  1. What are different Machine Learning techniques?
-              The different type of machine learning algorithms are-
o   Supervised Machine Learning Algorithms,
o   Unsupervised Machine Learning ALgoritms,
o   Semi-Supervised Machine Learning Algorithms,
o   Re-inforcement Machine Learning algorithms
  1. Difference Between Supervised and Unsupervised machine learning algorithms
-              please read my previous post here :Supervised, Un-Supervised, Semi-Supervised machine and Reinforcement Learning algorithms

  1. What are most commonly used Machine Learning Algorithms?
-              please read my previous post here:10 Most Commonly Used Machine Learning Algorithms

If you have any other question which I can add to this list, please let me know in the comment section. Any feedback or suggestion is always welcome. Stay tuned for next post. Regards, Mostafiz

Next post:Linear Regression Implementation with python





6 comments: