Technical: #ReinforcementLearning

Hello Folks, if you read my previous three posts on Artificial Intelligence (AI), then congratulations you have the basic knowledge about the Machine Learning algorithms if not please read them. Today I would like to discuss about some most commonly used interview question on the field of Machine Learning and AI. Which would help you crack your interviews in machine Learning. Most of the basic things are already covered, remaining we will learn here.

Let’s get started

What is Gradient Decent?

- Gradient decent is an optimization algorithm which minimizes any given function. Given a function Gradient decent starts with an initial set of parameters and iteratively move to the set of parameters which provides minimum for that particular function. It is little difficult to visualize; I will try to give an example with figures for better understanding.
- In the above figure the blue dots are actual house prices(y_Actual) corroding to the house size, green line is the predicted house price(y_Prediction) and yellow dotted lines are prediction errors (prediction error= y_Prediction - y_Actual). So, the aim is to improve the prediction by minimizing the prediction error (y_Predict - y_Actual). Gradient decent is the algorithm which is used to minimize the prediction error and optimize the function.

What are the differences between Random forest and Gradient boosting? Or explain the difference between bagging and boosting algorithms.

The difference between Random Forest and Gradient boosting is as follows-

- Randam forest uses bagging and samples randomly, whereas gradient boosting uses bagging, boosting samples with an increased weight on the ones that it got wrong previously

- Because all the trees in random forest are built without any consideration for any of the other trees, this is incredibly easy to parallelize, which means that it can train really quick. Whereas gradient boosting is iterative in that it relies on the results of the tree before it, in order to apply a higher weight to the ones that the previous tree got incorrect. So, boosting can't be parallelized, and it takes much longer to train.

- The final predictions for random forest are typically an unweighted average or an unweighted voting, while boosting uses a weighted voting.

- Lastly, random forest is easier to tune, faster to train and harder to overfit, while gradient boosting is harder to tune, slower to train, and easier to overfit.

So, with that why would you go with gradient boosting? Well, the trade-off is that gradient boosting is typically more powerful and better-performing if tuned properly.

What are the benefits of using gradient boosting?

- Well, it's one of the most powerful machine learning classifiers out there. It also accepts various types of inputs just like random forest, so it makes it very flexible. It can also be used for classification or regression, and the outputs feature importance which can be super useful. But it's not perfect. Some of the drawbacks are that it takes longer to train because it can't be parallelized, it's more likely to overfit because it obsesses over those ones that it got wrong, and it can get lost pursuing those outliers that don't really represent the overall population.

What are Bias and Variance?

- The prediction error in machine learning algorithms can be divided into three types-

o Bias error,

o Variance error and

o Irreducible error

- The irreducible error cannot be reduced whatever algorithm is used. So, we will focus into Bias and variance error.

- Bias is the assumptions made by the model to make the target function easier to approximate. High bias can cause an algorithm to miss the relevant relations between features and target outputs (under fitting).

- Variance is the amount that the estimate of the target function will change given different training data. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (over-fitting).

What is Bias Variance trade-off?

- The bias and variance trade-off is an import aspect of machine learning algorithm. To get an accurate model, an engineer’s goal is to reduce the bias and variance as much as possible. However, it is not feasible in real life. If a learning algorithm has low bias it must be very flexible so the it can fir any data. But if the learning algorithm is too flexible it will fit ever training data set and increase the variance error. So, there should be a trade-off between bias and variance when selecting models of different flexibility or complexity and in selecting appropriate training sets to minimize these sources of error!

Explain the difference between L1 and L2 regularization

- L2 regularization tends to spread error among all the terms, while L1 is more binary/sparser, with many variables either being assigned a 1 or 0 in weighting.

Difference between KMEAN and KNN(K Nearest Neighbor) algorithms

- The main difference is Kmean clustering is unsupervised whereas KNN is supervised machine learning algorithm. Which means KNN needs labelled data for prediction but Kmean doesn’t need as it is unsupervised.

- Kmean is used for clustering problem whereas KNN is a supervised learning algorithm used for classification and regression problem.

What are different Machine Learning techniques?

- The different type of machine learning algorithms are-

o Supervised Machine Learning Algorithms,

o Unsupervised Machine Learning ALgoritms,

o Semi-Supervised Machine Learning Algorithms,

o Re-inforcement Machine Learning algorithms

For details please read my previous post here:Supervised, Un-Supervised, Semi-Supervised machine and Reinforcement Learning algorithms

Difference Between Supervised and Unsupervised machine learning algorithms

- please read my previous post here :Supervised, Un-Supervised, Semi-Supervised machine and Reinforcement Learning algorithms

What are most commonly used Machine Learning Algorithms?

- please read my previous post here:10 Most Commonly Used Machine Learning Algorithms

If you have any other question which I can add to this list, please let me know in the comment section. Any feedback or suggestion is always welcome. Stay tuned for next post. Regards, Mostafiz

Next post:Linear Regression Implementation with python

Congratulation!!! Now you know what Artificial Intelligence and Machines Learning is. Now we can go little deeper and learn about different Machine Learning algorithms.

The most important question which comes to a beginner mind is “which algorithm should I use?” The answer to the question varies depending on many factors, including: The size, quality, and nature of data; The available computational time; The urgency of the task; and What you want to do with the data. Even an experienced data scientist cannot tell which algorithm will perform the best before trying different algorithms.

Before going into the algorithms, first we will see what Supervised, Un-Supervised, Semi-Supervised machine and Reinforcement Learning algorithms are.

What is Supervised Machine Learning?

- In Supervised Machine learning, the Machine is given a set of data which already knows how the output should look and have an idea about the relation between the input and out. Supervised learning problems are also categorized as “regression” and “classification” problems. In regression problems machine predict a numeric or continuous variable output where as in classification problems the predicted output is discrete. For example, if the machine is given a dataset of house prices with respect to house size, it can predict an unknown house price. Whereas if some image are labelled as dogs and cats, the machine can learn the relation between them and classify and separate some image as dog or cat. Below image may give you a better understanding-

What is Un-Supervised Machine Learning?

- Un-Supervised allows the machine to approach a problem with minimum or no idea about how the output will look like. It can drive structures and relations from the given dataset and can find hidden patterns or grouping information from the data. It is mainly used for clustering, dimensionality reduction, feature learning, density estimation, etc. Example- KMean Clustering.

What is Semi-Supervised Machine Learning?

- Semi-supervised machine learning algorithms fall somewhere in between supervised and unsupervised learning, since they use both labeled and unlabeled data for training – typically a small amount of labeled data and a large amount of unlabeled data. The systems that use this method are able to considerably improve learning accuracy. Usually, semi-supervised learning is chosen when the acquired labeled data requires skilled and relevant resources in order to train it / learn from it. Otherwise, acquiring unlabeled data generally doesn’t require additional resources. Example: speech recognition.

What is Reinforcement Learning?

- Reinforcement learning algorithms is a learning method that interacts with its environment by producing actions and discovers errors or rewards. Trial and error search and delayed reward are the most relevant characteristics of reinforcement learning. This method allows machines and software agents to automatically determine the ideal behaviour within a specific context in order to maximize its performance. It is employed by various software and machines to find the best possible behaviour or path it should take in a specific situation.

Okay, all set, we are now ready to learn the most popular machine learning algorithms, stay tuned for that. Please comment below for any suggestion and feedback.

Next post 10 Most Commonly Used Machine Learning Algorithms

Technical

Saturday, April 13, 2019

Top 10 interview questions in Machine Learning(ML) and Artificial Intelligence (AI)

Tuesday, March 12, 2019

Supervised, Un-Supervised, Semi-Supervised machine and Reinforcement Learning algorithms