Posts

Boosting and AdaBoost algorithm.

Image
 Boosting is an ensemble technique that attempts to create a strong classifier from a number of weak classifiers. This is done by building a model from the training data, then creating a second model that attempts to correct the errors from the first model. Models are added until the training set is predicted perfectly or a maximum number of models are added. AdaBoost was the first really successful boosting algorithm developed for binary classification. It is the best starting point for understanding boosting. Modern boosting methods build on AdaBoost, most notably stochastic gradient boosting machines. AdaBoost is used with short decision trees. After the first tree is created, the performance of the tree on each training instance is used to weight how much attention the next tree that is created should pay attention to each training instance. Training data that is hard to predict is given more weight, whereas easy to predict instances are given less weight. Mode...

Random forest algorithm

Image
Random Forest is one of the most popular and most powerful machine learning algorithms. It is a type of ensemble machine learning algorithm called Bootstrap Aggregation or bagging. The bootstrap is a powerful statistical method for estimating a quantity from a data sample. Such as a mean. You take lots of samples of your data, calculate the mean, then average all of your mean values to give you a better estimation of the true mean value. In bagging, the same approach is used, but instead for estimating entire statistical models, most commonly decision trees. Multiple samples of your training data are taken then models are constructed for each data sample. When you need to make a prediction for new data, each model makes a prediction and the predictions are averaged to give a better estimate of the true output value. Random forest is a tweak on this approach where decision trees are created so that rather than selecting optimal split points, suboptimal splits are made by intro...

Sequence prediction

Image
Sequence prediction is different to other types of supervised learning problems. The sequence imposes an order on the observations that must be preserved when training models and making predictions. Some examples include: - weather forecasting - DNA sequence classification - image captioning - language translation The recurrent connections in LSTMs add memory to the network and allow it to learn the ordered nature of observations within input sequences. In a sense, this capability unlocks sequence prediction for neural networks and deep learning. Sequential prediction is a very fast  Pattern Matching  Algorithm. It has linear running time and, if implemented as a  Folded Pattern Matcher , only needs to visit matching entries. During a search, it is able to find all matches along with their match size. Its speed makes it viable for use in a  Virtual Guns  array. This algorithm requires that inputs are discrete and capped, like in  Symbolic Pattern Mat...

Decision tree

Image
A decision tree is a schematic, tree-shaped diagram used to determine a course of action or show a statistical probability. Each branch of the decision tree represents a possible decision, occurrence or reaction. The tree is structured to show how and why one choice may lead to the next, with the use of the branches indicating each option is  mutually exclusive . x Decision Trees are an important type of algorithm for predictive modeling machine learning. The representation of the decision tree model is a binary tree. This is your binary tree from algorithms and data structures, nothing too fancy. Each node represents a single input variable (x) and a split point on that variable (assuming the variable is numeric). A decision tree is faster as compared to linear algorithms because it is generally faster. Trees work with nodes and can be fast as compared to other linear algorithms. The leaf nodes of the tree contain an output variable (y) which is used to make a...

Dear Youtube,

Dear YouTube, Thank you. First of all thank you very much for this open platform, I have learned a lot from this platform and I am sure most of the people in this world did. What changes did it bring and what effects YouTube caused in this modern world is totally praise worthy. But today I want to address a very serious problem worldwide and especially in developing countries. The problem is with YOUTUBE TRENDING. I don’t know how to solve this but I have a sense of feeling that machine learning can do it. I suggest to make thumbnail editing only to verified users and not to others because most of the thumbnails on today’s videos are not even in the video itself, people are being fooled. I am ashamed to even look at the trending topic, that’s how bad it is. Most of the people in my country use YouTube without logging in, especially children and teenagers. Think about the negative impact those vulgar thumbnails make in the mind of these. Secondly, the problem is with Video T...

More on Logistic Regression

Image
 Logistic Regression is a classification algorithm traditionally limited to only two-class classification problems. As per the name suggests it is to quantify the qualitative data into quantitative data. If you have more than two classes then the Linear Discriminant Analysis algorithm is the preferred linear classification technique. The representation of LDA is pretty straight forward. It consists of statistical properties of your data, calculated for each class. For a single input variable this includes: 1.      The mean value for each class. 2.      The variance calculated across all classes. Predictions are made by calculating a discriminate value for each class and making a prediction for the class with the largest value. The technique assumes that the data has a Gaussian distribution (bell curve), so it is a good idea to remove outliers from your data before hand. It's a simple and powerful method for classification p...

Logistic regression

Image
Logistic regression is another technique borrowed by machine learning from the field of statistics. It is the go-to method for binary classification problems (problems with two class values). Logistic regression is like linear regression in that the goal is to find the values for the coefficients that weight each input variable. It is not only the cause variables that are qualitative in nature but also in some cases effect variable may be qualitative. For example, smoking x cigarette per day or for x years may impact on having or not having cancer symptom in person. Logistic model is used for prediction of probability occurrence of an event by fitting data to a logistic curve. Unlike linear regression, the prediction for the output is transformed using a non-linear function called the logistic function. The logistic function looks like a big S and will transform any value into the range 0 to 1. This is useful because we can apply a rule to the output of the logistic funct...