Machine Learning in Depth

Posts

Showing posts from August, 2017

Boosting and AdaBoost algorithm.

August 23, 2017

Boosting is an ensemble technique that attempts to create a strong classifier from a number of weak classifiers. This is done by building a model from the training data, then creating a second model that attempts to correct the errors from the first model. Models are added until the training set is predicted perfectly or a maximum number of models are added. AdaBoost was the first really successful boosting algorithm developed for binary classification. It is the best starting point for understanding boosting. Modern boosting methods build on AdaBoost, most notably stochastic gradient boosting machines. AdaBoost is used with short decision trees. After the first tree is created, the performance of the tree on each training instance is used to weight how much attention the next tree that is created should pay attention to each training instance. Training data that is hard to predict is given more weight, whereas easy to predict instances are given less weight. Mode...

Random forest algorithm

August 09, 2017

Random Forest is one of the most popular and most powerful machine learning algorithms. It is a type of ensemble machine learning algorithm called Bootstrap Aggregation or bagging. The bootstrap is a powerful statistical method for estimating a quantity from a data sample. Such as a mean. You take lots of samples of your data, calculate the mean, then average all of your mean values to give you a better estimation of the true mean value. In bagging, the same approach is used, but instead for estimating entire statistical models, most commonly decision trees. Multiple samples of your training data are taken then models are constructed for each data sample. When you need to make a prediction for new data, each model makes a prediction and the predictions are averaged to give a better estimate of the true output value. Random forest is a tweak on this approach where decision trees are created so that rather than selecting optimal split points, suboptimal splits are made by intro...

Sequence prediction

August 09, 2017

Sequence prediction is different to other types of supervised learning problems. The sequence imposes an order on the observations that must be preserved when training models and making predictions. Some examples include: - weather forecasting - DNA sequence classification - image captioning - language translation The recurrent connections in LSTMs add memory to the network and allow it to learn the ordered nature of observations within input sequences. In a sense, this capability unlocks sequence prediction for neural networks and deep learning. Sequential prediction is a very fast Pattern Matching Algorithm. It has linear running time and, if implemented as a Folded Pattern Matcher , only needs to visit matching entries. During a search, it is able to find all matches along with their match size. Its speed makes it viable for use in a Virtual Guns array. This algorithm requires that inputs are discrete and capped, like in Symbolic Pattern Mat...