Posts

Showing posts from June, 2017

Dear Youtube,

Dear YouTube, Thank you. First of all thank you very much for this open platform, I have learned a lot from this platform and I am sure most of the people in this world did. What changes did it bring and what effects YouTube caused in this modern world is totally praise worthy. But today I want to address a very serious problem worldwide and especially in developing countries. The problem is with YOUTUBE TRENDING. I don’t know how to solve this but I have a sense of feeling that machine learning can do it. I suggest to make thumbnail editing only to verified users and not to others because most of the thumbnails on today’s videos are not even in the video itself, people are being fooled. I am ashamed to even look at the trending topic, that’s how bad it is. Most of the people in my country use YouTube without logging in, especially children and teenagers. Think about the negative impact those vulgar thumbnails make in the mind of these. Secondly, the problem is with Video T...

More on Logistic Regression

Image
 Logistic Regression is a classification algorithm traditionally limited to only two-class classification problems. As per the name suggests it is to quantify the qualitative data into quantitative data. If you have more than two classes then the Linear Discriminant Analysis algorithm is the preferred linear classification technique. The representation of LDA is pretty straight forward. It consists of statistical properties of your data, calculated for each class. For a single input variable this includes: 1.      The mean value for each class. 2.      The variance calculated across all classes. Predictions are made by calculating a discriminate value for each class and making a prediction for the class with the largest value. The technique assumes that the data has a Gaussian distribution (bell curve), so it is a good idea to remove outliers from your data before hand. It's a simple and powerful method for classification p...

Logistic regression

Image
Logistic regression is another technique borrowed by machine learning from the field of statistics. It is the go-to method for binary classification problems (problems with two class values). Logistic regression is like linear regression in that the goal is to find the values for the coefficients that weight each input variable. It is not only the cause variables that are qualitative in nature but also in some cases effect variable may be qualitative. For example, smoking x cigarette per day or for x years may impact on having or not having cancer symptom in person. Logistic model is used for prediction of probability occurrence of an event by fitting data to a logistic curve. Unlike linear regression, the prediction for the output is transformed using a non-linear function called the logistic function. The logistic function looks like a big S and will transform any value into the range 0 to 1. This is useful because we can apply a rule to the output of the logistic funct...

Linear regression

Image
Linear regression is a linear model with constant slope unlike logistic regression. Linear regression is one of the most well-known and well-understood algorithms in statistics and machine learning. Predictive modeling is primarily concerned with minimizing the error of a model or making the most accurate predictions possible, at the expense of explainability. We borrow, reuse and steal algorithms from many different fields, including statistics and use them towards these ends. The representation of linear regression is an equation that describes a line that best fits the relationship between the input variables (x) and the output variables (y), by finding specific weightings for the input variables called coefficients (B). Linear regression has linear line in the graph. For example: y = B0 + B1 * x We will predict y given the input x and the goal of the linear regression learning algorithm is to find the values for the coefficients B0 and B1. Different techn...

Bias and Variance trade off:

Image
Bias-Variance trade off is better understood with darts. I guess everybody is familiar with darts where you throw a small arrow to the circular board and try to stick it in the center of the circular board or also called the bull’s eye. Now imagine you threw 10 such arrows. There are 4 possibilities: 1.    all 10 arrows are scattered but in the center, which is called low bias high variance 2.    all 10 arrows are together near to each other but away from the center also called high bias low variance. 3.    all 10 arrows are away from the center and scattered well also called high variance high bias. 4.    all 10 arrows are near the center and not scattered more also called low variance and low bias which is what we need. Generally, parametric algorithms have a high bias making them fast to learn and easier to understand but generally less flexible. In turn, they have lower predictive performance on complex problems that fail to meet...