"Computers can do only what they are told to do"
How many times have you heard the statement above? It basically means that computers can only do what programmers know how to do and can explain to the computer. However, surprisingly this claim is NOT true.
News Flash: Computers can now perform tasks that programmers cannot explain to them. Computers can solve tasks that the programmers cannot even begin to comprehend.
Paradox?
The computer program "learns".
There is a story on one Arthur Samuel who in the 1950's created a checkers game that he could play by himself. But this got boring, so he decided to have the computer "learn" play checkers. While at first the computer was bad at making the moves and the process of evaluating the cost and benefit of each move was slow. Soon Samuel thought he should have another computer also complete the evaluation process. This iterative automated evaluation process made the computer a better checkers player than its programmer.
Here the programmer is not JUST telling the computer what to do, here the programmer is telling the computer to develop a capability.
So what all do you need to be successfully using machine learning and python? Well, you need a little bit of math. But don't worry. Some machine learners will make you believe you have to understand complex equations in math. Well, that is not the case. If you remember the four important things, you should be able to do the rest easily. So here are the four areas that you should focus on:
1) Conceptual simplification of equations (some algebra).
2) A few concepts related to randomness and chance (probability)
3) Graphing data on a grid (geometry)
4) A compact notation to express some arithmetic (symbols)
Let me just introduce to some of the names of machine learning methods you may heard of
OK...so let's take a sneak peek at the potential examples of machine learning applications for cardiovascular disease:
Develop and heart disease within ten years: yes/no (random forest prediction models)
Develop X-level severity heart disease (None, Grade I, II, or III)
Show some level of a specific indicator for heart disease within ten years: % of coronary artery blockage
Python Modules to focus on:
itertools, collections, and functools
Python number crunching tools:
numpy, pandas, matplotlib, seaborn
sklearn (scikit-learn)-implements many different learning algorithms and evaluations strategies and gives a uniform interface.
We cannot leave the discussion without a brief rundown of basic statistics: Especially the--
Normal Distribution Curve: Error functions, binomial approximations (used to generate normal shapes), and in Central limit theorem. We can summarize the properties of the Central Limit Theorem for sample means with the following statements:
1. Sampling is a form of any distribution with mean and standard deviation.
2. Provided that n is large (n ≥ 30), as a rule of thumb), the sampling distribution of the sample mean will be approximately normally distributed with a mean and a standard deviation is equal to n/sqrrt σ
3. If the sampling distribution is normal, the sampling distribution of the sample means will be an exact normal distribution for any sample size.
1. Sampling is a form of any distribution with mean and standard deviation.
2. Provided that n is large (n ≥ 30), as a rule of thumb), the sampling distribution of the sample mean will be approximately normally distributed with a mean and a standard deviation is equal to n/sqrrt σ
3. If the sampling distribution is normal, the sampling distribution of the sample means will be an exact normal distribution for any sample size.
If you want more information, you can visit talks by Raymond Hettinger--a core python developer
The main objective is to build and evaluate learning systems. How do you evaluate these systems?
First Step? Classification
Let me get you used to the jargon:
So, if there are two target classes, then this is considered BINARY CLASSIFICATION
Yes vs. No
True vs. False
Red vs. Black
What about more than 2 target classes?
Then this is considered a multiclass problem.
Already you see that what is called variables and categories in statistical science is now called binary classifiers in machine learning.
When practicing sklearn, it may be a good idea to practice the sklearn dataset (called Fisher's Iris Dataset) derived from the renowned Statistician, Sir Ronald Fisher. Edgar Anderson took this data to show from the secondary data how length of sepals and petals predicted the flower or iris.
With the advent of standardized testing, even in many placement or entrance exams, whole educational institutions are built on passing the test. However, this approach leads to students not learning what to do for problems that they have never seen before. We want to students doing well in the real world. So instead, as instructors, we should have students practice problems and go over why they missed what they did.
Let's avoid teaching students so that they can only pass the test.
Instead, we can harness the power of train_test_split segments and is a tool from the sklearn arsenal
1. A portion of the data with be used to study and build understanding (training)
2. A portion of the data that will be used to test (testing)
Another important classification is target class (outcome variable) versus feature class (explanatory variables usually).
The K-NN classification model and the Naive Bayes classifier are major of supervised machine learning. This was a model first proposed by Evelyn Fix and Joseph Hodges in 1951. The important thing to note is that as you replace k in k-NN, this is a completely separate model. k is considered a hyperparameter-something used to control the machine learning process (cannot be trained or manipulated).
Also, each row is typically called an example, observation, or data point, while each column (not including the label/dependent variable) is often called a predictor, dimension, independent variable, or feature.
An unsupervised machine learning algorithm makes use of input data without any labels —in other words, no teacher (label) telling the child (computer) when it is right or when it has made a mistake so that it can self-correct.
Unlike supervised learning that tries to learn a function that will allow us to make predictions given some new unlabeled data, unsupervised learning tries to learn the basic structure of the data to give us more insight into the data.
The KNN algorithm hinges on this assumption being true enough for the algorithm to be useful. KNN captures the idea of similarity (sometimes called distance, proximity, or closeness) with some mathematics — calculating the distance between points (Euclidean versus Manhattan) on a graph. This can then be extrapolated to geospatial learning.
To select the K that’s right for your data, we run the KNN algorithm several times with different values of K and choose the K that reduces the number of errors we encounter while maintaining the algorithm’s ability to accurately make predictions when it’s given data it hasn’t seen before.
Here are some things to keep in mind:
As we decrease the value of K to 1, our predictions become less stable. Just think for a minute, if you select k=1 neighbor, and that one neighbor happens to be a doctor, but the rest of you neighbors happen to be a lawyer, because of the way k=1 is designated, only doctors are considered to be more dominant.
Inversely, as we increase the value of K, our predictions become more stable due to majority averaging (more lawyers), and thus, more likely to make more accurate predictions (up to a certain point). Eventually, we begin to witness an increasing number of errors. It is at this point we know we have pushed the value of K too far.
In cases where we are taking a majority vote (e.g. picking the mode in a classification problem) among labels, we usually make K an odd number to have a tiebreaker.
Advantages
The algorithm is simple and easy to implement.
There’s no need to build a model, tune several parameters, or make additional assumptions.
The algorithm is versatile. It can be used for classification, regression, and search (as we will see in the next section).
Disadvantages
The algorithm gets significantly slower as the number of examples and/or predictors/independent variables increase
Naive Bayes Model/Classifier
Think about 2 casino tables--one rigged & one regular table
The rigged table yields unfair results for both the card and the dice. However, the observation from the card and the observation from the dice is independent. So, in summary, the dice are conditionally independent given the table.
Main ideas of Naive Bayes:
-Treats the features as if they are conditionally independent of each other given the class
-This independence of probabilities is known as multiplication
The likelihood of features for a given class can be calculated from training data. From the training data, we store the probabilities of seeing particular features within each target class. For testing, we look up probabilities of feature values associated with a potential target class. For testing, we look up probabilities of feature values associated with a potential target class and multiply them together along with the overall class probability. We do that for each possible class. Then one can choose the class with the highest overall probability.
Uses:
Text classification-words in a sentence depend on each other and on their order. We don't pick words at random, we intentionally put the right words together, in the right order, to communicate ideas.
-