Popular technology writers inaccurately conflate machine learning and artificial intelligence. Machine learning code is certainly a crucial subset of artificial intelligence code but we cannot count machine learning alone as artificial intelligence. AI is a much broader field that involves expression beyond simply recognizing and mimicking human speech and identifying a person in a photo which really is the current state of ML.
Deep learning projects are increasingly specialized techniques of ML which often combine two or more techniques in one method, such as random forests; this increased sophistication can be easily mistaken for intelligence. So, if we are to see an example of deep learning demystified then we must certainly examine the actual code involved. This we shall do later. For now, let’s explore some of the flagrant myths and realities at hand.
We need to put a filter on the hype that ML is AI, and we need to wrangle the misconceptions in the same way a data scientist wrangles data features, choosing the useful from the ambiguous. In this ByteScout article, we are going to technically evaluate the realistic limitations of ML. Right now, we have machine learning algorithms that can recognize patterns in enormous datasets. But developers often strain themselves to mung or wrangle datasets into a meaningful and useable form in order for current algorithms to work.
Reality: Machine Learning is a Small Subset of AI
Applied machine learning is a straightforward mathematical endeavor and there is no magic involved. Furthermore, ML is not intelligence. In a 1959 precedent, Arthur Samuel stated that machine learning code imparts the ability for a computer to “learn without being explicitly programmed.” As we will see in this ByteScout article, Machine learning code today requires an enormous amount of explicit coding and a mountain of training data. A program that can set a goal, search for data on its own, wrangle the data, find patterns, make inferences, reach conclusions, and take actions based on those conclusions is a fantasy that may or may not be realized in the future. For now, we will set aside the most popular misconceptions and look at the deep reality of deep learning.
One common misconception occurs when the output from a nonlinear ML algorithm supposedly shows insight which developers cannot understand. When the pattern turns out to be meaningful there is a eureka reaction and people claim that the algorithm is doing something beyond what it was actually programmed to do. The “hidden layers” in a neural network, it is claimed, are doing something magical, something intelligent. This is not even wrong; it’s misconception compounded by misunderstanding.
Reality: Hidden Layers are NOT Even Hidden
Hidden layers are explicitly programmed like all other aspects of the current ML methodology. Hidden layers in a neural network are not hidden at all. They are only called hidden because there are often too many parameters in the equations to be practically visualized by the human brain. But this is like calculations in 4-dimensional or n-dimensional space: the calculations prepared by human beings contain the intelligence, but the machines that execute those calculations are doing exactly as they are instructed. If there is a surprising result it may be caused by a bug, or a happy accident on the part of a coder casting about for a new method. The former exponentially outweighs the latter. What we find in machine learning research today are explicit programming methods such as:
Even a cursory study of these methods reveals that there is no magic beneath the surface. In fact, there is only mathematics. Here we have a case of machine learning demystified.
Later in this feature, we will explore the random forest method in detail with an example. Data Scientists who use these methods in reality experience far more incongruous results than meaningful ones because of choosing data features incorrectly. We read fantastic stories of breakthroughs because they are more interesting than persistent bugs or data which cannot be wrangled. Why is this so poorly understood?
Reality: Developers Painstakingly Choose Features
Machine learning code does not even know what your data is about! It does not know nor does it matter to the computations whether you feed it blood serum data or ethnicity data. Many writers have made a quantum leap and now conflate “deep artificial neural networks with multiple hidden layers” and artificial intelligence. This sort of dramatic language beguiles bystanders, but it also contains little or no reality! A “Neural network” is only a very loose analogy to the brain, but the analogy causes unfortunate confusion.
Blogs and chat about current AI widely exaggerate the capability of AI today. This appears to be an attempt to dramatize state-of-the-art AI, but it is really nothing more than a machine learning mystery. ML is good at pattern matching but it is not even close to true AI. The inference is perhaps the most important deficit of current AI Inference arises from many faculties centralized by the human amygdala.
Computers have nothing of the kind, nor are their “neurons” affected by judgment as arises from the frontal lobe of the human brain. A straightforward study of both the central nervous system and artificial neural networks lead inexorably to ML demystified!
Reality: Not Even Close
If you read carefully between the lines of Big Data stories, you will find people like Andrew Ng stating soberly that AI now is mostly old regression techniques repackaged with faster computers and larger datasets. The exaggeration that this is AI effectively deflates the ambition we might have about future developments in AI and lowers the standard of expectation: consider how quickly a conversational agent today fails to understand a remark involving humor, wordplay, or inference. We should hope for a lot more sophistication in the future!
Provided that you use a narrow list of words, stay in the context the program is coded for, don’t use wordplay or humor, then you can get basic information from a conversational agent. But this is not intelligence. There is no creativity, no inference, no emotion – and yes, like it or not emotion is a huge part of human intelligence. Because emotional priorities influence reasoning.
Reality: CAs are Nowhere Near a Meaningful Conversation
A useful way to compare the limitations of machine learning with intelligence is a thought experiment about the “hello world” of machine learning code. We can train TensorFlow to recognize handwritten numbers to nearly 100% accuracy, but that program has no idea how to write the numbers it can recognize. To understand how TensorFlow would potentially write a number 5 have a look at Figure 1. in the list below. In fact, this is a completely different proposition which illustrates that handwriting is like a fingerprint where the uniqueness of people is concerned; the machine learning code can distinguish a 7 from a 1, but it has nothing of its own to express. Recognition and expression are totally different processes. To see a demonstration in Python of how TensorFlow impressively recognizes handwritten numbers, please have a look at our tutorial on machine learning in Python. Here you can see what ML can realistically accomplish.
Let’s go ahead and outline some of
This graph shows the pixel activations as TensorFlow represents a number 5 in its convolutional neural network in the MNIST hello world demo. (1)
The comparison of the parameters in a regression model to human neurons arises partly from Hebbian theory. This is a generalized concept in which neurons adapt to a regular stimulus through repeated activation. But this is too vague to explain how humans learn. It resembles old voltage measurements like evoked reaction potentials; the theory is inadequate to explain why a comedian’s nonsequitur causes a riot of laughter. In order to fully realize how far we are from true AI today, let’s look at a list of the ordinary components of intelligence which algorithms cannot imitate:
Now that we have fully surveyed the state of the art in prosaic terms, let’s look at a machine learning model commonly used by developers today in order to have a concrete example of the value and application in the field. In this presentation, we will use the random forest model to predict future values about the spending behavior of college students. This random forest method is exciting because it provides the best accuracy among ML models. The method evolved as CART, or Classification and Regression (decision) Tree.
The goal of this project is to predict how spending behavior changes across several features of data including education level and gender combined. The first step is to import the required Python libraries and read in the data file. Additional explanatory comments are contained in the code sample below. Please following along with our example by pasting the code into your favorite Python IDE. After the code sample we will wrap up with the interpretation of results:
# ML example using Random Forest method is based on Classification # and Regression Tree or the CART decision tree algorithm. # Load required libraries (We explain each later on...) from sklearn.metrics import confusion_matrix from sklearn.cross_validation import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import roc_auc_score import pandas as pd # Read the data file into a Pandas DataFrame # Change the path to your local file df= pd.read_csv('C:\Users\Mark\Desktop\hospitals\example_data.csv') # Get familiar with the stats. This awesome Pandas method # auto calcs a lot of averages and other stats from the Frame print df.describe() # Simple wrangling to clean up this data # Here we will delete unused column 3 # The target data col shows the result 1 # when student reduced spending last month... df.dtypes.index df.drop('Unnamed: 3', axis=1, inplace=True) # Change 0,1 to A,B values for convenience df.ReducSpend.replace([0, 1], ['A', 'B'], inplace=True) # Have a look at the "wrangled" dataset with this Pandas # method to print the labels and first 5 data rows print df.head() # Count Target Variable Values df.ReducSpend.value_counts() # Find % Values of Target Variable Levels # round(df.ReducSpend.value_counts()*100/len(df.axes[0]),2) # Next we divide the data set of 476 samples into # a training set and a testing set... Train,Test = train_test_split(df, test_size = 0.3, random_state = 170) # Have a look at the training set... # Train.head() # Here we divide the CSV file data into two parts # 1. a target array, and 2. afeature set # Keep Target and Independent Variable into different array Train_IndepVars = Train.values[:, 3:5] Train_TargetVar = Train.values[:,5] # Now the arrays are set up to use the SKLearn Random Forest Model # which is RandomForestClassifier rf_model = RandomForestClassifier(max_depth=10,n_estimators=10) rf_model.fit(Train_IndepVars,Train_TargetVar) # Score the Random Forest Model's accuracy using the test set # and SKLearn's Predict method. To keep it simple in this case # just use the same data from the CSV file... predictions = rf_model.predict(Train_IndepVars) print predictions # Now print a matrix to calc whether this model is fit for # general predictions: print("Confusion matrix ", confusion_matrix(Train_TargetVar, predictions))
As shown, many of the methods for measuring the accuracy are predefined methods of the SKLearn Python library. For a more in-depth discussion of machine learning techniques please have a look at our tutorial on the mechanics of data mining.
Applied machine learning code requires a veritable mountain of training data. Moreover, that data must be munged or wrangled, so that the machine learning code can use it. Machine learning code cannot know if a data feature is chosen incorrectly by the human programmer; it will go ahead and try to use the street number in your address to predict your chance of diabetes if the programmer does not catch the error. In fact, in my previous tutorial, I showed how SKLearn datasets provided for pre-trained models may not contain headers on the data columns! I noticed that in one research paper it would have been very easy to accidentally reverse the order of a blood serum test for diabetes. The computer does not know the difference! In fact, the computer does not even know that our data features are related to diabetes.
Right now machine learning algorithms can actually test software programs for failures through a combination of collecting visual data by watching an expert operator and then detecting changes when a variation occurs. However, automation testing cannot autonomously decide not to deploy the code changes which caused the error. The program still needs a human operator to reach the conclusion that the anticipated input and the actual input differ in a way that causes an actual failure of the system.
Similarly, new autopilot vehicles use a mountain of training data to analyze incoming data from cameras and GPS and thus calculate appropriate speed, direction, and decide how to avoid accidents
Face recognition and natural language processing show results that resemble intelligence, but their methods are straightforward mathematics, most of which are derived from such concepts as linear regression. Many writers now make the quantum leap of describing machine learning methods like artificial intelligence. Whether because they don’t understand the true nature of ML or because they intend to make their subject more dramatic and interesting, this inaccuracy only increases misconception in an area that is already quite difficult to understand. In this ByteScout article, machine learning demystified is equally.
Andrew Ng leads foundational courses in machine learning in which he explains in rigorous details the mathematics of the most important fundamentals such as logistic regression, gradient descent, and the use of a matrix transpose to minimize a cost function using on a single step in the Octave programming language. His courses rectify misconceptions about ML by working from fundamentals up to libraries, whereas many popular MOOCs go straight to libraries like TensorFlow without covering the mechanics in-depth; this leaves students with a void of assumptions because a TensorFlow and other ML frameworks like Apache MXNet do a lot of rendering beneath the surface which the student cannot tinker with in order to gain a clearer concept. Stay tuned to ByteScout for the latest developments in the newest techniques.