Did you know that new drugs are now designed by machine learning models? ML is pervasive in every industry because leading companies can no longer compete without mining and exploiting these rich and expansive new benefits. ML algorithms can recognize patterns in data that people cannot see, and often cannot even understand. Paypal uses ML to identify potential fraud and estimate risk in customer behavior. Uber uses ML to estimate arrival times based on myriad conditions logged in prior trips. Google’s RankBrain uses an intuitive neural net to derive the intent of keywords in a search. The core of many industries is already dependent on ML for viability. What are the most important tools today in the implementation of machine learning? TensorFlow and Caffe represent prolific implementations on the most important websites we use today, and both are open source! ML Algorithms distributed with pre-trained models are in high demand because they are usable right out of the package. Pre-trained models provide example apps and demos for developers to learn and apply to their own apps. In this ByteScout article, we will explore the top 12 deep learning libraries and frameworks distributed with pre-trained models ready to use right now.

Let’s begin by clarifying several key terms. Within the broad scope of artificial intelligence, machine learning is a subset of methods which enables a system to learn a paradigm without explicit programming, without following fixed rules. Within the scope of machine learning, deep learning is a subset of increasingly specialized methods such as the deep belief network, which is currently used in such applications as new drug discovery. There is a lot of new terminology in this rapidly evolving field which we will explain along the way. DBNs, which are classes of deep neural networks, use both supervised and unsupervised training methods.

The algorithms we will explore here use supervised learning methods in which training data,  specified as input features, are labeled pairs which provide correct examples for an inference function to learn from. The function will then classify or predict unlabeled features in future scenarios. This comparison of performance from training to test is used to calculate the probable accuracy of predictions made by the model. Popular new open source deep learning algorithms like TensorFlow include in their distribution a variety of training data for developers. We will look at the most important of these pre-trained models and how they are applied today.


The mathematical heart of deep learning is vector math or matrix operations like the “transpose of a matrix,” and TensorFlow is named with a special vector type in mind: the tensor. Tensors are vectors with special properties. In ordinary programming terms, they have indexed arrays. The parameters of sets of linear equations used to define the feature inputs of a “neural net” are the elements of the tensor. This is important because this math runs fastest on GPUs. Nvidia, the top video gaming chip maker, just announced an extraordinary new GPU which has 21 billion transistors and will increase the speed of TensorFlow and similar deep learning software by up to ninefold because GPUs are optimized for matrix operations.

Another crucial speed component in DL is parallelism. Although TensorFlow does not support the widely used OpenML, it does support CUDA, the Nvidia API which gives developers direct access to the GPU’s virtual instruction set. The independent nodes in a TensorFlow graph are designed to run in parallel, ideally on multiple GPUs. Parallelism in TensorFlow can be farmed out across multiple machines and multiple GPUs to achieve optimal running speed. How do we access TensorFlow?

Scanning through the DL libraries in this review, a salient feature common to nearly all of them is the use of Python as an interface language. Python is easy to read, and as we will see in the example below, the code to invoke methods of these DL libraries is nearly transparent. TensorFlow is a library of DL methods which is written in C++ and Python, and Python is the natural choice for invoking its methods. Let’s have a look at the “hello world” of deep learning. One of the pre-trained models distributed with TensorFlow is the classic MNIST training and test data intended for developing a function to recognize handwritten numbers. After you pip install tensorflow, open a Python editor and enter the following code to get the pre-trained model for MNIST:

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

And the result is the creation of the mnist.train.images tensor, which is an array of dimension  [55000, 784] in which the 55000 images of numerals have been flattened into 784 pixels. As we will see later in the equivalent MXNet version, this flattening does indeed affect the activation values of the function, but this is a good simple way to get started, and it’s useful to compare the results.

Now you can import TensorFlow to begin working with the MNIST training data set in Python (TensorFlow offers a nice Python tutorial for MNIST). In the standard setup, we create placeholders for the inputs. Next, a Softmax regression with weighted sums and biases is implemented. Then we start a training session:

import tensorflow as tf1
x = tf1.placeholder(tf1.float32, [None, 784])
W = tf1.Variable(tf1.zeros([784, 10]))
b = tf1.Variable(tf1.zeros([10]))
y = tf1.nn.softmax(tf1.matmul(x, W) + b)
y_ = tf1.placeholder(tf1.float32, [None, 10])
cross_entropy = tf1.reduce_mean(-tf1.reduce_sum(y_ * tf1.log(y), reduction_indices=[1]))
train_step = tf1.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
sess = tf1.InteractiveSession()
for _ in range(1000):
  batch_xs, batch_ys = mnist.train.next_batch(100)
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

TensorFlow is a complex library of functions which includes computing the accuracy of our model as:

correct_prediction = tf1.equal(tf1.argmax(y,1), tf1.argmax(y_,1))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

Because this is a training set and we are operating an example bequeathed to us by the architects of TF itself, we know in advance that the accuracy of this training session is 92%. If you run the above tutorial on your own machine, then you should get a value of about 92%. We can then expand and refine the methods in this simple demo to improve the accuracy to greater than 99% (with significant refinements to the method shown above).

TensorFlow at full speed is excellent at handwriting recognition in part because it supports a method called recurrent neural networks. RNNs make possible the use of arbitrary input sequences, making possible the processing of handwriting wherein letters are connected. TF also supports a variety of fast math processing like automatic differentiation. This is important because the regression problems in deep learning involve the optimization of a function, and as you know from calculus, optimization is analogous to minimization and maximization of equations. The most basic regression analysis uses a form of optimization called “gradient descent,” which requires billions of differentiation over a training data set to find the minimum value of a cost function in the learning algorithm. For this reason, DL libraries need auto differentiation at the computational core.

Clicking through the features list of TensorFlow it is important to mention that convolutional nets and deep belief nets are also supported. CNNs implement a multilayer perceptron in the processing of objects in images to reduce processing time. CNNs are also ideal for natural language processing.

Talk about Convolution

Looking across the deep learning spectrum, it can be difficult to distinguish progenitor from progeny. Keras and Lasagne were both built on the Theano framework. And an original developer of Theano contributed to TensorFlow. Wolfram looks like an original provider of DL solutions on the surface, but it turns out that Wolfram’s DL framework is actually MXNet. But Apache developed MXNet as an open source DL framework. So in effect, Wolfram scooped up a free framework and now packages it inside the expensive, proprietary Wolfram language. But the company’s adoption of MXNet emphasizes several brilliant features of the Apache framework: MXNet was designed for maximum cross-platform flexibility, and it supports the greatest variety of interface languages of all the frameworks in our ByteScout survey, including Python, C++, R, JavaScript, Julia, Scala, Perl, Matlab (Octave), Go. The only other framework that comes close to this number of interface languages is Microsoft Cognitive Toolkit (formerly known as CNTK).  Finally, MXNet is based on CXXNet, which in turn is based on “mshadow,” described in its GitHub repository as, “Matrix Shadow: Lightweight CPU/GPU Matrix and Tensor Template Library in C++/CUDA for (Deep) Machine Learning.” Getting to the core of these libraries reveals that they contain mix-and-match combinations of each other’s components! A lot of them were also designed by the same handful of people.


All of the DL libraries and frameworks mentioned in the previous paragraph are distributed with pre-trained models, but one of them stands out as superlative, and that is Apache’s MXNet. Although Apache does not have the power to push its influence like TensorFlow’s parent company – current GitHub stats are a small fraction of TF’s popularity –  the methods of this library are undeniably brilliant. Distributed training figured among the primary design goals of MXNet’s architects; it was designed with industrial scale training in mind. And MXNet envisioned prophetic features including the mix of symbolic/imperative programming. This last feature is expressed best in Gluon, MXNet’s high-level interface. MXNet also features mirroring, enabling developers to balance computation time and memory. Now, for the purpose of comparison, let’s write “Hello world” again this time with MXNet. The full tutorial in MXNet’s documentation is remarkably similar to its TensorFlow counterpart, with several exotic terms tossed in to keep us curious:

# Load the MNIST training and test data:
import mxnet as mx1
mnist = mx1.test_utils.get_mnist()
# Initialize an iterator for training data and one for test data:
batch_size = 100
train_iter = mx1.io.NDArrayIter(mnist['train_data'], mnist['train_label'], batch_size, shuffle=True)
val_iter = mx1.io.NDArrayIter(mnist['test_data'], mnist['test_label'], batch_size)
data = mx1.sym.var('data')
# Flatten data from 4-D shape to 2-D (batch_size, num_channel*width*height)
data = mx1.sym.flatten(data=data)

# The next step enables neural networks to classify inputs which are not linearly separable. 
# This example uses the ReLU activation function.
# The first fully-connected layer and activation function:
fc1  = mx1.sym.FullyConnected(data=data, num_hidden=128)
act1 = mx1.sym.Activation(data=fc1, act_type="relu")
# The second fully-connected layer and activation function:
fc2  = mx1.sym.FullyConnected(data=act1, num_hidden = 64)
act2 = mx1.sym.Activation(data=fc2, act_type="relu")

# MNIST’s 10 classes
fc3  = mx1.sym.FullyConnected(data=act2, num_hidden=10)
# Softmax cross entropy loss
mlp  = mx1.sym.SoftmaxOutput(data=fc3, name='softmax')

# Perform 10 passes through entire training data set:
import logging
logging.getLogger().setLevel(logging.DEBUG)  # logging to stdout
# create a trainable module on CPU
mlp_model = mx1.mod.Module(symbol=mlp, context=mx1.cpu())
mlp_model.fit(train_iter,  # train data
              eval_data=val_iter,  # validation data
              optimizer='sgd',  # use SGD to train
              optimizer_params={'learning_rate':0.1},  # use fixed learning rate
              eval_metric='acc',  # report accuracy during training
              batch_end_callback = mx1.callback.Speedometer

# We can now use the training model to make predictions on the test data set:
test_iter = mx1.io.NDArrayIter(mnist['test_data'], None, batch_size)
prob = mlp_model.predict(test_iter)
assert prob.shape == (10000, 10)
# Print the results:
test_iter = mx1.io.NDArrayIter(mnist['test_data'], mnist['test_label'], batch_size)
# predict accuracy of mlp
acc = mx1.metric.Accuracy()
mlp_model.score(test_iter, acc)
assert acc.get()[1] > 0.96

If you are running the script along with us at this point, you should get an accuracy of 96%. The MXNet doc goes on to demo implementation of the convolutional net which improves this accuracy, going beyond the TF presentation of a trained model for didactic purposes. MXNet developers are encouraged to use available GPUs to accelerate computation speed.

Although we feature here the deep learning libraries and frameworks distributed with pre-trained models, keep in mind that by now all of the popular DL libraries have a multitude of community members contributing their own trained models and data sets on GitHub. Even Wolfram developers have gone off the reservation and produced sample implementations in their proprietary language to support the onslaught of interest in DL.

GitHub Top ……..

The growth of deep learning categories on GitHub is astonishing. The most popular GitHub repositories related to deep learning are a measure of what is trending in the field. TensorFlow is the clear leader and ranks higher by a factor of five than Keras and Caffe in the number two and three positions. Theano, Pytorch, Sonnet, and MXNet are barely clinging to positive rankings on GitHub. The remainder actually has negative overall rankings.

Who is Using DL?

The largest retailer in the world is using deep learning to calculate customer satisfaction. The stores have Emotion AI software collecting imagery from cash register cameras which will provide automated customer sentiment feedback to management. LPM reported previously that the same retailer used face recognition software with store cameras to detect shoplifters with prior convictions inside test stores. The program would identify a prior offender on store premises and send a photo to store security staff in the retail area to assist in locating and identifying the shopper at risk. The company later stated that the program was canceled. Inevitably such programs will become commonplace, but likely they will be concealed inside thick wrappers to avoid public perceptions of an Orwellian big brother.

On a smaller scale, there are hundreds of websites with DL widgets including trained models like VGG16. You can drop a picture of a bird on VGG16 and it will likely tell you the species of bird in the picture. That’s impressive when it works and weird when it is completely off. It’s a wild world of new words and unfamiliar objects. Garbage cans and recycle bins in Quebec now contain sensors which detect patterns in the trash! A swarm of confused students suddenly challenged to write their own neural nets are desperately posting their assignments to freelance websites, ready to pay top dollar for help. For good reason: the math of deep learning is like the linear regression on steroids. Developers claim that unpredictable outcomes are not necessarily bugs. Very often a correct result looks like a bug because humans can’t see what DL sees!

The top enterprises Amazon and Microsoft partnered to offer the Gluon interface which supports the deep learning library MXNet featured in the mini tutorial above. Gluon will eventually support the Microsoft Cognitive Toolkit as well. The primary goal of Gluon is to provide a framework for building deep learning into next-generation cloud applications. But so many new SaaSs have appeared under the DL sun that it seems like the next generation has already arrived! Stay tuned to ByeScout, where we will keep you informed of the latest technologies and development methods around deep learning.