Friday, November 24, 2017

Free Deep Learning Book (MIT Press)

The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine learning in general and deep learning in particular. The online version of the book is now complete and will remain available online for free.
Source for picture: click here
For more information, click here. The book is also available on Amazon, and also here (MIT Press). 


We plan to offer lecture slides accompanying all chapters of this book. We currently offer slides for only some chapters. If you are a course instructor and have your own lecture slides that are relevant, feel free to contact us if you would like to have your slides linked or mirrored from this site.
  1. Introduction
    • Presentation of Chapter 1, based on figures from the book [.key] [.pdf]
    • Video of lecture by Ian and discussion of Chapter 1 at a reading group in San Francisco organized by Alena Kruchkova
  2. Linear Algebra [.key][.pdf]
  3. Probability and Information Theory [.key][.pdf]
  4. Numerical Computation [.key] [.pdf] [youtube]
  5. Machine Learning Basics [.key] [.pdf]
  6. Deep Feedforward Networks [.key] [.pdf]
    • Video (.flv) of a presentation by Ian and a group discussion at a reading group at Google organized by Chintan Kaur.
  7. Regularization for Deep Learning [.pdf] [.key]
  8. Optimization for Training Deep Models
    • Gradient Descent and Structure of Neural Network Cost Functions [.key] [.pdf]
      These slides describe how gradient descent behaves on different kinds of cost function surfaces. Intuition for the structure of the cost function can be built by examining a second-order Taylor series approximation of the cost function. This quadratic function can give rise to issues such as poor conditioning and saddle points. Visualization of neural network cost functions shows how these and some other geometric features of neural network cost functions affect the performance of gradient descent.
    • Tutorial on Optimization for Deep Networks [.key] [.pdf]
      Ian's presentation at the 2016 Re-Work Deep Learning Summit. Covers Google Brain research on optimization, including visualization of neural network cost functions, Net2Net, and batch normalization.
    • Batch Normalization [.key] [.pdf]
    • Video of lecture / discussion: This video covers a presentation by Ian and group discussion on the end of Chapter 8 and entirety of Chapter 9 at a reading group in San Francisco organized by Taro-Shigenori Chiba.
  9. Convolutional Networks
    • Convolutional Networks [.key][.pdf]
      A presentation summarizing Chapter 9, based directly on the textbook itself.
    • Video of lecture / discussion: This video covers a presentation by Ian and group discussion on the end of Chapter 8 and entirety of Chapter 9 at a reading group in San Francisco organized by Taro-Shigenori Chiba.
  10. Sequence Modeling: Recurrent and Recursive Networks
    • Sequence Modeling [.pdf] [.key]
      A presentation summarizing Chapter 10, based directly on the textbook itself.
    • Video of lecture / discussion. This video covers a presentation by Ian and a group discussion of Chapter 10 at a reading group in San Francisco organized by Alena Kruchkova.
  11. Practical Methodology [.key][.pdf] [youtube]
  12. Applications [.key][.pdf]
  13. Linear Factors [.key][.pdf]
  14. Autoencoders [.key][.pdf]
  15. Representation Learning [.key][.pdf]
  16. Structured Probabilistic Models for Deep Learning[.key][.pdf]

Friday, November 17, 2017

Explainable Artificial Intelligence (XAI)

Dramatic success in machine learning has led to a torrent of Artificial Intelligence (AI) applications. Continued advances promise to produce autonomous systems that will perceive, learn, decide, and act on their own. However, the effectiveness of these systems is limited by the machine’s current inability to explain their decisions and actions to human users. The Department of Defense is facing challenges that demand more intelligent, autonomous, and symbiotic systems. Explainable AI—especially explainable machine learning—will be essential if future warfighters are to understand, appropriately trust, and effectively manage an emerging generation of artificially intelligent machine partners.
The Explainable AI (XAI) program aims to create a suite of machine learning techniques that:
  • Produce more explainable models, while maintaining a high level of learning performance (prediction accuracy); and
  • Enable human users to understand, appropriately trust, and effectively manage the emerging generation of artificially intelligent partners.
New machine-learning systems will have the ability to explain their rationale, characterize their strengths and weaknesses, and convey an understanding of how they will behave in the future. The strategy for achieving that goal is to develop new or modified machine-learning techniques that will produce more explainable models. These models will be combined with state-of-the-art human-computer interface techniques capable of translating models into understandable and useful explanation dialogues for the end user. Our strategy is to pursue a variety of techniques in order to generate a portfolio of methods that will provide future developers with a range of design options covering the performance-versus-explainability trade space.
Figure 1: XAI Concept
The XAI program will focus the development of multiple systems on addressing challenges problems in two areas: (1) machine learning problems to classify events of interest in heterogeneous, multimedia data; and (2) machine learning problems to construct decision policies for an autonomous system to perform a variety of simulated missions. These two challenge problem areas were chosen to represent the intersection of two important machine learning approaches (classification and reinforcement learning) and two important operational problem areas for the Department of Defense (intelligence analysis and autonomous systems).
XAI research prototypes will be tested and continually evaluated throughout the course of the program. At the end of the program, the final delivery will be a toolkit library consisting of machine learning and human-computer interface software modules that could be used to develop future explainable AI systems. After the program is complete, these toolkits would be available for further refinement and transition into defense or commercial applications.

Tuesday, November 14, 2017

Demystifying AI, Machine Learning, and Deep Learning

Learn about AI, machine learning, supervised learning, unsupervised learning, classification, decision trees, clustering, deep learning, and algorithms.

Deep learning, machine learning, artificial intelligence — all buzzwords that represent the future of analytics. In this post, we will explain what machine learning and deep learning are at a high level with some real-world examples. In future posts, we will explore vertical use cases. The goal of this is not to turn you into a data scientist but to give you a better understanding of what you can do with machine learning. Machine learning is becoming more accessible to developers, and data scientists work with domain experts, architects, developers, and data engineers, so it is important for everyone to have a good understanding of the possibilities. Every piece of information that your business generates has the potential to add value. This post and future posts are meant to provoke a review of your own data to identify new opportunities.
ML examples

What Is Artificial Intelligence?

Throughout the history of AI, the definition has been continuously redefined. AI is an umbrella term (the idea started in the 50s); machine learning is a subset of AI and deep learning is a subset of ML.
In 1985, when I was a student interning at the NSA, AI was also a very hot topic. At the NSA, I even took an MIT video (VCR) class on AI about expert systems. Expert systems capture an expert's knowledge in a rules engine. Rules engines have a wide use in industries such as finance and healthcare, and more recently for event processing, but when data is changing, rules can become difficult to update and maintain. Machine learning has the advantage that it learns from the data, and it can provide data-driven probabilistic predictions.
According to Ted Dunning, it is better to use precise terminology like machine learning ordeep learning instead of the term "AI" because before we get something to work well, we call it AI; afterward, we always call it something else. AI is better used as a word for the next frontier.

How Has Analytics Changed in the Last 10 Years?

According to Thomas Davenport in the HBR, analytical technology has changed dramatically over the last decade, with more powerful and less expensive distributed computing across commodity servers, streaming analytics, and improved machine learning technologies, enabling companies to store and analyze both far more data and many different types of it.
Traditionally, data was stored on a RAID system, sent to a multi-core server for processing, and sent back for storage, which caused a bottleneck on data transfer and was expensive. With file and table storage like MapR-XD and MapR-DB, data is distributed across a cluster and Hadoop technologies like MapReduce, Pig, and Hive send the computing task to where the data resides.
Technologies like Apache Spark speed up parallel processing of distributed data even more with iterative algorithms by caching data in-memory across iterations and using lighter weight threads.
Apache Spark
MapR Event Streams, a new distributed messaging system for streaming event data at scale, combined with Stream processing like Apache Spark streaming or Apache Flink speed up parallel processing of real-time events with machine learning models.
MapR Streams
Graphical Processing Units (GPUs) have sped up multi-core servers for parallel processing. A GPU has a massively parallel architecture consisting of thousands of smaller, more efficient cores designed for handling multiple tasks simultaneously, whereas a CPU consists of a few cores optimized for sequential serial processing. In terms of potential performance, the evolution from the Cray-1 to today’s clusters with lots of GPU’s is roughly a million times what was once the fastest computer on the planet at a tiny fraction of the cost.

What Is Machine Learning?

Machine learning uses algorithms to find patterns in data and then uses a model that recognizes those patterns to make predictions on new data.
In general, machine learning may be broken down into types: supervised, unsupervised, and in between those two. Supervised learning algorithms use labeled data and unsupervised learning algorithms find patterns in unlabeled data. Semi-supervised learning uses a mixture of labeled and unlabeled data. Reinforcement learning trains algorithms to maximize rewards based on feedback.

Supervised Learning

Supervised algorithms use labeled data in which both the input and target outcome, or label, are provided to the algorithm.
Supervised Learning
Supervised learning is also called predictive modeling or predictive analytics because you build a model that is capable of making predictions. Some examples of predictive modeling are classification and regression. Classification identifies which category an item belongs to (for example whether a transaction is fraud or not fraud) based on labeled examples of known items (for example, transactions known to be fraud or not). Logistic regression predicts a probability — for example, the probability of fraud. Linear regression predicts a numeric value — for example, the amount of fraud.
Some examples of classification include:
  • Credit card fraud detection (fraud, not fraud).
  • Credit card application (good credit, bad credit).
  • Email spam detection (spam, not spam).
  • Text sentiment analysis (happy, not happy).
  • Predicting patient risk (high-risk patient, low-risk patient).
  • Classifying a tumor as malignant or not.
Some examples of logistic regression (or other algorithms) include:
  • Given historical car insurance fraudulent claims and features of the claims such as the age of the claimant, the claimed amount, and the severity of the accident, predict the probability of fraud.
  • Given patient characteristics, predict the probability of congestive heart failure.
Some examples of linear regression include:
  • Given historical car insurance fraudulent claims and features of the claims such as the age of the claimant, the claimed amount, and the severity of the accident, predict the amount of fraud.
  • Given historical real estate sales prices and features of houses (i.e. square feet, number of bedrooms, location), predict a house’s price.
  • Given historical neighborhood crime statistics, predict crime rate.
There are other supervised and unsupervised learning algorithms shown below, which we won’t go over, but we will look at one example of each in more detail.

Classification Example: Debit Card Fraud

Classification takes a set of data with known labels and pre-determined features and learns how to label new records based on that information. Features are the “if” questions that you ask. The label is the answer to those questions.
Let’s go through an example of debit card fraud.
  • What are we trying to predict?
    • Whether a debit card transaction is fraud.
    • Fraud is the label (true or false).
  • What are the “if” questions or properties that you can use to make predictions?
    • Is the amount spent today > historical average?
    • Are there transactions in multiple countries today?
    • Are the number of transactions today > historical average?
    • Are the number of new merchant types today high compared to the last three months?
    • Are there multiple purchases today from merchants with a category code of risk?
    • Is there unusual signing activity today compared to using historically using PIN?
    • Are there new state purchases compared to the last three months?
    • Are there foreign purchases today compared to the last three months?
To build a classifier model, you extract the features of interest that most contribute to the classification.

Decision Trees

Decision trees create a model that predicts the class or label based on several input features. Decision trees work by evaluating a question containing a feature at every node and selecting a branch to the next node based on the answer. A possible decision tree for predicting debit card fraud is shown below. The feature questions are the nodes, and the answers “yes” or “no” are the branches in the tree to the child nodes. (Note that a real tree would have more nodes.)
  • Q1: Is the amount spent in 24 hours > average?
    • Yes
  • Q2: Are there multiple purchases today from risky merchants?
    • Yes fraud = 90%
    • Not fraud = 50%
Decision Tree
Decision trees are popular because they are easy to visualize and explain. The accuracy of models can be improved by combining algorithms with ensemble methods. An ensemble example is a random forest, which combines multiple random subsets of decision trees.

Unsupervised Learning

Unsupervised learning, also sometimes called descriptive analytics, does not have labeled data provided in advance. These algorithms discover similarities or regularities in the input data. An example of unsupervised learning is grouping similar customers based on purchase data.
Unsupervised Learning


In clustering, an algorithm classifies inputs into categories by analyzing similarities between input examples. Some clustering use cases include:
  • Search results grouping.
  • Grouping similar customers.
  • Grouping similar patients.
  • Text categorization.
  • Network security anomaly detection (finds what is not similar, the outliers from clusters).
The K-means algorithm groups observations into K clusters in which each observation belongs to the cluster with the nearest mean from its cluster center.
An example of clustering is a company that wants to segment its customers in order to better tailor products and offerings. Customers could be grouped on features such as demographics and purchase histories. Clustering with unsupervised learning is often combined with supervised learning in order to get more valuable results. For example, in this banking customer 360 use case, customers were first segmented based on answers to a survey. The customer groups were analyzed and labeled with customer personas. These labels were then linked by customer ID with features such as types of accounts and purchases. Finally, supervised machine learning was applied and tested with the labeled customers, allowing to link the survey customer personas with their banking actions and provide insights.

Deep Learning

Deep learning is the name for multilayered neural networks, which are networks composed of several “hidden layers” of nodes between the input and output. There are many variations of neural networks, which you can learn more about on this neural network cheat sheet. Improved algorithms, GPUs, and massively parallel processing (MPP) have given rise to networks with thousands of layers. Each node takes input data and a weight and outputs a confidence score to the nodes in the next layer until the output layer is reached where the error of the score is calculated. With backpropagation inside of a process called gradient descent, the errors are sent back through the network again and the weights are adjusted improving the model. This process is repeated thousands of times, adjusting a model’s weights in response to the error it produces until the error can’t be reduced anymore.
During this process, the layers learn the optimal features for the model, which has the advantage that features do not need to be predetermined. However, this has the disadvantage that the model’s decisions are not explainable. Because explaining the decisions can be important, researchers are developing new ways to understand the black box of deep learning.
There are different variations of Deep Learning Algorithms, which can be used with the Distributed Deep Learning Quick Start Solution from MapR to build data-driven applications such as the following:
Deep Learning QSS
  • Deep neural networks for improved traditional algorithms.
    • Finance: Enhanced fraud detection through identification of more complex patterns.
    • Manufacturing: Enhanced identification of defects based on deeper anomaly detection.
  • Convolutional neural networks for images.
    • Retail: In-store activity analysis of video to measure traffic.
    • Satellite images: Labeling terrain and classifying objects.
    • Automotive: Recognition of roadways and obstacles.
    • Healthcare: Diagnostic opportunities from x-rays, scans, etc.
    • Insurance: Estimating claim severity based on photographs.
  • Recurrent neural networks for sequenced data.
    • Customer satisfaction: Transcription of voice data to text for NLP analysis.
    • Social media: Real-time translation of social and product forum posts.
    • Photo captioning: Search archives of images for new insights.
    • Finance: Predicting behavior based via time series analysis (also enhanced recommendation systems).

Featured Post

How to build Unified Experience Platform (CDP & DXP)

USPA framework What is USPA framework ? The USPA framework is conceptual framework, to develop Unified Experience Platform (the unified of ...