Big Data Vietnam: August 2018

Friday, August 10, 2018

Deep Learning resources for self-study

Biologische Neuronen und technische Neuronenmodelle
Convolutional Neural Networks (CNN)
R-CNN Modell:
"Rich feature hierarchies for accurate object detection and semantic segmentation"
Paper

Fast R-CNN Modell:
"Fast R-CNN"
Paper
Capsule Networks
"Dynamic Routing Between Capsules"
Paper
Deep Learning Bibliotheken
Crashkurs Deep Learning Bibliotheken: TensorFlow und Keras

Faster R-CNN Modell:
"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks"
Paper
Reservoir Computing: Echo State Networks
Einführung/Crash-Kurs: GPU Programmierung

Yolo und Yolo9000 Modell:
"You Only Look Once: Unified, Real-Time Object Detection"
Paper1
"YOLO9000: Better, Faster, Stronger"
Paper2
Reservoir Computing: Liquid State Machines
ILSVRC Benchmark
Imagenet Large Scale Visual Recognition Challenge (ILSVRC): Wie funktioniert der Wettbewerb?
Link

SSD Modell:
"Single Shot MultiBox Detector"
Paper
Generative Adversarial Networks
Deep Learning Bibliotheken
Crashkurs Deep Learning Bibliotheken: Caffe/Caffe2 sowie Torch/PyTorch

Mask R-CNN Modell:
"Mask R-CNN"
Paper
Neuromorphische Chips

Wednesday, August 1, 2018

Ứng dụng Machine Learning như thế nào ?

Trong bài viết này, tôi sẽ mô tả cách phân tích có liên quan đến Học máy. Tôi sẽ cố gắng làm sáng tỏ một số điều vô nghĩa xung quanh ML, và giải thích quá trình và các loại học máy. Cuối cùng, tôi sẽ chia sẻ một vài video mô tả cấp độ tiếp theo của Trí tuệ nhân tạo - Deep Learning .

Phân tích (Analytics) và học máy (Machine Learning)

Ví dụ: Làm sao dự đoán giả thiết "nếu ta giảm giá thêm 5% thì sẽ tăng doanh số bán hàng lên 10% ?" bằng dữ liệu bạn có từ 6 tháng qua ?

Analytics có 3 kỹ thuật chính, bao gồm:

Analytics mô tả (Descriptive Analytics): Để xác định điều gì đã xảy ra? Điều này thường liên quan đến các báo cáo giúp mô tả những gì đã xảy ra. Ví dụ: để so sánh doanh thu của tháng này với cùng thời điểm năm ngoái.

Phân tích chẩn đoán (Diagnostic Analytics): Cố gắng giải thích lý do tại sao điều này xảy ra, thường liên quan đến việc sử dụng trang tổng quan với khả năng OLAP để tìm hiểu và điều tra dữ liệu cùng với các kỹ thuật Khai phá dữ liệu để tìm mối tương quan.

Phân tích dự đoán (Predictive Analytics): Cố gắng ước tính điều gì có thể xảy ra. Có thể phân tích dự báo được sử dụng để chọn bạn làm người đọc tiềm năng của bài viết này dựa trên tiêu đề, sở thích và liên kết công việc của bạn cho người khác.

Học máy (ML) phù hợp với không gian Predictive Analytics.

Học máy là gì?
Học máy là một tập hợp con của Trí tuệ nhân tạo AI, nhờ đó máy học được từ kinh nghiệm quá khứ, tức là. Dữ liệu. Không giống như lập trình truyền thống, nơi mà nhà phát triển cần dự đoán và mã hóa mọi điều kiện tiềm năng, giải pháp Machine Learning có hiệu quả điều chỉnh đầu ra dựa trên dữ liệu.

Thuật toán Học máy không có nghĩa là viết mã, nhưng nó xây dựng một mô hình máy tính (ML model) về hành vi của dữ liệu, sau đó nó sửa đổi dựa trên cách nó được đào tạo.

Làm thế nào nó hoạt động?
Phần mềm lọc spam là một ví dụ tuyệt vời. Nó sử dụng kỹ thuật Machine Learning để tìm hiểu cách nhận dạng spam từ hàng triệu thư. Nó hoạt động bằng cách sử dụng các kỹ thuật thống kê để giúp xác định các mẫu.

Ví dụ: nếu cứ 85 trong tổng số 100 email, bao gồm các từ “rẻ” và “Viagra” được tìm thấy là thư spam, chúng ta có thể nói với sự tự tin 85% rằng chúng thực sự là spam. Kết hợp điều này với một số chỉ báo khác (ví dụ: từ người gửi bạn chưa bao giờ nhận được thư) và thử nghiệm thuật toán chống lại một tỷ email khác, chúng ta có thể cải thiện độ tin cậy và chính xác theo thời gian.

Spam free diet: machine learning, a subset of AI (Artificial Intelligence) helps keep your inbox (relatively) free of spam.

Trong thực tế, Google cho biết nó bây giờ dừng lại khoảng 99,99% thư rác được gửi đi.

Ví dụ về máy học

Hiện có hàng trăm ứng dụng đã sẵn sàng bao gồm: -

Tiếp thị được nhắm mục tiêu (Targeted Marketing): Được Google và Facebook sử dụng để nhắm mục tiêu quảng cáo dựa trên sở thích cá nhân và bởi Netflix đề xuất phim để xem và Amazon đề xuất các sản phẩm để mua.
Ghi điểm tín dụng (Credit Scoring): Ngân hàng sử dụng dữ liệu thu nhập (ước tính từ nơi bạn sống), tuổi và tình trạng hôn nhân của bạn để dự đoán liệu bạn có mặc định cho khoản vay không.
Phát hiện gian lận thẻ (Card Fraud Detection): Được sử dụng để ngăn chặn sử dụng gian lận thẻ tín dụng hoặc thẻ ghi nợ trực tuyến dựa trên thói quen chi tiêu trước đó và có khả năng của bạn.
Phân tích giỏ hàng (Basket Analysis): Được sử dụng để dự đoán ưu đãi đặc biệt nào bạn có nhiều khả năng sử dụng dựa trên thói quen mua hàng triệu khách hàng tương tự.

Trong một trường hợp gây tranh cãi, nhà bán lẻ Hoa Kỳ Target đã sử dụng phân tích giỏ 25 sản phẩm mỹ phẩm và sức khỏe khác nhau để dự đoán thành công hành vi mua hàng của phụ nữ đang mang thai kể cả ngày đến hạn có độ chính xác cao. Điều này phản tác dụng khi cha của một cô gái trẻ phàn nàn rằng Target đã khuyến khích các bà mẹ tuổi teen sau khi cô bị spam emails với những lời đề nghị đặc biệt liên quan đến việc mang thai.

Tóm lại, bạn cần (theo thứ tự ưu tiên):

Mục tiêu (A Goal). Vấn đề bạn đang cố giải quyết. Ví dụ, thẻ tín dụng này có bị đánh cắp không? Giá cổ phiếu có tăng hay giảm? Phim nào khách hàng sẽ thích nhất?
Rất nhiều dữ liệu (Lots of Data). Ví dụ: để dự đoán chính xác giá trị gia đình, bạn sẽ cần giá lịch sử chi tiết cùng với chi tiết thuộc tính mở rộng.
Một chuyên gia (An Expert). Bạn sẽ cần một chuyên gia chuyên ngành hiểu được câu trả lời đúng để xác minh kết quả được tạo và xác nhận khi mô hình đủ chính xác.
Một mô hình (A Pattern). Bạn đang tìm kiếm một mẫu trong dữ liệu. Nếu không có mẫu, bạn có thể có dữ liệu sai hoặc không đầy đủ hoặc có thể không có mẫu nào cả.

Các loại máy học

Phân tích dự đoán (Predictive analytics) cố gắng dự đoán kết quả trong tương lai dựa trên dữ liệu lịch sử và phương pháp phổ biến nhất được gọi là Học tập được giám sát (Supervised Learning). Image title

Các loại máy học là:

Học tập được giám sát (Supervised Learning): Được sử dụng khi chúng ta biết câu trả lời chính xác từ dữ liệu trong quá khứ, nhưng cần phải dự đoán kết quả trong tương lai. Ví dụ: sử dụng giá nhà cũ để dự đoán giá trị hiện tại và tương lai. (ví dụ: Zillow dựa trên Hoa Kỳ hoặc Vương quốc Anh dựa trên Zoopla). Sử dụng hiệu quả quá trình cải tiến thống kê dựa trên thử nghiệm và lỗi, máy dần dần cải thiện độ chính xác bằng cách kiểm tra kết quả dựa vào một tập hợp các giá trị do người giám sát cung cấp.
Học tập không được giám sát (Unsupervised Learning): Nơi không có câu trả lời đúng khác biệt, nhưng chúng ta muốn khám phá điều gì đó mới mẻ từ dữ liệu. Thường được sử dụng để phân loại hoặc nhóm dữ liệu, ví dụ: để phân loại nhạc trên Spotify, để giúp đề xuất những album bạn có thể nghe. Sau đó nó sẽ phân loại người nghe, để xem liệu họ có nhiều khả năng nghe Radiohead hay Justin Bieber hơn không. (Radiohead mỗi lần!).
Học tập tăng cường (Reinforcement Learning): Không cần chuyên gia miền nhưng liên quan đến các cải tiến liên tục đối với mục tiêu được xác định trước. Đó là một kỹ thuật thường triển khai mạng Neural, ví dụ, DeepMind, trong đó AphaGo đã chơi một triệu trò chơi của Go chống lại chính nó để cuối cùng trở thành nhà vô địch thế giới.

Quá trình học máy

Không giống như hình ảnh tương lai của các máy học chơi cờ, hầu hết Machine Learning (hiện tại) khá mất thời gian và được minh họa trong sơ đồ dưới đây:

Có khả năng việc học máy trong tương lai sẽ được áp dụng để giúp tăng tốc quá trình, đặc biệt là trong lĩnh vực thu thập và làm sạch dữ liệu, nhưng các bước chính vẫn là:

Xác định vấn đề (Define Problem): Như được chỉ ra trong bài viết khác của tôi, luôn luôn bắt đầu với một vấn đề được xác định rõ ràng và mục tiêu trong tâm trí.
Thu thập dữ liệu (Collect data): Khối lượng và số lượng dữ liệu thích hợp càng lớn, mô hình học máy sẽ càng chính xác hơn. Điều này có thể đến từ bảng tính, tệp văn bản và cơ sở dữ liệu ngoài các nguồn dữ liệu có sẵn trên thị trường.
Chuẩn bị dữ liệu (Prepare data): Bao gồm phân tích, làm sạch và hiểu dữ liệu. Loại bỏ hoặc sửa chữa các ngoại lệ (giá trị cực kỳ sai); điều này thường mất tới 60% tổng thời gian và công sức. Dữ liệu sau đó được chia thành hai phần riêng biệt, dữ liệu Đào tạo và Kiểm tra.
Đào tạo mô hình (Train Model): Chống lại một tập hợp dữ liệu đào tạo - được sử dụng để xác định các mẫu hoặc tương quan trong dữ liệu hoặc đưa ra dự đoán, đồng thời nâng cao độ chính xác bằng cách sử dụng phương pháp thử nghiệm lặp lại và sửa lỗi.
Đánh giá mô hình(Evaluate model): Bằng cách so sánh độ chính xác của kết quả với tập hợp dữ liệu thử nghiệm. Điều quan trọng là không đánh giá mô hình dựa trên dữ liệu được sử dụng để đào tạo hệ thống để đảm bảo kiểm tra không thiên vị và độc lập.
Triển khai và cải tiến (Deply & Improve): Việc này có thể liên quan đến việc thử một thuật toán hoàn toàn khác hoặc thu thập nhiều hoặc khối lượng dữ liệu lớn hơn. Ví dụ: bạn có thể cải thiện dự đoán giá nhà bằng cách ước tính giá trị của các cải tiến nhà tiếp theo bằng cách sử dụng dữ liệu do chủ nhà cung cấp.
Tóm lại, hầu hết các quy trình Máy học thực tế tròn và liên tục, vì dữ liệu bổ sung được thêm vào hoặc các tình huống thay đổi, bởi vì thế giới không bao giờ đứng yên, và luôn có chỗ để cải thiện.

Tóm lược
Sơ đồ dưới đây minh họa các chiến lược chính được các hệ thống Máy học sử dụng.

Tóm lại, thành phần quan trọng của bất kỳ hệ thống học máy nào là dữ liệu. Với sự lựa chọn của các thuật toán bổ sung, lập trình thông minh và số lượng lớn dữ liệu chính xác hơn - Big Data thắng mọi lúc.

Cảm ơn bạn đã đọc
Bạn cũng có thể quan tâm đến video dài 14 phút này trên Deep Mind của Google, giải thích cách các nhà khoa học ở Cambridge phát triển hệ thống Trí tuệ nhân tạo sử dụng học tập tăng cường để tự rèn luyện bản thân trong các trò chơi trên máy tính bao gồm Space Invaders. Rất gợi nhớ đến bộ phim "War Games" năm 1980.

All You Need to Know About Neural Networks

a high-level algorithms expert will provide a brief overview of the evolution of neural networks and discuss the latest approaches in the field.

Neural networks and deep learning technologies underpin most of the advanced intelligent applications today. In this article, Dr. Sun Fei (Danfeng), a high-level algorithms expert from Alibaba's Search Department, will provide a brief overview of the evolution of neural networks and discuss the latest approaches in the field. The article is primarily centered on the following five items:

The Evolution of Neural Networks
Sensor Models
Feed-forward Neural Networks
Back-propagation
Deep Learning Basics

1. The Evolution of Neural Networks

Before we dive into the historical development of neural networks, let's first introduce the concept of a neural network. A neural network is primarily a computing model that simulates the workings of the human brain at a simplified level. This type of model uses a large number of computational neurons which connect via layers of weighted connections. Each layer of neurons is capable of performing large-scale parallel computing and passing information between them.

The timeline below shows the evolution of neural networks:

The origin of neural networks goes back to even before the development of computing itself, with the first neural networks appearing in the 1940s. We will go through a bit of history to help everyone gain a better understanding of the basics of neural networks.

The first generation of neural network neurons worked as verifiers. The designers of these neurons just wanted to confirm that they could build neural networks for computation. These networks cannot be used for training or learning; they simply acted as logic gate circuits. Their input and output was binary and weights were predefined.

The second phase of neural network development came about in the 1950s and 1960s. This involved Roseblatt's seminal work on sensor models and Herbert's work on learning principles.

2. Sensor Models

Sensor models and the neuron models as we mentioned above were similar but had some key differences. The activation algorithm in a sensor model can be either a break algorithm or a sigmoid algorithm, and its input can be a real number vector instead of the binary vectors used by the neuron model. Unlike the neuron model, the sensor model is capable of learning. Next, we will talk about some of the special characteristics of the sensor model.

We can think of the input value (x1..., xn) as a coordinate in N dimensional space, while wTx-w0 = 0 is a hyperplane in N dimensional space. Obviously, if wTx-w0 < 0, then the point falls below the hyperplane, while if wTx-w0 > 0, then the point falls above the hyperplane.

The sensor model corresponds to the hyperplane of a classifier and is capable of separating different types of points in N dimensional space. Looking at the figure below, we can see that the sensor model is a linear classifier.

The sensor model is capable of easily performing classification for basic logical operations like AND, OR, and NOT.

Can we classify all logical operations through the sensor model? The answer is, of course not. For example, Exclusive OR operations are very difficult to classify through a single linear sensor model, which is one of the main reasons that neural networks quickly entered a low point in development soon after the first peak. Several authors, including Minsky, discussed this problem on the topic of sensor models. However, a lot of people misunderstood the authors on this subject.

In reality, authors like Minsky pointed out that one could implement Exclusive OR operations through multiple layers of sensor models; however, since the academic world lacked effective methods to study multi-layer sensor models at the time, the development of neural networks dipped into its first low point.

The figure below shows intuitively how multiple layers of sensor models can achieve Exclusive OR operations:

3. Feed-Forward Neural Networks

Entering into the 1980s, due to the expressive ability of sensor model neural networks being limited to linear classification tasks, the development of neural networks began to enter the phase of multi-layered sensors. A classic multi-layered neural network is a feed-forward neural network.

We can see from the figure below that it involves an input layer, a hidden layer with an undefined number of nodes, and an output layer.

We can express any logical operation by a multi-layer sensor model, but this introduces the issue of weighted learning between the three layers. When xk is transferred from the input layer to the weighted vkj on the hidden layer and then passed through an activation algorithm like sigmoid, then we can retrieve the corresponding value hj from the hidden layer. Likewise, we can use a similar operation to derive the yi node value from the output layer using the hj value. For learning, we need the weighted information from the w and v matrices so that we can finally obtain the estimated value y and the actual value d.

If you have a basic understanding of machine learning, you will understand why we use a gradient descent to learn a model. The principle behind applying a gradient descent to the sensor model is fairly simple, as we can see from the below figure. First, we have to determine the model's loss.

The example uses a square root loss and seeks to close the gap between the simulated value y and the real value d. For the sake of convenient computing, in most situations, we use the root relationship E = 1/2 (d-y)^2 = 1/2 (d-f(x))^2.

According to the gradient descent principle, the rate of the weighting update cycle is: wj ← wi + α(d − f(x))f′(x)xi, where α is the rate of learning which we can adjust manually.

4. Back-Propagation

How do we learn all of the parameters in a multilayer feed-forward neural network? The parameters for the top layer are very easy to obtain. One can achieve the parameters by comparing the difference between the estimated and real values output by the computing model and using the gradient descent principles to obtain the parameter results. The problem comes when we try to obtain parameters from the hidden layer. Even though we can compute the output from the model, we have no way of knowing what the expected value is, so we have no way of effectively training a multi-layer neural network. This issue plagued researchers for a long time, leading to the lack of development of neural networks after the 1960s.

Later, in the 70s, a number of scientists independently introduced the idea of a back-propagation algorithm. The basic idea behind this type of algorithm is actually quite simple. Even though at the time there was no way to update according to the expected value from the hidden layer, one could update the weights between the hidden and other layers via errors passed from the hidden layer. When computing a gradient, since all of the nodes in the hidden layer are related to multiple nodes on the output layer, so all of the layers on the previous layers are accumulated and processed together.

Another advantage of back-propagation is that we can perform gradients and weighting of nodes on the same layer at the same time since they are unrelated. We can express the entire process of back-propagation in pseudocode as below:

Next, let's talk about some of the other characteristics of a back-propagation neural network. A back-propagation is actually a chain rule. It can easily generalize any computation that has a map. According to the gradient function, we can use a back-propagation neural network to produce a local optimized solution, but not a global optimized solution. However, from a general perspective, the result produced by a back-propagation algorithm is usually a satisfactorily optimized solution. The figure below is an intuitive representation of a back-propagation algorithm:

Under most circumstances, a back-propagation neural network will find the smallest possible value within scope; however, if we leave that scope we may find an even better value. In actual application, there are a number of simple and effective ways to address this kind of issue, for example, we can try different randomized initialization methods. Moreover, in practice, among the models frequently used in the field of modern deep learning, the method of initialization has significant influence on the final result. Another method of forcing the model to leave the optimized scope is to introduce random noises during training or use a hereditary algorithm to prevent the training model from stopping at an un-ideal optimized position.

A back-propagation neural network is an excellent model of machine learning, and when speaking about machine learning, we can't help but notice a basic issue that's frequently encountered throughout the process of machine learning, that is the issue of overfitting. A common manifestation of overfitting is that during training, even when the loss of the model constantly drops, the loss and error in the test group rises. There are two typical methods to avoid overfitting:

Early stopping: we can separate a validation group ahead of time, and run it against this already verified group during training. We can then observe the loss of the model and, if the loss has already stopped dropping in the verification group but is still dropping in the training group, then we can stop the training early to prevent overfitting.
Regularization: we can add rules to the weights within the neural network. The dropout method, which is popular these days, involves randomly dropping some nodes or sides. This method, which we can consider as a form of regularization, is extremely effective at preventing overfitting.

Even though neural networks were very popular during the 1980s, they, unfortunately, entered another low point in development in the 1990s. A number of factors contributed to this low point. For example, the Support Vector Machines, which were a popular model in the 1990s, took the stage at all kinds of major conferences and found application in a variety of fields. Support Vector Machines have an excellent statistical learning theory and are easy to intuitively understand. They are also very effective and produce near-ideal results.

Amidst this shift, the rise of the statistical learning theory behind Support Vector Machines applied no small amount of pressure to the development of neural networks. On the other hand, from the perspective of neural networks themselves, even though you can use back-propagation networks to train any neural network in theory, in actual application, we notice that as the number of layers in the neural network increases, the difficulty of training the network increases exponentially. For example, in the beginning of the 1990s, people noticed that in a neural network with a relatively large number of layers, it was common to see gradient loss or gradient explosion.

A simple example of gradient loss, for example, would be where each layer in a neural network is a sigmoid structure layer, therefore its loss during back-propagation is chained into a sigmoid gradient. When a series of elements are strung together, then if one of the gradients is very small, the gradient will then become smaller and smaller. In reality, after propagating one or two layers, this gradient disappears. Gradient loss leads to the parameters in deep layers to stop changing, making it very difficult to get meaningful results. This is one of the reasons that a multi-layer neural network can be very difficult to train.

The academic world has studied this issue in-depth, and come to the conclusion that the easiest way to handle it is by changing the activation algorithm. In the beginning, we tried to use a Rectified activation algorithm since the sigmoid algorithm is an index method which can easily bring about the issue of gradient loss. Rectified, on the other hand, replaces the sigmoid function and replaces max (0,x). From the figure below we can see that the gradient for estimates above 0 is 1, which prevents the issue of gradient disappearance. However, when the estimate is lower than 0, we can see that the gradient is 0 again, so the ReLU algorithm must be imperfect. Later, a number of improved algorithms came out, including Leaky ReLU and Parametric Rectifier (PReLU). When the estimate x is smaller than 0, we can convert it to a coefficient like 0.01 or α to prevent it from actually being 0.

With the development of neural networks, we later came up with a number of methods that solve the issue of passing gradients on a structural level. For example, the Metamodel, LSTM model, and modern image analysis use a number of cross-layer linking methods to more easily propagate gradients.

5. Deep Learning Basics

From the second low point in development in the 1990s to 2006, neural networks once again entered the consciousness of the masses, this time in even more force than before. A monumental occurrence during this rise of neural networks was the two theses on multi-layer neural networks (now called “deep learning”) submitted by Hinton in a number of places including Salahundinov.

One of these theses solved the issue of setting initialization values for neural networks. The solution, put simply, is to consider the input value as x, and the output value as decoded x, then through this method find a better initialization point. The other thesis raised a method of quickly training a deep neural network. Actually, there are a number of factors contributing to the modern popularity of neural networks, for example, the enormous growth in computing resources and availability of data. In the 1980s, it was very difficult to train a large scale neural network due to the lack of data and computing resources.

The early rise of neural networks was driven by three monumental figures, namely Hinton, Bengio, and LeCun. Hinton's main accomplishment was in the Restricted Boltzmann Machine and Deep Autoencoder. Bengio's major contribution was a series of breakthroughs in using the metamodel for deep learning. This was also the first field in which deep learning experienced a major breakthrough.

In 2013, language modeling, based on the metamodel, was already capable of outperforming even the most effective method at the time, the probability model. The main accomplishment of LeCun was research related to CNN. The primary appearance of deep learning was in a number of major summits like NIPS, ICML, CVPR, ACL, where it attracted no small amount of attention. This included the appearance of Google Brain, Deep Mind, and Facebook AI, which all placed the center of their research on the field of deep learning.

The first breakthrough to come about after deep learning entered the consciousness of the masses was in the field of speech recognition. Before we began using deep learning, models were all trained on previously defined statistical databases. In 2010, Microsoft used a deep learning neural network for speech recognition. We can see from the figure below that two error indicators both dropped by 2/3, an obvious improvement. Based on the newest ResNet technology, Microsoft has already reduced this indicator to 6.9%, with improvements coming year by year.

In the field of image classification, the CNN model experienced a major breakthrough in the form of ImageNet in 2012. In ImageNet, the Image classification is tested using a massive data collection and then sorted into 1000 types. Before the application of deep learning, the best error rate for image classification system was 25.8% (in 2011), which came down to a mere 10%, thanks to the work done by Hinton and his students in 2012 using CNN.

From the graph, we can see that since 2012, this indicator has experienced a major breakthrough each year, all of which have been achieved using the CNN model.
These massive achievements owe in large part to the multi-layered structure of modern systems, as they allow for independent learning and the ability to express data through a layered abstraction structure. The abstracted features can be applied to a variety of tasks, contributing significantly to the current popularity of deep learning.

Next, we will introduce two classic and common types of deep learning neural networks: One is the Convolutional Neural Network (CNN), and the other is the Recurrent Neural Network

Convolutional Neural Networks

There are two core concepts to Convolutional Neural Networks. One is convolution and the other is pooling. At this point, some may ask why we don't simply use feed-forward neural networks rather than CNN. Taking a 1000x1000 image, for example, a neural network would have 1 million nodes on the hidden layer. A feed-forward neural network, then, would have 10^12 parameters. At this point, it's nearly impossible for the system to learn since it would require an absolutely massive number of estimations.

However, a large number of images have characteristics like this. If we use CNN to classify images, then because of the concept of convolution, each node on the hidden layer only needs to connect and scan the features of one location of the image. If each node on the hidden layer connects to 10*10 estimations, then the final number of parameters is 100 million, and if the local parameters accessed by multiple hidden layers can be shared, then the number of parameters is decreased significantly.

Looking at the image below, the difference between feed-forward neural networks and CNN is obviously massive. The models in the image are, from left to right, fully connected, normal, feed-forward, fully connected feed-forward, and CNN modeled neural networks. We can see that the connection weight parameters of nodes on the hidden layer of a CNN neural network can be shared.

Another operation is pooling. A CNN will, on the foundation of the principle of convolution, form a hidden layer in the middle, namely the pooling layer. The most common pooling method is Max Pooling, wherein nodes on the hidden layer choose the largest output value. Because multiple kernels are pooling, we get multiple hidden layer nodes in the middle.

What is the benefit? First of all, pooling further reduces the number of parameters, and secondly, it provides a certain amount of translation invariance. As shown in the image, if one of the nine nodes shown in the image were to experience translation, then the node produced on the pooling layer would remain unchanged.

These two characteristics of CNN have made it popular in the field of image processing, and it has become a standard in the field of image processing. The example of the visualized car below is a great example of the application of CNN in the field of image classification. After entering the original image of the car into the CNN model, we can pass some simple and rough features like edges and points through the convolution and ReLU activation layer. We can intuitively see that the closer they are to the output image from the uppermost output layer, the closer they are to the contours of a car. This process will finally retrieve a hidden layer representation and connect it to the classification layer, after which it will receive a classification for the image, like the car, truck, airplane, ship, and horse shown in the image.

The image below is a neural network used in the early days by LeCun and other researchers in the field of handwriting recognition. This network found application in the US postal system in the 1990s. Interested readers can log into LeCun's website to see the dynamic process of handwriting recognition.

While CNN has become incredibly popular in the field of image recognition, it has also become instrumental in text recognition over the past two years. For example, CNN is currently the basis of the most optimal solution for text classification. In terms of determining the class of a piece of text, all one really needs to do is look for indications from keywords in the text, which is a task that is well suited to the CNN model.

CNN has widespread real-world applications, for example in investigations, self-driving cars, Segmentation, and Neural Style. Neural Style is a fascinating application. For example, there is a popular app in the App Store called Prisma, which allows users to upload an image and convert it into a different style. For example, it can be converted to the style of Van Goh's Starry Night. This process relies heavily on CNN.

Recursive Neural Networks

As for the foundational principles behind recursive neural networks, we can see from the image below that the output from such a network relies not only on output x but the status of the hidden layer, which is updated according to the previous input x. The expanded image shows the entire process. The hidden layer from the first input is S(t-1), which influences the next input, X(t). The main advantage of the recursive neural network model is that we can use it in sequential data operations like text, language, and speech where the state of the current data is influenced by previous data states. This type of data is very difficult to handle using a feed-forward neural network.

Speaking of recursive neural networks, we would be remiss not to bring up the LSTM model we mentioned earlier. LSTM is not actually a complete neural network. Simply put, it is the result of an RNN node that has undergone complex processing. An LSTM has three gates, namely the input gate, the regret gate, and the output gate.
Each of these gates is used to process the data in a cell and determine whether or not the data in the cell should be input, regretted, or output.

Finally, let's talk a bit about a cross-discipline application of neural networks which is gaining widespread acceptance. This application involves converting an image into a text description of the image or a title describing it. We can describe the specific implementation process by using a CNN model first to extract information about the image and produce a vector representation. Later on, we can pass that vector as input to an already trained recursive neural network to produce the description of the image.

Summary

In this article, we talked about the evolution of neural networks and introduced several basic concepts and approaches in this field.

Source: https://dzone.com/articles/all-you-need-to-know-about-neural-networks-part-2

Big Data Vietnam

Pages