Monday, April 2, 2018

Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis



Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis from Trieu Nguyen
  •  Growth of big datasets
  • Introduction to Apache Hadoop and Spark for developing applications 
  • Components of Hadoop, HDFS, MapReduce and HBase 
  • Capabilities of Spark and the differences from a typical MapReduce solution 
  • Some Spark use cases for data analysis  

Wednesday, January 10, 2018

Machine Learning on Big Data

Có một cuộc cách mạng xảy ra trong lĩnh vực học máy và dữ liệu lớn. Từ mỗi cà phê mà bạn mua cho mọi thứ bạn nhấp vào (không phải đề cập đến mua hàng) trực tuyến, mọi thứ đang được theo dõi và phân tích. Từ những phân tích này, rất nhiều khoản khấu trừ được thực hiện để cung cấp cho bạn các lựa chọn mới và tốt hơn theo những gì bạn thích.

Các công nghệ trước đây như học máy và trí thông minh nhân tạo được sử dụng chỉ để ngồi trong phòng thí nghiệm, không bao giờ được thực hiện - nhưng không nữa. Với sự gia tăng của dữ liệu lớn, các công nghệ này đã đi chính. Và bằng cách sử dụng các công nghệ này, bạn có thể dự đoán gần như mọi thứ, từ đó quảng cáo người dùng sẽ nhấp chuột vào bên cạnh liệu một khối u có bị ung thư hay không chỉ dựa trên sự công nhận hình ảnh.

Hãy xem một số trường hợp sử dụng phổ biến, nơi chúng tôi sử dụng máy học và phân tích định kỳ về dữ liệu lớn trên cơ sở hàng ngày. Trên đường đi, tôi cũng sẽ đề cập đến cách chúng được giải thích trong cuốn sách Big Data Analytics With Java.

Recommendation Engines


Tôi thích xem Marco Polo trên Netflix, vì vậy tôi đã đề nghị những bộ phim và chương trình tương tự mà tôi thích (xem hình trên). Đây là một trong những trường hợp sử dụng phổ biến nhất của máy học - nơi mà máy học được từ dữ liệu lịch sử của chúng tôi và đưa ra các khuyến nghị thích hợp cho chúng tôi.

Frequently Bought Together


Chúng ta hãy nhìn vào hình trên. Như bạn có thể biết, bất cứ khi nào bạn mua bất kỳ sản phẩm nào trên bất kỳ cửa hàng thương mại điện tử nào và đến trang chi tiết của mặt hàng đó, bạn sẽ được hiển thị các sản phẩm khác thường được bán cùng với nó. Điều này mang lại cho người sử dụng nhiều lựa chọn hơn để mua cùng với mặt hàng hiện tại và được thực hiện để tăng doanh thu.

Predictive Analytics


Học máy đã được sử dụng rất nhiều trong dự đoán giá trị tương lai của các vật liệu miễn là có sẵn các dữ liệu lịch sử để đào tạo các mô hình. Giá trị có thể là bất cứ thứ gì, cho dù đó là số tiền cần cho một chiến dịch tiếp thị, số tiền cần thiết để khởi chạy một sản phẩm mới, hoặc giá của một sản phẩm. Cuốn sách Big Data Analytics With Java sử dụng một nghiên cứu trường hợp thực tế về dự đoán giá của căn nhà dựa trên một tập hợp các biến số khác nhau do Hạt King phát hành tại Chicago.

Spam Detection and Sentiment Analysis


Phát hiện spam là một trường hợp sử dụng phổ biến. Gmail thực hiện điều đó cho chúng tôi, và chúng tôi thường sử dụng nó. Hãy nhìn vào hình ảnh của hai email được hiển thị ở trên. Email ở bên trái rõ ràng là spam, trong khi email ở bên phải là hoàn toàn tốt.

Sử dụng cùng một thuật toán được sử dụng để phát hiện spam, Big Data Analytics Với Java xây dựng trên một nghiên cứu tình huống mẫu cho thấy tình cảm (cho dù tích cực hay tiêu cực) của người dùng trên đầu trang của một bộ tweets cho các bộ phim khác nhau. Xem hình dưới đây.



Social Analytics and Regular Graph Analytics


Khi bạn tìm kiếm điểm đến trên GPS của mình, thuật toán tìm kiếm đồ thị chạy để tìm ra con đường ngắn nhất đến điểm đến của bạn. Chạy các biểu đồ trên một mẩu dữ liệu nhỏ là một việc, nhưng chạy chúng trên một lượng lớn dữ liệu đòi hỏi phần mềm đặc biệt như GraphFrames trên dữ liệu lớn. Ngoài ra, trong thế giới ngày hôm nay của các mạng xã hội, chúng tôi có các biểu đồ xã hội khổng lồ của những người có thể kết nối chúng tôi với những người mà chúng tôi biết - ví dụ như bạn bè, bạn bè của chúng tôi, và vân vân. Hình ảnh trên cho thấy một biểu đồ xã hội rất đơn giản nhưng nó cho thấy mức độ phức tạp của những biểu đồ này có thể nhận được như thế nào.

Phân tích dữ liệu lớn Với Java có một chương mở rộng về phân tích đồ thị và bao gồm một nghiên cứu trường hợp về một tập dữ liệu thực về các sân bay và các chuyến bay kết nối. Sử dụng bộ dữ liệu này, chúng tôi chạy phân tích như thuật toán xếp hạng trang để tìm ra sân bay là lựa chọn tốt nhất, đường đi ngắn nhất giữa các điểm đến trong biểu đồ và nhiều hơn thế nữa.

Image Classification and Natural Language Processing



Phân loại hình ảnh và NLP là những vấn đề khó khăn và thú vị để giải quyết. Mạng thần kinh nhân tạo cực kỳ tốt và ngày càng trở nên tốt hơn trong những lĩnh vực này. Trên thực tế, một số mạng nơ ron xoắn có thể thực hiện phân loại bằng tay bằng độ chính xác 99%.

Phần kết luận
Các trường hợp sử dụng và ví dụ ở trên chỉ là một vài; hiện có rất nhiều trường hợp sử dụng phân tích khác. Trí tuệ nhân tạo và các quá trình phân tích khác đang được thu hút vào quá trình thường ngày của chúng tôi để ngày rằng nó là rất rõ ràng rằng chúng ta sẽ thấy việc sử dụng các kỹ thuật này ngày càng mở rộng trong tương lai gần.

Tuesday, January 9, 2018

How Volkswagen is using artificial intelligence for ad buying decisions

Cars aren’t the only thing Volkswagen wants to automate. Artificial intelligence is managing the brand’s media buys in Germany and proving to be more effective than its media agency.

Whenever Volkswagen uses the recommendations from Blackwood Seven, a Danish media agency that uses AI and predictive analytics to forecast ad spend decisions, it sells more cars than it would have if it had gone with its media agency’s recommendation, according to Lutz Kothe, the head of marketing and PR for Volkswagen’s passenger cars. Kothe said his team uses Blackwood Seven’s algorithm to buy the right ads based on sales.

Defining a role for the AI platform, however, has taken Volkswagen a while. Two years ago, the advertiser ditched MediaCom and gave its digital business in Germany to Blackwood Seven. At the time, Kothe wasn’t sure the algorithm would be able to deliver better media recommendations than its agency, but the signs were promising just months into the deal, according to Kothe. Between September and December 2016, Volkswagen used the algorithm’s media recommendations for a campaign for its up! model, which led to a 14 percent rise in orders from Volkswagen’s dealerships versus what those orders would’ve been had the campaign run solely on its agency’s recommendations. In some instances, the difference between the algorithm’s and its agency’s car orders has been as high as 20 percent, revealed Kothe.

Since those early campaigns, Volkswagen has applied the algorithm to all its media strategies in Germany. Every car model there from the advertiser has a different strategy, predicting which media investments will give the best returns for its marketers. Those forecasts are based on Volkswagen’s transactional data — or incoming orders — as well as market data such as fuel price, competitor prices and overall car registrations from sources such as Nielsen. Like other AI platforms, the more data fed into the algorithm, the smarter Blackwood Seven’s recommendations become.

Volkswagen’s marketing team bought those recommendations for display, search and social directly from the platform in 2017. By circumventing the agency and going direct to the media owner, the car manufacturer avoided any hidden rebate costs it might have incurred from buying via an agency. In 2018, however, Volkswagen will only use the AI platform to plan its campaigns, not buy them — a move that comes nearly a year after the advertiser appointed PHD as its global media agency.

Kothe would not comment on the prospect of Volkswagen eventually replacing all its agencies with AI platforms, instead stressing how important AI could be to finding marginal gains in media.

What separates Blackwood Seven from Volkswagen’s agencies, Kothe said, is the AI platform’s capacity to process reams of data. Volkswagen’s marketers pull data from more than 1,400 touch points, yet they relied on the “personal interpretations coming from the agency,” which wasn’t based wholly on the data, Kothe said. The biggest advantage the algorithm gives Volkswagen is the ability to better predict what the brand’s media investments will do, he added. The more precise the recommendations become, the more “we can steer our media activity into the best areas,” he concluded.

This means the brand can invest less for some campaigns and still see a sales uplift, Kothe said. While he would not reveal details on those campaigns, he noted how radio, which had been regarded as somewhat of a stale media for Volkswagen in Germany, has been used more frequently over other media to launch new cars, per the algorithm’s forecasts.

But trusting a machine hasn’t been easy. After two years of working with Blackwood Seven and another year before that of early discussions, Volkswagen’s German team is only now talking to other markets about the algorithm’s performance. While hype continues to build around AI replacing media agencies, the reality is it won’t happen anytime soon. Blackwood Seven’s failed launch in the U.K. is a testament to that; the startup closed its office there last month after just a year, despite offering to give marketers more control of their media spend at the height of the transparency crisis.

Friday, December 29, 2017

Data Scientist Skill Set

1         Background

Data science is first and foremost a talent-based discipline and capability. Platforms, tools and IT infrastructure play an important but secondary role. Nevertheless, software and technology companies around the globe spend significant amounts of money talking business managers into buying or licensing their products which often times results in unsatisfying outcomes that do not come close to realizing the full potential of data science.
Talent is key - but unfortunately very rare and hard to identify. If you are trying to hire a data scientist these days you are facing the serious risk of recruiting someone with the wrong or an insufficient skill set. On top of things, talent is even more crucial for small or medium-sized companies whose data science teams are likely to stay relatively small. Wasting one or two head counts on wrong profiles might render an entire team inefficient.
The demand for data scientists has risen dramatically in recent years [1, 2, 3, 4, 5]:
  • New technologies significantly improved our ability to manage and process data; including new data types of data as well as large quantities of data.
  • shift in mind set in business environments took place [6] regarding the utilization of data: from data as a reporting and business analytics necessity towards a valuable resource to enable smart decision making.
  • Last but not least exciting new intellectual developments
  • Last but not least exciting new intellectual developments have taken place in relevant related academic disciplines like machine learning [7, 8] or natural language processing.
Due to high demand, the term ‘data scientist’ developed into a recruiting buzz word which is broadly being abused these days. Experienced lead data scientists share a painful experience when trying to fill a vacant position: Out of a hundred applicants, typically only a handful matches the requirements to qualify for an interview. Some candidates feel already qualified to call themselves ‘data scientist’ after finishing a six-week online course on a statistical computing language. Unqualified individuals often times end up being hired by managers who themselves lack data science experience - leading to disappointments, frustration and an erosion of the term ‘data science’.

2         Who is a Data Scientist?

The data scientist skill set described in the following is based on the idea that it fundamentally rests on three pillars, each representing a skill set mostly orthogonal to the remaining two.
Following this idea, a solid data scientist needs to have the following three well-established skill sets:
  1. Technical skills,
  2. Analytical skills and
  3. Business skills.
Although technical skills are often times the focus of data science role descriptions, they represent only the basis of a data scientist’s skill set. Analytical skills are much harder to acquire (and to test) but represent the crucial core of a data scientist’s ability to solve business problems utilizing scientific approaches. Business skills enable a data scientist to thrive in corporate environments.

2.1        Technical skills | Basis

Technical skills are the basis of a data scientist’s skill set. They include coding skills in languages such as R or Python, the ability to handle various computational architectures, including different types of data bases and operating systems but also other skills such as parallel computing or high performance computing.
The ability to handle data is a necessity for data scientists. It includes data management, data consolidation, data cleansing and data modelling amongst others. As there is often times a high demand for these skills in corporate environments, it comes with the risk of focusing data scientists on data management tasks - thus distracting them from their actual work.
Almost more important than a candidate’s current technical skill set is their mind set. A key factor is intellectual agility providing candidates with the ability to adapt to new computational environments in a short amount of time. This includes learning new coding languages, dealing with new types of data bases or data structures or keeping up with current technological developments like moving from relational databases to object-analytical approaches.
A data scientist with a static technical skill set will not thrive for long as the discipline requires constant adaption and learning. Strong candidates show a healthy appetite for developing their technical skills. When a candidate focusses on a tool discussion during an interview it can be an indication of a narrow technical comfort zone with firm constraints.
Unfortunately, data science job profiles are often times narrowly focused on technical skills; caused by a) the misperception that a successful data scientist’s secret lies exclusively in the ability to handle a specific set of tools and b) a lack of knowledge on the hiring manager’s end as to what the right skill set looks like in the first place. Focusing on technical skills when evaluating candidates renders a significant risk.

2.2        Analytical skills | Core

Scientific problem solving is an essential part of data science. Analytical skills represent the ability to succceed at this complex and highly non-linear discipline. Establishing throrough analytical skills requires a high amount of commitment and dedication (which is a limiting factor contributing to the global shortage of data scientists).
Analytical skills include expertise in academic disciplines like computer science, machine learning, advanced statistics, probability theory, causal inference, artificial intelligence, feature extraction and others (including strong mathematical skills). The list can be extended almost infinetely [9, 10, 11] and has been subject to many debates.
Covering all potentially usefull analytical disciplines is a life-time achievement for any data scientist and not a requirement for a successful candidate. Rather, a data scientist needs to have a healthy mix of analytical skills to succeed. For instance, an expert on Markov chains and an expert on Bayesian networks might both be able to develop a solution for the very same business problem although utilizing their respective strengths and thus fundamentally different methods.
Analytical skills are typically beeing developed through pursuing excellence in a highly quantitative academic field such as computer science, theoretical physics, computational math or bioinformatics. These skills are trained in academic institutions through exposure to hard, unsolved research problems that require a high level of intellectual curiosity and dedication to tackle and eventually solve. This is typically done over the course of a PhD.
Mastering a quantitative research question that nobody else has solved before is a non-linear process inadvertedly accompanied by failing over and over again. However, this process of scientitic problem solving shapes the analytical mind and builds the expertise to later succeed in data science. It typically consists of iterative cycles of
  1. implementing and adapting an analytical approach
  2. applying it and observing it fail, then
  3. investigating the problems and
  4. building an understanding why it failed and where the limitations of the approach lie
  5. to come up with a better more refined approach.
These iterations are acompanied with key learnings and represent small steps towards the project goal thus effectively zig-zagging towards the final solution.
A key requirement for analytical excellence is the right mind set: A data scientist needs to have an intrinsic, high level of curiosity and a strong appetite for intellectual challenges. Data scientists need to be able to pick up new methods and mathematical techniques in a short amount of time to then apply them to a problem at hand - often times within the limited time frame of an ongoing project.
A good way to test analytical skills during an interview process is to provide potential candidates with a business problem and real data to then ask them to spend a few of hours working on it remotely. Discussing the code they wrote, the approach they chose, the solution they built and the insights they generated is a great way to evaluate their potential and at the same time provide the candidates with a first feeling for their potential new tasks.

2.3        Business Skills | Enablement

Business skills enable data scientists to thrive in a corporate environment.
It is important for data scientists to communicate effectively with business users utilizing business lingua and at the same time avoiding a shift towards a conversation that is too technical. Healthy data science projects start and end with the discussion of a business problem supported by a valid business case.
Data scientists need to have a good understanding of business processes as it will be required to make sure the solution they build can be integrated and ultimately consumed by the respective business users. Careful and smart change management almost always plays a role in data science projects as well. A solid portion of entrepreneurship and out-of-the-box thinking helps data scientists to consider business problems from new angles utilizing analytical methods that their business partners do not know about. Last but not least, many big and successful data science projects that ultimately lead to significant impact were achieved through ‘connecting the dots’ by data scientists who built up internal knowledge by working on different projects across departments and functions.
Candidates who come with strong technical and analytical skills are often times highly intelligent individuals looking for intellectual challenges. Even if they have no experience in an industry or in navigating a corporate environment, they can pick up required business skills in a short amount of time - given that they have a healthy appetite for solving business cases. Building strong analytical or technical skills takes orders of magnitude longer.
When trying to determine whether a candidate has an intrinsic interest in business questions or whether he or she would rather prefer to work in an academic setting, it can help to ask yourself the following questions:
  • How well can the candidate explain data science methods like deep learning to business users?
  • When discussing a business problem can the candidate communicate effectively in business terms while thinking about potential mathematical or technical approaches?
  • Will the business users collaborate with the data scientist in the future respect him or her as a partner at eye-level?
  • Would you feel comfortable sending the candidate on their own to present to your manager?
  • Do you think the candidate will succeed in your business environment?

3         Recruiting

Data science requires a mix of different skills. In the end, this mix needs to be adapted to the requirements and the situation at hand, and the business problems that represent the biggest potential value for your company. Big data for instance, is a strong buzz word but in many companies data is under-utilized to a degree that a data science team can focus on low hanging fruit for one or two years in the form of small and structured data sets and at the same time already have a strong business impact.
A key characteristic of candidates that has not been mentioned so far and which can be hard to evaluate is attitude. Hiring data scientists for business consultant positions will require a different mindset and attitude than hiring for integration into an analytics unit or even to supplement a business team.

4         References

[1] NY Times, Data Science: The Numbers of Our Lives by Claire Cain Miller http://nyti.ms/1TfCFmX[2] TechCrunch: How To Stem The Global Shortage Of Data Scientists http://tcrn.ch/1TUIqsB[3] Bloomberg: Help Wanted: Black Belts in Data http://bloom.bg/1Xt8bTO[4] McKinsey on US opportunities for growth http://bit.ly/1WAonmD[5] McKinsey on big data and data science http://bit.ly/1VXQJHD[6] Big Data at Work: Dispelling the Myths, Uncovering the Opportunities; Thomas H. Davenport; Harvard Business Review Press (2014)
[7] Andrew Ng on Deep Learning http://bit.ly/1Tg3g74[8] Andrew Ng on Deep Learning Applications http://bit.ly/1Wza02H[9] Data scientist Venn diagram by Drew Conway http://bit.ly/1Xd6MAn[10] Swami Chandrasekaran’s data scientist skill map: http://bit.ly/1ZUGUIF[11] Forbes: The best machine learning engineers have these 9 traits in common. http://onforb.es/1VXR9Og

Sunday, December 17, 2017

Guide to machine learning and big data jobs in finance from J.P.Morgan

Minimum Spanning Tree for 31 JPM tradable risk premia indices
Financial services jobs go in and out of fashion. In 2001 equity research for internet companies was all the rage. In 2006, structuring collateralised debt obligations (CDOs) was the thing. In 2010, credit traders were popular. In 2014, compliance professionals were it. In 2017, it’s all about machine learning and big data. If you can get in here, your future in finance will be assured.
J.P. Morgan’s quantitative investing and derivatives strategy team, led Marko Kolanovic and Rajesh T. Krishnamachari, has just issued the most comprehensive report ever on big data and machine learning in financial services.
Titled, ‘Big Data and AI Strategies’ and subheaded, ‘Machine Learning and Alternative Data Approach to Investing’, the report says that machine learning will become crucial to the future functioning of markets. Analysts, portfolio managers, traders and chief investment officers all need to become familiar with machine learning techniques. If they don’t they’ll be left behind: traditional data sources like quarterly earnings and GDP figures will become increasingly irrelevant as managers using newer datasets and methods will be able to predict them in advance and to trade ahead of their release.
At 280 pages, the report is too long to cover in detail, but we’ve pulled out the most salient points for you below.

1. Banks will need to hire excellent data scientists who also understand how markets work

J.P. Morgan cautions against the fashion for banks and finance firms to prioritize data analysis skills over market knowledge. Doing so is dangerous. Understanding the economics behind the data and the signals is more important than developing complex technical solutions.

2. Machines are best equipped to make trading decisions in the short and medium term

J.P. Morgan notes that human beings are already all but excluded from high frequency trading. In future, they say machines will become increasingly prevalent over the medium term too: “Machines have the ability to quickly analyze news feeds and tweets, process earnings statements, scrape websites, and trade on these instantaneously.” This will help erode demand for fundamental analysts, equity long-short managers and macro investors.
In the long term, however, humans will retain an advantage: “Machines will likely not do well in assessing regime changes (market turning points) and forecasts which involve interpreting more complicated human responses such as those of politicians and central bankers, understanding client positioning, or anticipating crowding,” says J.P. Morgan. If you want to survive as a human investor, this is where you will need to make your niche,

4. An army of people will be needed to acquire, clean, and assess the data 

Before machine learning strategies can be implemented, data scientists and quantitative researchers need to acquire and analyze the data with the aim of deriving tradable signals and insights.
J.P. Morgan notes that data analysis is complex. Today’s datasets are often bigger than yesterday’s. They can include anything from data generated by individuals (social media posts, product reviews, search trends, etc.), to data generated by business processes (company exhaust data, commercial transaction, credit card data, etc.) and data generated by sensors (satellite image data, foot and car traffic, ship locations, etc.). These new forms of data need to be analyzed before they can be used in a trading strategy. They also need to be assessed for ‘alpha content’ – their ability to generate alpha. Alpha content will be partially dependent upon the cost of the data, the amount of processing required and how well-used the dataset is already.
JPMorgan big data

5. There are different kinds of machine learning. And they are used for different purposes

Machine learning has various iterations, including supervised learning, unsupervised learning and deep and reinforcement learning.
The purpose of supervised learning is to establish a relationship between two datasets and to use one dataset to forecast the other. The purpose of unsupervised learning is to try to understand the structure of data and to identify the main drivers behind it. The purpose of deep learning is to use multi-layered neural networks to analyze a trend, while reinforcement learning encourages algorithms to explore and find the most profitable trading strategies.
JPMorgan machine learning classification

6. Supervised learning will be used to make trend-based predictions using sample data

In a finance context, J.P. Morgan says supervised learning algorithms are provided with provided historical data and asked to find the relationship that has the best predictive power. Supervised learning algorithms come in two varieties: regression and classification methods.
Regression-based supervised learning methods try to predict outputs based on input variables. For example, they might look at how the market will move if inflation spikes.
Classification methods work backwards and try to identify which category a set of classifications belong to.

7. Unsupervised learning will be used to identify relationships between a large number of variables

In unsupervised learning, a machine is given an entire set of returns from assets and doesn’t know which are the dependent and the independent variables. At a high level, unsupervised learning methods are categorized as clustering or factor analyses.
Clustering involves splitting a dataset into smaller groups based on some notion of similarity. For example, it cant involve identifying historical regimes with high and low volatility, rising and failing rates, or rising and falling inflation.
Factor analyses aim to identify the main drivers of the data or to identify best representation of the data. For example, yield curve movements can be described by the parallel shift of yields, steepening of the curve, and convexity of the curve. In a multi-asset portfolio, factor analysis will identify the main drivers such as momentum, value, carry, volatility, or liquidity.

8. Deep learning systems will undertake tasks that are hard for people to define but easy to perform

Deep learning is effectively an attempt to artificially recreate human intelligence. J.P. Morgan says deep learning is particularly well suited to the pre-processing of unstructured big data sets (for instance, it can be used to count cars in satellite images, or to identify sentiment in a press release.). A deep learning model could use a hypothetical financial data series to estimate the probability of a market correction.
Deep Learning methods are based on neural networks which are loosely inspired by the workings of the human brain. In a network, each neuron receives inputs from other neurons, and ‘computes’ a weighted average of these inputs. The relative weighting of different inputs is guided by the past experience.
JPMorgan neural network

9. Reinforcement learning will be used to choose a successive course of actions to maximize the final reward

The goal of reinforcement learning is to choose a course of successive actions in order to maximize the final (or cumulative) reward. Unlike supervised learning (which is typically a one step process), the reinforcement learning model doesn’t know the correct action at each step.
J.P. Morgan’s electronic trading group has already developed algorithms using reinforcement learning. The diagram below shows the bank’s machine learning model (we suspect it’s blurry on purpose).
JPMorgan algorithmic trading architecture

10. You won’t need to be a machine learning expert, you will need to be an excellent quant and an excellent programmer

J.P. Morgan says the skillset for the role of data scientists is virtually the same as for any other quantitative researchers. Existing buy side and sell side quants with backgrounds in computer science, statistics, maths, financial engineering, econometrics and natural sciences should therefore be able to reinvent themselves. Expertise in quantitative trading strategies will be the crucial skill. “It is much easier for a quant researcher to change the format/size of a dataset, and employ better statistical and Machine Learning tools, than for an IT expert, silicon valley entrepreneur, or academic to learn how to design a viable trading strategy,” say Kolanovic and Krishnamacharc.
By comparison, J.P. Morgan notes that you won’t need to know about machine learning in any great detail. – Most of the Machine Learning methods are already coded (e.g. in R): you just need to apply the existing models. As a start, they suggest you can look at small datasets using GUI-based software like Weka. Python also has extensive libraries like Keras (keras.io). And there are open source Machine Learning libraries like Tensorflow and Theano.
JPMorgan machine learning 2

11. These are the coding languages and data analysis packages you’ll need to know

If you’re only planning to learn one coding language related to machine learning, J.P. Morgan suggests you choose R, along with the related packages below. However, C++, Python and Java also have machine learning applications as shown below.
Machine learning tables
machine learning r
machine learning r3

12. And these are some examples of popular machine learning codes using Python

Machine learning python
machine learning python 2
Python code 3

13. Support functions are going to need to understand big data too

Lastly, J.P. Morgan notes that support functions need to know about big data too. The report says that too many recruiters and hiring managers are incapable of distinguishing between an ability to talk broadly about artificial intelligence and an ability to actually design a tradeable strategy At the same time, compliance teams will need to be able to vet machine learning models and to ensure that data is properly anonymized and doesn’t contain private information. The age of machine learning in finance is upon us.

Friday, November 24, 2017

Free Deep Learning Book (MIT Press)

The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine learning in general and deep learning in particular. The online version of the book is now complete and will remain available online for free.
Source for picture: click here
Content
For more information, click here. The book is also available on Amazon, and also here (MIT Press). 

Lectures

We plan to offer lecture slides accompanying all chapters of this book. We currently offer slides for only some chapters. If you are a course instructor and have your own lecture slides that are relevant, feel free to contact us if you would like to have your slides linked or mirrored from this site.
  1. Introduction
    • Presentation of Chapter 1, based on figures from the book [.key] [.pdf]
    • Video of lecture by Ian and discussion of Chapter 1 at a reading group in San Francisco organized by Alena Kruchkova
  2. Linear Algebra [.key][.pdf]
  3. Probability and Information Theory [.key][.pdf]
  4. Numerical Computation [.key] [.pdf] [youtube]
  5. Machine Learning Basics [.key] [.pdf]
  6. Deep Feedforward Networks [.key] [.pdf]
    • Video (.flv) of a presentation by Ian and a group discussion at a reading group at Google organized by Chintan Kaur.
  7. Regularization for Deep Learning [.pdf] [.key]
  8. Optimization for Training Deep Models
    • Gradient Descent and Structure of Neural Network Cost Functions [.key] [.pdf]
      These slides describe how gradient descent behaves on different kinds of cost function surfaces. Intuition for the structure of the cost function can be built by examining a second-order Taylor series approximation of the cost function. This quadratic function can give rise to issues such as poor conditioning and saddle points. Visualization of neural network cost functions shows how these and some other geometric features of neural network cost functions affect the performance of gradient descent.
    • Tutorial on Optimization for Deep Networks [.key] [.pdf]
      Ian's presentation at the 2016 Re-Work Deep Learning Summit. Covers Google Brain research on optimization, including visualization of neural network cost functions, Net2Net, and batch normalization.
    • Batch Normalization [.key] [.pdf]
    • Video of lecture / discussion: This video covers a presentation by Ian and group discussion on the end of Chapter 8 and entirety of Chapter 9 at a reading group in San Francisco organized by Taro-Shigenori Chiba.
  9. Convolutional Networks
    • Convolutional Networks [.key][.pdf]
      A presentation summarizing Chapter 9, based directly on the textbook itself.
    • Video of lecture / discussion: This video covers a presentation by Ian and group discussion on the end of Chapter 8 and entirety of Chapter 9 at a reading group in San Francisco organized by Taro-Shigenori Chiba.
  10. Sequence Modeling: Recurrent and Recursive Networks
    • Sequence Modeling [.pdf] [.key]
      A presentation summarizing Chapter 10, based directly on the textbook itself.
    • Video of lecture / discussion. This video covers a presentation by Ian and a group discussion of Chapter 10 at a reading group in San Francisco organized by Alena Kruchkova.
  11. Practical Methodology [.key][.pdf] [youtube]
  12. Applications [.key][.pdf]
  13. Linear Factors [.key][.pdf]
  14. Autoencoders [.key][.pdf]
  15. Representation Learning [.key][.pdf]
  16. Structured Probabilistic Models for Deep Learning[.key][.pdf]

Featured Post

Big Data : Nhu cầu thị trường - Định hướng nghề nghiệp - Tính chất công việc

Tập hợp một số câu hỏi từ bạn trên page https://www.facebook.com/bigdatavn  và từ các buổi thuyết trình ở Barcamp 1. Các người làm việc tro...