Big Data Vietnam: Guide to machine learning and big data jobs in finance from J.P.Morgan

Main Points

Banks will need to hire excellent data scientists who also understand how markets work
Machines are best equipped to make trading decisions in the short and medium term
An army of people will be needed to acquire, clean, and assess the data
There are different kinds of machine learning. And they are used for different purposes
Supervised learning will be used to make trend-based predictions using sample data
Unsupervised learning will be used to identify relationships between a large number of variables
Deep learning systems will undertake tasks that are hard for people to define but easy to perform
Reinforcement learning will be used to choose a successive course of actions to maximize the final reward
You won’t need to be a machine learning expert, you will need to be an excellent quant and an excellent programmer
These are the coding languages and data analysis packages you’ll need to know
And these are some examples of popular machine learning codes using Python
Support functions are going to need to understand big data too

Minimum Spanning Tree for 31 JPM tradable risk premia indices

Financial services jobs go in and out of fashion. In 2001 equity research for internet companies was all the rage. In 2006, structuring collateralised debt obligations (CDOs) was the thing. In 2010, credit traders were popular. In 2014, compliance professionals were it. In 2017, it’s all about machine learning and big data. If you can get in here, your future in finance will be assured.

J.P. Morgan’s quantitative investing and derivatives strategy team, led Marko Kolanovic and Rajesh T. Krishnamachari, has just issued the most comprehensive report ever on big data and machine learning in financial services.

Titled, ‘Big Data and AI Strategies’ and subheaded, ‘Machine Learning and Alternative Data Approach to Investing’, the report says that machine learning will become crucial to the future functioning of markets. Analysts, portfolio managers, traders and chief investment officers all need to become familiar with machine learning techniques. If they don’t they’ll be left behind: traditional data sources like quarterly earnings and GDP figures will become increasingly irrelevant as managers using newer datasets and methods will be able to predict them in advance and to trade ahead of their release.

At 280 pages, the report is too long to cover in detail, but we’ve pulled out the most salient points for you below.

1. Banks will need to hire excellent data scientists who also understand how markets work

J.P. Morgan cautions against the fashion for banks and finance firms to prioritize data analysis skills over market knowledge. Doing so is dangerous. Understanding the economics behind the data and the signals is more important than developing complex technical solutions.

2. Machines are best equipped to make trading decisions in the short and medium term

J.P. Morgan notes that human beings are already all but excluded from high frequency trading. In future, they say machines will become increasingly prevalent over the medium term too: “Machines have the ability to quickly analyze news feeds and tweets, process earnings statements, scrape websites, and trade on these instantaneously.” This will help erode demand for fundamental analysts, equity long-short managers and macro investors.

In the long term, however, humans will retain an advantage: “Machines will likely not do well in assessing regime changes (market turning points) and forecasts which involve interpreting more complicated human responses such as those of politicians and central bankers, understanding client positioning, or anticipating crowding,” says J.P. Morgan. If you want to survive as a human investor, this is where you will need to make your niche,

4. An army of people will be needed to acquire, clean, and assess the data

Before machine learning strategies can be implemented, data scientists and quantitative researchers need to acquire and analyze the data with the aim of deriving tradable signals and insights.

J.P. Morgan notes that data analysis is complex. Today’s datasets are often bigger than yesterday’s. They can include anything from data generated by individuals (social media posts, product reviews, search trends, etc.), to data generated by business processes (company exhaust data, commercial transaction, credit card data, etc.) and data generated by sensors (satellite image data, foot and car traffic, ship locations, etc.). These new forms of data need to be analyzed before they can be used in a trading strategy. They also need to be assessed for ‘alpha content’ – their ability to generate alpha. Alpha content will be partially dependent upon the cost of the data, the amount of processing required and how well-used the dataset is already.

5. There are different kinds of machine learning. And they are used for different purposes

Machine learning has various iterations, including supervised learning, unsupervised learning and deep and reinforcement learning.

The purpose of supervised learning is to establish a relationship between two datasets and to use one dataset to forecast the other. The purpose of unsupervised learning is to try to understand the structure of data and to identify the main drivers behind it. The purpose of deep learning is to use multi-layered neural networks to analyze a trend, while reinforcement learning encourages algorithms to explore and find the most profitable trading strategies.

JPMorgan machine learning classification

6. Supervised learning will be used to make trend-based predictions using sample data

In a finance context, J.P. Morgan says supervised learning algorithms are provided with provided historical data and asked to find the relationship that has the best predictive power. Supervised learning algorithms come in two varieties: regression and classification methods.

Regression-based supervised learning methods try to predict outputs based on input variables. For example, they might look at how the market will move if inflation spikes.

Classification methods work backwards and try to identify which category a set of classifications belong to.

7. Unsupervised learning will be used to identify relationships between a large number of variables

In unsupervised learning, a machine is given an entire set of returns from assets and doesn’t know which are the dependent and the independent variables. At a high level, unsupervised learning methods are categorized as clustering or factor analyses.

Clustering involves splitting a dataset into smaller groups based on some notion of similarity. For example, it cant involve identifying historical regimes with high and low volatility, rising and failing rates, or rising and falling inflation.

Factor analyses aim to identify the main drivers of the data or to identify best representation of the data. For example, yield curve movements can be described by the parallel shift of yields, steepening of the curve, and convexity of the curve. In a multi-asset portfolio, factor analysis will identify the main drivers such as momentum, value, carry, volatility, or liquidity.

8. Deep learning systems will undertake tasks that are hard for people to define but easy to perform

Deep learning is effectively an attempt to artificially recreate human intelligence. J.P. Morgan says deep learning is particularly well suited to the pre-processing of unstructured big data sets (for instance, it can be used to count cars in satellite images, or to identify sentiment in a press release.). A deep learning model could use a hypothetical financial data series to estimate the probability of a market correction.

Deep Learning methods are based on neural networks which are loosely inspired by the workings of the human brain. In a network, each neuron receives inputs from other neurons, and ‘computes’ a weighted average of these inputs. The relative weighting of different inputs is guided by the past experience.

9. Reinforcement learning will be used to choose a successive course of actions to maximize the final reward

The goal of reinforcement learning is to choose a course of successive actions in order to maximize the final (or cumulative) reward. Unlike supervised learning (which is typically a one step process), the reinforcement learning model doesn’t know the correct action at each step.

J.P. Morgan’s electronic trading group has already developed algorithms using reinforcement learning. The diagram below shows the bank’s machine learning model (we suspect it’s blurry on purpose).

JPMorgan algorithmic trading architecture

10. You won’t need to be a machine learning expert, you will need to be an excellent quant and an excellent programmer

J.P. Morgan says the skillset for the role of data scientists is virtually the same as for any other quantitative researchers. Existing buy side and sell side quants with backgrounds in computer science, statistics, maths, financial engineering, econometrics and natural sciences should therefore be able to reinvent themselves. Expertise in quantitative trading strategies will be the crucial skill. “It is much easier for a quant researcher to change the format/size of a dataset, and employ better statistical and Machine Learning tools, than for an IT expert, silicon valley entrepreneur, or academic to learn how to design a viable trading strategy,” say Kolanovic and Krishnamacharc.

By comparison, J.P. Morgan notes that you won’t need to know about machine learning in any great detail. – Most of the Machine Learning methods are already coded (e.g. in R): you just need to apply the existing models. As a start, they suggest you can look at small datasets using GUI-based software like Weka. Python also has extensive libraries like Keras (keras.io). And there are open source Machine Learning libraries like Tensorflow and Theano.

11. These are the coding languages and data analysis packages you’ll need to know

If you’re only planning to learn one coding language related to machine learning, J.P. Morgan suggests you choose R, along with the related packages below. However, C++, Python and Java also have machine learning applications as shown below.

12. And these are some examples of popular machine learning codes using Python

13. Support functions are going to need to understand big data too

Lastly, J.P. Morgan notes that support functions need to know about big data too. The report says that too many recruiters and hiring managers are incapable of distinguishing between an ability to talk broadly about artificial intelligence and an ability to actually design a tradeable strategy At the same time, compliance teams will need to be able to vet machine learning models and to ensure that data is properly anonymized and doesn’t contain private information. The age of machine learning in finance is upon us.

Big Data Vietnam

Pages

Sunday, December 17, 2017

Guide to machine learning and big data jobs in finance from J.P.Morgan