Intro to Support Vector Machines with a Trading Example (2024)

Let’s try to understand Support Vector Machines and how to implement them in financial markets

Published in

Maximal Margin Classifier

The support vector machine algorithm comes from the maximal margin classifier. The maximal margin classifier uses the distance from a given decision boundary to classify an input. The greater the distance, or margin, the better the classifier is at handling the data. On a Cartesian plane, the boundary can be thought of as a line. In three dimensional space, it is a plane, but after that, it becomes hard to conceptualize. This boundary can be better thought of as a hyperplane, specifically one of dimension p-1, where p is the dimension of the data point.
Our boundary, or hyperplane, is known as a separating hyperplane, because it is used to separate the data points into desired categories. In general, there are many hyperplanes that can separate a given data set, but the one we care about is the maximal margin hyperplane or the optimal separating hyperplane. This separating hyperplane is the one with the largest minimum distance from each data point in the training set. By using this hyperplane to classify a data point from the test set, we have the maximal margin classifier.

Intro to Support Vector Machines with a Trading Example (4)

The line in the graph above represents the hyperplane. Notice that it completely separates all of the points in the blue and purple regions of the graph.

Now the maximal margin classifier works, to a degree. If you have a data set which cannot be separated by a hyperplane, you can no longer use this. Sometimes, you may run into a data that has more than two categories, which makes a linear boundary useless.

Intro to Support Vector Machines with a Trading Example (5)

At this point, you have to consider your options:

Support Vector Classifiers

I like the second option too. By using a classifier that isn’t perfect, you can at least handle most observations, and introduce a level of adaptation to the model when it is presented with new data.

This evolution of the maximal margin classifier is known as the support vector classifier (SVC), or the soft margin classifier. Instead of being exact and not very robust in its classification, the SVC allows some observations to be on the wrong side of the margin and/or hyperplane (where the soft comes from), for the sake of getting classification mostly correct.

Without getting into too much math, the algorithm determines which side of the hyperplane an observation will lie on by finding a solution to an optimization problem that uses a tuning parameter, the width of the margin (which it tries to maximize) and slack variables.

The tuning parameter is used to control the bias-variance tradeoff. When it is small, the classifier fits the data well as the margins are small. In other words, low bias, high variance. A larger tuning parameter is the opposite. It allows for more observations to be on the wrong side of the margin allowing for high bias and low variance.

Slack variables in particular are pretty cool. They allow data points to be on the side of the margin or hyperplane. They are also used to transform inequalities into an equality. The values that the slack values take on can also tell us about the behavior of a given data point. If the slack variable for a given data point is equal to 0, then that data point is on the right side of the margin. If the slack variable is greater than 0 but less than 1, the data point is on the wrong side of the margin, but on the right side of the hyperplane. If the slack variable is greater than 1, the data point is on the wrong side of the hyper plane.

The main reason this optimization matters is its affect on the hyperplane. The only values that affect the hyperplane, and in turn how data points are classified, are those that are on the margin, or on the wrong side of it. If an object is on the right side of the hyperplane, it has not affect on it. The classifier gets its name from the former data points, as they are known as support vectors.

Finally, Support Vector Machines

The support vector machines builds on the optimization in support vector classifiers by growing the feature space by using kernels.

Kernels, similar to the previous optimization, uses a fair bit of math. Put simply, kernels tell us how similar data points are. By assigning weights to to sequences of data, it can identify how similar two points are, given that it has learned how to compare them. Kernels allow data to be processed in simpler terms, as opposed to being done in a higher dimensional space. More specifically, it computes inner products between all possible outputs of all of the pairs of data points in the feature space. By using kernels instead of enlarging the feature space, the algorithm can be much more efficient. It uses one function to compare pairs of distinct data points as opposed to using functions for original features in the data set.

Intro to Support Vector Machines with a Trading Example (6)

Many different kernels exist including the RBF kernel, graph kernels, the linear kernel, polynomial kernel. For example, the linear kernel compares a pair of data points by using their bivariate correlation. The polynomial kernel attempts to fit an SVC in a higher dimensional space. A support vector classifier is the same as using an SVM with a polynomial kernel of degree 1.

Basically, the main goal of the Support Vector Machine is to construct a hyperplane, which it then uses to classify data. Despite generally being categorized as a classification algorithm, there is an extension of the Support Vector Machine used for regression, known as Support Vector Regression.

Support Vector Machines for Trading

Intro to Support Vector Machines with a Trading Example (7)

Before I get into this application, this is by no means advice on how/what you should trade. That’s on you.

We’ll begin by gathering our data.

We’ll use a time period going back about five years, October 28, 2014 to October 28, 2019. The stocks that we will get data for are the components of the Dow Jones Industrial Average.

Yahoo Finance used to be really easy to get data from, but most packages no longer work, so we’ll also create a web scraper in the process.

The first thing we’ll do is import all of the packages we’ll need.

Then we’ll use the requests package to scrape the contents of this page on Yahoo Finance. The page contains the names of the companies that make up the Dow Jones Industrial Average, as well as their tickers. Next, we’ll use BeautifulSoup4 to make the information in Dow_Content searchable.

The lines above parse the data gathered from the web page and search for the bit of HTML code that corresponds to the table on the page. This can be found by right click on the area of the page, inspecting the element, and with a little investigation you can find the class name used above.

There will be two types of lines that the search will come across:

Lines containing the ticker
Lines containing the company name with no ticker

We don’t care for the latter, so when the loop finds them, it ignores that bit and moves on. A few string operations to trim the extra fat and we have our ticker. Each ticker is then added to a list for safe keeping.

Yahoo Finance uses a Unix time stamp in their URL, so we make use of the time package to convert our start and end dates to the desired format. It can take either struct_time (more about that here) or a tuple of 9 time arguments. We don’t really care for anything past the date here.

The ScrapeYahoo function takes four arguments:

data_df, your designated data structure to store the output
ticker, a string representing a given stock
start, a Unix timestamp representing the start date
end, a Unix timestamp representing the current day

It combines these with the base URL for Yahoo Finance and gets the data from the desired web page. Instead of processing it like we did earlier, we parse the JSON data from the page. Yahoo Finance uses cookies now, and simply using the HTML code will throw an error.

The lines after parse the content of the JSON data. Something that helped a lot while I was initially exploring the data set was the keys() method for Python dictionaries. It made traversing the JSON data much easier. You can read about it here

The Stock_Data dictionary will hold our parsed data. The keys in the dictionary will be the ticker of a given stock. For each stock, the function ScrapeYahoo will create a date frame containing open, high, low, close, and volume data.

We have historical price data, now what? Recall that the support vector machine is a classification algorithm. We’re going to attempt to create the features for our model with the help of technical analysis.

Technical analysis is a methodology that uses past data to forecast the future direction of price. In general, technical indicators use price data and volume in their calculations. The motivation for the indicators chosen come from the papers listed in the references section at the end of the article.

One very important thing to pay attention to before moving on: look-ahead bias. We already have all of the closing data, which is what will be used for calculations. In a real world scenario, the most you have is the previous day’s closing. We have to make sure our calculations don’t take in data that technically had not occurred yet. To do this, we will lag the data. That is, shift our data back one day.

Technical Analysis

We will make use of the talib library perform the technical analysis calculations.

Intro to Support Vector Machines with a Trading Example (8)

We’ll be using our returns column to calculate our label for each trading day. If returns are 0 or positive, it will be labelled 1, otherwise it will be labelled 0. Notice that the returns column uses opening prices as opposed to closing prices, to avoid look-ahead bias.

Training the Model

Before we start setting up our model, the data must be normalized. By doing so, all of the features are scaled and given equal importance when the SVM calculates its distances.

We used the MaxAbsScaler, which scales each feature by its maximum absolute value.

Next, we created a dictionary to store the training and testing data. If NaN values aren’t dropped, the model will not run. The variable X will contain all of the features of model, which will then be scaled. It is important to drop the Signal and Returns columns. We are predicting the Signal, if its keep, the model will be almost perfect. If we keep the Returns column, it will also influence the model too much. Remember, the Signal column was initially calculated by using the returns calculated. Y is what we want to predict, so we assign it to the column containing signals.

Our model will use 70% of the data to train, and 30% to test on as shown in line 11.

The model is defined by the model variable (in case you were confused). I left various kernel configurations in the notebook linked below that you can play around with. The model is fit to the training data and used to predict values in the Signal column.

Lastly, we add the accuracy, precision, and recall to our model dictionary for each stock.

Almost there! The next bit of code calculates returns using the signals from the SVM model. By using the iloc method for Pandas data frames, its much easier to append the signals to the end as opposed to how we did it earlier.

We’ll calculate returns relative to how the market performed and use it as our benchmark. Portfolio performance is gauged by use of the Sharpe Ratio.

Finally, graphing the outcomes of the predictions.

A few examples of the output are as follow:

Intro to Support Vector Machines with a Trading Example (9)

Intro to Support Vector Machines with a Trading Example (10)

Conclusion

The model isn’t perfect by any means, but it does work for some equities in the Dow Jones Industrial Average. A few ways that this could be improved:

Use the technical indicators to create signals instead of only returns
Adapt the model for long/short scenarios
Use different technical indicators
Create a portfolio including position sizing, transaction costs, slippage etc.

Financial markets are a wonderfully complex place and our model is fairly simple. This example simply classified whether previous returns were indicative of future price direction.

There’s lots of room for improvement, so take a shot at it. A link to the notebook will be in the references, so feel free to play with the code.

I hope this was helpful in your understanding of support vector machines!

References

[1] R. Rosillo, J.Giner, D. De la Fuente and R. Pino ,Trading System Based on Support Vector Machines in the S&P 500 Index (2012), Proceedings of the 2012 International Conference on Artificial Intelligence, ICAI 2012

[2] B. Henrique, V. Sobreiro and H. Kimura, Stock price prediction using support vector regression on daily and up to the minute prices (2018), The Journal of Finance and Data Science

[3] X. Di, Stock Trend Prediction with Technical Indicators using SVM (2014)

[4] G. James, D. Witten, T. Hastie and R. Tibshirani, An Introduction to Statistical Learning with Applications in R (2017)

Python Code

EDIT: Fixed issue with using “shift” through out various parts of the code. Thanks to those that called it out, I appreciate the feedback.

Intro to Support Vector Machines with a Trading Example (2024)

FAQs

What is a Support Vector Machine with an example? ›

A support vector machine (SVM) is a machine learning algorithm that uses supervised learning models to solve complex classification, regression, and outlier detection problems by performing optimal data transformations that determine boundaries between data points based on predefined classes, labels, or outputs.

Read On ›

What is the Support Vector Machine for stock trading? ›

SVM is a mathematical model with solid theoretical foundation and substantially developed in pattern recognition, function estimation, and time series prediction. Accurate prediction of stock prices helps to take better investment decisions with minimum risk.

Discover More Details ›

How does SVM work step by step? ›

The steps of the SVM algorithm involve: (1) selecting the appropriate kernel function, (2) defining the parameters and constraints, (3) solving the optimization problem to find the optimal hyperplane, and (4) making predictions based on the learned model.

What is a Support Vector Machine for dummies? ›

Support vector machines (SVM) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. It is a machine learning approach. They analyze the large amount of data to identify patterns from them.

See Details ›

What is a support vector machine summary? ›

Support Vector Machines (SVMs) are binary classifiers based on vector representations of training instances that find a hyperplane between two classes such that the distance (margin) between the closest training examples and the hyperplane is maximized.

Find Out More ›

What are the two types of support vector machines? ›

Types of support vector machines

Linear SVM. Linear SVMs use a linear kernel to create a straight-line decision boundary that separates different classes. ...
Nonlinear SVM. Nonlinear SVMs address scenarios where the data cannot be separated by a straight line in the input feature space.

Tell Me More ›

Is SVM a good investment? ›

SVM is currently sporting a Zacks Rank of #2 (Buy) and an A for Value. The stock is trading with a P/E ratio of 8.88, which compares to its industry's average of 11.44. SVM's Forward P/E has been as high as 30.26 and as low as 7.12, with a median of 10.72, all within the past year.

Show Me More ›

Why should I use support vector machine? ›

It is effective in high-dimensional spaces. It is effective in cases where the number of dimensions is greater than the number of samples. It uses a subset of the training set in the decision function (called support vectors), so it is also memory efficient.

Explore More ›

What is the basic principle of a support vector machine? ›

SVM works by mapping data to a high-dimensional feature space so that data points can be categorized, even when the data are not otherwise linearly separable. A separator between the categories is found, then the data are transformed in such a way that the separator could be drawn as a hyperplane.

What is the introduction of SVM algorithm? ›

Support Vector Machine (SVM) is a relatively simple Supervised Machine Learning Algorithm used for classification and/or regression. It is more preferred for classification but is sometimes very useful for regression as well. Basically, SVM finds a hyper-plane that creates a boundary between the types of data.

Show Me More ›

What is the equation for SVM? ›

The goal of the algorithm involved behind SVM:

Finding a hyperplane with the maximum margin (margin is basically a protected space around hyperplane equation) and algorithm tries to have maximum margin with the closest points (known as support vectors). w^T(Φ(x)) + b < 0.

Read The Full Story ›

What is support vector machines with examples? ›

Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection. The advantages of support vector machines are: Effective in high dimensional spaces. Still effective in cases where number of dimensions is greater than the number of samples.

See Details ›

What are the key terminologies of Support Vector Machine? ›

Support Vector Machine Terminology

Support Vectors: Support vectors are the closest data points to the hyperplane, which makes a critical role in deciding the hyperplane and margin. Margin: Margin is the distance between the support vector and hyperplane.

Get More Info Here ›

When not to use SVM? ›

Not suitable for large datasets with many features: SVMs can be very slow and can consume a lot of memory when the dataset has many features. Not suitable for datasets with missing values: SVMs requires complete datasets, with no missing values, it can not handle missing values.

Why is it called a support vector machine? ›

Note that in the picture, the support vectors are "supporting" the decision boundary plus the margin. The training samples(data points denoted by a vector arrow or tuple ) that can be thought of as "supporting " or "holding" the optimal hyperplane. hence the name SVM.

What are the real world applications of support vector machines? ›

SVMs are used in applications like handwriting recognition, intrusion detection, face detection, email classification, gene classification, and in web pages. This is one of the reasons we use SVMs in machine learning. It can handle both classification and regression on linear and non-linear data.

View Details ›

What is the difference between SVC and SVM? ›

And that's the difference between SVM and SVC. If the hyperplane classifies the dataset linearly then the algorithm we call it as SVC and the algorithm that separates the dataset by non-linear approach then we call it as SVM. SVM has a technique called the kernel trick.