Ordinary Least Square

Package Installation

In [1]:
pip install pandas
Requirement already satisfied: pandas in c:\users\annef\anaconda3\lib\site-packages (1.0.5)
Requirement already satisfied: pytz>=2017.2 in c:\users\annef\anaconda3\lib\site-packages (from pandas) (2020.1)
Requirement already satisfied: python-dateutil>=2.6.1 in c:\users\annef\anaconda3\lib\site-packages (from pandas) (2.8.1)
Requirement already satisfied: numpy>=1.13.3 in c:\users\annef\anaconda3\lib\site-packages (from pandas) (1.18.5)
Requirement already satisfied: six>=1.5 in c:\users\annef\anaconda3\lib\site-packages (from python-dateutil>=2.6.1->pandas) (1.15.0)
Note: you may need to restart the kernel to use updated packages.
In [7]:
pip install pandas-datareader
Requirement already satisfied: pandas-datareader in c:\users\annef\anaconda3\lib\site-packages (0.9.0)Note: you may need to restart the kernel to use updated packages.

Requirement already satisfied: requests>=2.19.0 in c:\users\annef\anaconda3\lib\site-packages (from pandas-datareader) (2.24.0)
Requirement already satisfied: lxml in c:\users\annef\anaconda3\lib\site-packages (from pandas-datareader) (4.5.2)
Requirement already satisfied: pandas>=0.23 in c:\users\annef\anaconda3\lib\site-packages (from pandas-datareader) (1.0.5)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\users\annef\anaconda3\lib\site-packages (from requests>=2.19.0->pandas-datareader) (1.25.9)
Requirement already satisfied: chardet<4,>=3.0.2 in c:\users\annef\anaconda3\lib\site-packages (from requests>=2.19.0->pandas-datareader) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in c:\users\annef\anaconda3\lib\site-packages (from requests>=2.19.0->pandas-datareader) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\annef\anaconda3\lib\site-packages (from requests>=2.19.0->pandas-datareader) (2020.6.20)
Requirement already satisfied: pytz>=2017.2 in c:\users\annef\anaconda3\lib\site-packages (from pandas>=0.23->pandas-datareader) (2020.1)
Requirement already satisfied: python-dateutil>=2.6.1 in c:\users\annef\anaconda3\lib\site-packages (from pandas>=0.23->pandas-datareader) (2.8.1)
Requirement already satisfied: numpy>=1.13.3 in c:\users\annef\anaconda3\lib\site-packages (from pandas>=0.23->pandas-datareader) (1.18.5)
Requirement already satisfied: six>=1.5 in c:\users\annef\anaconda3\lib\site-packages (from python-dateutil>=2.6.1->pandas>=0.23->pandas-datareader) (1.15.0)

Data Aquisition

First, we need to import the packages and modules.

In [1]:
import pandas as pd
from pandas_datareader import data
import matplotlib.pyplot as plt
from random import randint
import numpy as np
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

Now we define a function which sets the company we want to look at and which has as parameters the starting and the ending dates.

In [2]:
def get_data(start_date, end_date):
    symbol = 'NFLX'
    data_source = 'yahoo'
    start_date = start_date
    end_date = end_date
    df = data.get_data_yahoo(symbol, start_date, end_date)
    return df

Now we define a function which calculates the ordinary least square equation and returns either the OLS equation or only the two parameters a and b.

In [5]:
def OLS(df, k=1):
    x_real = np.arange(len(df.index))
    y_real = np.array(df['Adj Close'])
    x_bar = 1/len(x_real) * sum(x_real)
    y_bar = 1/len(y_real) * sum(y_real)
    a_1 = 0
    a_2 = 0
    for i in range(len(x_real)):
        a_1 += (x_real[i]-x_bar)* (y_real[i]-y_bar)
        a_2 += (x_real[i]-x_bar)**2


    a = a_1/a_2
    b = y_bar - a*x_bar
    print("Ordinary least square equation: y = {0:.2f}x+{1:.2f}".format(a, b))
    lx = x_real.tolist()
    eq = [a*i+b for i in lx]
    
    if k == 0:
        return a, b
    elif k==1:
        return eq

We now need to set our data with the appropiate linear equation

In [6]:
df = get_data('2020-01-01','2020-10-01')
eq = OLS(df)
Ordinary least square equation: y = 1.05x+327.43

Finally, we can plot the data with the appropriate OLS line. Since the stock market is closed on weekends and on public holidays, there are missing days. That is why we first need to define x as an ordinary array and then relate the dates later.

In [7]:
x = np.arange(len(df.index))
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot_date(x , y = df['Adj Close'])
fig.suptitle('NFLX')
ax.set_xlabel('Date')
ax.set_ylabel("Adj Close")

ax.plot(x, eq)

#to label the x axis with the dates. Here it is every 14 trading days.
xt = np.arange(0, len(df.index), step=14)
xl = df.index[xt].date


ax.set_xticks(xt, minor=False)
ax.set_xticklabels(xl,  minor=False, rotation=45)

plt.show()

Predictions

If we now want to predict the stock price of the first December 2020, we can use our OLS equation and caluclate the predicted price. However we first need to convert the date into an integer. Since the stock market is closed on Saturdays and Sundays, as well as on public holidays, we need to find out which integer corresponds to the first December 2020.

In [8]:
df = get_data('2020-01-01','2020-12-01')
df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 232 entries, 2020-01-02 to 2020-12-01
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   High       232 non-null    float64
 1   Low        232 non-null    float64
 2   Open       232 non-null    float64
 3   Close      232 non-null    float64
 4   Volume     232 non-null    int64  
 5   Adj Close  232 non-null    float64
dtypes: float64(5), int64(1)
memory usage: 12.7 KB

We see that we have 232 entries from the start of the year until the first December 2020. So by using our OLS equation, we get the following predicted stock price:

In [8]:
df = get_data('2020-01-01','2020-09-30')
a, b = OLS(df, 0)
y = a*232+b
y
Ordinary least square equation: y = 1.05x+327.44
Out[8]:
571.8261369870274

However if we look at the actual stock price of the first December 2020, we see that the price is 503.73 (see yahoo finance) so we have an error of +/- 13%. This error occurs because we do not have a proper linear relationship between the dependent and the independent variables. However the OLS equation helps us to get a direction of the stock prices in the near future.

Graph of the stock prices over the whole year 2020 combined with the best matching linear regression line:

In [59]:
df = get_data('2020-01-01','2020-11-28')
eq = OLS(df, 1)

x = np.arange(len(df.index))
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot_date(x , y = df['Adj Close'])
fig.suptitle('NFLX')
ax.set_xlabel('Date')
ax.set_ylabel("Adj Close")

ax.plot(x, eq)

#to label the x axis with the dates. Here it is every 14 trading days.
xt = np.arange(0, len(df.index), step=14)
xl = df.index[xt].date


ax.set_xticks(xt, minor=False)
ax.set_xticklabels(xl,  minor=False, rotation=45)

plt.show()
Ordinary least square equation: y = 0.87x+340.81
In [ ]: