Random Forest Regression¶

Install packages¶

pip install sklearn

Requirement already satisfied: sklearn in /srv/conda/envs/notebook/lib/python3.6/site-packages (0.0)
Requirement already satisfied: scikit-learn in /srv/conda/envs/notebook/lib/python3.6/site-packages (from sklearn) (0.23.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from scikit-learn->sklearn) (2.1.0)
Requirement already satisfied: scipy>=0.19.1 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from scikit-learn->sklearn) (1.5.3)
Requirement already satisfied: numpy>=1.13.3 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from scikit-learn->sklearn) (1.19.4)
Requirement already satisfied: joblib>=0.11 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from scikit-learn->sklearn) (0.17.0)
Note: you may need to restart the kernel to use updated packages.

pip install pandas

Requirement already satisfied: pandas in /srv/conda/envs/notebook/lib/python3.6/site-packages (1.1.5)
Requirement already satisfied: python-dateutil>=2.7.3 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from pandas) (2.8.1)
Requirement already satisfied: pytz>=2017.2 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from pandas) (2020.4)
Requirement already satisfied: numpy>=1.15.4 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from pandas) (1.19.4)
Requirement already satisfied: six>=1.5 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)
Note: you may need to restart the kernel to use updated packages.

pip install numpy

Requirement already satisfied: numpy in /srv/conda/envs/notebook/lib/python3.6/site-packages (1.19.4)
Note: you may need to restart the kernel to use updated packages.

pip install yfinance

Requirement already satisfied: yfinance in /srv/conda/envs/notebook/lib/python3.6/site-packages (0.1.55)
Requirement already satisfied: pandas>=0.24 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from yfinance) (1.1.5)
Requirement already satisfied: numpy>=1.15 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from yfinance) (1.19.4)
Requirement already satisfied: multitasking>=0.0.7 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from yfinance) (0.0.9)
Requirement already satisfied: lxml>=4.5.1 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from yfinance) (4.6.2)
Requirement already satisfied: requests>=2.20 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from yfinance) (2.24.0)
Requirement already satisfied: python-dateutil>=2.7.3 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from pandas>=0.24->yfinance) (2.8.1)
Requirement already satisfied: pytz>=2017.2 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from pandas>=0.24->yfinance) (2020.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from requests>=2.20->yfinance) (1.25.10)
Requirement already satisfied: chardet<4,>=3.0.2 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from requests>=2.20->yfinance) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from requests>=2.20->yfinance) (2020.6.20)
Requirement already satisfied: idna<3,>=2.5 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from requests>=2.20->yfinance) (2.10)
Requirement already satisfied: six>=1.5 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from python-dateutil>=2.7.3->pandas>=0.24->yfinance) (1.15.0)
Note: you may need to restart the kernel to use updated packages.

Import packages¶

from sklearn.ensemble import RandomForestRegressor
import pandas as pd
import numpy as np
import yfinance as yf
import math
from datetime import date, timedelta

Data Acquisition for the Random Forest Regressor¶

# We set the stock we want to work with
data = yf.Ticker('NFLX')

# This built-in feature will return us the date of today
today = date.today()

# We extract the data history for the ticker we selected from a starting day to an ending day
df = data.history(period="max",  start="2015-01-01", end="2020-12-01")

# We want to view the 5 first row of our collected data
df.head()

Create Testing and Training Data¶

# We want to predict if the stock will close up or down n days from the day indicated
n = 1
# If we have the data from a day m, then we want to obtain a prediction for the day m+1 since we set our n = 1

# We will work with the close column, so we create a list out of it
close_n_days = []
close = []
close_actual = df["Close"].copy()
for i in close_actual:
    close.append(i)

close_n_days = close[n:]
    
# We "delete" the n last rows of the X column and the n first rows in the Y column
# In this way, if we put them side by side, the Y value un row m will tell if the day m+n is a up or down day
df = df[:len(df)-n]

df["Close in n days"] = close_n_days
    

# We take the first p percent of our dataframe to be our training data
p = 90
df_percentage = int((len(close_n_days)*p)/100)
training = []
for i in range (df_percentage):
    training.append(True)
for i in range (df_percentage, len(close_n_days)):
    training.append(False)
    
df['Training Set'] = training

df[(df_percentage-2):(df_percentage+3)]

Creating dataframes with Test Rows and Training Rows¶

# We split the dataframe into two separate dataframes, one for testing and one for training
train, test = df[df['Training Set']==True], df[df['Training Set']==False]

Displaying the Number of Rows for the Testing and Training Dataframes¶

print('Number of rows in the training data: ', len(train))
print('Number of rows in the testing data: ', len(test))

Number of rows in the training data:  1339
Number of rows in the testing data:  149

Create a List of the Feature Columns' Names¶

# In this case the list of features is ['Open', 'High', 'Low', 'Close'] and they are used to predict the closing price in n days
features = df.columns[:4]
X = train[features]
y = train['Close in n days']

Creating the Random Forest Regressor¶

regr = RandomForestRegressor(n_estimators = 50)

Training the Regressor¶

regr.fit(X, y)

RandomForestRegressor(n_estimators=50)

Calculating the Coefficient of Determination R^2 of the Prediction¶

The coefficient of determination (R squared) is used to see how accurate the predictions are. The closer this R squared is to 1, the better the predictions.

regr.score(X, y)

0.9994430722470299

Applying the Trained Regressor to the Testing Data¶

# We apply the model to our testing dataframe
preds = regr.predict(test[features])

print('First five test values: ', preds[0:5])

First five test values:  [408.10320129 412.15060059 412.70239868 423.90580017 426.19799805]

# We can compare the predicted values above to the real values shown below
test['Close in n days'].head()

Date
2020-04-29    419.850006
2020-04-30    415.269989
2020-05-01    428.149994
2020-05-04    424.679993
2020-05-05    434.260010
Name: Close in n days, dtype: float64

Predicting for one Specific Example¶

# We get yesterday's date
yesterday = today + timedelta(days=-2)

# We take the data from yesterday (and only from yesterday)
pred_data = data.history(period="max",  start=yesterday, end=today)

pred_data_y = pred_data[:1]

# We again only take the Open, High, Low and Close Features
X = pred_data_y[features]

# We predict the closing price for yesterday
preds = regr.predict(X)
print('Today the predicted closing value is: ', preds[0])

Today the predicted closing value is:  424.46599548339844

print("Today's actual closing value was: ")

pred_data_t = pred_data[1:]

print(pred_data_t["Close"][0])

Today's actual closing value was: 
534.4500122070312

	Open	High	Low	Close	Volume	Dividends	Stock Splits
Date
2015-01-02	49.151428	50.331429	48.731430	49.848572	13475000	0	0.0
2015-01-05	49.258572	49.258572	47.147144	47.311428	18165000	0	0.0
2015-01-06	47.347141	47.639999	45.661430	46.501427	16037700	0	0.0
2015-01-07	47.347141	47.421429	46.271427	46.742859	9849700	0	0.0
2015-01-08	47.119999	47.835712	46.478573	47.779999	9601900	0	0.0

	Open	High	Low	Close	Volume	Dividends	Stock Splits	Close in n days	Training Set
Date
2020-04-27	425.000000	429.000000	420.839996	421.380005	6277500	0	0.0	403.829987	True
2020-04-28	419.989990	421.000000	402.910004	403.829987	10101200	0	0.0	411.890015	True
2020-04-29	399.529999	415.859985	393.600006	411.890015	9693100	0	0.0	419.850006	False
2020-04-30	410.309998	424.440002	408.000000	419.850006	7954000	0	0.0	415.269989	False
2020-05-01	415.100006	427.970001	411.730011	415.269989	8299900	0	0.0	428.149994	False