Deep Trading with TensorFlow V
Do you want to know how to build a multi-layered neural network? As deep as you want?
In the next post, we will use real market data. In this one, we will still use non-trading data, because we are looking for a well-established knowledge of the basic concepts of Tensorflow. But we will use data used in other very real and current problems.
OK, remember to keep in mind our other posts that make up a systematic and complete structure to deal with problems of supervised machine learning:
https://todotrader.com/deeptrading-with-tensorflow/
https://todotrader.com/deeptrading-with-tensorflow-I/
https://todotrader.com/deeptrading-with-tensorflow-II/
https://todotrader.com/deeptrading-with-tensorflow-III/
https://todotrader.com/deeptrading-with-tensorflow-IV/
Implementing a multiple hidden layer Neural Network
The progress of the model can be saved during and after training. This means that a model can be resumed where it left off and avoid long training times. Saving also means that you can share your model and others can recreate your work.
In the last post, we presented a simple regression problem that we solved with a neural network with a single hidden layer. I made a prediction of one of the variables involved.
This type of prediction is called “problems or regression analysis”, compared to other types of problems such as “classification” (see Artifical Intelligence Taxonomy in my post, https://todotrader.com/artificial-intelligence-trading-systems /)
Regression analysis can help us to model the relationship between a dependent variable (which we are trying to predict) and one or more independent variables (the input of the model). The regression analysis can show if there is a significant relationship between the independent variables and the dependent variable, and the importance of their interrelation: when the independent variables move, how much can we expect the dependent variable to move?
We will illustrate how to create a multiple fully connected hidden layer NN, save it and make predictions with a
We will use a more complex data than the iris data is for this exercise. That dataset is “Concrete Compressive Strength Data Set” from https://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength
Data Characteristics:
The actual concrete compressive strength (MPa) for a given mixture under a specific age (days) was determined from laboratory. Data is in raw form (not scaled).
Summary Statistics:
Number of instances (observations): 1030 Number of Attributes: 9 Attribute breakdown: 8 quantitative input variables, and 1 quantitative output variable Missing Attribute Values: None
We will build a three-hidden layer neural network to predict the nineth attribute, the concrete compressive strength, from the other eight.
Load configuration
In [1]:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
#from sklearn.datasets import load_iris
from tensorflow.python.framework import ops
import pandas as pd
/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
Ingest raw data
In [2]:
# Dataset "Concrete Compressive Strength Data Set" from: https://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength
df = pd.read_excel(r'../data/raw/Concrete_Data.xls') #for an earlier version of Excel, you may need to use the file extension of 'xls'
#Simplifying column names
df.columns = ["Cement", "Blast Furnace Slag", "Fly Ash", "Water", "Superplasticizer",
"Coarse Aggregate", "Fine Aggregate", "Age", "Strength"]
# We get a pandas dataframe to better visualize the datasets
df
Out[2]:
Cement | Blast Furnace Slag | Fly Ash | Water | Superplasticizer | Coarse Aggregate | Fine Aggregate | Age | Strength | |
---|---|---|---|---|---|---|---|---|---|
0 | 540.0 | 0.0 | 0.0 | 162.0 | 2.5 | 1040.0 | 676.0 | 28 | 79.986111 |
1 | 540.0 | 0.0 | 0.0 | 162.0 | 2.5 | 1055.0 | 676.0 | 28 | 61.887366 |
2 | 332.5 | 142.5 | 0.0 | 228.0 | 0.0 | 932.0 | 594.0 | 270 | 40.269535 |
3 | 332.5 | 142.5 | 0.0 | 228.0 | 0.0 | 932.0 | 594.0 | 365 | 41.052780 |
4 | 198.6 | 132.4 | 0.0 | 192.0 | 0.0 | 978.4 | 825.5 | 360 | 44.296075 |
5 | 266.0 | 114.0 | 0.0 | 228.0 | 0.0 | 932.0 | 670.0 | 90 | 47.029847 |
6 | 380.0 | 95.0 | 0.0 | 228.0 | 0.0 | 932.0 | 594.0 | 365 | 43.698299 |
7 | 380.0 | 95.0 | 0.0 | 228.0 | 0.0 | 932.0 | 594.0 | 28 | 36.447770 |
8 | 266.0 | 114.0 | 0.0 | 228.0 | 0.0 | 932.0 | 670.0 | 28 | 45.854291 |
9 | 475.0 | 0.0 | 0.0 | 228.0 | 0.0 | 932.0 | 594.0 | 28 | 39.289790 |
10 | 198.6 | 132.4 | 0.0 | 192.0 | 0.0 | 978.4 | 825.5 | 90 | 38.074244 |
11 | 198.6 | 132.4 | 0.0 | 192.0 | 0.0 | 978.4 | 825.5 | 28 | 28.021684 |
12 | 427.5 | 47.5 | 0.0 | 228.0 | 0.0 | 932.0 | 594.0 | 270 | 43.012960 |
13 | 190.0 | 190.0 | 0.0 | 228.0 | 0.0 | 932.0 | 670.0 | 90 | 42.326932 |
14 | 304.0 | 76.0 | 0.0 | 228.0 | 0.0 | 932.0 | 670.0 | 28 | 47.813782 |
15 | 380.0 | 0.0 | 0.0 | 228.0 | 0.0 | 932.0 | 670.0 | 90 | 52.908320 |
16 | 139.6 | 209.4 | 0.0 | 192.0 | 0.0 | 1047.0 | 806.9 | 90 | 39.358048 |
17 | 342.0 | 38.0 | 0.0 | 228.0 | 0.0 | 932.0 | 670.0 | 365 | 56.141962 |
18 | 380.0 | 95.0 | 0.0 | 228.0 | 0.0 | 932.0 | 594.0 | 90 | 40.563252 |
19 | 475.0 | 0.0 | 0.0 | 228.0 | 0.0 | 932.0 | 594.0 | 180 | 42.620648 |
20 | 427.5 | 47.5 | 0.0 | 228.0 | 0.0 | 932.0 | 594.0 | 180 | 41.836714 |
21 | 139.6 | 209.4 | 0.0 | 192.0 | 0.0 | 1047.0 | 806.9 | 28 | 28.237490 |
22 | 139.6 | 209.4 | 0.0 | 192.0 | 0.0 | 1047.0 | 806.9 | 3 | 8.063422 |
23 | 139.6 | 209.4 | 0.0 | 192.0 | 0.0 | 1047.0 | 806.9 | 180 | 44.207822 |
24 | 380.0 | 0.0 | 0.0 | 228.0 | 0.0 | 932.0 | 670.0 | 365 | 52.516697 |
25 | 380.0 | 0.0 | 0.0 | 228.0 | 0.0 | 932.0 | 670.0 | 270 | 53.300632 |
26 | 380.0 | 95.0 | 0.0 | 228.0 | 0.0 | 932.0 | 594.0 | 270 | 41.151375 |
27 | 342.0 | 38.0 | 0.0 | 228.0 | 0.0 | 932.0 | 670.0 | 180 | 52.124386 |
28 | 427.5 | 47.5 | 0.0 | 228.0 | 0.0 | 932.0 | 594.0 | 28 | 37.427515 |
29 | 475.0 | 0.0 | 0.0 | 228.0 | 0.0 | 932.0 | 594.0 | 7 | 38.603761 |
… | … | … | … | … | … | … | … | … | … |
1000 | 141.9 | 166.6 | 129.7 | 173.5 | 10.9 | 882.6 | 785.3 | 28 | 44.611855 |
1001 | 297.8 | 137.2 | 106.9 | 201.3 | 6.0 | 878.4 | 655.3 | 28 | 53.524711 |
1002 | 321.3 | 164.2 | 0.0 | 190.5 | 4.6 | 870.0 | 774.0 | 28 | 57.218234 |
1003 | 366.0 | 187.0 | 0.0 | 191.3 | 6.6 | 824.3 | 756.9 | 28 | 65.909079 |
1004 | 279.8 | 128.9 | 100.4 | 172.4 | 9.5 | 825.1 | 804.9 | 28 | 52.826962 |
1005 | 252.1 | 97.1 | 75.6 | 193.8 | 8.3 | 835.5 | 821.4 | 28 | 33.399596 |
1006 | 164.6 | 0.0 | 150.4 | 181.6 | 11.7 | 1023.3 | 728.9 | 28 | 18.033934 |
1007 | 155.6 | 243.5 | 0.0 | 180.3 | 10.7 | 1022.0 | 697.7 | 28 | 37.363394 |
1008 | 160.2 | 188.0 | 146.4 | 203.2 | 11.3 | 828.7 | 709.7 | 28 | 35.314271 |
1009 | 298.1 | 0.0 | 107.0 | 186.4 | 6.1 | 879.0 | 815.2 | 28 | 42.644091 |
1010 | 317.9 | 0.0 | 126.5 | 209.7 | 5.7 | 860.5 | 736.6 | 28 | 40.062003 |
1011 | 287.3 | 120.5 | 93.9 | 187.6 | 9.2 | 904.4 | 695.9 | 28 | 43.798273 |
1012 | 325.6 | 166.4 | 0.0 | 174.0 | 8.9 | 881.6 | 790.0 | 28 | 61.235811 |
1013 | 355.9 | 0.0 | 141.6 | 193.3 | 11.0 | 801.4 | 778.4 | 28 | 40.868690 |
1014 | 132.0 | 206.5 | 160.9 | 178.9 | 5.5 | 866.9 | 735.6 | 28 | 33.306517 |
1015 | 322.5 | 148.6 | 0.0 | 185.8 | 8.5 | 951.0 | 709.5 | 28 | 52.426376 |
1016 | 164.2 | 0.0 | 200.1 | 181.2 | 12.6 | 849.3 | 846.0 | 28 | 15.091251 |
1017 | 313.8 | 0.0 | 112.6 | 169.9 | 10.1 | 925.3 | 782.9 | 28 | 38.461040 |
1018 | 321.4 | 0.0 | 127.9 | 182.5 | 11.5 | 870.1 | 779.7 | 28 | 37.265488 |
1019 | 139.7 | 163.9 | 127.7 | 236.7 | 5.8 | 868.6 | 655.6 | 28 | 35.225329 |
1020 | 288.4 | 121.0 | 0.0 | 177.4 | 7.0 | 907.9 | 829.5 | 28 | 42.140084 |
1021 | 298.2 | 0.0 | 107.0 | 209.7 | 11.1 | 879.6 | 744.2 | 28 | 31.875165 |
1022 | 264.5 | 111.0 | 86.5 | 195.5 | 5.9 | 832.6 | 790.4 | 28 | 41.542308 |
1023 | 159.8 | 250.0 | 0.0 | 168.4 | 12.2 | 1049.3 | 688.2 | 28 | 39.455954 |
1024 | 166.0 | 259.7 | 0.0 | 183.2 | 12.7 | 858.8 | 826.8 | 28 | 37.917043 |
1025 | 276.4 | 116.0 | 90.3 | 179.6 | 8.9 | 870.1 | 768.3 | 28 | 44.284354 |
1026 | 322.2 | 0.0 | 115.6 | 196.0 | 10.4 | 817.9 | 813.4 | 28 | 31.178794 |
1027 | 148.5 | 139.4 | 108.6 | 192.7 | 6.1 | 892.4 | 780.0 | 28 | 23.696601 |
1028 | 159.1 | 186.7 | 0.0 | 175.6 | 11.3 | 989.6 | 788.9 | 28 | 32.768036 |
1029 | 260.9 | 100.5 | 78.3 | 200.6 | 8.6 | 864.5 | 761.5 | 28 | 32.401235 |
1030 rows × 9 columns
In [3]:
# Now our usual X, y variables
X_raw = df[df.columns[0:8]].values
y_raw = df[df.columns[8]].values
# Dimensions of dataset
print("Dimensions of dataset")
n = X_raw.shape[0]
p = X_raw.shape[1]
print("n=",n,"p=",p)
Dimensions of dataset
n= 1030 p= 8
In [4]:
X_raw.shape # Array 1030x8. Each element is an 8-dimensional data point: Cement, Blast Furnace Slag, Fly Ash,…
Out[4]:
(1030, 8)
In [5]:
y_raw.shape Vector 1030. Each element is a 1-dimensional (scalar) data point: Strength
Out[5]:
(1030,)
In [6]:
# We can confirm the data are right with a simple visualization.
X_raw
Out[6]:
array([[ 540. , 0. , 0. , ..., 1040. , 676. , 28. ],
[ 540. , 0. , 0. , ..., 1055. , 676. , 28. ],
[ 332.5, 142.5, 0. , ..., 932. , 594. , 270. ],
...,
[ 148.5, 139.4, 108.6, ..., 892.4, 780. , 28. ],
[ 159.1, 186.7, 0. , ..., 989.6, 788.9, 28. ],
[ 260.9, 100.5, 78.3, ..., 864.5, 761.5, 28. ]])
In [7]:
y_raw
Out[7]:
array([79.98611076, 61.88736576, 40.26953526, ..., 23.69660064,
32.76803638, 32.40123514])
Basic pre-process data
Checking multicollinearity of pairs
Plotting the pairwise scatterplots
Pairwise scatter plots and correlation heatmap are usual visual tools for checking multicollinearity. We can use the pairplot function from the seaborn library to plot the pairwise scatterplots of all combinations.
In [8]:
# Visualization
sns.set_style("whitegrid");
sns.pairplot(df);
plt.show()
As you can see it is a pretty difficult problem. There are almost no correlations between Strenght and the other features. Visually we can see some correlation of Strenght with the kind of Cement (it is not surprising, of course).
Plotting a diagonal correlation matrix
Now we are going to plot the diagonal correlation matrix with Seaborn:
In [9]:
# Correlation sns.set(style="white") # Generate a large random dataset rs = np.random.RandomState(33) # Compute the correlation matrix corr = df.corr() # Generate a mask for the upper triangle mask = np.zeros_like(corr, dtype=np.bool) mask[np.triu_indices_from(mask)] = True # Set up the matplotlib figure f, ax = plt.subplots(figsize=(11, 9)) # Generate a custom diverging colormap cmap = sns.diverging_palette(220, 10, as_cmap=True) # Draw the heatmap with the mask and correct aspect ratio sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0, square=True, linewidths=.5, cbar_kws={"shrink": .5})
Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ff0696ddb38>