Category: <span>Software</span>

Deep Trading with TensorFlow VIII

2019-10-062019-10-15
by parrondo

Introduction

After training and testing your model, you had a great question: how will you deploy the model and make predictions for new data samples? Luckily for us, TensorFlow was developed for production, and it provides a robust solution for model deployment, known as TensorFlow Serving. In this approaching, we have three steps:

Export your model from Tensorflow for serving (production repository)
Create and launch a Docker container with your model (Tensorflow server for model hosting)
Deploy it with Kubernetes into a cloud platform, (example Google Cloud, Amazon AWS, or Azure Kubernetes Services AKS). This deployment option is more professional but not included in this tutorial.

Credits to the kind people wich own the References for this tutorial.

You can get all files related to this tutorial in my github repository:

https://github.com/parrondo/deeptrading-tfserving-python (NOT YET AVAILABLE!! But it will be soon)

Trading system Tensorflow serving

Below we can see the components explained in this tutorial for deploying trading systems with Tensorflow serving. We will use the model of Deep Trading with TensorFlow VI, but you can use whatever model you train. The workflow is as follows:

TensorFlow

Well, you perfectly know what TensorFlow is: an open-source library for the development of Machine Learning and especially Deep Learning models created and supported by Google. We create the model with Tensorflow in our research/test environment and write it in our research/test repository of models.

Docker

Docker is a containerization engine and provides a convenient way to pack your stuff with all dependencies together to be deployed locally or in the cloud. The documentation is very comprehensive and you should check it for the details. We will use the Bitnami Docker Image for TensorFlow Serving. Please, read carefully Bitnami instructions.

TensorFlow Serving

TensorFlow Serving hosts the model and provides remote access to it. TensorFlow Serving has proper documentation on its architecture and useful tutorials. Unfortunately, they are using simple examples and get a little explanation, what you need to do for your trading models to be served.

Tensorflow Serving has two funny issues. On the one hand, the server implements a gRPC interface, so we need to create a client, which can communicate over gRPC. On the other hand, it provides operations on models stored as Protobuf.

Proto…what? Yes, Protocol Buffers (or Protobuf), which allows efficient data serialization. It is an open-source piece by Google 🙂

Below is the caption of the proposed Tensorflow server.

Kubernetes

Kubernetes is an open-source software created at Google, and it provides container orchestration, allowing you automated horizontal scaling, service discovery, load balancing, and more. So, it automates the management of your web services in the cloud.

Save our model

We have written our model when running the tutorial VI. In the snippets below, we have extracted the line to write the model for research/test and production environments.

We pass the pathname where we want the model stored to the builder. The last part of the path is the model version. We use it when retraining the model on real data.

TIP: Do not forget to delete all versions you don’t need.

The model path has two possibilities controlled with the boolean variable “to_production”:

For research/test environment is:

model_path = model_dir+”07_First_Forex_Prediction”

# Save model weights to disk
save_path=saver.save(sess,model_path)
print("Model saved in file: %s"%save_path)
print("First Optimization Finished!")

For production environment is:

production_model_path = production_dir+"models/"+"07_First_Forex_Prediction"

# Running a new session for predictions and export model to production
print("Starting prediction session...")
with tf.Session() as sess:
    # Initialize variables
    sess.run(init)

    # Try to restore a model if any.
    try:
        saver.restore(sess, model_path)
        print("Model restored from file: %s" % model_path)
        # We try to predict the close price of test samples
        feed_dict = {X: X_test_std}
        
        prediction = sess.run(y_hat, feed_dict)
        print(prediction) 

        %matplotlib inline
        # Plot Prices over time
        plt.plot(y_test, 'k-', label='y_test')
        plt.plot(prediction, 'r--', label='prediction')
        plt.title('Price over time')
        plt.legend(loc='upper right')
        plt.xlabel('Time')
        plt.ylabel('Price')
        plt.show()
        
        if to_production:
            # Pick out the model input and output
            X_tensor = sess.graph.get_tensor_by_name("X"+ ':0')
            y_tensor = sess.graph.get_tensor_by_name("out_layer" + ':0')

            model_input = build_tensor_info(X_tensor)
            model_output = build_tensor_info(y_tensor)
            
            # Create a signature definition for tfserving
            signature_definition = signature_def_utils.build_signature_def(
                inputs={"X": model_input},
                outputs={"out_layer": model_output},
                method_name=signature_constants.PREDICT_METHOD_NAME)

            model_version = 1
            export_model_dir = production_model_path+"/"+str(model_version)
            while os.path.exists(export_model_dir):
                model_version += 1
                export_model_dir = production_model_path+"/"+str(model_version)       
    
            builder = saved_model_builder.SavedModelBuilder(export_model_dir)  
            builder.add_meta_graph_and_variables(sess,
                                                 [tag_constants.SERVING],
                                                 signature_def_map={
                                                     signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
                                                     signature_definition})

            # Save the model so we can serve it with a model server :)
            builder.save()

    except Exception:
        print("Unexpected error:", sys.exc_info()[0])
        pass

Obviously, “07_First_Forex_Prediction” is the name of the model. Feel free to change the name at your convenience, but I recommend you stick to a naming rule.

In the example above (see complete code in Deep Trading with TensorFlow V) we created the placeholders, and operations on the default graph. Then we started a session and ran the operations.

It’s time to save the model. Here’s an outline for the code:

We have to define the input and output tensors. See lines below # Pick out the model input and output
Create a signature definition from the input and output tensors. The signature definition is what the model builder uses to save something a model server can load. See lines below # Create a signature definition for tfserving

First we have to figure out which nodes are input and output nodes. From our deep math model we can grab input tensor:

X = tf.placeholder(dtype=tf.float32, shape=[None, n_features], name="X")

and output tensor:

out_layer = tf.nn.relu(tf.add(tf.matmul(layer_4, variables['W5']), variables['bias5']), name="out_layer")

The names “X” and “out_layer” are the strings defining the input and output placeholders of our model. We can use whatever strings we like to name the input and output of your models; ‘inputs’ and ‘outputs’ are also proper names. TensorFlow has defined some constants for us that we can use. The definition of these constants are in signature_constants.py, and there are three sets of constants: predictions, classification, and regression. If you peek into signature_constants.py we’ll see that the input and output constants are ‘inputs’ and ‘outputs.’

A place where we have to use a string that TensorFlow has defined is the third keyword parameter called method_name. It must be one of tensorflow/serving/predict, tensorflow/serving/classify, tensorflow/serving/regress. They also defined in signature_constants.py as:

CLASSIFY_METHOD_NAME

PREDICT_METHOD_NAME

REGRESS_METHOD_NAME.

It is not clear why we need this to save the model, and the documentation is quite weak here. The model server will give you an error if you don’t use one of these constants.

Run the code, and we should have our model ready for serving. If the code runs without errors, you can find the model in <your_path>/models/07_First_Forex_Prediction/1.

Serving the model

If we finished the previous step without problems, we are ready to serve the model now.

We will use a simple CPU compiled server for this tutorial.

You can find the model server we’re going to use here. You can install it following the instruction for your operating system, but I already recommend a Docker image with the server. We can use the excellent one provided by Bitnami. Their Tensorflow Serving containers are designed to work well together, are well documented, and continuously updated when new versions are made available. I have been using them for years with excellent and robust results.

We recommend Docker Compose, which is a tool for defining and running multi-container Docker applications. It is very versatile. With Compose, we use a YAML file to configure our application’s services. Then, with a single command, you create and start all the services from your configuration. To learn more about all the features of Compose, see the list of features.

The recommended way to get the Bitnami TensorFlow Serving Docker Image is to pull the prebuilt image from the Docker Hub Registry.

$ docker pull bitnami/tensorflow-serving:latest

To serve the model all we need to do is run the file run_model.sh.

The content of the script is

#!/usr/bin/env bash
docker-compose up
# To stop: docker stop tensorflow-serving && docker rm tensorflow-serving

We should see an output like this; if so, we are now serving the model with TensorFlow Serving!

Congrats! You are now serving a model with TensorFlow Serving.

The model server runs a gRPC service, and we’ll tell you need to know about it and how to send requests.

To stopping the server you need to open a new terminal tab and run:

./stop_model.sh

The content of the script is:

#!/usr/bin/env bash
docker-compose down -v

and the results are:

Send predictions requests to your model

Sending prediction requests to the model before trading orders

You don’t need to understand or know anything about gRPC to complete this tutorial. Of course that knowledge is important if you want to modify the code.

For now, let’s focus on using a client built by Epigram AI from this Github repository (All credits to the author):

https://github.com/epigramai/tfserving-python-predict-client

The client works without TensorFlow installed. So create a new environment using conda or virualenv (or similar environment tool.)

We have copy this client in my Github repository and added some files you will need.

Come on and install the client:

As usual, we use the terminal or an Anaconda Prompt for the following steps:

$ <em>conda create --name myenv</em>


<em>$ conda activate myenv</em>

<em>(myenv)$ </em><em>pip install
git+https://github.com/parrondo/deeptrading-tfserving-python.git</em>

Make sure the model server is running with the model you want to test. Start it again if you stopped the container when we cleaned up with the stop_model.sh script.

Example 1

Below you can see the python file example1.py

#
# Example 1 file
#
# This client sends 3 requests to the Tensorflow server.
# The content of the request data is a numpy array with three columns
# representing three OHLC sample data, to test the version 1 of
# the model '07_First_Forex_Prediction'
# Note the 'in_tensor_name': 'X' which must be the same in your model tensor name.
#
# Author: R.M.Parrondo
#https://github.com/parrondo/deeptrading-tfserving-python
#

import logging
import numpy as np

from predict_client.prod_client import ProdClient

logging.basicConfig(level=logging.DEBUG,
                    format='%(asctime)s - %(levelname)s - %(name)s - %(message)s')

# In each file/module, do this to get the module name in the logs
logger = logging.getLogger(__name__)

# Make sure you have a model running on localhost:8500
host = '0.0.0.0:8500'
model_name = '07_First_Forex_Prediction'
model_version = 1

client = ProdClient(host, model_name, model_version)
data = np.array([[1.87374825, 1.87106024, 1.87083053, 1.86800846],
                 [1.87224386, 1.8729944 , 1.87405399, 1.8712318 ],
                 [1.86558156, 1.86289375, 1.86008567, 1.86521489]])
req_data = [{'in_tensor_name': 'X', 'in_tensor_dtype': 'DT_FLOAT', 'data': data}]

prediction = client.predict(req_data, request_timeout=10)
for k in prediction:
    logger.info('Prediction key: {}, shape: {}'.format(k, prediction[k].shape))
    logger.info('Prediction key: {}, shape: {}'.format(k, prediction[k]))

and the result is:

Example 2

And this is the python file of example2.py

#
# Example 2 file
#
# This client send several requests controled by 'repetitions', to the Tensorflow server.
# The content of the request data is a numpy array with 'repetitions' columns
# representing multipe OHLC sample data, to test the version 1 of
# the model '07_First_Forex_Prediction'
# Note the 'in_tensor_name': 'X' which must be the same in your model tensor name.
#
# Author: R.M.Parrondo
#https://github.com/parrondo/deeptrading-tfserving-python
#

import logging
import numpy as np

from predict_client.prod_client import ProdClient

# generate random floating point values
from random import seed
from random import random

# seed random number generator
seed(1)


logging.basicConfig(level=logging.DEBUG,
                    format='%(asctime)s - %(levelname)s - %(name)s - %(message)s')

# In each file/module, do this to get the module name in the logs
logger = logging.getLogger(__name__)

# Make sure you have a model running on localhost:8500
host = '0.0.0.0:8500'
model_name = '07_First_Forex_Prediction'
model_version = 1

client = ProdClient(host, model_name, model_version)

# Generate random numbers between mini and maxi
repetitions = 10
# Change mini and maxi in order to get prices 
mini = 0.9
maxi = 1.5

#Number of repetitions
for _ in range(repetitions):
    # Generating random OHLC data
    o = mini + (random() * (maxi - mini))
    h = o + random()*0.01
    l = o - random()*0.01
    c = (h + l)/2
    # Constructing data array. You can use a batch of M samples.
    # So data must be an array of dimension M x 4
    data = np.array([[o, h, l, c]])

    req_data = [{'in_tensor_name': 'X', 'in_tensor_dtype': 'DT_FLOAT', 'data': data}]
    prediction = client.predict(req_data, request_timeout=10)
    for k in prediction: 
        #ram logger.info('Prediction key: {}, shape: {}'.format(k, prediction[k].shape))
        #ram logger.info('Prediction key: {}, shape: {}'.format(k, prediction[k]))
        print("data =", data, "prediction =", prediction[k])

and the corresponding results:

Note you can see only the last prediction in this caption.

And now what?

Well, we can send requests to our model and receive the responses. The next step is receiving real data in our Tensorflow serving client from our broker (or market data suppliers), made actual predictions, and send orders to the broker.

We will use the API of the broker Darwinex, but you can select what you want. The aim of this series is that you have the knowledge you need.

References:

1) https://medium.com/epigramai/tensorflow-serving-101-pt-1-a79726f7c103

2) https://medium.com/epigramai/tensorflow-serving-101-pt-2-682eaf7469e7

3) https://github.com/epigramai/tfserving-python-predict-client

4) https://bitnami.com/stack/tensorflow-serving

5) https://github.com/bitnami/bitnami-docker-tensorflow-serving

6) https://www.tensorflow.org/tfx/serving/serving_basic

7) https://towardsdatascience.com/how-to-deploy-machine-learning-models-with-tensorflow-part-1-make-your-model-ready-for-serving-776a14ec3198

8) https://towardsdatascience.com/how-to-deploy-machine-learning-models-with-tensorflow-part-2-containerize-it-db0ad7ca35a7

9) https://becominghuman.ai/creating-restful-api-to-tensorflow-models-c5c57b692c10?gi=61a4907327ae

Black Belt

Deep Trading with TensorFlow: Recapitulating

2019-09-242019-09-25
by parrondo

We have already traveled a good part of the trip, but there is still an important part. In this post, I tell you where we are and how much we have left.

Courage, we sure got it!

The Machine Learning Workflow

The following diagram provides a high-level overview of the stages in a machine learning workflow.

It is made in the IDEF0 style because I am looking for simplicity (BPMN is not pleasant for what I want to express here).

In order to develop and manage a model ready for production in trading, you must work through the following stages:

Ingest data (Get and prepare your data.)
Implement the model (Develop your model.)
Train a machine learning model with your data:
- Train the model
- Evaluate the accuracy of the model
- Adjust the hyper-parameters (with performance metrics)
Deploy your trained model.
Use the model (Send prediction requests to your model):
- Online prediction
- Batch Prediction
Monitor the predictions continuously.
Manage your models and their versions.

These stages are iterative. You may need to reevaluate and return to an earlier step at any point in the process.

Before you begin, evaluate the problem

Before you start thinking about how to solve a problem with machine learning, take some time to analyze the trading problem you are trying to solve. Ask yourself the following questions:

Do you have a well defined trading problem to solve?

Many different approaches are possible when machine learning is used to recognize patterns in the data. It is essential to define the information you are trying to obtain from the model and why you need that information. For example, do you want to know if an asset is going to go up or down within the next time cycle? Or on the contrary, do you want to know how much it will go up or down?

Is machine learning the best solution for your problem?

Supervised machine learning (the style of machine learning described in this series of posts) adapts well to certain types of problems.

You should only consider using the ML in order to solve your problem if you have access to a considerable set of data to train your model. There are no absolutes about how much information is sufficient. Each feature (data attribute) that you include in your model increases the number of instances (data records) you need to properly train the model. Consult the recommendations of the ML for guidance in the relevant bibliography.

You should also take into account the division of the data set into three subsets: one for the training, another for the evaluation (or validation) and another for the test.

Research alternatives that can provide a more natural and more concrete way to solve the problem. For example, if all you are looking for is rebalancing a portfolio (there is a wide variety of techniques available)

How can you measure if the model is correct?

One of the biggest challenges of creating an ML model is knowing when you have completed the model development phase. It is tempting to continue to define the model better for longer, which draws more and more small improvements in precision. You should know what it means to complete correctly before starting the process. Consider the level of accuracy sufficient for your needs. Consider the consequences of the corresponding error level. Think that improving a few tenths of pips is undoubtedly an illusion.

Get and prepare your data

You must have access to a broad set of training data that includes the attribute (called a feature in Machine Learning) that you want to be able to infer (predict) according to the other functions. There is a great variety of financial data sources. Look for them!

Basic pre-process data

Once you get your data, you must analyze, understand and prepare them to be the entrance to the training process. Try to perform the following steps:

Join data from multiple sources and rationalize them in a data set.
Time alignment of data.
Visualize the data to look for trends.
Use data-centric languages and tools to find patterns in the data.
Identify features in your data. Features include the subset of data attributes you use in your model.
Clean the data to find anomalous values caused by errors in the input or measurement of data.

Transform features

In the preprocessing step, transform valid and clean data into the format that best suits the needs of your model. Here are some examples of feature transformation:

Categorical Encoding. Label encoding converts categorical variables to numerical representation which is machine-readable.
Handling of Skewed data. Statistical techniques often assume the normality of distributions. When you use regression algorithms such as linear regression or neural networks, you are likely to see great improvements if you transform the variables with asymmetric distributions. Financial time series are skewed.
Apply formatting rules to the data. For example, if you remove the HTML tagging of a text feature.
Reduce data redundancy through simplification. For example, convert a text feature into a word bag representation.
Represent the text numerically. For example, when you assign values to each possible value in a categorical characteristic.
Assigning key values to data.
Scaling. It is a method of transforming data into a particular range. There are several scaling methods.
Dummy Variables. You need to convert any non-numerical values to integers or floats to utilize them in most machine learning libraries. For low cardinality variables, the best method is usually to turn the feature into one column per unique value, with a “0” where the value is not present and a “1” where it is. These are known as dummy variables.

TensorFlow has several pre-processing libraries that you can use with the ML. For example, tf.transform.

Code your model

Develop your model with established ML techniques or with the definition of new operations and approaches.

Start learning by working with the TensorFlow introduction guide. Do not forget to read the posts of this blog about Tensorflow. You can also follow the scikit-learn documentation or the XGBoost documentation to create your model.

Train, evaluate and adjust your model

In this blog, we have provided everything you need to train and evaluate your model. But there are many more websites that can be useful, look for them!

When you train your model, you feed it with data that you already know the value for your target data attribute (characteristic). You run the model to predict those target values of your training data so that the model can adjust its settings to fit the data better and thus predict the target value more accurately.

Similarly, when you evaluate your trained model, you feed it with data that includes the target values. You compare the results of your model’s predictions with the actual values of the evaluation data and use appropriate statistical techniques for your model to assess that it has no vulnerabilities.

You can also adjust the model if you change the operations or settings you use to control the training process, such as the number of training steps you want to execute. This technique is the adjustment of hyperparameters.

The development of a model is a process of experimentation and incremental adjustment. You should expect to spend a lot of time when you better define and modify your model to get the best results. You must set a threshold without vulnerabilities for your model before you start so that you know when you should stop defining the model better.

Test your model

During training, the application of the model to known data to adjust the settings to improve the results. When your results are good enough for your needs, you must implement the model on any system that uses your application and test it. That is, you must check your model with your broker.

To test your model, run the data in a context as close as possible to your final application and your production infrastructure.

Use a different set of data from the ones you use for training and evaluation. Ideally, you would use a separate data set each time you perform a test so that you evaluate your model with data that it has never processed before.

You may also want to create different sets of test data depending on the nature of your model. For example, you can use different data sets for certain elements or points in time or divide the instances to mimic different demographic data.

During the test process, make adjustments to the hyperparameters and parameters of the model according to the test results. This last can help you identify problems in the model or its interaction with the rest of your application.

We are here

We have made a long journey to get here. The following figure tells us where we are and where we are going.

Therefore, we are close to achieving our first major objective:

“Place automatic orders in our broker”.

Path showing where we are in this process

Deploy your model

In the next post “Deep Trading with TensorFlow VIII,” we will explain how you can “Save a TensorFlow model” so that TensorFlow Serving ModelServer will load it and use in production. Then Serve your model with the TensorFlow Serving ModelServer and finally “Send requests” to your model (and get responses).

Send prediction requests to your model

Tensorflow serving provides the services you need to request predictions of your model.

There are two ways to obtain predictions from trained models: online prediction and batch prediction.

Monitor your prediction service

Monitor the predictions continuously. It is important not to lose sight of your predictions: You risk your money (or that of your investors)!

Manage your models and model versions

You must be ordered with the versions of your model. TensorFlow serving allows you to manage different versions of the same model. This facilitates the automatic update of the models, but at the same time, it can be a source of errors.

Use your model

Finally you got it. You have a working model and it sends real order to your broker. 🙂

This is the content of next Deep Trading with TensorFlow IX. We will send order to our MT4 broker, but you can adapt the code to use any other broker. We’ll see.

So I hope you read our next post to complete the journey!

Black Belt

Cognitive Trading System Model

2019-09-172019-09-19
by parrondo

Yes, Artificial Intelligence (AI) is here to stay. Previously on this blog, I have written about the Basis of the Scientific Trading System as well as the Artificial Intelligence Trading Systems. Since then, I have designed a trading system model which I believe could satisfy all requirements of the present and future trading systems. In this post, I will describe the model following the guidelines of the Standard Model of the Mind (see below).

In the end, what we all want in this world of trading is to put our operations in the broker and get benefits. Well, we have already leaped ridicule trying to beat professionals with simple “strategies” or “manual” methods. These methods seemed to work at first until you lose too much and think, “What? What a sh_t! That’s the past.

We have evolved, and now we want our “strategies” to be automated. That is to say that they make the decisions, fast as lightning. But when we get it, we realize that we have to design continually new strategies. The market has changed (or as it is said euphemistically “the regime”), and has changed again and so on. Therefore, we want to incorporate into our “strategy” some method of correction (self-adaptation). This last is how we test with indicators and something that tells us when our strategy is no longer valid :). You know, that one of a drawdown too big and stuff (do you find it familiar?)

We have even programmed that new strategies come into operation and others stop automatically 🙂

All this implies a considerable programming effort. And if the code is not well structured (organized), then we enter a sea of spaghetti code.
To this, we must add that we always want to be up to date on risk management, in new memories or new hardware such as GPUs, new AI techniques. How does reinforcement learning adapt to my platform with Tensorflow? What module do I put it? How do I do backtesting with that new technique we have seen?

For all of you who are new to the field of algorithmic trading, it will have been tough (and expensive) to find details of everything I am talking about simply because it does not exist. At least not too clearly.
And for those of you who are already veterans, this post will also come in handy. I am sure that it can clarify many concepts about the design of your trading platforms and systems.

This model is provisional, and I will improve it.

I have published the model as a paper on SSRN:

https://papers.ssrn.com/abstract=3450470

Enjoy it.

ABSTRACT

A model captures a community consensus on a coherent field of knowledge, serving as a cumulative benchmark that can guide both research and application design, while also focusing efforts to extend or review it. Here we propose to develop this model for cognitive trading systems, computational entities whose structures and processes are substantially similar to those in human cognition. We hypothesize that cognitive architectures provide an adequate computational abstraction to define a model applicable to the design of trading systems in their entirety, although the model is not in itself such an architecture. The resulting cognitive trading system model encompasses critical aspects of structure and processing, memory and content, learning, perception, and action; highlighting the main architectural aspects while identifying the potential areas of incompleteness which remain undeveloped. We hope to provide to the general community what it is and what we expect of a modern and future trading system, which is currently challenging to find synthesized in one place.

Introduction

Cognitive computing and cognitive technologies are game-changers for future engineering systems [Noor, 2015]. They are significant drivers for knowledge automation tasks and the creation of cognitive products with higher levels of intelligence than current smart items. The complexity of financial markets [Arthur, 1995, Arthur, 1999, Carlin, 2009, Mauboussin, 2002]is forcing the use of increasingly sophisticated trading systems.[Dase and Pawar, 2010, Kyoung-jae and Ingoo, 2000, López de Prado, 2018].
Algorithmic trading [Chaboud et al., 2011, Chan, 2009, Hendershott et al., 2009, Kaufman P. J., 2013, Pardo, 2015, Treleaven et al., 2013]has been one of the most discussed issues within the financial industry in recent years. In today’s hypercompetitive commercial world, financial institutions feel the growing need for technology to help them with their unique business style. In short, give them an advantage.

Sales-side institutions are exploring ways to increase the talent of their operators and optimize their customer services, while buy-side companies are persistent in their effort to control their business strategies and hide them from the competition.

Systematic and algorithmic trading has played an essential role in this situation.

Since 2006, algorithmic trading has entered the mainstream. Algorithmic techniques and the technology that drives them now have a significant influence on the way of trading the financial instruments, both in exchange and over-the-counter markets. We use Algorithmic trading in all kinds of assets, including stocks, futures, options, and currencies (FX). Its
techniques have even reached the world of betting or online gambling.

Modern trading systems include sophisticated artificial intelligence (AI) techniques and are advancing at high speed in their development. Therefore, it is convenient to have a trading system model that can guide the developers and traders from where they are in each phase of the development of their systems. System and Cognitive theories (Bertalanffy,
1950; Kotseruba & Tsotsos, 2018) can help in this purpose.

The critical fundamental hypothesis in artificial intelligence is that minds are computational entities of a particular kind, that is, cognitive systems, which can be implemented through a variety of physical devices (recently reformulated concept as substrate independence) (Bostrom, 2003). They may be natural brains, traditional general-purpose computers, or other forms of sufficiently functional hardware or software.

Artificial intelligence, cognitive science, neuroscience, and robotics contribute to our understanding of complex learning tasks, although each directs their research based on a different perspective. The differences are:

Artificial Intelligence refers to the construction of artificial minds and, therefore, cares more about how to construct systems than exhibiting intelligent behavior.
Cognitive science refers to modeling natural minds and, therefore, is more concerned with understanding the cognitive processes that generate human thinking.
Neuroscience refers to the structure and function of brains and, therefore, is more concerned with how brains’ minds arise.
Robotics refers to the construction and control of artificial bodies and, therefore, worries more about how minds control such bodies.

Our purpose is to present an overview of the Cognitive Trading System (CTS) model, which define a model for a trading system in the more general sense and starting the development of what can be called a blueprint of trading systems. We want to develop AI Trading Systems. Undirected research is sub-optimal, so we need to have a plan how to achieve AI trading systems, and for that, we need to define what we expect from trading systems to achieve in various stages of development. CTS was designed on many ideas coming from the Standard Model of the Mind [Laird et al., 2017], MECA (Multipurpose Enhanced Cognitive Architecture) [Gudwin et al., 2018], and CST (Cognitive Systems Toolkit) [Paraense et al., 2016] where to combine much of what current trading systems are. This model is supposed to be internally consistent but still has significant gaps.
The purpose of this model is:

Characterizing existing trading systems. Modeling Many poorly documented existing trading systems can provide a concise way to represent the existing system design.
Trading System design and requirements flow-down. This model can be used to support architectural system solutions, as well as flow trading system requirements down to components.
Support for trading system integration and verification. This model can be used to support the integration of the hardware and software components into a trading system, as well as to support verification that the trading system satisfies its requirements.

The standard model of the mind could serve as a shared ontology, providing a vehicle to map common aspects, and possibly different terminology, of disparate architectures on a common basis.

This model represents what we have named “Cognitive Trading System.”

Background

Cognitive Trading System

Here we propose the following definition:
A cognitive trading system (CTS) is a group of interacting entities that form a unified organization, influenced by its environment, confined by its implementation and temporal limits. It is described by its cognitive structure and functioning and whose purpose is to obtain wealth by trading assets in the financial markets.

Some explanations come here:

“group of interacting entities” is the set of all the necessary elements for the tasks to work. For example, we can now consider that a trading system is composed of the traders + computer + software to trade.
“that form a unified organization” means that it is not possible to perform system tasks if we eliminate some of its parts.
“influenced by its environment”. That is, the system perceives its environment, for example, economic events, natural disasters, news.
“It is confined by its implementation and temporal limits”. So, the hardware and software limit the system, also the implementation date.
“It is described by its cognitive structure and functioning”. This last is a capital part of the definition since it means that we can describe the system from how it perceives information and how it processes and acts. In short, we can describe the system by one or several cognitive architectures.
“whose purpose is to obtain wealth by trading assets in the financial markets” (self-explanatory).

Therefore, a cognitive trading system is made up of all the necessary elements to receive the information from the environment, process it, and acts by performing trading operations.

A “floor trader,” the human being, and possibly a telephone or a telegraph, were in itself the cognitive trading system in the years before the computer age. The cognitive system of the trader (perhaps with the help of a calculator) processed all the logic.

However, at present, we can see very sophisticated trading systems where a large part of the information processing (cognitive tasks) is carried out by specialized modules of artificial intelligence. We can then say that cognitive tasks are shared between the trader and the machine, even that more and more tasks are performed by the machine, thus relegating the human trader to perform high-level cognitive tasks, such as the creation of strategies, or the control of global risk. Besides, the machine has gone from being a simple computer to a complex network cluster often distributed by the cloud. Hence the need to have a system model for the design of current complex trading activities.

Therefore, we can consider a CTS as an extension that encompasses not only the concepts traditionally used in the industry such as trading algorithms, quantitative trading, trading strategies, trading systems, trading architecture, data entry and their treatment, etc. but also the mental processes performed by the people involved in the trading tasks (the trader, the risk officer, the system administrator, etc.). These mental processes, currently typical of the human being (situation analysis, goal setting, strategy creation, implementation, problem-solving, and more) are also likely to be performed by machines shortly. To do this, the CTS aims to identify the tasks that can be automated at any time and transfer such tasks to the machines. The CTS would then serve as a design and monitoring map for the performance of an increasingly complex and evolved trading system.

Guide Lines

When we approach the creation of a trading system model, we need to take into account what we expect of that system and model. Below we signal some requirements based mainly on [Bates and Palmer, 2007] and other usual industry demands.

Therefore, we must expect from our trading system model to allow us to do the following:

Be the first to create a system. Today’s markets are continually evolving, with new opportunities that arise per minute. White-box trading systems allow to quickly compose and evolve algorithms to monitor, analyze, and respond to market events in a specific way. The ability to customize business strategies to the unique requirements of a company means that there is a more significant opportunity for competitive advantage. In today’s competitive environment, the trader must be able to develop algorithmic strategies for implementation in record time.
Customize quickly. A growing trend in the current algorithmic trading space is dissatisfaction with the commercialized black box algorithms provided by brokers. If everyone has access to the same algorithms, where is the advantage? Increasingly, support tables on the sale side and hedge funds on the purchase side are developing personnel capable of designing differentiated algorithms. A white box approach allows companies to take advantage of their intellectual property and create a competitive advantage.
Evolve rapidly. As the creation and customization of algorithmic strategies are critical, so is the rapid evolution of trading systems. If systems are not developed to capitalize on an opportunity quickly, then competition will. We must develop systems continuously and systematically. In the race for algorithmic supremacy, companies try to observe the counterpart’s commercial activity and, automatically or manually, “apply reverse engineering” to the strategies used. As a result, companies must plan to evolve rapidly or perish. Even ideally, the system will reprogram itself in response to external changes.
Get access to multiple liquidity pools. With the increase in ECN and DMA, electronic markets continue to advance. Today, companies can gain advantages by spreading commercial activity in these multiple groups, which differ in their strengths. For example, in the FX market, Currenex is similar to Hotspot, but it is not anonymous; EBS and Reuters Dealing 3000 are essential players but tend to be exceptionally competitive in pairs of specific exchange rates. Understanding the anomalies in the variety of liquidity groups can be a source of advantage, but the only way to obtain this advantage is if the algorithmic trading platform can access multiple liquidity groups at the same time. Also, the monitoring of several groups in real-time allows a strategy to route orders to the group with, for example, the best price or the highest available liquidity.
Operate within multiple asset classes. When a trading platform has electronic access to multiple asset classes, it is possible to combine existing algorithmic systems by operating on multiple assets simultaneously in a single strategy. For example, a company could buy shares and cover them with an option, while taking an FX position, all at the same time.
Integrate news and all kind of real-time data channels into the trading system. Today’s financial markets move through the news. For example, non-farm payroll numbers in the US In the US, the decisions of global interest rates or the announcements associated with specific actions have an impact on the confidence in the affected values and, therefore, in the prices. When a trading system can analyze and react to the news and other data before other systems, then the advantage arises.
Design for low latency decisions. In algorithmic trading, milliseconds are essential. Minimizing the time between event detection (market data, news, requests for quotes) and the action (placing an order) is critical.
Research strategies and scientific design of trading systems. With companies continually developing their unique strategies, how can they ensure that the strategies they introduce in the markets are the best? For the rapid development and deployment of new strategies, it is critical to test algorithms under a range of anticipated market conditions. For this, a scientific methodology of research and development of systems must be carried out.
Integrate risk management within the trading system. Historically, the calculation of risk exposure was often carried out in batches at the end of the trading day. Now, companies incorporate back-office functions traditionally in algorithmic trading, such as adjusting risk exposure. This last reinforces the need for algorithmic risk management in real-time.

Trading System Architecture

There is no consensus on what a system architecture means. This article adopts the definition of [Reid, 2013] that defines the system architecture as “the infrastructure within which application components which satisfy functional requirements can be specified, deployed, and executed. Functional requirements are the expected functions of the system and its components, e.g., make trading decisions. Non-functional requirements are measures through which the quality of the system can be measured, e.g., make millions of trading decisions per second (performance) and log the
audit trail for all trading decisions made (auditability).”

The idea is that we must embed the trading architecture within the cognitive architecture. See below.

Cognitive Architectures

Cognitive architectures [Kotseruba and Tsotsos, 2018] are part of the research in general AI, which began in the 1950s to create programs that can reason about problems in different domains, develop ideas, adapt to new situations and reflect on themselves.

The goal of cognitive architectures is to model the human mind, eventually allowing to build AI at the human level. To this end, cognitive architectures try to provide evidence of what particular mechanisms manage to produce intelligent behavior.
According to [Russell and Norvig, 2003], artificial intelligence can be performed in four different ways:

Systems that think like humans.
Systems that think rationally.
Systems that act like humans.
Systems that act rationally.

Existing cognitive architectures have explored the four possibilities. We want to emphasize that modern trading system need to have a good architecture of both software and hardware but also cognitive. In this way, it is possible to integrate in an agile manner all the advances that arise regarding the application of artificial intelligence to financial markets.

In Appendix A we have an updated list of the main cognitive architectures that are currently being implemented and mature. It is not an exhaustive list, and of course, it is open so it must be updated continuously according to future research (source http://jtl.lassonde.yorku.ca/project/cognitive_architectures_survey/)

In the next section, we describe the standard model of the mind that allows us to specify how would the cognitive trading system model be. This standard model condenses three cognitive architectures, namely: Soar, ACT-R, and Sigma. However, its vocation is to cover as many compatible models as possible. That is why it is a sound basis for our cognitive trading model.

Proposed Trading System Model

Cognitive Trading System Model with Standard Model of Mind Style

The standard model of the mind (SMM) [Laird et al., 2017], broken down into structure and processing; memory and content; learning; and perception and action. This model represents the author’s understanding of the consensus that was presented skeletally at the AAAI symposium, based primarily on three cognitive architectures of interest:

Soar [Laird, 2018]
ACT-R [Anderson, 2007]
Sigma [Rosenbloom et al., 2016]

It is a consensus model; therefore, it is not universally accepted, after all, it does not require unanimity: it is an attempt to provide a coherent summary together with a set of assumptions widely shared in the field.

Although in principle the SMM adapts well to the three cognitive architectures named above, there is no impediment to continue adding other architectures, current or future, with the corresponding adaptation if necessary. We have included part of the concepts of the MECA architecture and have made use of the nomenclature of codelets, although we have replaced the concept of codelet with that of component, more widely used in the design of the software used in the financial industry. However, there are already rigorous codelet implementations that can make the construction of complete trading systems viable [Suettlerlein et al., 2013].

The structure of a CTS architecture defines how information is organized and processed into components and how information flows between components. This model postulates that independent modules which have different functionalities constitute a cognitive trading system.

The following figure 1 shows the result of the proposed CTS architecture bases on the Standard Model of Mind Style.

The central components of the cognitive trading model include:

Perception channels (PC)
Action channels (AC)
Working memory (WM)
Declarative long-term memory (DLTM)
Procedural long-term memory. (PLTM)

It can serve as a model to design and develop all kind of trading systems. It is a model, so it is subject to changes and modifications necessary to adapt to design needs.

We can see each of the modules as unitary or further decomposed into multiple modules.

Working memory buffers are:

Semantic declarative memory.
Episodic declarative memory.
Matching procedural memory.
Selection procedural memory
Execution procedural memory

Outside of direct connections between the PC and AC modules, WM acts as the inter-component communication buffer for components. We can consider it as unitary, or consist of separate modality-specific memories (e.g., Market data, visual data, others.) that together constitute an aggregate working memory.

DLTM, PC, and AC modules are all restricted to accessing and modifying their associated working memory buffers, whereas procedural memory has access to all of working memory (but no direct access to the contents of long- term declarative memory or itself). All long-term memories have one or more associated learning mechanisms that automatically store, modify, or tune information based on the architecture’s processing.

We have represented the test environment separated from the production environment because it is a common practice,
but the idea is that both environments integrate within the same cognitive structure. That is, this separation should be considered as symbolic since the entire cognitive structure of a trading system forms a whole.

Cognitive Trading System Model

In this section, we present the Cognitive Trading System model with the main modules. Figure 2 shows an overview of the CTS model. The names of the components follow the MECA and CST reference architecture. The conception of CTS inherits many ideas from different sources. First of all, CTS is an instance of SMM. That means that CTS split into five modules. The model is a network that connects three types of elements: components, memories, and containers (which groups together memories). All inputs and outputs of the model are made exclusively by the Perception and
External Actions modules. The inputs to the CTS are made by the sensory components (bottom left in the diagram), which are responsible for collecting the input data and filling the Sensory Memory. The outputs of the CTS are carried out by the External Actions components, which collect data from the External Actions Memory and are responsible for sending this data to the system actuators (among others the Order Management System).

The Architecture Guidelines

Next, we will describe everything that a designer should take into account when designing a CTS. This description should serve as a design checklist.

Structure and processing

The structure of a cognitive trading system defines how information and processing are organized into components and how information flows between components.

The purpose of architectural processing is to support bounded rationality, not only optimality. However, optimality is acceptable because we are also thinking of non-human-like artificial intelligence.
Processing is based on a small number of task-independent modules
There is significant parallelism in architectural processing
- Processing is parallel across modules. May be:
  - Asynchronous
  - Synchronous
- Processing is parallel within modules. Information processing may be:
  - Rule match
  - Graph solution
  - Rule firings
Behavior is driven by sequential action selection via a cognitive cycle. This last runs at 50 ms per cycle in human cognition, but maybe in the order of ns for machine cycles.
Complex behavior arises from a sequence of independent cognitive cycles that operate in their local context, without a separate architectural module for global optimization (or planning).

Perception

Regardless of its design and purpose, a trading system cannot exist in isolation and requires input to produce behavior. Perception channel is a process that transforms raw input data into the internal representation of the system. It converts external signals into symbols and relationships, with associated metadata, and places the results in specific buffers within working memory. The system can have many different perception modules, each with information of a different modality (see the possible types of data below) and each with its perception buffer.

Depending on the origin of the source and properties of the data entry, multiple modules are distinguished: The most common are:

Fundamental data (Assets, ROI, Macro parameters),
Market data (Price, Volume, Dividends),
Analytic data (Recommendations, market sentiments),
other data (images, Google searching, Social Networks)

Naturally, the architectures of many systems implement some of these, as well as other modalities that do not have an explicit correlation with the market but with human beings as the symbolic entrance [i.e., keyboard input or graphical user interface (GUI)] and various sensors and even forecasts from other systems.

Depending on its capabilities, a trading system can process various amounts and types of data as perceptual input. In this section, the designer must investigate the diversity of data entries used in trading architectures. Also, what kind of information is extracted from these sources and how to apply it.

Perception converts external signals into symbols and relationships, with associated metadata, and places the results in specific buffers within working memory. The system can have many different perception modules, each with information of a different modality (see the possible types of data above) and each with its perception buffer.

Perception yields symbol structures with associated metadata in specific working memory buffers.

There can be many different perception modules, each with input from a different data modality and its buffer.
Perceptual learning acquires new patterns and tunes of existing ones.
An attentional bottleneck constrains the amount of information that becomes available in working memory.
Perception can be influenced by top-down information provided from working memory.

Attention (Filtering)

With “Attention,” we want to name all kinds of filters and selection strategies to which the input data will be submitted.

Perceptual attention plays a vital role in the information processing of trading systems since it mediates the selection of relevant information and filters the irrelevant information of the input data. However, it would be a mistake to think of attention as a monolithic filter that decides what to process or what not to process. We must understand attention as a set of mechanisms that affect both perceptual and processing tasks. Currently, treatment of price and volume data, as well as the lack of any data at the entrance of the system remains the most studied form of attention. This last is because only a few architectures have underlying mechanisms to filter news or image data efficiently. Also, attention components include risk management filters.

The model assumes an attentional bottleneck that restricts the amount of information that is available in the working memory but does not incorporate any compromise regarding the internal representation (or processing) of the information within the perceptual modules.

External Action

External Action channels convert the symbol structures and their meta-data that have been stored in their buffers into orders through the order manager. As with perception channels, there can be multiple order modules. Although the risk control (attentional) system supervises the ultimate control over the orders.

Action selection determines at any point in time, “what to do next.” The “what” part involves the decision making and the “how” part the action (motor) control [Öztürk, 2009]. In our trading context, motor actions are external actions which involve order management related actions.

Actions are then involved in the External Action module and the Working memory.

We distinguish three main types of actions [Kotseruba and Tsotsos, 2018]:

Planned actions refer to traditional AI algorithms. They determine a sequence of steps to reach a certain objective or to solve a problem before run time.
Dynamic actions choose one of the best actions among the alternatives based on the knowledge available at that time. The default option is always the best action based on the defined criteria (for example, the action with the highest activation value).
- Type of selection:
  - Winner-take-all (WTA) [Grossberg, 1973, Lazzaro et al., 1988]
  - Probabilistic [Simmons and Koenig, 1995]
  - Predefined order
  - A Finite-state machine [Gat, 1992]
- Selection criteria:
  - Relevance
  - Utility
  - Emotion / Feeling
Reactive actions are executed, bypassing the action selection. These actions are typical of risk management systems (RMS) and can take full control of the system if necessary.

Finally, learning can also affect the selection of action. Keep in mind that these action selection mechanisms are not mutually exclusive and most trading systems have more than one.

Memory

Memory is an essential part of any trading system. Memory systems store intermediate calculation results, allowing learning and adaptation to the changing environment. However, despite their functional similarity, particular implementations of memory systems differ significantly and depend on research objectives and conceptual limitations, such as programming language, software architecture, use of frameworks, software paradigms. In the cognitive architecture, memory is described in terms of its duration (short-term and long-term) and type (procedural, declarative, semantic), although it is not necessarily implemented as separate knowledge stores.

We follow the convention for memories [Cowan, 2008]:

Long-term memory
- Semantic memory (which store factual knowledge)
- Procedural (with information on what actions should be taken under certain conditions)
- Episodic memory (which store factual knowledge, and episodes from the individual experience of the system)
Short-term memory
- Sensory or perceptual memory (very short-term buffer that stores several recent percepts)
- Working memory (temporary storage for percepts that also contains other items related to the current task and is associated with the current focus of attention.)

Learning

Learning is the ability of a system to improve its performance over time. Experience is the base of any learning (feedback). Thus, a trader or a trading system itself may be able to infer facts and behaviors from observed events or the results of its actions. The type of learning and its realization depend on many factors, such as the design paradigm, the
application scenario, the data structures, and the algorithms used to implement the architecture, among others. Squire [Squire, 1984] defines declarative and non-declarative types of learning and Breazeal et al. [Breazeal et al., 2001] describe priming:

Declarative or explicit item Not declarative, which includes types:
- Perceptual
- Procedural
- Associative
- Non-associative
Priming

Reasoning

The reasoning is the ability to process knowledge logically and systematically. The reasoning can affect or structure virtually any type of trading system. As a result, apart from the classic triad of logical inference (deduction, induction, and abduction), other types of reasoning are considered, such as heuristic, defensible, analogical, narrative, even moral.
All trading systems have to do with practical reasoning, whose ultimate goal is to find the next best action and carry it out, as opposed to theoretical reasoning that aims to establish or evaluate beliefs. A significant amount of reasoning and planning from a designer is required to build a trading system with non-trivial capabilities. The reasoning is intimately related to planning, decision making, and learning, as well as perception, understanding of language, situations, and problem-solving.

Metacognition

Metacognition [Flavell, 1979], intuitively defined as “thinking about thinking,” is a set of skills that introspectively monitor internal processes and reason about them. There is an increasing interest in the development of the metacognition of trading systems due to the practical need to identify, explain, and correct erroneous decisions. We will focus on the
three most common metacognitive mechanisms, namely:

Self-observation
Self-analysis
Self-regulation

Practical Application

So, with this model, we can map all kind of actual trading architectures. As an example below is one trading architecture adapted and modify from the original “Algorithmic Trading System” designed by [Reid, 2013] There we can see, in pink color, the high-level cognitive operations that remain in the particular field of human intelligence. This task represents research opportunities areas toward the full-autonomous trading system. These areas are:

Trader screening
Risk officer screening
System administrator screening

Figure 3: Cognitive Trading System Model practical application
adapted from the [Reid, 2013] algorithmic trading system architecture.

Conclusions

We offer an expanded definition of the trading systems used in financial markets and what we call “Cognitive Trading System” based on the cognitive architecture of the system. We have presented a model of this “Cognitive Trading System” using the standard model of the mind as a reference. We have described the main modules as well as the main
components of such modules. Finally, We have presented a concrete example of the application of this model based on an actual trading system architecture. Our future work will aim at providing more significant details of each module and its possible implementations. It is also our objective to use the model to research the possible improvements of current trading systems.

Appendix A Cognitive Architectures

4D-RCS	https://github.com/usnistgov/rcslib
ACT-R	http://act-r.psy.cmu.edu/software/
AIS	http://www-ksl.stanford.edu/projects/BB1/bb1.html
APEX	http://apex-autonomy.sourceforge.net/
ART	http://techlab.bu.edu/resources/software/
ASMO	https://github.com/airobots/asmo_python, https://github.com/airobots/asmo_ros
BECCA	https://github.com/brohrer/becca
CAPS	http://www.ccbi.cmu.edu/4CAPS/4caps-v1-3.2.6.lsp
CERA-CRANIUM	https://github.com/raul-arrabales/OpenCranium, https://github.com/raul-arrabales/crubots
CHREST	https://github.com/petercrlane/chrest
CLARION	https://sites.google.com/site/clarioncognitivearchitecture/downloads
CogPrime	https://github.com/opencog
CoJACK	http://aosgrp.com/products/cojack/download_cojack.html
Copycat/Metacat	http://science.slc.edu/~jmarshall/metacat/Metacat-1.1.zip, http://web.cecs.pdx.edu/~mm/how-to-get-copycat.html
CoSy	http://www.cognitivesystems.org/software.asp
Darwinian Neurodynamics	https://osf.io/7xfh2/
DIARC	http://ade.sourceforge.net/
HTM	http://numenta.org/
ICARUS	https://github.com/ghballiet/acs-journal/tree/master/app/webroot/courses/langley/aicogsys11/icarus
iCub	http://wiki.icub.org/wiki/ICub_Software_Installation
Leabra	https://grey.colorado.edu/emergent/index.php/Leabra, https://grey.colorado.edu/emergent/index.php/Main_Page
LIDA	http://ccrg.cs.memphis.edu/framework.html
MicroPsi	https://github.com/joschabach/micropsi2
MIDCA	https://github.com/mclumd/MIDCA
MusiCog	http://www.sfu.ca/~jbmaxwel/MusiCog/downloads.html
NARS	https://github.com/opennars/opennars/wiki
NEUCOGAR	https://github.com/research-team/NEUCOGAR
OSCAR	http://johnpollock.us/ftp/OSCAR-web-page/oscar.html
Pogamut	http://pogamut.cuni.cz/main/tiki-index.php
PRODIGY	http://www.cs.cmu.edu/afs/cs.cmu.edu/project/prodigy/Web/Distribution/distrib.html
Sigma	https://bitbucket.org/sigma-development/sigma-release/wiki/Home
Soar	https://github.com/SoarGroup
SPA (Spaun)	http://www.nengo.ca/download
STAR	https://github.com/TsotsosLab/STAR-FC
Xapagy	https://github.com/Xapagy/Xapagy

Table 1: Cognitive Architectures which are mature and with a certain degree of development. Source:

http://jtl.lassonde.yorku.ca/project/cognitive-architectures-survey/

http://www.data.nvision2.eecs.yorku.ca/cognitive-architecture-survey

References

[Anderson, 2007] Anderson, J. R. (2007). How Can the Human Mind Occur in the Physical Universe? Oxford
University Press. 3.1
[Arthur, 1995] Arthur, W. B. (1995). Complexity in Economic and Financial Markets. Complexity, 1(1):20–25. 1
[Arthur, 1999] Arthur, W. B. (1999). Complexity and the Economy. (April):107–109. 1
[Bates and Palmer, 2007] Bates, J. and Palmer, M. (2007). Next-Generation Algorithmic Trading. Trading Spring,
2007(1):31–34. 2.2
[Breazeal et al., 2001] Breazeal, C., Edsinger, A., Fitzpatrick, P., and Scassellati, B. (2001). Active vision for sociable robots. IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans., 31(5):443–453. 4.6
[Carlin, 2009] Carlin, B. I. (2009). Strategic price complexity in retail financial markets. Journal of Financial
Economics, 91(3):278–287. 1
[Chaboud et al., 2011] Chaboud, A., Hjalmarsson, E., Vega, C., and Chiquoine, B. (2011). Rise of the Machines:
Algorithmic Trading in the Foreign Exchange Market. The Journal of Finance, 69(5):2045–2084. 1
[Chan, 2009] Chan, E. P. (2009). Quantitative Trading: How to Build Your Own Algorithmic Trading Business. John
Wiley and Sons, Inc. 1
[Cowan, 2008] Cowan, N. (2008). Chapter 20 What are the differences between long-term, short-term, and working memory? Progress in Brain Research, 169(June):323–338. 4.5
[Dase and Pawar, 2010] Dase, R. and Pawar, D. (2010). Application of Artificial Neural Network for stock market predictions: A review of literature. International Journal of Machine Intelligence, 2(2):14–17. 1
[Flavell, 1979] Flavell, J. H. (1979). Metacognition and Cognitive Monitoring. 34(10):906–911. 4.8
[Gat, 1992] Gat, E. (1992). Integrating Planning and Reacting in a Heterogeneous Asynchronous Architecture for
Controlling Real-World Mobile Robots. pages 809–815. 4.4
[Grossberg, 1973] Grossberg, S. (1973). Contour Enhancement, Short Term Memory, and Constancies in Reverberating
Neural Networks. Studies in Applied Mathematics, 52(3):213–257. 4.4
[Gudwin et al., 2018] Gudwin, R., Paraense, A., De Paula, S., Fróes, E., Gibaut, W., Castro, E., Figueiredo, V., and
Raizer, K. (2018). An overview of the Multipurpose Enhanced Cognitive Architecture (MECA). Procedia Computer
Science, 123:155–160. 1
[Hendershott et al., 2009] Hendershott, T., Jones, C. M., and Menkveld, A. J. (2009). Does Algorithmic Trading
Improve Liquidity ? 1 Does Algorithmic Trading Improve Liquidity ? New York, LXVI(1):1–34. 1
[Kaufman P. J., 2013] Kaufman P. J. (2013). The New Trading Systems and Methods. 1
[Kotseruba and Tsotsos, 2018] Kotseruba, I. and Tsotsos, J. K. (2018). 40 Years of Cognitive Architectures: Core
Cognitive Abilities and Practical Applications. Springer Netherlands. 2.4, 4.4
[Kyoung-jae and Ingoo, 2000] Kyoung-jae, K. and Ingoo, H. (2000). Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index. Expert Systems with Applications,
19(2):125–132. 1
[Laird, 2018] Laird, J. E. (2018). The Soar Cognitive Architecture. The Soar Cognitive Architecture, page 12296. 3.1
[Laird et al., 2017] Laird, J. E., Lebiere, C., and Rosenbloom, P. S. (2017). A Standard Model of the Mind: Toward a
Common Computational Framework across Artificial Intelligence, Cognitive Science, Neuroscience, and Robotics.
AI Magazine, 38(4):13–26. 1, 3.1
[Lazzaro et al., 1988] Lazzaro, J., Mead, C., Ryckebusch, S., and Mahowald, M. (1988). Winner-Take-All Networks of O(N) Complexity. pages 703–711. 4.4
[López de Prado, 2018] López de Prado, M. (2018). Advances in Financial Machine Learning. John Wiley and Sons,
Inc. 1
[Mauboussin, 2002] Mauboussin, M. J. (2002). Revisiting Market Efficiency: the Stock Market As a Complex Adaptive
System. Journal of Applied Corporate Finance, 14(4):47–55. 1
[Noor, 2015] Noor, A. K. (2015). Potential of cognitive computing and cognitive systems. Open Engineering,
5(1):75–88. 1
[Öztürk, 2009] Öztürk, P. (2009). Levels and types of action selection: The action selection soup. Adaptive Behavior,
17(6):537–554. 4.4
[Paraense et al., 2016] Paraense, A. L., Raizer, K., de Paula, S. M., Rohmer, E., and Gudwin, R. R. (2016). The cognitive systems toolkit and the CST reference cognitive architecture. Biologically Inspired Cognitive Architectures,
17:32–48. 1 [Pardo, 2015] Pardo, R. (2015). The Evaluation and Optimization of Trading Strategies. The Evaluation and Optimization of Trading Strategies. 1
[Reid, 2013] Reid, S. G. (2013). Algorithmic Trading System. pages 1–29. 2.3, 5, 3
[Rosenbloom et al., 2016] Rosenbloom, P. S., Demski, A., and Ustun, V. (2016). The Sigma Cognitive Architecture and System: Towards Functionally Elegant Grand Unification. Journal of Artificial General Intelligence, 7(1):1–103. 3.1
[Russell and Norvig, 2003] Russell, S. J. and Norvig, P. (2003). Artificial intelligence: a modern approach, second edition. Prentice-Hall, Inc. 2.4
[Simmons and Koenig, 1995] Simmons, R. and Koenig, S. (1995). Probabilistic Robot Navigation in Partially Observ-
able Environments. Proceedings of the 1995 International Joint Conference on Artificial Intelligence (IJCAI), pages
1080–1087. 4.4
[Squire, 1984] Squire, L. R. (1984). Nondeclarative Memory: Multiple Brain Systems Supporting Learning. Neuro-
science, 4(3):232–243. 4.6
[Suettlerlein et al., 2013] Suettlerlein, J., Zuckerman, S., and Gao, G. R. (2013). An Implementation of the Codelet
Model. pages 633–644. 3.1
[Treleaven et al., 2013] Treleaven, P., Galas, M., and Lalchand, V. (2013). Algorithmic trading review. Communications of the ACM, 56(11):76–85. 1

Black Belt

DeepTrading with TensorFlow VII

2019-07-242019-09-16
by parrondo

Making prediction of GOLD price

For the first time, we will use the features of multiple financial instruments. In this case, we will use the main Forex pairs and the SP500 to perform our forecasts on Gold. This Jupyter notebook will be your guide for more complex calculations. Obviously, you can change the features and instruments as you want or you need. That will be your own research. I have given you the guide.

Problem definition

Experiment conditions:

features = [“open”, “high”, “low”, “close”] for each Forex pair and SPXUSD
Features Normalization

We have the previous features in candles of one minute, but, to simplify, we will only use the opening price of each of the currency pairs. With these features, we will calculate the opening price of the next one hour candle for GOLD (XAUUSD).D).

Finally, we have to say that in some parts of the code, we have abused of pandas dataframes only with the aim of getting the cleare printable landscape of the working data. These lines of code are indicated with the comment “…as dataframe” and may be avoided if you translate this notebook to python scripts.

Again, we are going to follow our Supervised Learning Flowchart as we see below:

Load configuration

Below, it is an example of .env file for your customization:

PROJ_DIR=.
DATA_DIR=../data/
RAW_DIR=../data/raw/
INTERIM_DIR=../data/interim/
PROCESSED_DIR=../data/processed/
FIGURES_DIR=../reports/figures/
MODEL_DIR=../models/
EXTERNAL_DIR=../data/external/
PRODUCTION_DIR=/home/PRODUCTION/

In [1]:

## 1. Import libraries and modules. Load env variables
import os
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#import seaborn as sns
import zipfile
import sqlite3
from datetime import date, datetime, timezone
from dotenv import find_dotenv, load_dotenv

import tensorflow as tf
from tensorflow.python.framework import ops
from tensorflow.python.saved_model import builder as saved_model_builder
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import signature_def_utils
from tensorflow.python.saved_model import tag_constants
from tensorflow.python.saved_model.utils import build_tensor_info

from sklearn.preprocessing import MinMaxScaler
#from IPython.display import Image

#Reads the key,value pair from .env and adds them to environment variable 
load_dotenv(find_dotenv())

# Check the env variables exist. Check as many variables as you need
raw_msg = "Set your raw data absolute path in the .env file at project root"
assert "RAW_DIR" in os.environ, raw_msg
data_msg = "Set your processed data absolute path in the .env file at project root"
assert "DATA_DIR" in os.environ, data_msg
interim_msg = "Set your interim data absolute path in the .env file at project root"
assert "INTERIM_DIR" in os.environ, interim_msg

# Load env variables
proj_dir = os.path.expanduser(os.environ.get("PROJ_DIR"))
data_dir = os.path.expanduser(os.environ.get("DATA_DIR"))
raw_dir = os.path.expanduser(os.environ.get("RAW_DIR"))
interim_dir = os.path.expanduser(os.environ.get("INTERIM_DIR"))
processed_dir = os.path.expanduser(os.environ.get("PROCESSED_DIR"))
figures_dir = os.path.expanduser(os.environ.get("FIGURES_DIR"))
model_dir = os.path.expanduser(os.environ.get("MODEL_DIR"))
external_dir = os.path.expanduser(os.environ.get("EXTERNAL_DIR"))
production_dir = os.path.expanduser(os.environ.get("PRODUCTION_DIR"))

# Import our project modules
#
#Add src/app to the PATH
#Ram sys.path.append(os.path.join(proj_dir,"src/app"))

#Add src/data to the PATH
sys.path.append(os.path.join(proj_dir,"src/data"))
#Ram import make_dataset as md

#Add src/visualization to the PATH
#Ram sys.path.append(os.path.join(proj_dir,"src/visualization"))
#Ram import visualize as vs

#Data files
#raw_data = 
#interim_data = 

#Global configuration variables
# Send models to production env. folder (True: send. False: Do not send)
to_production = True

/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)

Ingest raw data

Download raw datasets

The great people at HistData.com have set up the infrastructure necessary to provide you FOREX data for free. This is awesome and if possible, you should donate or purchase some of their services to help them. There exist several tools contained on the internet to download the data, but all of them need your careful attention. For example:

I include forex time series zipped files needed in this tutorial

In [2]:

## 2. Download raw data sets

#This point must be adapted for each project

Download .csv data file from HistData.com and save in ../data/raw dir

Basic pre-process data

Machine Learning time-series algorithms usually require data to be into a single text file in tabular format, with each row representing a timestamp of the input dataset and each column one of its features.

“Prepare” data for Machine Learning is a complex task depending on where the data is stored and where it is obtained from. And doubtless, it is one of the most time-consuming task. Often the Forex data is not available in a single file. They may be distributed across different sources like multiple compressed CSV files, spreadsheets or plain text files, normalized in database tables, or even in NoSql database like MongoDB. So we need a tool to stage, filter, transform when necessary, and finally export to a single flat, text CSV file.

If your Forex data is small and the changes are simple such as adding a derived field or new events you can use a spreadsheet, make the necessary changes, and then export it to a CSV file. Certainly, not too professional. But when the changes are more complex; e.g., joining several sources, filtering a subset of the data, or anaging a large number of timestamp rows, you might need a more powerful tool like an RDBMS. MySQL is a great one and it’s free and opensourced. In this tutorial, we have selected SQLite which is enough for our purpose and data size. Here we treat several compressed .csv files distributed in different folders, which is very usual in real trading. If the data size that we are managing is in the terabytes, then we should consider Hadoop. But trust us, that is another story.

Clean Data

The steps and techniques for data cleaning will vary from dataset to dataset. However, we provide a reliable starting framework that can be used every time. The common steps are:

Remove unwanted observations
- Duplicate observations
- Irrelevant observations: those that don’t actually fit the problem that we’re trying to solve.

Fixing structural errors: mislabeled classes, i.e. separate classes that should really be the same
Filtering observations: filter Unwanted Outliers

In [3]:

# All available instruments to trade with
all_symbols = ["EURUSD" ,
           "AUDNZD" ,
           "AUDUSD" ,
           "AUDJPY" ,
           "EURCHF" ,
           "EURGBP" ,
           "EURJPY" ,
           "GBPCHF" ,
           "GBPJPY" ,
           "GBPUSD" ,
           "NZDUSD" ,
           "USDCAD" ,
           "USDCHF" ,
           "USDJPY" ,
           "CADJPY" ,
           "EURAUD" ,
           "CHFJPY" ,
           "EURCAD" ,
           "AUDCAD" ,
           "AUDCHF" ,
           "CADCHF" ,
           "EURNZD" ,
           "GBPAUD" ,
           "GBPCAD" ,
           "GBPNZD" ,
           "NZDCAD" ,
           "NZDCHF" ,
           "NZDJPY" ,
           "XAGUSD" ,
           "XAUUSD" ,
           "SPXUSD" ] # type=str, symbol list using format "EURUSD" "EURGBP"

In [4]:

#
# Here we select the features to work with
#
features = ['timestamp', 'open', 'high', 'low', 'close', 'volume']

In [5]:

# Clean database table
DATABASE_FILE = processed_dir+"Data.db"
def initialize_db(self):
    with sqlite3.connect(DATABASE_FILE) as connection:
        cursor = connection.cursor()
        cursor.execute('CREATE TABLE IF NOT EXISTS History (timestamp INTEGER,'
                       'symbol VARCHAR(20), high FLOAT, low FLOAT,'
                       'open FLOAT, close FLOAT, volume FLOAT, '
                       'quoteVolume FLOAT, weightedAverage FLOAT,'
                       'PRIMARY KEY (timestamp, symbol));')
        connection.commit()

initialize_db(DATABASE_FILE)
conn = sqlite3.connect(DATABASE_FILE)

In [6]:

# Create the dataframe
columns = ["timestamp", "symbol", "open", "high", "low", "close", "volume", "quoteVolume", "weightedAverage"]
dtype = {"timestamp":"INTEGER",
         "symbol":"VARCHAR(20)",
         "open":"FLOAT",
         "high":"FLOAT",
         "low":"FLOAT",
         "close":"FLOAT",
         "volume":"FLOAT",
         "quoteVolume":"FLOAT",
         "weightedAverage":"FLOAT"}
df0 = pd.DataFrame(columns=columns)

# Write dataframe to sqlite database
df0.to_sql("History", conn, if_exists="replace", index=False, dtype=dtype)

Remove unwanted observations

Duplicate observations

Left in blanck intentionally

Irrelevant observations:

Those that don’t actually fit the problem that we’re trying to solve.

Fixing structural errors: mislabeled classes, i.e. separate classes that should really be the same

Left in blanck intentionally

Filtering observations: filter Unwanted Outliers

Left in blanck intentionally

Missing Data

Handling missing data

Missing categorical data

The best way to handle missing data for categorical features is to simply label them as ’Missing’! (Not used in this notebook).

We add a new class for the feature.
This tells the algorithm that the value was missing.
This gets around the technical requirement for no missing values.

Missing numeric data

For missing numeric data, we flag and fill the values.

Flag the observation with an indicator variable of missingness: “Missingness Indicator”.
Fill the original missing value with 0 just to meet the technical requirement of no missing values.

By using this technique of flagging and filling, we are allowing the algorithm to estimate the optimal constant for missingness, instead of just filling it in with some other stimated value.

Features extraction

Following are the features consider for training the model to.

timestamp: INTEGER,
symbol: VARCHAR(20)
open: FLOAT
high: FLOAT
low: FLOAT
close: FLOAT

In [7]:

#
# Database population
#
#
# All price instrument in cash currency base
#

# Symbol selection
symbols = all_symbols

# Initialicing dataframes           
df1 = pd.DataFrame().iloc[0:0]
df2 = pd.DataFrame().iloc[0:0]

# Managing diferent raw data files from several folders
for symbol in symbols:
    compressedfile = os.path.join(raw_dir,symbol.lower(),'HISTDATA_COM_ASCII_'+symbol+'_M1_2017.zip')
    zf = zipfile.ZipFile(compressedfile) # having .csv zipped file
    inputfile = 'DAT_ASCII_'+symbol+'_M1_2017.csv'
    print("inputfile: ",inputfile)
    #df1 = pd.read_csv(inputfile, names=['date', 'open', 'high', 'low', 'close', 'volume'],index_col=0, parse_dates=True, delimiter=";")
    
    df1 = pd.read_csv(zf.open(inputfile), header=None,
                      names=['timestamp', 'open', 'high', 'low', 'close', 'volume'],
                      index_col=0, parse_dates=True,sep=';') # reads the csv and creates the dataframe called "df1"
    
    # Resampling data from 1Min to desired Period
    df2 =  df1["open"].resample('60Min').ohlc()
        
    # Convert pandas timestamps in Unix timestamps:
    df2.index = df2.index.astype(np.int64) // 10**9

    # Insert new columns with the instrument name and their values
    df2.insert(loc=0, column='symbol', value=symbol)

    #Only for compatibility with stocks code (optional, you may want to remove this fields from database)
    df2['volume']=1000.
    df2['quoteVolume']=1000.
    df2['weightedAverage']=1.

    # Reset index to save in database
    df2=df2.reset_index()
    
    #Filling gaps forward
    df2 = df2.fillna(method='pad')
        
    # Save to database (Populate database)
    df2.to_sql("History", conn, if_exists="append", index=False, chunksize=1000)

# Liberate memory
del df1
del df2

inputfile:  DAT_ASCII_EURUSD_M1_2017.csv
inputfile:  DAT_ASCII_AUDNZD_M1_2017.csv
inputfile:  DAT_ASCII_AUDUSD_M1_2017.csv
inputfile:  DAT_ASCII_AUDJPY_M1_2017.csv
inputfile:  DAT_ASCII_EURCHF_M1_2017.csv
inputfile:  DAT_ASCII_EURGBP_M1_2017.csv
inputfile:  DAT_ASCII_EURJPY_M1_2017.csv
inputfile:  DAT_ASCII_GBPCHF_M1_2017.csv
inputfile:  DAT_ASCII_GBPJPY_M1_2017.csv
inputfile:  DAT_ASCII_GBPUSD_M1_2017.csv
inputfile:  DAT_ASCII_NZDUSD_M1_2017.csv
inputfile:  DAT_ASCII_USDCAD_M1_2017.csv
inputfile:  DAT_ASCII_USDCHF_M1_2017.csv
inputfile:  DAT_ASCII_USDJPY_M1_2017.csv
inputfile:  DAT_ASCII_CADJPY_M1_2017.csv
inputfile:  DAT_ASCII_EURAUD_M1_2017.csv
inputfile:  DAT_ASCII_CHFJPY_M1_2017.csv
inputfile:  DAT_ASCII_EURCAD_M1_2017.csv
inputfile:  DAT_ASCII_AUDCAD_M1_2017.csv
inputfile:  DAT_ASCII_AUDCHF_M1_2017.csv
inputfile:  DAT_ASCII_CADCHF_M1_2017.csv
inputfile:  DAT_ASCII_EURNZD_M1_2017.csv
inputfile:  DAT_ASCII_GBPAUD_M1_2017.csv
inputfile:  DAT_ASCII_GBPCAD_M1_2017.csv
inputfile:  DAT_ASCII_GBPNZD_M1_2017.csv
inputfile:  DAT_ASCII_NZDCAD_M1_2017.csv
inputfile:  DAT_ASCII_NZDCHF_M1_2017.csv
inputfile:  DAT_ASCII_NZDJPY_M1_2017.csv
inputfile:  DAT_ASCII_XAGUSD_M1_2017.csv
inputfile:  DAT_ASCII_XAUUSD_M1_2017.csv
inputfile:  DAT_ASCII_SPXUSD_M1_2017.csv

In [8]:

# Committing changes and closing the connection to the database file
conn.commit()
conn.close()

Load data from database

In [9]:

# Load dataset (In this case reading the database)
DATABASE_FILE=processed_dir+"Data.db"
conn = sqlite3.connect(DATABASE_FILE)
df = pd.read_sql_query("select * from History;", conn)
df=df.drop(["index"], axis=1, errors="ignore")

In [10]:

df

Out[10]:

	timestamp	symbol	open	high	low	close	volume	quoteVolume	weightedAverage
0	1483322400	EURUSD	1.05155	1.05213	1.05130	1.05150	1000.0	1000.0	1.0
1	1483326000	EURUSD	1.05152	1.05175	1.04929	1.04929	1000.0	1000.0	1.0
2	1483329600	EURUSD	1.04889	1.04904	1.04765	1.04868	1000.0	1000.0	1.0
3	1483333200	EURUSD	1.04866	1.04885	1.04791	1.04803	1000.0	1000.0	1.0
4	1483336800	EURUSD	1.04805	1.04812	1.04768	1.04782	1000.0	1000.0	1.0
5	1483340400	EURUSD	1.04782	1.04782	1.04653	1.04659	1000.0	1000.0	1.0
6	1483344000	EURUSD	1.04655	1.04680	1.04615	1.04668	1000.0	1000.0	1.0
7	1483347600	EURUSD	1.04655	1.04747	1.04649	1.04747	1000.0	1000.0	1.0
8	1483351200	EURUSD	1.04718	1.04729	1.04637	1.04699	1000.0	1000.0	1.0
9	1483354800	EURUSD	1.04696	1.04771	1.04676	1.04686	1000.0	1000.0	1.0
10	1483358400	EURUSD	1.04690	1.04690	1.04621	1.04655	1000.0	1000.0	1.0
11	1483362000	EURUSD	1.04654	1.04665	1.04605	1.04605	1000.0	1000.0	1.0
12	1483365600	EURUSD	1.04600	1.04627	1.04581	1.04592	1000.0	1000.0	1.0
13	1483369200	EURUSD	1.04589	1.04597	1.04565	1.04582	1000.0	1000.0	1.0
14	1483372800	EURUSD	1.04582	1.04582	1.04496	1.04525	1000.0	1000.0	1.0
15	1483376400	EURUSD	1.04534	1.04702	1.04532	1.04605	1000.0	1000.0	1.0
16	1483380000	EURUSD	1.04616	1.04678	1.04557	1.04573	1000.0	1000.0	1.0
17	1483383600	EURUSD	1.04572	1.04703	1.04572	1.04662	1000.0	1000.0	1.0
18	1483387200	EURUSD	1.04660	1.04805	1.04659	1.04763	1000.0	1000.0	1.0
19	1483390800	EURUSD	1.04758	1.04772	1.04711	1.04713	1000.0	1000.0	1.0
20	1483394400	EURUSD	1.04717	1.04838	1.04715	1.04838	1000.0	1000.0	1.0
21	1483398000	EURUSD	1.04845	1.04896	1.04824	1.04860	1000.0	1000.0	1.0
22	1483401600	EURUSD	1.04882	1.04896	1.04814	1.04814	1000.0	1000.0	1.0
23	1483405200	EURUSD	1.04820	1.04882	1.04820	1.04863	1000.0	1000.0	1.0
24	1483408800	EURUSD	1.04880	1.04880	1.04539	1.04586	1000.0	1000.0	1.0
25	1483412400	EURUSD	1.04600	1.04641	1.04218	1.04370	1000.0	1000.0	1.0
26	1483416000	EURUSD	1.04351	1.04355	1.04025	1.04100	1000.0	1000.0	1.0
27	1483419600	EURUSD	1.04017	1.04158	1.03972	1.03980	1000.0	1000.0	1.0
28	1483423200	EURUSD	1.03964	1.03976	1.03822	1.03931	1000.0	1000.0	1.0
29	1483426800	EURUSD	1.03941	1.03941	1.03753	1.03852	1000.0	1000.0	1.0
…	…	…	…	…	…	…	…	…	…
268971	1514458800	SPXUSD	2684.88000	2685.38000	2683.50000	2685.00000	1000.0	1000.0	1.0
268972	1514462400	SPXUSD	2685.25000	2685.25000	2684.13000	2684.38000	1000.0	1000.0	1.0
268973	1514466000	SPXUSD	2684.75000	2684.75000	2683.00000	2683.50000	1000.0	1000.0	1.0
268974	1514469600	SPXUSD	2683.50000	2684.75000	2683.00000	2684.38000	1000.0	1000.0	1.0
268975	1514473200	SPXUSD	2684.75000	2687.25000	2684.13000	2687.25000	1000.0	1000.0	1.0
268976	1514476800	SPXUSD	2686.63000	2686.63000	2685.00000	2685.25000	1000.0	1000.0	1.0
268977	1514480400	SPXUSD	2686.63000	2686.63000	2685.00000	2685.25000	1000.0	1000.0	1.0
268978	1514484000	SPXUSD	2684.75000	2687.13000	2684.75000	2686.13000	1000.0	1000.0	1.0
268979	1514487600	SPXUSD	2686.50000	2687.50000	2686.25000	2687.13000	1000.0	1000.0	1.0
268980	1514491200	SPXUSD	2687.50000	2688.00000	2686.75000	2687.50000	1000.0	1000.0	1.0
268981	1514494800	SPXUSD	2687.25000	2687.50000	2686.50000	2687.50000	1000.0	1000.0	1.0
268982	1514498400	SPXUSD	2687.25000	2687.75000	2687.25000	2687.63000	1000.0	1000.0	1.0
268983	1514502000	SPXUSD	2687.50000	2687.75000	2687.50000	2687.63000	1000.0	1000.0	1.0
268984	1514505600	SPXUSD	2687.63000	2687.75000	2687.50000	2687.50000	1000.0	1000.0	1.0
268985	1514509200	SPXUSD	2687.63000	2688.00000	2687.50000	2688.00000	1000.0	1000.0	1.0
268986	1514512800	SPXUSD	2688.25000	2689.00000	2688.00000	2688.75000	1000.0	1000.0	1.0
268987	1514516400	SPXUSD	2688.50000	2689.25000	2687.63000	2688.75000	1000.0	1000.0	1.0
268988	1514520000	SPXUSD	2688.88000	2690.75000	2688.63000	2690.75000	1000.0	1000.0	1.0
268989	1514523600	SPXUSD	2691.00000	2693.75000	2691.00000	2693.75000	1000.0	1000.0	1.0
268990	1514527200	SPXUSD	2693.63000	2695.00000	2693.50000	2694.50000	1000.0	1000.0	1.0
268991	1514530800	SPXUSD	2694.25000	2697.75000	2694.25000	2695.25000	1000.0	1000.0	1.0
268992	1514534400	SPXUSD	2695.50000	2696.13000	2693.50000	2693.75000	1000.0	1000.0	1.0
268993	1514538000	SPXUSD	2694.00000	2695.00000	2687.00000	2687.13000	1000.0	1000.0	1.0
268994	1514541600	SPXUSD	2687.50000	2687.88000	2683.50000	2685.13000	1000.0	1000.0	1.0
268995	1514545200	SPXUSD	2685.50000	2686.25000	2683.75000	2685.63000	1000.0	1000.0	1.0
268996	1514548800	SPXUSD	2685.88000	2686.13000	2684.25000	2684.75000	1000.0	1000.0	1.0
268997	1514552400	SPXUSD	2684.25000	2685.25000	2681.50000	2682.25000	1000.0	1000.0	1.0
268998	1514556000	SPXUSD	2682.25000	2683.75000	2682.00000	2683.25000	1000.0	1000.0	1.0
268999	1514559600	SPXUSD	2683.00000	2686.00000	2677.50000	2677.50000	1000.0	1000.0	1.0
269000	1514563200	SPXUSD	2676.75000	2677.00000	2667.75000	2668.25000	1000.0	1000.0	1.0

269001 rows × 9 columns In [11]:

# Create the apropiate features dataframe (In this case is open prices of all symbols)
symbols = all_symbols
df2 = pd.DataFrame()
for symbol in symbols:
    df1 = df.loc[(df['symbol'] == symbol),['timestamp','open']]
    df1.columns=['timestamp',symbol]
    # Setting the timestamp as the index
    df1.set_index('timestamp', inplace=True)
    
    # Convert timestamps to dates but it's not mandatory
    #df1.index = pd.to_datetime(df1.index, unit='s')

    # Just perform a join and that's it
    df2 = df2.join(df1, how='outer')

# Filling the remaining gaps backguards (the initial gaps has not before value)
df2 = df2.fillna(method='bfill')

# Independent variables data
X_raw = df2

# Dimensions of dataset
print("Dimensions of dataset")
n = X_raw.shape[0]
p = X_raw.shape[1]
print("n=",n,"p=",p)

# Drop timestamp variable (only when necessary)
#Ram print("Drop timestamp variable")
#Ram X_raw = X_raw.drop(['timestamp'], 1)

Dimensions of dataset
n= 8679 p= 31

In [12]:

X_raw

Out[12]:

	EURUSD	AUDNZD	AUDUSD	AUDJPY	EURCHF	EURGBP	EURJPY	GBPCHF	GBPJPY	GBPUSD	…	EURNZD	GBPAUD	GBPCAD	GBPNZD	NZDCAD	NZDCHF	NZDJPY	XAGUSD	XAUUSD	SPXUSD
timestamp
1483322400	1.05155	1.03924	0.72074	84.258	1.07186	0.85184	122.966	1.25839	144.236	1.23400	…	1.51584	1.71140	1.65760	1.77865	0.93094	0.70579	80.960	15.883	1150.67	2241.00
1483326000	1.05152	1.03608	0.71891	84.202	1.07199	0.85157	123.163	1.25825	144.595	1.23433	…	1.51573	1.71656	1.65910	1.77881	0.93154	0.70649	81.170	15.883	1150.67	2241.00
1483329600	1.04889	1.03603	0.71750	84.217	1.07144	0.84993	123.111	1.26015	144.812	1.23364	…	1.51382	1.71926	1.65383	1.78057	0.92817	0.70687	81.240	15.883	1150.67	2241.00
1483333200	1.04866	1.03611	0.71701	84.172	1.07140	0.85059	123.128	1.25899	144.725	1.23254	…	1.51576	1.71859	1.65222	1.78131	0.92698	0.70633	81.184	15.883	1150.67	2241.00
1483336800	1.04805	1.03606	0.71778	84.221	1.07183	0.85169	122.980	1.25784	144.351	1.23011	…	1.51246	1.71370	1.65062	1.77544	0.92896	0.70795	81.204	15.883	1150.67	2241.00
1483340400	1.04782	1.03568	0.71772	84.254	1.07134	0.85166	123.005	1.25731	144.372	1.22994	…	1.51195	1.71307	1.65114	1.77450	0.92954	0.70766	81.237	15.883	1150.67	2241.00
1483344000	1.04655	1.03599	0.71761	84.297	1.07100	0.85130	122.946	1.25765	144.387	1.22904	…	1.51043	1.71221	1.65046	1.77375	0.92973	0.70855	81.332	15.883	1150.67	2241.00
1483347600	1.04655	1.03480	0.71728	84.155	1.07110	0.85168	122.789	1.25709	144.164	1.22857	…	1.50970	1.71236	1.65043	1.77227	0.93100	0.70900	81.283	15.883	1150.67	2241.00
1483351200	1.04718	1.03359	0.71845	84.312	1.07134	0.85193	122.897	1.25634	144.227	1.22885	…	1.50634	1.70967	1.64825	1.76754	0.93260	0.71065	81.551	15.883	1150.67	2241.00
1483354800	1.04696	1.03366	0.71846	84.318	1.07134	0.85152	122.875	1.25775	144.277	1.22919	…	1.50582	1.71072	1.64941	1.76835	0.93280	0.71090	81.560	15.883	1150.67	2241.00
1483358400	1.04690	1.03391	0.71835	84.321	1.07139	0.85148	122.879	1.25804	144.264	1.22920	…	1.50657	1.71044	1.65050	1.76905	0.93194	0.71030	81.484	15.883	1150.67	2241.00
1483362000	1.04654	1.03403	0.71823	84.289	1.07140	0.85166	122.823	1.25711	144.136	1.22790	…	1.50749	1.70953	1.64927	1.76738	0.93164	0.70988	81.406	15.883	1150.67	2241.00
1483365600	1.04600	1.03422	0.71796	84.332	1.07071	0.85151	122.875	1.25698	144.236	1.22804	…	1.50679	1.70985	1.64925	1.76909	0.93150	0.70981	81.479	15.883	1150.67	2241.00
1483369200	1.04589	1.03556	0.71862	84.437	1.07060	0.85138	122.972	1.25740	144.456	1.22852	…	1.50823	1.70910	1.65014	1.77119	0.93075	0.70915	81.468	15.883	1150.67	2241.00
1483372800	1.04582	1.03607	0.71837	84.467	1.07056	0.85138	122.958	1.25674	144.410	1.22838	…	1.50777	1.70935	1.64978	1.77110	0.93050	0.70895	81.431	15.883	1150.67	2241.00
1483376400	1.04534	1.03681	0.71822	84.391	1.07000	0.85162	122.788	1.25583	144.221	1.22725	…	1.50930	1.70853	1.64827	1.77116	0.92998	0.70800	81.267	15.883	1150.67	2241.00
1483380000	1.04616	1.03849	0.71960	84.410	1.07109	0.85199	122.726	1.25622	143.983	1.22748	…	1.50942	1.70538	1.64982	1.77098	0.93085	0.70875	81.250	15.883	1150.67	2241.00
1483383600	1.04572	1.03832	0.71952	84.519	1.07095	0.85142	122.826	1.25773	144.265	1.22796	…	1.50886	1.70642	1.65047	1.77178	0.93135	0.70960	81.392	15.967	1150.30	2242.75
1483387200	1.04660	1.03688	0.71945	84.431	1.07064	0.85168	122.828	1.25694	144.212	1.22879	…	1.50843	1.70782	1.65136	1.77084	0.93236	0.70974	81.422	15.959	1150.69	2241.25
1483390800	1.04758	1.03852	0.72167	84.717	1.07051	0.85173	122.969	1.25666	144.368	1.22979	…	1.50751	1.70403	1.65120	1.76977	0.93290	0.71000	81.560	16.060	1156.09	2242.25
1483394400	1.04717	1.03831	0.72244	84.807	1.07058	0.85151	122.923	1.25712	144.345	1.22968	…	1.50496	1.70201	1.64973	1.76726	0.93335	0.71125	81.664	16.056	1157.15	2243.75
1483398000	1.04845	1.03841	0.72297	84.862	1.07104	0.85210	123.066	1.25679	144.420	1.23033	…	1.50588	1.70172	1.64977	1.76718	0.93345	0.71110	81.707	16.075	1156.88	2244.25
1483401600	1.04882	1.03714	0.72331	84.862	1.07116	0.85240	123.053	1.25650	144.344	1.23033	…	1.50380	1.70083	1.64931	1.76402	0.93490	0.71216	81.815	16.103	1157.33	2244.50
1483405200	1.04820	1.03796	0.72295	84.837	1.07131	0.85243	122.997	1.25663	144.275	1.22955	…	1.50485	1.70063	1.64905	1.76522	0.93406	0.71180	81.717	16.113	1157.77	2246.00
1483408800	1.04880	1.03807	0.72348	85.008	1.07175	0.85235	123.226	1.25729	144.561	1.23037	…	1.50472	1.70050	1.64914	1.76523	0.93425	0.71225	81.884	16.077	1157.46	2247.75
1483412400	1.04600	1.03866	0.72142	85.012	1.07110	0.85123	123.263	1.25818	144.790	1.22872	…	1.50598	1.70307	1.65011	1.76894	0.93270	0.71118	81.838	16.028	1153.10	2249.00
1483416000	1.04351	1.03829	0.72133	85.164	1.07041	0.85197	123.195	1.25643	144.593	1.22476	…	1.50198	1.69784	1.64314	1.76273	0.93199	0.71264	82.018	15.995	1151.48	2250.50
1483419600	1.04017	1.03987	0.72049	85.129	1.06953	0.84643	122.898	1.26351	145.179	1.22880	…	1.50119	1.70537	1.65028	1.77347	0.93042	0.71235	81.856	15.939	1148.10	2250.00
1483423200	1.03964	1.04080	0.72021	85.176	1.06996	0.84672	122.951	1.26355	145.195	1.22773	…	1.50230	1.70454	1.64911	1.77410	0.92945	0.71210	81.829	15.903	1146.78	2250.75
1483426800	1.03941	1.04125	0.72077	85.197	1.06951	0.84562	122.858	1.26463	145.280	1.22909	…	1.50146	1.70504	1.65278	1.77541	0.93080	0.71220	81.811	15.933	1148.56	2254.00
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…
1514458800	1.19366	1.09901	0.77911	87.941	1.16874	0.88842	134.756	1.31543	151.664	1.34353	…	1.68393	1.72440	1.69124	1.89522	0.89225	0.69398	80.010	16.808	1293.84	2684.88
1514462400	1.19513	1.09924	0.77920	87.933	1.16825	0.88855	134.885	1.31462	151.785	1.34492	…	1.68597	1.72592	1.69233	1.89726	0.89182	0.69281	79.986	16.837	1294.27	2685.25
1514466000	1.19530	1.09967	0.77894	87.913	1.16944	0.88887	134.914	1.31555	151.773	1.34470	…	1.68754	1.72637	1.69252	1.89837	0.89155	0.69295	79.931	16.855	1294.02	2684.75
1514469600	1.19578	1.10011	0.77931	87.951	1.16956	0.88891	134.967	1.31561	151.822	1.34517	…	1.68844	1.72597	1.69196	1.89900	0.89072	0.69270	79.935	16.847	1294.31	2683.50
1514473200	1.19527	1.09959	0.77961	87.986	1.16921	0.88902	134.916	1.31509	151.745	1.34447	…	1.68636	1.72443	1.68929	1.89629	0.89085	0.69335	80.003	16.843	1294.68	2684.75
1514476800	1.19403	1.09967	0.77938	87.989	1.16907	0.88821	134.813	1.31612	151.764	1.34428	…	1.68503	1.72466	1.68915	1.89675	0.89050	0.69376	79.998	16.833	1294.38	2686.63
1514480400	1.19415	1.09891	0.77902	87.916	1.16841	0.88740	134.779	1.31465	151.672	1.34449	…	1.68483	1.72421	1.68910	1.89612	0.89022	0.69279	79.950	16.833	1294.69	2686.63
1514484000	1.19384	1.10010	0.77944	87.978	1.16842	0.88877	134.758	1.31398	151.604	1.34274	…	1.68464	1.72234	1.68742	1.89546	0.89035	0.69320	79.947	16.832	1294.12	2684.75
1514487600	1.19394	1.09862	0.77935	87.974	1.16867	0.88830	134.785	1.31545	151.725	1.34403	…	1.68325	1.72425	1.68861	1.89439	0.89120	0.69423	80.064	16.835	1293.97	2686.50
1514491200	1.19441	1.09914	0.77986	88.042	1.16825	0.88820	134.862	1.31515	151.812	1.34460	…	1.68345	1.72402	1.68903	1.89497	0.89115	0.69395	80.092	16.847	1294.64	2687.50
1514494800	1.19421	1.09799	0.77934	87.872	1.16797	0.88797	134.662	1.31517	151.634	1.34477	…	1.68254	1.72530	1.68976	1.89456	0.89175	0.69405	80.015	16.827	1295.58	2687.25
1514498400	1.19476	1.09875	0.77962	87.891	1.16901	0.88814	134.715	1.31611	151.661	1.34507	…	1.68384	1.72523	1.68971	1.89574	0.89115	0.69415	79.983	16.862	1295.80	2687.25
1514502000	1.19426	1.09799	0.77934	87.885	1.16877	0.88751	134.688	1.31685	151.747	1.34562	…	1.68253	1.72631	1.69046	1.89568	0.89165	0.69455	80.033	16.846	1295.40	2687.50
1514505600	1.19444	1.09887	0.77992	87.931	1.16883	0.88750	134.680	1.31689	151.740	1.34581	…	1.68300	1.72545	1.69048	1.89619	0.89145	0.69445	80.011	16.850	1295.65	2687.63
1514509200	1.19489	1.09839	0.78028	87.937	1.16913	0.88767	134.677	1.31701	151.700	1.34604	…	1.68203	1.72495	1.68985	1.89482	0.89175	0.69497	80.052	16.854	1295.94	2687.63
1514512800	1.19551	1.09781	0.77998	87.868	1.16931	0.88772	134.696	1.31707	151.713	1.34660	…	1.68272	1.72639	1.69117	1.89539	0.89215	0.69481	80.031	16.839	1295.56	2688.25
1514516400	1.19523	1.09712	0.77934	87.795	1.16780	0.88730	134.666	1.31588	151.741	1.34688	…	1.68266	1.72820	1.69100	1.89617	0.89169	0.69390	80.012	16.820	1295.06	2688.50
1514520000	1.19674	1.09776	0.78087	87.906	1.16794	0.88605	134.739	1.31806	152.055	1.35060	…	1.68240	1.72952	1.69311	1.89864	0.89162	0.69412	80.070	16.867	1296.39	2688.88
1514523600	1.19814	1.09773	0.78131	87.980	1.16921	0.88734	134.927	1.31755	152.048	1.35021	…	1.68336	1.72798	1.69313	1.89697	0.89246	0.69447	80.140	16.875	1296.81	2691.00
1514527200	1.19803	1.09806	0.78168	87.961	1.16825	0.88766	134.829	1.31601	151.901	1.34962	…	1.68284	1.72648	1.69159	1.89591	0.89215	0.69410	80.103	16.881	1297.02	2693.63
1514530800	1.19870	1.09797	0.78197	87.984	1.16933	0.88724	134.888	1.31782	152.009	1.35096	…	1.68314	1.72751	1.69326	1.89685	0.89260	0.69466	80.126	16.872	1297.02	2694.25
1514534400	1.19951	1.09836	0.78151	87.960	1.17005	0.88744	135.024	1.31837	152.136	1.35162	…	1.68589	1.72941	1.69573	1.89954	0.89260	0.69399	80.079	16.853	1296.58	2695.50
1514538000	1.19867	1.09747	0.78089	87.900	1.17050	0.88620	134.937	1.32071	152.247	1.35250	…	1.68458	1.73181	1.69770	1.90057	0.89314	0.69469	80.090	16.920	1299.53	2694.00
1514541600	1.19927	1.09780	0.78133	87.902	1.17051	0.88721	134.936	1.31926	152.078	1.35165	…	1.68512	1.72979	1.69745	1.89918	0.89375	0.69457	80.060	16.938	1301.33	2687.50
1514545200	1.20097	1.09967	0.78218	88.091	1.17024	0.88769	135.280	1.31823	152.381	1.35287	…	1.68840	1.72955	1.69497	1.90194	0.89109	0.69300	80.105	16.937	1302.56	2685.50
1514548800	1.20220	1.10062	0.78139	88.052	1.17085	0.88871	135.485	1.31744	152.439	1.35271	…	1.69340	1.73105	1.69322	1.90541	0.88855	0.69136	79.992	17.003	1303.34	2685.88
1514552400	1.20214	1.10096	0.78118	87.944	1.17043	0.88892	135.355	1.31654	152.247	1.35239	…	1.69423	1.73104	1.69390	1.90590	0.88872	0.69075	79.875	17.056	1306.05	2684.25
1514556000	1.20134	1.10049	0.78080	87.947	1.17066	0.88824	135.331	1.31786	152.342	1.35246	…	1.69315	1.73193	1.69573	1.90618	0.88950	0.69126	79.909	16.956	1303.65	2682.25
1514559600	1.20092	1.10046	0.78089	87.969	1.17067	0.88846	135.301	1.31753	152.272	1.35169	…	1.69246	1.73080	1.69492	1.90484	0.88965	0.69159	79.925	16.941	1302.93	2683.00
1514563200	1.19978	1.10081	0.78018	87.914	1.16901	0.88865	135.204	1.31534	152.126	1.34997	…	1.69282	1.73016	1.69716	1.90459	0.89085	0.69045	79.839	16.946	1303.20	2676.75

8679 rows × 31 columns In [13]:

#A quick look at the dataframe time series using pyplot.plot(X_raw['XAUUSD']):
plt.plot(X_raw['XAUUSD'])

Out[13]:

[<matplotlib.lines.Line2D at 0x7fad67dd6128>]

In [14]:

# Target
# We use as target one of the symbols rate, i.e. "XAUUSD". That is we try to predict next value of XAUUSD
lag = -1
y_raw = df2.loc[:,"XAUUSD"].shift(periods=lag)

In [15]:

y_raw

Out[15]:

timestamp
1483322400    1150.67
1483326000    1150.67
1483329600    1150.67
1483333200    1150.67
1483336800    1150.67
1483340400    1150.67
1483344000    1150.67
1483347600    1150.67
1483351200    1150.67
1483354800    1150.67
1483358400    1150.67
1483362000    1150.67
1483365600    1150.67
1483369200    1150.67
1483372800    1150.67
1483376400    1150.67
1483380000    1150.30
1483383600    1150.69
1483387200    1156.09
1483390800    1157.15
1483394400    1156.88
1483398000    1157.33
1483401600    1157.77
1483405200    1157.46
1483408800    1153.10
1483412400    1151.48
1483416000    1148.10
1483419600    1146.78
1483423200    1148.56
1483426800    1148.72
               ...   
1514458800    1294.27
1514462400    1294.02
1514466000    1294.31
1514469600    1294.68
1514473200    1294.38
1514476800    1294.69
1514480400    1294.12
1514484000    1293.97
1514487600    1294.64
1514491200    1295.58
1514494800    1295.80
1514498400    1295.40
1514502000    1295.65
1514505600    1295.94
1514509200    1295.56
1514512800    1295.06
1514516400    1296.39
1514520000    1296.81
1514523600    1297.02
1514527200    1297.02
1514530800    1296.58
1514534400    1299.53
1514538000    1301.33
1514541600    1302.56
1514545200    1303.34
1514548800    1306.05
1514552400    1303.65
1514556000    1302.93
1514559600    1303.20
1514563200        NaN
Name: XAUUSD, Length: 8679, dtype: float64

In [16]:

#
# Removal of Null values**
# Now since there still exists 'NaN' values in our target dataframe, and these are Null values,
# we have to do something about them. In here, I will just do the naive thing of replacing these NaNs
# with previous value because it is only the last value an error is negligible as such:

# Filling gaps forward
y_raw = y_raw.fillna(method='pad')
y_raw

# Drop timestamp variable (only when necessary)
#Ram print("Drop timestamp variable")
#Ram y_raw = data.drop(['timestamp'], 1)

Out[16]:

timestamp
1483322400    1150.67
1483326000    1150.67
1483329600    1150.67
1483333200    1150.67
1483336800    1150.67
1483340400    1150.67
1483344000    1150.67
1483347600    1150.67
1483351200    1150.67
1483354800    1150.67
1483358400    1150.67
1483362000    1150.67
1483365600    1150.67
1483369200    1150.67
1483372800    1150.67
1483376400    1150.67
1483380000    1150.30
1483383600    1150.69
1483387200    1156.09
1483390800    1157.15
1483394400    1156.88
1483398000    1157.33
1483401600    1157.77
1483405200    1157.46
1483408800    1153.10
1483412400    1151.48
1483416000    1148.10
1483419600    1146.78
1483423200    1148.56
1483426800    1148.72
               ...   
1514458800    1294.27
1514462400    1294.02
1514466000    1294.31
1514469600    1294.68
1514473200    1294.38
1514476800    1294.69
1514480400    1294.12
1514484000    1293.97
1514487600    1294.64
1514491200    1295.58
1514494800    1295.80
1514498400    1295.40
1514502000    1295.65
1514505600    1295.94
1514509200    1295.56
1514512800    1295.06
1514516400    1296.39
1514520000    1296.81
1514523600    1297.02
1514527200    1297.02
1514530800    1296.58
1514534400    1299.53
1514538000    1301.33
1514541600    1302.56
1514545200    1303.34
1514548800    1306.05
1514552400    1303.65
1514556000    1302.93
1514559600    1303.20
1514563200    1303.20
Name: XAUUSD, Length: 8679, dtype: float64

Split data

In [17]:

# split into train and test sets

# Total samples
nsamples = n

# Splitting into train (70%) and test (30%) sets
split = 70 # training split% ; test (100-split)%
jindex = nsamples*split//100 # Index for slicing the samples

# Samples in train
nsamples_train = jindex

# Samples in test
nsamples_test = nsamples - nsamples_train
print("Total number of samples: ",nsamples,"\nSamples in train set: ", nsamples_train,
      "\nSamples in test set: ",nsamples_test)

# Here are train and test samples
X_train = X_raw.values[:jindex, :]
y_train = y_raw.values[:jindex]

X_test = X_raw.values[jindex:, :]
y_test = y_raw.values[jindex:]

print("X_train.shape = ", X_train.shape, "y_train.shape =", y_train.shape, "\nX_test.shape =  ",
      X_test.shape, "y_test.shape = ", y_test.shape)

Total number of samples:  8679 
Samples in train set:  6075 
Samples in test set:  2604
X_train.shape =  (6075, 31) y_train.shape = (6075,) 
X_test.shape =   (2604, 31) y_test.shape =  (2604,)

In [18]:

#X_train as dataframe (optional, only for printing. See note in the beginning)
X_Train = pd.DataFrame(data=X_train)
X_Train.columns = X_raw.columns
print("X_train")
X_Train

X_train

Out[18]:

	EURUSD	AUDNZD	AUDUSD	AUDJPY	EURCHF	EURGBP	EURJPY	GBPCHF	GBPJPY	GBPUSD	…	EURNZD	GBPAUD	GBPCAD	GBPNZD	NZDCAD	NZDCHF	NZDJPY	XAGUSD	XAUUSD	SPXUSD
0	1.05155	1.03924	0.72074	84.258	1.07186	0.85184	122.966	1.25839	144.236	1.23400	…	1.51584	1.71140	1.65760	1.77865	0.93094	0.70579	80.960	15.883	1150.67	2241.00
1	1.05152	1.03608	0.71891	84.202	1.07199	0.85157	123.163	1.25825	144.595	1.23433	…	1.51573	1.71656	1.65910	1.77881	0.93154	0.70649	81.170	15.883	1150.67	2241.00
2	1.04889	1.03603	0.71750	84.217	1.07144	0.84993	123.111	1.26015	144.812	1.23364	…	1.51382	1.71926	1.65383	1.78057	0.92817	0.70687	81.240	15.883	1150.67	2241.00
3	1.04866	1.03611	0.71701	84.172	1.07140	0.85059	123.128	1.25899	144.725	1.23254	…	1.51576	1.71859	1.65222	1.78131	0.92698	0.70633	81.184	15.883	1150.67	2241.00
4	1.04805	1.03606	0.71778	84.221	1.07183	0.85169	122.980	1.25784	144.351	1.23011	…	1.51246	1.71370	1.65062	1.77544	0.92896	0.70795	81.204	15.883	1150.67	2241.00
5	1.04782	1.03568	0.71772	84.254	1.07134	0.85166	123.005	1.25731	144.372	1.22994	…	1.51195	1.71307	1.65114	1.77450	0.92954	0.70766	81.237	15.883	1150.67	2241.00
6	1.04655	1.03599	0.71761	84.297	1.07100	0.85130	122.946	1.25765	144.387	1.22904	…	1.51043	1.71221	1.65046	1.77375	0.92973	0.70855	81.332	15.883	1150.67	2241.00
7	1.04655	1.03480	0.71728	84.155	1.07110	0.85168	122.789	1.25709	144.164	1.22857	…	1.50970	1.71236	1.65043	1.77227	0.93100	0.70900	81.283	15.883	1150.67	2241.00
8	1.04718	1.03359	0.71845	84.312	1.07134	0.85193	122.897	1.25634	144.227	1.22885	…	1.50634	1.70967	1.64825	1.76754	0.93260	0.71065	81.551	15.883	1150.67	2241.00
9	1.04696	1.03366	0.71846	84.318	1.07134	0.85152	122.875	1.25775	144.277	1.22919	…	1.50582	1.71072	1.64941	1.76835	0.93280	0.71090	81.560	15.883	1150.67	2241.00
10	1.04690	1.03391	0.71835	84.321	1.07139	0.85148	122.879	1.25804	144.264	1.22920	…	1.50657	1.71044	1.65050	1.76905	0.93194	0.71030	81.484	15.883	1150.67	2241.00
11	1.04654	1.03403	0.71823	84.289	1.07140	0.85166	122.823	1.25711	144.136	1.22790	…	1.50749	1.70953	1.64927	1.76738	0.93164	0.70988	81.406	15.883	1150.67	2241.00
12	1.04600	1.03422	0.71796	84.332	1.07071	0.85151	122.875	1.25698	144.236	1.22804	…	1.50679	1.70985	1.64925	1.76909	0.93150	0.70981	81.479	15.883	1150.67	2241.00
13	1.04589	1.03556	0.71862	84.437	1.07060	0.85138	122.972	1.25740	144.456	1.22852	…	1.50823	1.70910	1.65014	1.77119	0.93075	0.70915	81.468	15.883	1150.67	2241.00
14	1.04582	1.03607	0.71837	84.467	1.07056	0.85138	122.958	1.25674	144.410	1.22838	…	1.50777	1.70935	1.64978	1.77110	0.93050	0.70895	81.431	15.883	1150.67	2241.00
15	1.04534	1.03681	0.71822	84.391	1.07000	0.85162	122.788	1.25583	144.221	1.22725	…	1.50930	1.70853	1.64827	1.77116	0.92998	0.70800	81.267	15.883	1150.67	2241.00
16	1.04616	1.03849	0.71960	84.410	1.07109	0.85199	122.726	1.25622	143.983	1.22748	…	1.50942	1.70538	1.64982	1.77098	0.93085	0.70875	81.250	15.883	1150.67	2241.00
17	1.04572	1.03832	0.71952	84.519	1.07095	0.85142	122.826	1.25773	144.265	1.22796	…	1.50886	1.70642	1.65047	1.77178	0.93135	0.70960	81.392	15.967	1150.30	2242.75
18	1.04660	1.03688	0.71945	84.431	1.07064	0.85168	122.828	1.25694	144.212	1.22879	…	1.50843	1.70782	1.65136	1.77084	0.93236	0.70974	81.422	15.959	1150.69	2241.25
19	1.04758	1.03852	0.72167	84.717	1.07051	0.85173	122.969	1.25666	144.368	1.22979	…	1.50751	1.70403	1.65120	1.76977	0.93290	0.71000	81.560	16.060	1156.09	2242.25
20	1.04717	1.03831	0.72244	84.807	1.07058	0.85151	122.923	1.25712	144.345	1.22968	…	1.50496	1.70201	1.64973	1.76726	0.93335	0.71125	81.664	16.056	1157.15	2243.75
21	1.04845	1.03841	0.72297	84.862	1.07104	0.85210	123.066	1.25679	144.420	1.23033	…	1.50588	1.70172	1.64977	1.76718	0.93345	0.71110	81.707	16.075	1156.88	2244.25
22	1.04882	1.03714	0.72331	84.862	1.07116	0.85240	123.053	1.25650	144.344	1.23033	…	1.50380	1.70083	1.64931	1.76402	0.93490	0.71216	81.815	16.103	1157.33	2244.50
23	1.04820	1.03796	0.72295	84.837	1.07131	0.85243	122.997	1.25663	144.275	1.22955	…	1.50485	1.70063	1.64905	1.76522	0.93406	0.71180	81.717	16.113	1157.77	2246.00
24	1.04880	1.03807	0.72348	85.008	1.07175	0.85235	123.226	1.25729	144.561	1.23037	…	1.50472	1.70050	1.64914	1.76523	0.93425	0.71225	81.884	16.077	1157.46	2247.75
25	1.04600	1.03866	0.72142	85.012	1.07110	0.85123	123.263	1.25818	144.790	1.22872	…	1.50598	1.70307	1.65011	1.76894	0.93270	0.71118	81.838	16.028	1153.10	2249.00
26	1.04351	1.03829	0.72133	85.164	1.07041	0.85197	123.195	1.25643	144.593	1.22476	…	1.50198	1.69784	1.64314	1.76273	0.93199	0.71264	82.018	15.995	1151.48	2250.50
27	1.04017	1.03987	0.72049	85.129	1.06953	0.84643	122.898	1.26351	145.179	1.22880	…	1.50119	1.70537	1.65028	1.77347	0.93042	0.71235	81.856	15.939	1148.10	2250.00
28	1.03964	1.04080	0.72021	85.176	1.06996	0.84672	122.951	1.26355	145.195	1.22773	…	1.50230	1.70454	1.64911	1.77410	0.92945	0.71210	81.829	15.903	1146.78	2250.75
29	1.03941	1.04125	0.72077	85.197	1.06951	0.84562	122.858	1.26463	145.280	1.22909	…	1.50146	1.70504	1.65278	1.77541	0.93080	0.71220	81.811	15.933	1148.56	2254.00
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…
6045	1.20168	1.11100	0.80467	87.223	1.14031	0.91156	130.252	1.25087	142.879	1.31819	…	1.65905	1.63802	1.60045	1.81988	0.87932	0.68722	78.502	17.851	1337.43	2474.25
6046	1.20139	1.11087	0.80478	87.271	1.14052	0.91170	130.279	1.25090	142.890	1.31769	…	1.65828	1.63719	1.60018	1.81876	0.87970	0.68768	78.551	17.856	1337.46	2475.00
6047	1.20072	1.11067	0.80478	87.223	1.14063	0.91156	130.137	1.25120	142.757	1.31716	…	1.65706	1.63661	1.59962	1.81779	0.87995	0.68825	78.527	17.862	1337.37	2473.75
6048	1.20154	1.10948	0.80548	87.268	1.14095	0.91188	130.183	1.25112	142.747	1.31755	…	1.65495	1.63563	1.59927	1.81487	0.88112	0.68932	78.648	17.850	1337.47	2473.25
6049	1.20080	1.10833	0.80369	87.227	1.14132	0.91048	130.333	1.25339	143.135	1.31879	…	1.65585	1.64079	1.60005	1.81864	0.87976	0.68916	78.706	17.827	1335.48	2473.50
6050	1.20145	1.10777	0.80469	87.240	1.14019	0.91079	130.257	1.25184	143.010	1.31910	…	1.65387	1.63918	1.59949	1.81583	0.88075	0.68930	78.761	17.848	1337.18	2474.00
6051	1.20211	1.10540	0.80498	87.275	1.14040	0.91114	130.334	1.25157	143.036	1.31929	…	1.65076	1.63879	1.59930	1.81167	0.88270	0.69075	78.943	17.862	1338.07	2476.00
6052	1.20129	1.10522	0.80536	87.367	1.14027	0.91048	130.321	1.25233	143.126	1.31934	…	1.64849	1.63809	1.59869	1.81058	0.88300	0.69160	79.045	17.878	1338.08	2476.13
6053	1.20114	1.10568	0.80550	87.418	1.14045	0.90897	130.355	1.25461	143.401	1.32133	…	1.64868	1.64028	1.59863	1.81370	0.88132	0.69165	79.058	17.833	1336.78	2475.50
6054	1.19980	1.10600	0.80483	87.391	1.13960	0.90817	130.287	1.25479	143.460	1.32107	…	1.64873	1.64140	1.59931	1.81546	0.88085	0.69110	79.014	17.781	1335.91	2473.75
6055	1.19843	1.10632	0.80384	87.400	1.14183	0.90818	130.308	1.25717	143.470	1.31952	…	1.64936	1.64137	1.60055	1.81600	0.88130	0.69220	78.998	17.767	1332.58	2474.63
6056	1.19878	1.10652	0.80426	87.490	1.14128	0.90827	130.408	1.25643	143.573	1.31981	…	1.64920	1.64089	1.60301	1.81573	0.88280	0.69192	79.064	17.848	1334.29	2480.88
6057	1.19825	1.10630	0.80279	87.416	1.14082	0.90897	130.476	1.25502	143.539	1.31827	…	1.65122	1.64192	1.60035	1.81659	0.88097	0.69083	79.013	17.884	1334.40	2480.75
6058	1.19632	1.10682	0.80222	87.471	1.14065	0.90817	130.445	1.25585	143.629	1.31722	…	1.65052	1.64184	1.59865	1.81731	0.87960	0.69100	79.023	17.874	1333.09	2484.25
6059	1.19697	1.10703	0.80324	87.623	1.14064	0.90822	130.567	1.25589	143.754	1.31784	…	1.64952	1.64051	1.59793	1.81607	0.87971	0.69140	79.142	17.886	1333.36	2486.63
6060	1.19632	1.10693	0.80279	87.799	1.14112	0.90802	130.838	1.25657	144.088	1.31744	…	1.64954	1.64099	1.59686	1.81653	0.87905	0.69170	79.313	17.813	1329.79	2487.25
6061	1.19622	1.10810	0.80295	87.800	1.14164	0.90800	130.800	1.25718	144.052	1.31738	…	1.65091	1.64052	1.59551	1.81804	0.87750	0.69150	79.219	17.804	1329.61	2486.00
6062	1.19524	1.10754	0.80267	87.871	1.14321	0.90772	130.845	1.25933	144.139	1.31667	…	1.64920	1.64024	1.59565	1.81683	0.87819	0.69306	79.326	17.756	1327.42	2486.50
6063	1.19508	1.10589	0.80269	87.788	1.14193	0.90790	130.715	1.25789	143.930	1.31593	…	1.64693	1.63883	1.59311	1.81396	0.87820	0.69325	79.304	17.792	1327.42	2486.50
6064	1.19540	1.10541	0.80280	87.803	1.14302	0.90773	130.754	1.25847	144.019	1.31676	…	1.64577	1.63968	1.59433	1.81292	0.87922	0.69395	79.405	17.803	1327.67	2488.00
6065	1.19621	1.10511	0.80309	87.752	1.14295	0.90793	130.707	1.25876	143.961	1.31748	…	1.64602	1.64027	1.59418	1.81278	0.87916	0.69425	79.395	17.816	1328.82	2488.00
6066	1.19579	1.10600	0.80230	87.770	1.14353	0.90801	130.819	1.25933	144.069	1.31687	…	1.64847	1.64131	1.59443	1.81544	0.87820	0.69364	79.348	17.789	1327.23	2488.25
6067	1.19472	1.10606	0.80136	87.781	1.14337	0.90758	130.867	1.25968	144.186	1.31628	…	1.64899	1.64249	1.59547	1.81678	0.87807	0.69330	79.353	17.769	1325.02	2489.13
6068	1.19548	1.10490	0.80033	87.513	1.14320	0.90748	130.720	1.25966	144.041	1.31726	…	1.65035	1.64579	1.59619	1.81861	0.87763	0.69262	79.192	17.748	1325.64	2487.75
6069	1.19610	1.10576	0.80106	87.551	1.14334	0.90752	130.730	1.25980	144.044	1.31796	…	1.65108	1.64518	1.59566	1.81918	0.87701	0.69241	79.171	17.751	1325.88	2487.50
6070	1.19545	1.10792	0.80089	87.593	1.14376	0.90720	130.744	1.26060	144.109	1.31766	…	1.65370	1.64516	1.59670	1.82275	0.87595	0.69152	79.053	17.745	1325.08	2487.50
6071	1.19555	1.10838	0.80126	87.637	1.14362	0.90739	130.766	1.26025	144.104	1.31750	…	1.65384	1.64425	1.59617	1.82251	0.87564	0.69142	79.059	17.736	1324.10	2487.75
6072	1.19588	1.10955	0.80165	87.693	1.14346	0.90748	130.818	1.25997	144.149	1.31773	…	1.65511	1.64364	1.59617	1.82377	0.87509	0.69077	79.028	17.754	1325.43	2488.00
6073	1.19762	1.10360	0.80369	88.039	1.14339	0.90769	131.194	1.25957	144.524	1.31929	…	1.64448	1.64147	1.59486	1.81168	0.88025	0.69520	79.769	17.766	1326.20	2489.13
6074	1.19746	1.10027	0.80427	88.100	1.14423	0.90722	131.169	1.26115	144.584	1.31987	…	1.63822	1.64105	1.59634	1.80576	0.88393	0.69838	80.057	17.795	1327.00	2489.25

6075 rows × 31 columns In [19]:

#X_test as dataframe (optional, only for printing. See note in the beginning)
X_Test = pd.DataFrame(data=X_test)
X_Test.columns = X_raw.columns
print("X_test")
X_Test

X_test

Out[19]:

	EURUSD	AUDNZD	AUDUSD	AUDJPY	EURCHF	EURGBP	EURJPY	GBPCHF	GBPJPY	GBPUSD	…	EURNZD	GBPAUD	GBPCAD	GBPNZD	NZDCAD	NZDCHF	NZDJPY	XAGUSD	XAUUSD	SPXUSD
0	1.19578	1.09851	0.80288	88.085	1.14416	0.90125	131.191	1.26945	145.561	1.32678	…	1.63594	1.65241	1.60775	1.81533	0.88565	0.69925	80.182	17.791	1326.34	2490.75
1	1.19614	1.09953	0.80275	88.058	1.14519	0.90206	131.208	1.26944	145.449	1.32595	…	1.63819	1.65175	1.60777	1.81606	0.88523	0.69895	80.084	17.795	1326.37	2490.25
2	1.19365	1.09878	0.80263	88.133	1.14539	0.89908	131.074	1.27397	145.785	1.32761	…	1.63403	1.65402	1.61047	1.81740	0.88600	0.70089	80.208	17.738	1323.60	2491.50
3	1.19318	1.10105	0.80242	88.140	1.14431	0.89896	131.063	1.27284	145.786	1.32723	…	1.63725	1.65401	1.61183	1.82118	0.88500	0.69888	80.043	17.795	1325.55	2492.00
4	1.19296	1.10064	0.80168	88.150	1.14647	0.89998	131.173	1.27381	145.750	1.32555	…	1.63778	1.65331	1.61062	1.81990	0.88489	0.69994	80.084	17.786	1325.43	2490.50
5	1.19462	1.10157	0.80184	88.236	1.14733	0.90103	131.454	1.27327	145.890	1.32577	…	1.64118	1.65333	1.61067	1.82138	0.88425	0.69904	80.092	17.829	1326.65	2493.13
6	1.19538	1.10092	0.80341	88.309	1.14719	0.90131	131.389	1.27271	145.768	1.32617	…	1.63795	1.65050	1.61040	1.81718	0.88613	0.70030	80.209	17.826	1327.29	2490.75
7	1.19675	1.10215	0.80411	88.464	1.14845	0.90122	131.667	1.27420	146.092	1.32788	…	1.64028	1.65127	1.61399	1.82000	0.88670	0.70005	80.261	17.830	1327.19	2492.50
8	1.19628	1.10149	0.80307	88.369	1.14825	0.90119	131.641	1.27396	146.063	1.32738	…	1.64076	1.65268	1.61441	1.82046	0.88666	0.69969	80.221	17.846	1328.13	2494.50
9	1.19601	1.10063	0.80242	88.308	1.14873	0.90030	131.627	1.27586	146.195	1.32837	…	1.64048	1.65546	1.61630	1.82197	0.88700	0.70018	80.228	17.834	1328.40	2493.50
10	1.19699	1.09980	0.80208	88.313	1.14980	0.90030	131.787	1.27703	146.373	1.32944	…	1.64122	1.65735	1.61877	1.82283	0.88805	0.70055	80.291	17.881	1330.77	2493.00
11	1.19664	1.10027	0.80187	88.370	1.14943	0.90086	131.876	1.27590	146.380	1.32826	…	1.64209	1.65641	1.61772	1.82258	0.88743	0.69993	80.299	17.872	1331.46	2495.88
12	1.19648	1.09963	0.80159	88.273	1.14803	0.90050	131.770	1.27413	146.238	1.32792	…	1.64121	1.65548	1.61756	1.82174	0.88775	0.69895	80.247	17.872	1331.50	2496.25
13	1.19650	1.10062	0.80195	88.331	1.14904	0.90022	131.799	1.27551	146.361	1.32878	…	1.64188	1.65658	1.61823	1.82357	0.88702	0.69935	80.235	17.909	1333.50	2496.75
14	1.19717	1.10016	0.80185	88.383	1.14920	0.90029	131.952	1.27635	146.556	1.32963	…	1.64248	1.65802	1.61920	1.82424	0.88750	0.69958	80.325	17.878	1331.98	2496.50
15	1.19710	1.10150	0.80160	88.359	1.14930	0.90081	131.958	1.27570	146.483	1.32882	…	1.64497	1.65762	1.61791	1.82577	0.88605	0.69866	80.210	17.857	1330.81	2496.13
16	1.19695	1.10336	0.80227	88.371	1.14961	0.90096	131.854	1.27588	146.329	1.32842	…	1.64624	1.65576	1.61754	1.82693	0.88528	0.69827	80.082	17.853	1330.75	2494.75
17	1.19841	1.10355	0.80350	88.436	1.14912	0.90094	131.900	1.27534	146.395	1.33012	…	1.64584	1.65525	1.61854	1.82670	0.88589	0.69811	80.129	17.873	1331.95	2494.75
18	1.19848	1.10198	0.80335	88.393	1.14917	0.90023	131.873	1.27641	146.475	1.33122	…	1.64396	1.65694	1.61935	1.82602	0.88670	0.69895	80.205	17.872	1332.19	2494.25
19	1.19824	1.10111	0.80294	88.360	1.14938	0.90024	131.866	1.27669	146.469	1.33096	…	1.64335	1.65755	1.61935	1.82536	0.88707	0.69940	80.238	17.866	1331.61	2494.25
20	1.19832	1.10096	0.80243	88.368	1.14980	0.90046	131.966	1.27682	146.545	1.33071	…	1.64406	1.65825	1.61905	1.82565	0.88669	0.69929	80.258	17.853	1330.83	2494.13
21	1.19788	1.10107	0.80265	88.344	1.14915	0.90032	131.843	1.27615	146.432	1.33044	…	1.64322	1.65743	1.61896	1.82499	0.88695	0.69915	80.221	17.855	1330.61	2493.50
22	1.19737	1.10057	0.80275	88.369	1.15004	0.89969	131.808	1.27821	146.497	1.33080	…	1.64151	1.65771	1.61929	1.82451	0.88738	0.70050	80.287	17.854	1329.95	2492.38
23	1.19846	1.10068	0.80347	88.335	1.14923	0.89985	131.761	1.27707	146.421	1.33179	…	1.64163	1.65743	1.61875	1.82427	0.88724	0.69993	80.251	17.931	1332.22	2491.63
24	1.19792	1.10282	0.80377	88.449	1.14966	0.90226	131.821	1.27409	146.091	1.32763	…	1.64362	1.65162	1.61231	1.82154	0.88500	0.69936	80.192	17.925	1332.34	2493.00
25	1.19722	1.10301	0.80346	88.433	1.14882	0.90236	131.769	1.27311	146.026	1.32670	…	1.64353	1.65113	1.61161	1.82124	0.88481	0.69895	80.168	17.919	1332.68	2492.25
26	1.19863	1.10318	0.80437	88.442	1.14969	0.90363	131.794	1.27223	145.840	1.32640	…	1.64381	1.64889	1.60995	1.81911	0.88495	0.69931	80.165	17.944	1334.40	2492.00
27	1.19853	1.10372	0.80389	88.490	1.15027	0.90263	131.932	1.27430	146.160	1.32775	…	1.64553	1.65158	1.61214	1.82297	0.88429	0.69896	80.164	17.936	1333.14	2493.00
28	1.19739	1.10352	0.80284	88.415	1.14903	0.90231	131.864	1.27331	146.135	1.32699	…	1.64576	1.65281	1.61170	1.82396	0.88358	0.69811	80.114	17.851	1331.53	2493.00
29	1.19376	1.10190	0.80065	88.350	1.14766	0.90136	131.729	1.27318	146.142	1.32437	…	1.64280	1.65400	1.61254	1.82258	0.88470	0.69855	80.177	17.756	1327.27	2494.75
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…
2574	1.19366	1.09901	0.77911	87.941	1.16874	0.88842	134.756	1.31543	151.664	1.34353	…	1.68393	1.72440	1.69124	1.89522	0.89225	0.69398	80.010	16.808	1293.84	2684.88
2575	1.19513	1.09924	0.77920	87.933	1.16825	0.88855	134.885	1.31462	151.785	1.34492	…	1.68597	1.72592	1.69233	1.89726	0.89182	0.69281	79.986	16.837	1294.27	2685.25
2576	1.19530	1.09967	0.77894	87.913	1.16944	0.88887	134.914	1.31555	151.773	1.34470	…	1.68754	1.72637	1.69252	1.89837	0.89155	0.69295	79.931	16.855	1294.02	2684.75
2577	1.19578	1.10011	0.77931	87.951	1.16956	0.88891	134.967	1.31561	151.822	1.34517	…	1.68844	1.72597	1.69196	1.89900	0.89072	0.69270	79.935	16.847	1294.31	2683.50
2578	1.19527	1.09959	0.77961	87.986	1.16921	0.88902	134.916	1.31509	151.745	1.34447	…	1.68636	1.72443	1.68929	1.89629	0.89085	0.69335	80.003	16.843	1294.68	2684.75
2579	1.19403	1.09967	0.77938	87.989	1.16907	0.88821	134.813	1.31612	151.764	1.34428	…	1.68503	1.72466	1.68915	1.89675	0.89050	0.69376	79.998	16.833	1294.38	2686.63
2580	1.19415	1.09891	0.77902	87.916	1.16841	0.88740	134.779	1.31465	151.672	1.34449	…	1.68483	1.72421	1.68910	1.89612	0.89022	0.69279	79.950	16.833	1294.69	2686.63
2581	1.19384	1.10010	0.77944	87.978	1.16842	0.88877	134.758	1.31398	151.604	1.34274	…	1.68464	1.72234	1.68742	1.89546	0.89035	0.69320	79.947	16.832	1294.12	2684.75
2582	1.19394	1.09862	0.77935	87.974	1.16867	0.88830	134.785	1.31545	151.725	1.34403	…	1.68325	1.72425	1.68861	1.89439	0.89120	0.69423	80.064	16.835	1293.97	2686.50
2583	1.19441	1.09914	0.77986	88.042	1.16825	0.88820	134.862	1.31515	151.812	1.34460	…	1.68345	1.72402	1.68903	1.89497	0.89115	0.69395	80.092	16.847	1294.64	2687.50
2584	1.19421	1.09799	0.77934	87.872	1.16797	0.88797	134.662	1.31517	151.634	1.34477	…	1.68254	1.72530	1.68976	1.89456	0.89175	0.69405	80.015	16.827	1295.58	2687.25
2585	1.19476	1.09875	0.77962	87.891	1.16901	0.88814	134.715	1.31611	151.661	1.34507	…	1.68384	1.72523	1.68971	1.89574	0.89115	0.69415	79.983	16.862	1295.80	2687.25
2586	1.19426	1.09799	0.77934	87.885	1.16877	0.88751	134.688	1.31685	151.747	1.34562	…	1.68253	1.72631	1.69046	1.89568	0.89165	0.69455	80.033	16.846	1295.40	2687.50
2587	1.19444	1.09887	0.77992	87.931	1.16883	0.88750	134.680	1.31689	151.740	1.34581	…	1.68300	1.72545	1.69048	1.89619	0.89145	0.69445	80.011	16.850	1295.65	2687.63
2588	1.19489	1.09839	0.78028	87.937	1.16913	0.88767	134.677	1.31701	151.700	1.34604	…	1.68203	1.72495	1.68985	1.89482	0.89175	0.69497	80.052	16.854	1295.94	2687.63
2589	1.19551	1.09781	0.77998	87.868	1.16931	0.88772	134.696	1.31707	151.713	1.34660	…	1.68272	1.72639	1.69117	1.89539	0.89215	0.69481	80.031	16.839	1295.56	2688.25
2590	1.19523	1.09712	0.77934	87.795	1.16780	0.88730	134.666	1.31588	151.741	1.34688	…	1.68266	1.72820	1.69100	1.89617	0.89169	0.69390	80.012	16.820	1295.06	2688.50
2591	1.19674	1.09776	0.78087	87.906	1.16794	0.88605	134.739	1.31806	152.055	1.35060	…	1.68240	1.72952	1.69311	1.89864	0.89162	0.69412	80.070	16.867	1296.39	2688.88
2592	1.19814	1.09773	0.78131	87.980	1.16921	0.88734	134.927	1.31755	152.048	1.35021	…	1.68336	1.72798	1.69313	1.89697	0.89246	0.69447	80.140	16.875	1296.81	2691.00
2593	1.19803	1.09806	0.78168	87.961	1.16825	0.88766	134.829	1.31601	151.901	1.34962	…	1.68284	1.72648	1.69159	1.89591	0.89215	0.69410	80.103	16.881	1297.02	2693.63
2594	1.19870	1.09797	0.78197	87.984	1.16933	0.88724	134.888	1.31782	152.009	1.35096	…	1.68314	1.72751	1.69326	1.89685	0.89260	0.69466	80.126	16.872	1297.02	2694.25
2595	1.19951	1.09836	0.78151	87.960	1.17005	0.88744	135.024	1.31837	152.136	1.35162	…	1.68589	1.72941	1.69573	1.89954	0.89260	0.69399	80.079	16.853	1296.58	2695.50
2596	1.19867	1.09747	0.78089	87.900	1.17050	0.88620	134.937	1.32071	152.247	1.35250	…	1.68458	1.73181	1.69770	1.90057	0.89314	0.69469	80.090	16.920	1299.53	2694.00
2597	1.19927	1.09780	0.78133	87.902	1.17051	0.88721	134.936	1.31926	152.078	1.35165	…	1.68512	1.72979	1.69745	1.89918	0.89375	0.69457	80.060	16.938	1301.33	2687.50
2598	1.20097	1.09967	0.78218	88.091	1.17024	0.88769	135.280	1.31823	152.381	1.35287	…	1.68840	1.72955	1.69497	1.90194	0.89109	0.69300	80.105	16.937	1302.56	2685.50
2599	1.20220	1.10062	0.78139	88.052	1.17085	0.88871	135.485	1.31744	152.439	1.35271	…	1.69340	1.73105	1.69322	1.90541	0.88855	0.69136	79.992	17.003	1303.34	2685.88
2600	1.20214	1.10096	0.78118	87.944	1.17043	0.88892	135.355	1.31654	152.247	1.35239	…	1.69423	1.73104	1.69390	1.90590	0.88872	0.69075	79.875	17.056	1306.05	2684.25
2601	1.20134	1.10049	0.78080	87.947	1.17066	0.88824	135.331	1.31786	152.342	1.35246	…	1.69315	1.73193	1.69573	1.90618	0.88950	0.69126	79.909	16.956	1303.65	2682.25
2602	1.20092	1.10046	0.78089	87.969	1.17067	0.88846	135.301	1.31753	152.272	1.35169	…	1.69246	1.73080	1.69492	1.90484	0.88965	0.69159	79.925	16.941	1302.93	2683.00
2603	1.19978	1.10081	0.78018	87.914	1.16901	0.88865	135.204	1.31534	152.126	1.34997	…	1.69282	1.73016	1.69716	1.90459	0.89085	0.69045	79.839	16.946	1303.20	2676.75

2604 rows × 31 columns

Transform features

Note

Be careful not to write X_test_std = sc.fit_transform(X_test) instead of X_test_std = sc.transform(X_test). In this case, it wouldn’t make a great difference since the mean and standard deviation of the test set should be (quite) similar to the training set. However, this is not always the case in Forex market data, as has been well established in the literature. The correct way is to re-use parameters from the training set if we are doing any kind of transformation. So, the test set should basically stand for “new, unseen” data.

In [20]:

# Scale data
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)

y_train_std = sc.fit_transform(y_train.reshape(-1, 1)) #Reshape to get the right tensor dimension 
y_test_std = sc.transform(y_test.reshape(-1, 1)) #Reshape to get the right tensor dimension

In [21]:

print("Mean:",sc.mean_)
print("Variance",sc.var_)

Mean: [1248.6151177]
Variance [1253.0292182]

In [22]:

#X_train_std as dataframe (optional, only for printing. See note in the beginning)
X_Train_std = pd.DataFrame(data=X_train_std)
X_Train_std.columns = X_Train.columns
print("X_train_std")
X_Train_std

X_train_std

Out[22]:

	EURUSD	AUDNZD	AUDUSD	AUDJPY	EURCHF	EURGBP	EURJPY	GBPCHF	GBPJPY	GBPUSD	…	EURNZD	GBPAUD	GBPCAD	GBPNZD	NZDCAD	NZDCHF	NZDJPY	XAGUSD	XAUUSD	SPXUSD
0	-1.224742	-1.555891	-2.288290	-0.665046	-0.730431	-0.893555	-0.239328	0.468459	0.774780	-1.261199	…	-0.673191	1.158946	-0.200571	0.022696	-0.396873	0.154414	0.486752	-1.865377	-2.765494	-2.244887
1	-1.225386	-1.721135	-2.384747	-0.695522	-0.725437	-0.905391	-0.192994	0.459918	0.906494	-1.249807	…	-0.675329	1.282770	-0.169755	0.026263	-0.362513	0.207172	0.589981	-1.865377	-2.765494	-2.244887
2	-1.281865	-1.723750	-2.459067	-0.687358	-0.746563	-0.977283	-0.205225	0.575832	0.986110	-1.273625	…	-0.712450	1.347562	-0.278020	0.065500	-0.555501	0.235812	0.624390	-1.865377	-2.765494	-2.244887
3	-1.286804	-1.719566	-2.484894	-0.711848	-0.748100	-0.948351	-0.201226	0.505064	0.954190	-1.311596	…	-0.674746	1.331484	-0.311095	0.081998	-0.623648	0.195113	0.596863	-1.865377	-2.765494	-2.244887
4	-1.299904	-1.722181	-2.444308	-0.685182	-0.731583	-0.900131	-0.236035	0.434906	0.816973	-1.395477	…	-0.738882	1.214139	-0.343964	-0.048867	-0.510261	0.317210	0.606694	-1.865377	-2.765494	-2.244887
5	-1.304843	-1.742052	-2.447471	-0.667223	-0.750404	-0.901446	-0.230156	0.402572	0.824677	-1.401346	…	-0.748794	1.199021	-0.333282	-0.069823	-0.477046	0.295353	0.622916	-1.865377	-2.765494	-2.244887
6	-1.332116	-1.725841	-2.453269	-0.643822	-0.763464	-0.917227	-0.244032	0.423314	0.830181	-1.432413	…	-0.778336	1.178383	-0.347251	-0.086544	-0.466165	0.362431	0.669615	-1.865377	-2.765494	-2.244887
7	-1.332116	-1.788069	-2.470663	-0.721099	-0.759623	-0.900569	-0.280958	0.389150	0.748364	-1.448637	…	-0.792524	1.181983	-0.347868	-0.119539	-0.393437	0.396347	0.645528	-1.865377	-2.765494	-2.244887
8	-1.318587	-1.851343	-2.408993	-0.635659	-0.750404	-0.889610	-0.255557	0.343395	0.771478	-1.438971	…	-0.857827	1.117431	-0.392653	-0.224989	-0.301810	0.520705	0.777268	-1.865377	-2.765494	-2.244887
9	-1.323311	-1.847683	-2.408466	-0.632393	-0.750404	-0.907583	-0.260731	0.429415	0.789823	-1.427235	…	-0.867933	1.142628	-0.368822	-0.206931	-0.290357	0.539548	0.781692	-1.865377	-2.765494	-2.244887
10	-1.324600	-1.834610	-2.414264	-0.630761	-0.748484	-0.909337	-0.259790	0.447107	0.785053	-1.426890	…	-0.853357	1.135909	-0.346430	-0.191325	-0.339606	0.494326	0.744333	-1.865377	-2.765494	-2.244887
11	-1.332331	-1.828335	-2.420589	-0.648175	-0.748100	-0.901446	-0.272961	0.390370	0.738091	-1.471764	…	-0.835476	1.114071	-0.371698	-0.228556	-0.356786	0.462672	0.705990	-1.865377	-2.765494	-2.244887
12	-1.343927	-1.818399	-2.434821	-0.624775	-0.774603	-0.908021	-0.260731	0.382439	0.774780	-1.466932	…	-0.849081	1.121750	-0.372109	-0.190433	-0.364803	0.457396	0.741875	-1.865377	-2.765494	-2.244887
13	-1.346289	-1.748327	-2.400033	-0.567633	-0.778828	-0.913720	-0.237917	0.408062	0.855496	-1.450362	…	-0.821094	1.103753	-0.353825	-0.143616	-0.407753	0.407653	0.736468	-1.865377	-2.765494	-2.244887
14	-1.347793	-1.721658	-2.413210	-0.551307	-0.780364	-0.913720	-0.241210	0.367798	0.838619	-1.455195	…	-0.830034	1.109752	-0.361221	-0.145622	-0.422070	0.392579	0.718280	-1.865377	-2.765494	-2.244887
15	-1.358101	-1.682961	-2.421116	-0.592666	-0.801874	-0.903199	-0.281193	0.312281	0.769277	-1.494202	…	-0.800298	1.090074	-0.392242	-0.144285	-0.451849	0.320979	0.637663	-1.865377	-2.765494	-2.244887
16	-1.340491	-1.595110	-2.348378	-0.582326	-0.760007	-0.886980	-0.295775	0.336074	0.681956	-1.486262	…	-0.797966	1.014484	-0.360399	-0.148298	-0.402027	0.377505	0.629306	-1.865377	-2.765494	-2.244887
17	-1.349940	-1.604000	-2.352595	-0.523008	-0.765384	-0.911967	-0.272256	0.428195	0.785420	-1.469693	…	-0.808850	1.039441	-0.347046	-0.130463	-0.373393	0.441568	0.699108	-1.744267	-2.775945	-2.217561
18	-1.331042	-1.679301	-2.356284	-0.570898	-0.777292	-0.900569	-0.271785	0.379999	0.765975	-1.441042	…	-0.817207	1.073036	-0.328762	-0.151419	-0.315554	0.452120	0.713855	-1.755801	-2.764930	-2.240983
19	-1.309997	-1.593541	-2.239270	-0.415255	-0.782285	-0.898377	-0.238623	0.362917	0.823210	-1.406523	…	-0.835087	0.982088	-0.332049	-0.175273	-0.284630	0.471716	0.781692	-1.610180	-2.612415	-2.225368
20	-1.318802	-1.604523	-2.198684	-0.366276	-0.779596	-0.908021	-0.249442	0.390980	0.814771	-1.410320	…	-0.884647	0.933614	-0.362248	-0.231231	-0.258860	0.565927	0.832815	-1.615947	-2.582477	-2.201946
21	-1.291314	-1.599293	-2.170749	-0.336345	-0.761927	-0.882158	-0.215809	0.370848	0.842288	-1.387883	…	-0.866767	0.926655	-0.361426	-0.233014	-0.253133	0.554621	0.853952	-1.588553	-2.590102	-2.194138
22	-1.283368	-1.665705	-2.152828	-0.336345	-0.757318	-0.869007	-0.218866	0.353156	0.814404	-1.387883	…	-0.907193	0.905298	-0.370877	-0.303463	-0.170097	0.634512	0.907041	-1.548183	-2.577393	-2.190235
23	-1.296682	-1.622825	-2.171803	-0.349950	-0.751557	-0.867692	-0.232037	0.361087	0.789089	-1.414808	…	-0.886785	0.900498	-0.376218	-0.276710	-0.218201	0.607379	0.858868	-1.533765	-2.564966	-2.166812
24	-1.283797	-1.617073	-2.143867	-0.256890	-0.734656	-0.871199	-0.178177	0.401352	0.894020	-1.386502	…	-0.889312	0.897379	-0.374369	-0.276487	-0.207320	0.641295	0.940959	-1.585669	-2.573721	-2.139486
25	-1.343927	-1.586220	-2.252448	-0.254714	-0.759623	-0.920296	-0.169475	0.455648	0.978038	-1.443459	…	-0.864823	0.959051	-0.354442	-0.193777	-0.296083	0.560651	0.918347	-1.656317	-2.696863	-2.119967
26	-1.397400	-1.605568	-2.257191	-0.171994	-0.786126	-0.887857	-0.185468	0.348886	0.905761	-1.580154	…	-0.942565	0.833547	-0.497630	-0.332222	-0.336743	0.670689	1.006829	-1.703896	-2.742617	-2.096544
27	-1.469126	-1.522946	-2.301467	-0.191041	-0.819927	-1.130711	-0.255322	0.780816	1.120760	-1.440697	…	-0.957919	1.014244	-0.350949	-0.092786	-0.426651	0.648832	0.927195	-1.784637	-2.838080	-2.104352
28	-1.480508	-1.474314	-2.316225	-0.165464	-0.803411	-1.117998	-0.242856	0.783256	1.126630	-1.477632	…	-0.936346	0.994326	-0.374985	-0.078741	-0.482200	0.629990	0.913923	-1.836542	-2.875362	-2.092641
29	-1.485447	-1.450783	-2.286708	-0.154035	-0.820696	-1.166218	-0.264729	0.849144	1.157816	-1.430687	…	-0.952671	1.006325	-0.299590	-0.049536	-0.404890	0.637527	0.905075	-1.793288	-2.825088	-2.041892
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…
6045	1.999285	2.196622	2.135574	0.948529	1.898775	1.724357	1.474315	0.009686	0.276907	1.644951	…	2.110146	-0.601951	-1.374634	0.941872	-3.352977	-1.245180	-0.721518	0.972075	2.509258	1.397318
6046	1.993058	2.189824	2.141371	0.974651	1.906842	1.730495	1.480665	0.011516	0.280943	1.627692	…	2.095181	-0.621869	-1.380181	0.916902	-3.331216	-1.210510	-0.697432	0.979284	2.510106	1.409030
6047	1.978669	2.179366	2.141371	0.948529	1.911067	1.724357	1.447267	0.029818	0.232146	1.609397	…	2.071470	-0.635787	-1.391685	0.895277	-3.316899	-1.167550	-0.709229	0.987935	2.507564	1.389511
6048	1.996279	2.117137	2.178268	0.973018	1.923358	1.738385	1.458086	0.024938	0.228477	1.622859	…	2.030461	-0.659304	-1.398875	0.830179	-3.249897	-1.086906	-0.649750	0.970633	2.510388	1.381703
6049	1.980387	2.057001	2.083919	0.950706	1.937570	1.677014	1.493366	0.163424	0.370831	1.665663	…	2.047953	-0.535480	-1.382851	0.914227	-3.327780	-1.098965	-0.621239	0.937472	2.454183	1.385607
6050	1.994346	2.027717	2.136628	0.957781	1.894166	1.690603	1.475491	0.068863	0.324970	1.676364	…	2.009471	-0.574115	-1.394356	0.851581	-3.271086	-1.088413	-0.594203	0.967750	2.502197	1.393415
6051	2.008520	1.903784	2.151913	0.976828	1.902232	1.705946	1.493601	0.052391	0.334509	1.682922	…	1.949027	-0.583474	-1.398259	0.758839	-3.159416	-0.979129	-0.504737	0.987935	2.527334	1.424645
6052	1.990910	1.894371	2.171943	1.026895	1.897239	1.677014	1.490543	0.098756	0.367529	1.684648	…	1.904909	-0.600272	-1.410790	0.734539	-3.142236	-0.915066	-0.454598	1.011003	2.527617	1.426675
6053	1.987689	1.918426	2.179322	1.054649	1.904153	1.610821	1.498540	0.237853	0.468425	1.753341	…	1.908602	-0.547718	-1.412023	0.804096	-3.238444	-0.911297	-0.448207	0.946123	2.490900	1.416837
6054	1.958913	1.935159	2.144007	1.039956	1.871504	1.575752	1.482547	0.248834	0.490071	1.744366	…	1.909573	-0.520842	-1.398053	0.843333	-3.265359	-0.952750	-0.469836	0.871149	2.466328	1.389511
6055	1.929492	1.951893	2.091825	1.044854	1.957160	1.576190	1.487486	0.394031	0.493740	1.690862	…	1.921818	-0.521562	-1.372580	0.855371	-3.239589	-0.869844	-0.477701	0.850964	2.372277	1.403252
6056	1.937008	1.962351	2.113963	1.093832	1.936034	1.580136	1.511006	0.348886	0.531530	1.700872	…	1.918708	-0.533080	-1.322042	0.849352	-3.153689	-0.890948	-0.445258	0.967750	2.420574	1.500846
6057	1.925626	1.950847	2.036481	1.053561	1.918365	1.610821	1.526999	0.262865	0.519056	1.647713	…	1.957967	-0.508363	-1.376688	0.868525	-3.258487	-0.973099	-0.470328	1.019654	2.423681	1.498816
6058	1.884180	1.978039	2.006437	1.083492	1.911835	1.575752	1.519708	0.313501	0.552076	1.611468	…	1.944363	-0.510283	-1.411612	0.884576	-3.336942	-0.960287	-0.465412	1.005236	2.386682	1.553469
6059	1.898139	1.989021	2.060200	1.166212	1.911451	1.577944	1.548402	0.315942	0.597938	1.632870	…	1.924927	-0.542199	-1.426404	0.856932	-3.330643	-0.930139	-0.406916	1.022538	2.394307	1.590632
6060	1.884180	1.983791	2.036481	1.261992	1.929888	1.569176	1.612140	0.357427	0.720480	1.619062	…	1.925316	-0.530680	-1.448385	0.867187	-3.368439	-0.907529	-0.322858	0.917287	2.293478	1.600314
6061	1.882032	2.044974	2.044914	1.262537	1.949862	1.568300	1.603203	0.394641	0.707272	1.616991	…	1.951943	-0.541959	-1.476119	0.900851	-3.457202	-0.922602	-0.369065	0.904311	2.288394	1.580795
6062	1.860987	2.015690	2.030156	1.301175	2.010166	1.556025	1.613787	0.525806	0.739191	1.592483	…	1.918708	-0.548678	-1.473243	0.873875	-3.417688	-0.805027	-0.316468	0.835105	2.226541	1.588602
6063	1.857551	1.929407	2.031210	1.256006	1.961001	1.563916	1.583211	0.437956	0.662511	1.566939	…	1.874590	-0.582514	-1.525423	0.809892	-3.417115	-0.790707	-0.327282	0.887009	2.226541	1.588602
6064	1.864423	1.904307	2.037008	1.264169	2.002868	1.556464	1.592384	0.473340	0.695164	1.595589	…	1.852045	-0.562116	-1.500360	0.786706	-3.358704	-0.737949	-0.277634	0.902869	2.233602	1.612025
6065	1.881818	1.888619	2.052293	1.236415	2.000180	1.565231	1.581329	0.491032	0.673885	1.620443	…	1.856904	-0.547958	-1.503442	0.783585	-3.362140	-0.715339	-0.282550	0.921612	2.266082	1.612025
6066	1.872798	1.935159	2.010653	1.246210	2.022458	1.568738	1.607671	0.525806	0.713509	1.599386	…	1.904520	-0.523001	-1.498306	0.842887	-3.417115	-0.761314	-0.305653	0.882684	2.221175	1.615929
6067	1.849820	1.938297	1.961107	1.252197	2.016312	1.549888	1.618961	0.547159	0.756435	1.579020	…	1.914627	-0.494685	-1.476941	0.872761	-3.424560	-0.786939	-0.303195	0.853848	2.158757	1.629670
6068	1.866141	1.877638	1.906817	1.106349	2.009782	1.545505	1.584387	0.545939	0.703236	1.612849	…	1.941059	-0.415495	-1.462149	0.913558	-3.449757	-0.838190	-0.382337	0.823570	2.176268	1.608121
6069	1.879455	1.922609	1.945294	1.127029	2.015160	1.547258	1.586739	0.554480	0.704337	1.637012	…	1.955247	-0.430133	-1.473037	0.926266	-3.485263	-0.854017	-0.392660	0.827896	2.183046	1.604217
6070	1.865497	2.035561	1.936334	1.149886	2.031292	1.533231	1.590032	0.603285	0.728185	1.626656	…	2.006167	-0.430613	-1.451672	1.005855	-3.545965	-0.921095	-0.450665	0.819245	2.160451	1.604217
6071	1.867644	2.059616	1.955836	1.173831	2.025915	1.541559	1.595206	0.581933	0.726350	1.621133	…	2.008888	-0.452450	-1.462560	1.000504	-3.563718	-0.928632	-0.447716	0.806269	2.132773	1.608121
6072	1.874731	2.120798	1.976392	1.204307	2.019769	1.545505	1.607436	0.564851	0.742860	1.629073	…	2.033571	-0.467088	-1.462560	1.028595	-3.595215	-0.977621	-0.462954	0.832221	2.170336	1.612025
6073	1.912097	1.809657	2.083919	1.392602	2.017080	1.554710	1.695870	0.540448	0.880445	1.682922	…	1.826973	-0.519162	-1.489472	0.759062	-3.299719	-0.643739	-0.098704	0.849522	2.192084	1.629670
6074	1.908661	1.635523	2.114490	1.425799	2.049345	1.534107	1.689990	0.636839	0.902459	1.702943	…	1.705308	-0.529241	-1.459068	0.627082	-3.088978	-0.404067	0.042867	0.891335	2.214679	1.631544

6075 rows × 31 columns In [23]:

#X_train_std as dataframe (optional, only for printing. See note in the beginning)
X_Test_std = pd.DataFrame(data=X_test_std)
X_Test_std.columns = X_Test.columns

print("X_test_std")
X_Test_std

X_test_std

Out[23]:

	EURUSD	AUDNZD	AUDUSD	AUDJPY	EURCHF	EURGBP	EURJPY	GBPCHF	GBPJPY	GBPUSD	…	EURNZD	GBPAUD	GBPCAD	GBPNZD	NZDCAD	NZDCHF	NZDJPY	XAGUSD	XAUUSD	SPXUSD
0	1.872583	1.543488	2.041224	1.417636	2.046656	1.272404	1.695165	1.143198	1.260913	1.941469	…	1.660995	-0.256635	-1.224666	0.840435	-2.990479	-0.338496	0.104313	0.885567	2.196038	1.654966
1	1.880314	1.596827	2.034372	1.402942	2.086219	1.307911	1.699163	1.142588	1.219821	1.912818	…	1.704725	-0.272473	-1.224255	0.856709	-3.014531	-0.361107	0.056140	0.891335	2.196885	1.647159
2	1.826842	1.557607	2.028047	1.443758	2.093901	1.177279	1.667647	1.418951	1.343097	1.970120	…	1.623874	-0.218000	-1.168788	0.886583	-2.970436	-0.214892	0.117094	0.809152	2.118651	1.666678
3	1.816749	1.676311	2.016978	1.447567	2.052418	1.172018	1.665059	1.350013	1.343464	1.957003	…	1.686456	-0.218240	-1.140848	0.970854	-3.027702	-0.366382	0.035985	0.891335	2.173726	1.674485
4	1.812024	1.654871	1.977974	1.453009	2.135385	1.216731	1.690931	1.409190	1.330255	1.899011	…	1.696756	-0.235038	-1.165706	0.942317	-3.034002	-0.286492	0.056140	0.878358	2.170336	1.651063
5	1.847673	1.703503	1.986407	1.499811	2.168418	1.262760	1.757021	1.376246	1.381620	1.906605	…	1.762837	-0.234558	-1.164679	0.975312	-3.070652	-0.354324	0.060072	0.940356	2.204794	1.692130
6	1.863993	1.669513	2.069160	1.539538	2.163040	1.275034	1.741734	1.342082	1.336859	1.920413	…	1.700060	-0.302469	-1.170226	0.881678	-2.962991	-0.259359	0.117586	0.936030	2.222869	1.654966
7	1.893414	1.733833	2.106056	1.623890	2.211438	1.271089	1.807118	1.432982	1.455733	1.979440	…	1.745345	-0.283992	-1.096474	0.944547	-2.930349	-0.278201	0.143147	0.941797	2.220045	1.682293
8	1.883321	1.699320	2.051239	1.572191	2.203756	1.269773	1.801003	1.418341	1.445093	1.962181	…	1.754674	-0.250156	-1.087846	0.954802	-2.932640	-0.305334	0.123484	0.964866	2.246594	1.713523
9	1.877523	1.654348	2.016978	1.538994	2.222193	1.230759	1.797710	1.534254	1.493523	1.996354	…	1.749232	-0.183444	-1.049019	0.988466	-2.913169	-0.268403	0.126925	0.947565	2.254220	1.697908
10	1.898568	1.610946	1.999057	1.541715	2.263292	1.230759	1.835342	1.605633	1.558829	2.033290	…	1.763614	-0.138090	-0.998276	1.007638	-2.853039	-0.240517	0.157894	1.015329	2.321157	1.690100
11	1.891052	1.635523	1.987988	1.572735	2.249080	1.255307	1.856274	1.536694	1.561398	1.992557	…	1.780523	-0.160647	-1.019847	1.002065	-2.888544	-0.287245	0.161826	1.002353	2.340645	1.735071
12	1.887616	1.602056	1.973230	1.519947	2.195305	1.239526	1.831344	1.428712	1.509299	1.980821	…	1.763420	-0.182964	-1.023134	0.983338	-2.870219	-0.361107	0.136265	1.002353	2.341774	1.740849
13	1.888045	1.653826	1.992205	1.551511	2.234100	1.227252	1.838164	1.512902	1.554427	2.010507	…	1.776441	-0.156568	-1.009370	1.024136	-2.912024	-0.330959	0.130366	1.055699	2.398261	1.748656
14	1.902434	1.629771	1.986934	1.579810	2.240246	1.230321	1.874149	1.564148	1.625971	2.039848	…	1.788102	-0.122012	-0.989442	1.039073	-2.884536	-0.313624	0.174607	1.011003	2.355331	1.744753
15	1.900930	1.699843	1.973757	1.566749	2.244087	1.253116	1.875560	1.524493	1.599188	2.011888	…	1.836497	-0.131611	-1.015944	1.073182	-2.967572	-0.382964	0.118077	0.980726	2.322286	1.738975
16	1.897709	1.797107	2.009072	1.573279	2.255994	1.259691	1.851100	1.535474	1.542686	1.998080	…	1.861179	-0.176245	-1.023545	1.099043	-3.011668	-0.412357	0.055157	0.974959	2.320592	1.717426
17	1.929062	1.807043	2.073904	1.608653	2.237173	1.258814	1.861919	1.502530	1.566901	2.056762	…	1.853405	-0.188484	-1.003001	1.093916	-2.976735	-0.424416	0.078260	1.003795	2.354484	1.717426
18	1.930566	1.724943	2.065998	1.585252	2.239093	1.227690	1.855569	1.567808	1.596252	2.094733	…	1.816867	-0.147929	-0.986361	1.078756	-2.930349	-0.361107	0.115619	1.002353	2.361262	1.709619
19	1.925412	1.679449	2.044387	1.567293	2.247160	1.228129	1.853922	1.584890	1.594051	2.085758	…	1.805011	-0.133291	-0.986361	1.064042	-2.909160	-0.327191	0.131841	0.993702	2.344881	1.709619
20	1.927130	1.671605	2.017505	1.571647	2.263292	1.237773	1.877442	1.592821	1.621935	2.077129	…	1.818810	-0.116493	-0.992524	1.070507	-2.930922	-0.335481	0.141672	0.974959	2.322851	1.707745
21	1.917681	1.677357	2.029101	1.558586	2.238325	1.231636	1.848513	1.551946	1.580476	2.067808	…	1.802485	-0.136170	-0.994373	1.055793	-2.916032	-0.346033	0.123484	0.977842	2.316638	1.697908
22	1.906729	1.651211	2.034372	1.572191	2.272511	1.204019	1.840281	1.677621	1.604324	2.080235	…	1.769250	-0.129451	-0.987594	1.045092	-2.891408	-0.244285	0.155928	0.976400	2.297997	1.680419
23	1.930136	1.656963	2.072323	1.553688	2.241398	1.211033	1.829227	1.608073	1.576440	2.114409	…	1.771582	-0.136170	-0.998687	1.039742	-2.899425	-0.287245	0.138231	1.087419	2.362110	1.668708
24	1.918540	1.768869	2.088135	1.615727	2.257915	1.316678	1.843339	1.426272	1.455366	1.970810	…	1.810259	-0.275593	-1.130987	0.978879	-3.027702	-0.330206	0.109229	1.078768	2.365499	1.690100
25	1.903507	1.778805	2.071796	1.607020	2.225650	1.321062	1.831108	1.366485	1.431518	1.938708	…	1.808510	-0.287351	-1.145368	0.972191	-3.038583	-0.361107	0.097431	1.070117	2.375102	1.678389
26	1.933787	1.787694	2.119761	1.611918	2.259067	1.376734	1.836988	1.312798	1.363276	1.928352	…	1.813951	-0.341104	-1.179470	0.924705	-3.030566	-0.333974	0.095957	1.106162	2.423681	1.674485
27	1.931639	1.815932	2.094461	1.638040	2.281345	1.332898	1.869445	1.439083	1.480681	1.974953	…	1.847380	-0.276553	-1.134480	1.010760	-3.068362	-0.360353	0.095465	1.094628	2.388094	1.690100
28	1.907158	1.805474	2.039116	1.597224	2.233716	1.318870	1.853452	1.378686	1.471509	1.948718	…	1.851850	-0.247036	-1.143519	1.032830	-3.109021	-0.424416	0.070887	0.972075	2.342622	1.690100
29	1.829204	1.720760	1.923683	1.561851	2.181093	1.277226	1.821700	1.370755	1.474077	1.858278	…	1.794322	-0.218480	-1.126262	1.002065	-3.044882	-0.391254	0.101855	0.835105	2.222304	1.717426
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…
2574	1.827057	1.569635	0.788332	1.339270	2.990789	0.709982	2.533641	3.948307	3.500058	2.519661	…	2.593698	1.470906	0.490514	2.621490	-2.612519	-0.735688	0.019764	-0.531717	1.278125	4.686312
2575	1.858625	1.581662	0.793076	1.334916	2.971968	0.715681	2.563981	3.898891	3.544452	2.567643	…	2.633346	1.507381	0.512907	2.666970	-2.637144	-0.823870	0.007966	-0.489905	1.290270	4.692089
2576	1.862275	1.604148	0.779372	1.324032	3.017677	0.729708	2.570802	3.955627	3.540049	2.560049	…	2.663860	1.518180	0.516810	2.691716	-2.652606	-0.813318	-0.019070	-0.463953	1.283209	4.684282
2577	1.872583	1.627156	0.798874	1.344712	3.022286	0.731462	2.583267	3.959288	3.558027	2.576273	…	2.681351	1.508581	0.505305	2.705761	-2.700137	-0.832160	-0.017104	-0.475487	1.291399	4.664763
2578	1.861631	1.599964	0.814687	1.363759	3.008842	0.736284	2.571272	3.927564	3.529776	2.552109	…	2.640926	1.471626	0.450454	2.645345	-2.692692	-0.783171	0.016323	-0.481254	1.301849	4.684282
2579	1.835002	1.604148	0.802564	1.365392	3.003465	0.700776	2.547047	3.990401	3.536747	2.545551	…	2.615077	1.477145	0.447578	2.655600	-2.712736	-0.752269	0.013865	-0.495672	1.293376	4.713638
2580	1.837579	1.564405	0.783589	1.325665	2.978114	0.665269	2.539050	3.900721	3.502993	2.552800	…	2.611190	1.466347	0.446551	2.641555	-2.728770	-0.825377	-0.009730	-0.495672	1.302132	4.713638
2581	1.830922	1.626633	0.805726	1.359406	2.978498	0.725325	2.534111	3.859846	3.478045	2.492392	…	2.607497	1.421472	0.412038	2.626841	-2.721326	-0.794476	-0.011205	-0.497114	1.286033	4.684282
2582	1.833070	1.549240	0.800983	1.357229	2.988100	0.704722	2.540462	3.949527	3.522439	2.536921	…	2.580482	1.467307	0.436485	2.602986	-2.672649	-0.716846	0.046308	-0.492789	1.281797	4.711608
2583	1.843163	1.576433	0.827864	1.394235	2.971968	0.700338	2.558572	3.931225	3.554358	2.556597	…	2.584369	1.461787	0.445113	2.615917	-2.675512	-0.737949	0.060072	-0.475487	1.300720	4.727223
2584	1.838868	1.516296	0.800455	1.301720	2.961213	0.690256	2.511532	3.932445	3.489051	2.562465	…	2.566683	1.492503	0.460110	2.606776	-2.641152	-0.730413	0.022222	-0.504323	1.327269	4.723319
2585	1.850679	1.556038	0.815214	1.312060	3.001160	0.697708	2.523998	3.989791	3.498957	2.572821	…	2.591949	1.490824	0.459082	2.633083	-2.675512	-0.722876	0.006492	-0.453860	1.333482	4.723319
2586	1.839942	1.516296	0.800455	1.308794	2.991941	0.670091	2.517647	4.034937	3.530510	2.591806	…	2.566488	1.516740	0.474490	2.631745	-2.646879	-0.692728	0.031070	-0.476929	1.322185	4.727223
2587	1.843807	1.562314	0.831027	1.333828	2.994246	0.669652	2.515766	4.037377	3.527942	2.598365	…	2.575623	1.496103	0.474901	2.643115	-2.658332	-0.700265	0.020255	-0.471162	1.329246	4.729253
2588	1.853471	1.537213	0.850002	1.337093	3.005769	0.677105	2.515060	4.044698	3.513266	2.606304	…	2.556771	1.484104	0.461959	2.612573	-2.641152	-0.661074	0.040410	-0.465395	1.337436	4.729253
2589	1.866785	1.506883	0.834189	1.299543	3.012683	0.679296	2.519529	4.048358	3.518036	2.625635	…	2.570181	1.518660	0.489076	2.625280	-2.618246	-0.673132	0.030087	-0.487021	1.326704	4.738935
2590	1.860772	1.470802	0.800455	1.259816	2.954683	0.660885	2.512473	3.975760	3.528309	2.635300	…	2.569015	1.562095	0.485584	2.642669	-2.644588	-0.741718	0.020747	-0.514416	1.312582	4.742838
2591	1.893199	1.504269	0.881100	1.320223	2.960061	0.606090	2.529642	4.108755	3.643513	2.763710	…	2.563962	1.593771	0.528931	2.697735	-2.648597	-0.725137	0.049258	-0.446651	1.350146	4.748772
2592	1.923264	1.502700	0.904292	1.360494	3.008842	0.662639	2.573859	4.077642	3.640945	2.750248	…	2.582620	1.556815	0.529341	2.660505	-2.600493	-0.698758	0.083667	-0.435117	1.362008	4.781876
2593	1.920902	1.519957	0.923794	1.350154	2.971968	0.676666	2.550810	3.983691	3.587012	2.729882	…	2.572513	1.520820	0.497704	2.636873	-2.618246	-0.726644	0.065479	-0.426466	1.367939	4.822943
2594	1.935290	1.515250	0.939080	1.362671	3.013451	0.658255	2.564687	4.094114	3.626636	2.776137	…	2.578344	1.545537	0.532012	2.657829	-2.592476	-0.684438	0.076785	-0.439442	1.367939	4.832625
2595	1.952685	1.535644	0.914834	1.349610	3.041107	0.667022	2.596674	4.127668	3.673231	2.798920	…	2.631791	1.591131	0.582755	2.717800	-2.592476	-0.734935	0.053682	-0.466836	1.355512	4.852143
2596	1.934646	1.489104	0.882154	1.316957	3.058392	0.612665	2.576211	4.270424	3.713956	2.829296	…	2.606331	1.648724	0.623225	2.740762	-2.561552	-0.682177	0.059089	-0.370236	1.438830	4.828721
2597	1.947531	1.506361	0.905346	1.318046	3.058776	0.656940	2.575976	4.181964	3.651952	2.799955	…	2.616826	1.600250	0.618089	2.709774	-2.526619	-0.691221	0.044342	-0.344284	1.489669	4.727223
2598	1.984038	1.604148	0.950149	1.420901	3.048405	0.677981	2.656884	4.119127	3.763120	2.842068	…	2.680574	1.594490	0.567141	2.771305	-2.678948	-0.809550	0.066463	-0.345726	1.524408	4.695993
2599	2.010452	1.653826	0.908509	1.399677	3.071835	0.722695	2.705099	4.070931	3.784400	2.836545	…	2.777751	1.630486	0.531190	2.848665	-2.824406	-0.933154	0.010916	-0.250567	1.546438	4.701927
2600	2.009164	1.671605	0.897440	1.340903	3.055703	0.731900	2.674524	4.016024	3.713956	2.825499	…	2.793882	1.630246	0.545160	2.859589	-2.814670	-0.979129	-0.046598	-0.174152	1.622978	4.676474
2601	1.991984	1.647027	0.877411	1.342535	3.064537	0.702091	2.668879	4.096554	3.748811	2.827916	…	2.772892	1.651603	0.582755	2.865831	-2.770002	-0.940691	-0.029884	-0.318331	1.555193	4.645244
2602	1.982964	1.645459	0.882154	1.354508	3.064922	0.711735	2.661823	4.076422	3.723129	2.801336	…	2.759481	1.624487	0.566114	2.835957	-2.761412	-0.915819	-0.022019	-0.339958	1.534858	4.656956
2603	1.958483	1.663761	0.844731	1.324576	3.001160	0.720064	2.639009	3.942816	3.669562	2.741964	…	2.766478	1.609129	0.612132	2.830384	-2.692692	-1.001739	-0.064294	-0.332749	1.542484	4.559362

2604 rows × 31 columns In [24]:

#y_train as panda dataframe (optional, only for printing. See note in the beginning)
y_Train = pd.DataFrame(data=y_train)
y_Train.columns=["XAUUSDopen"]
y_Train

Out[24]:

	XAUUSDopen
0	1150.67
1	1150.67
2	1150.67
3	1150.67
4	1150.67
5	1150.67
6	1150.67
7	1150.67
8	1150.67
9	1150.67
10	1150.67
11	1150.67
12	1150.67
13	1150.67
14	1150.67
15	1150.67
16	1150.30
17	1150.69
18	1156.09
19	1157.15
20	1156.88
21	1157.33
22	1157.77
23	1157.46
24	1153.10
25	1151.48
26	1148.10
27	1146.78
28	1148.56
29	1148.72
…	…
6045	1337.46
6046	1337.37
6047	1337.47
6048	1335.48
6049	1337.18
6050	1338.07
6051	1338.08
6052	1336.78
6053	1335.91
6054	1332.58
6055	1334.29
6056	1334.40
6057	1333.09
6058	1333.36
6059	1329.79
6060	1329.61
6061	1327.42
6062	1327.42
6063	1327.67
6064	1328.82
6065	1327.23
6066	1325.02
6067	1325.64
6068	1325.88
6069	1325.08
6070	1324.10
6071	1325.43
6072	1326.20
6073	1327.00
6074	1326.34

6075 rows × 1 columns In [25]:

#y_train as panda dataframe (optional, only for printing. See note in the beginning)
y_Test = pd.DataFrame(data=y_test)
y_Test.columns=["XAUUSDopen"]
y_Test

Out[25]:

	XAUUSDopen
0	1326.37
1	1323.60
2	1325.55
3	1325.43
4	1326.65
5	1327.29
6	1327.19
7	1328.13
8	1328.40
9	1330.77
10	1331.46
11	1331.50
12	1333.50
13	1331.98
14	1330.81
15	1330.75
16	1331.95
17	1332.19
18	1331.61
19	1330.83
20	1330.61
21	1329.95
22	1332.22
23	1332.34
24	1332.68
25	1334.40
26	1333.14
27	1331.53
28	1327.27
29	1323.21
…	…
2574	1294.27
2575	1294.02
2576	1294.31
2577	1294.68
2578	1294.38
2579	1294.69
2580	1294.12
2581	1293.97
2582	1294.64
2583	1295.58
2584	1295.80
2585	1295.40
2586	1295.65
2587	1295.94
2588	1295.56
2589	1295.06
2590	1296.39
2591	1296.81
2592	1297.02
2593	1297.02
2594	1296.58
2595	1299.53
2596	1301.33
2597	1302.56
2598	1303.34
2599	1306.05
2600	1303.65
2601	1302.93
2602	1303.20
2603	1303.20

2604 rows × 1 columns In [26]:

#y_train_std as panda dataframe (optional, only for printing. See note in the beginning)
y_Train_std = pd.DataFrame(data=y_train_std)
y_Train_std.columns=["XAUUSDopen"]
y_Train_std

Out[26]:

	XAUUSDopen
0	-2.766956
1	-2.766956
2	-2.766956
3	-2.766956
4	-2.766956
5	-2.766956
6	-2.766956
7	-2.766956
8	-2.766956
9	-2.766956
10	-2.766956
11	-2.766956
12	-2.766956
13	-2.766956
14	-2.766956
15	-2.766956
16	-2.777408
17	-2.766391
18	-2.613840
19	-2.583895
20	-2.591523
21	-2.578810
22	-2.566380
23	-2.575138
24	-2.698308
25	-2.744073
26	-2.839558
27	-2.876848
28	-2.826563
29	-2.822043
…	…
6045	2.509873
6046	2.507331
6047	2.510156
6048	2.453938
6049	2.501963
6050	2.527106
6051	2.527388
6052	2.490663
6053	2.466086
6054	2.372013
6055	2.420321
6056	2.423428
6057	2.386421
6058	2.394048
6059	2.293195
6060	2.288110
6061	2.226243
6062	2.226243
6063	2.233305
6064	2.265793
6065	2.220875
6066	2.158443
6067	2.175958
6068	2.182738
6069	2.160138
6070	2.132453
6071	2.170025
6072	2.191778
6073	2.214378
6074	2.195733

6075 rows × 1 columns In [27]:

#y_train_std as panda dataframe (optional, only for printing. See note in the beginning)
y_Test_std = pd.DataFrame(data=y_train_std)
y_Test_std.columns=["XAUUSDopen"]
y_Test_std

Out[27]:

	XAUUSDopen
0	-2.766956
1	-2.766956
2	-2.766956
3	-2.766956
4	-2.766956
5	-2.766956
6	-2.766956
7	-2.766956
8	-2.766956
9	-2.766956
10	-2.766956
11	-2.766956
12	-2.766956
13	-2.766956
14	-2.766956
15	-2.766956
16	-2.777408
17	-2.766391
18	-2.613840
19	-2.583895
20	-2.591523
21	-2.578810
22	-2.566380
23	-2.575138
24	-2.698308
25	-2.744073
26	-2.839558
27	-2.876848
28	-2.826563
29	-2.822043
…	…
6045	2.509873
6046	2.507331
6047	2.510156
6048	2.453938
6049	2.501963
6050	2.527106
6051	2.527388
6052	2.490663
6053	2.466086
6054	2.372013
6055	2.420321
6056	2.423428
6057	2.386421
6058	2.394048
6059	2.293195
6060	2.288110
6061	2.226243
6062	2.226243
6063	2.233305
6064	2.265793
6065	2.220875
6066	2.158443
6067	2.175958
6068	2.182738
6069	2.160138
6070	2.132453
6071	2.170025
6072	2.191778
6073	2.214378
6074	2.195733

6075 rows × 1 columns

IMPORTANT NOTE: After the transformation, we must decide if we want to feed the training cycle with the vector “y” transformed (y_train_std and y_test_std) or not (y_train and y_test). In any case, the vector that we will introduce will be the vector y_train_input and the y_test_input. In our case, we will use the transformed vector, so the input vectors will be:

In [28]:

# Shape of tensors
print("X_train_std.shape = ", X_train_std.shape, "y_train.shape =", y_train.shape,
      "\nX_train_std.shape =  ", X_train_std.shape, "y_test.shape = ", y_test.shape)

X_train_std.shape =  (6075, 31) y_train.shape = (6075,) 
X_train_std.shape =   (6075, 31) y_test.shape =  (2604,)

In [29]:

# Shape of tensors
print("X_train_std.shape = ", X_train_std.shape, "y_train_std.shape =", y_train_std.shape,
      "\nX_train_std.shape =  ", X_train_std.shape, "y_test_std.shape = ", y_test_std.shape)

X_train_std.shape =  (6075, 31) y_train_std.shape = (6075, 1) 
X_train_std.shape =   (6075, 31) y_test_std.shape =  (2604, 1)

In [30]:

# Number of samples in the train set
ntrain = y_train_std.shape[0]
# Number of samples in the test set
ntest = y_test_std.shape[0]

# Reshaping to get the righ training input tensor dimension.
y_train_input = y_train_std.reshape(ntrain,)
y_test_input = y_test_std.reshape(ntest,)

In [31]:

y_train_input

Out[31]:

array([-2.76695562, -2.76695562, -2.76695562, ...,  2.19177771,
        2.21437776,  2.19573272])

In [32]:

y_test_input

Out[32]:

array([2.19658022, 2.11832755, 2.17341517, ..., 1.53439878, 1.54202629,
       1.54202629])

Implement the model

The progress of the model can be saved during and after training. This means that a model can be resumed where it left off and avoid long training times. Saving also means that you can share your model and others can recreate your work.

We will illustrate how to create a multiple fully connected hidden layer NN, save it and make predictions with the trained model after reload it.

We will use Forex, Gold and SPX data for this exercise.

We will build a four-hidden layer neural network to predict the next close price of GOLD, from the other Forex pair four features of the precedent period (open, high, low and close). For the sake of simplicity, we have really used only the “open” price.

This is a practical exercise to learn how to make predictions with TensorFlow. Also, in the meantime, you will be able to elaborate on your own systems.

As can be seen, this neural network is not exactly the most complex one. That is to say that it is not precisely AmoebaNet-B with 557 million model parameters! but it is enough to clarify our concepts.

# Clears the default graph stack and resets the global default graph
ops.reset_default_graph()

In [34]:

# make results reproducible
seed = 2 #ram 2
tf.set_random_seed(seed)
np.random.seed(seed)  


# Parameters
learning_rate = 0.004
epsilon = 0.00001
batch_size = 1
n_features = X_train.shape[1]#  Number of features in training data
epochs = 5000
display_step = 50
model_path = model_dir+"08_Gold_Prediction-60Min"
production_model_path = production_dir+"models/"+"08_Gold_Prediction-60Min"
n_classes = 1

# Network Parameters
# See figure of the model
d0 = D = n_features # Layer 0 (Input layer number of features)
d1 = 60 # Layer 1 (1024 hidden nodes)
d2 = 20 # Layer 2 (512 hidden nodes) 
d3 = 5 # Layer 3 (256 hidden nodes)
d4 = 3 # Layer 4 (128 hidden nodes)
d5 = C = 1 # Layer 5 (Output layer)

# tf Graph input
print("Placeholders")
X = tf.placeholder(dtype=tf.float32, shape=[None, n_features], name="X")
y = tf.placeholder(dtype=tf.float32, shape=[None,n_classes], name="y")


# Initializers
print("Initializers")
sigma = 3.0
weight_initializer = tf.variance_scaling_initializer(mode="fan_avg", distribution='uniform', scale=sigma)
bias_initializer = tf.zeros_initializer()

# Create model
def multilayer_perceptron(X, variables):
    # Hidden layer with ReLU activation
    layer_1 = tf.nn.relu(tf.add(tf.matmul(X, variables['W1']), variables['bias1']), name="layer_1")
    # Hidden layer with ReLU activation
    layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, variables['W2']), variables['bias2']), name="layer_2")
    # Hidden layer with ReLU activation
    layer_3 = tf.nn.relu(tf.add(tf.matmul(layer_2, variables['W3']), variables['bias3']), name="layer_3")
    # Hidden layer with ReLU activation
    layer_4 = tf.nn.relu(tf.add(tf.matmul(layer_3, variables['W4']), variables['bias4']), name="layer_4")
    # Output layer with ReLU activation
    out_layer = tf.nn.relu(tf.add(tf.matmul(layer_4, variables['W5']), variables['bias5']), name="out_layer")
    return out_layer

# Store layers weight & bias
variables = {
    'W1': tf.Variable(weight_initializer([n_features, d1]), name="W1"), # inputs -> d1 hidden neurons
    'bias1': tf.Variable(bias_initializer([d1]), name="bias1"), # one biases for each d1 hidden neurons
    'W2': tf.Variable(weight_initializer([d1, d2]), name="W2"), # d1 hidden inputs -> d2 hidden neurons
    'bias2': tf.Variable(bias_initializer([d2]), name="bias2"), # one biases for each d2 hidden neurons
    'W3': tf.Variable(weight_initializer([d2, d3]), name="W3"), ## d2 hidden inputs -> d3 hidden neurons
    'bias3': tf.Variable(bias_initializer([d3]), name="bias3"), # one biases for each d3 hidden neurons
    'W4': tf.Variable(weight_initializer([d3, d4]), name="W4"), ## d3 hidden inputs -> d4 hidden neurons
    'bias4': tf.Variable(bias_initializer([d4]), name="bias4"), # one biases for each d4 hidden neurons
    'W5': tf.Variable(weight_initializer([d4, d5]), name="W5"), # d4 hidden inputs -> 1 output
    'bias5': tf.Variable(bias_initializer([d5]), name="bias5") # 1 bias for the output
}

# Construct model
y_hat = multilayer_perceptron(X, variables)

# Cost function
print("Cost function")
mse = tf.reduce_mean(tf.squared_difference(y_hat, y))

# Optimizer
print("Optimizer")
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate, epsilon=epsilon).minimize(mse)

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

# 'Saver' op to save and restore all the variables
saver = tf.train.Saver()

def generate_one_epoch(X_train, y_train, batch_size):
    num_batches = int(len(X_train)) // batch_size
    if batch_size * num_batches < len(X_train):
        num_batches += 1

    batch_indices = range(num_batches)
    batch_n = np.random.permutation(batch_indices)
    for j in batch_n:
        batch_X = X_train[j * batch_size: (j + 1) * batch_size]
        batch_y = y_train[j * batch_size: (j + 1) * batch_size]

    yield batch_X, batch_y

Placeholders
Initializers
Cost function
Optimizer

Train the model and Evaluate the model

In [35]:

# Fit neural net
print("Fit neural net")

with tf.Session() as sess:
    
    # Writer to record image, scalar, histogram and graph for display in tensorboard
    writer = tf.summary.FileWriter("/tmp/tensorflow_logs", sess.graph)  # create writer
    writer.add_graph(sess.graph)

    # Run the initializer
    #sess.run(init)
    sess.run(tf.global_variables_initializer())
    
    # Restore model weights from previously saved model
    #Ram saver.restore(sess, model_path)
    #Ram print("Model restored from file: %s" % model_path)
    '''
    # Try to restore a model if any.
    try:
        saver.restore(sess, model_path)
        print("Model restored from file: %s" % model_path)
    except Exception:
        print("No model file to restore")
        pass
    '''
    # Training cycle
    mse_train = []
    mse_test = []

    
    # Run
    print("Run")
    printcounter = 0
    for e in range(epochs):
        #sess.run(tf.global_variables_initializer())
        # Minibatch training
        
        for batch_X, batch_y in generate_one_epoch(X_train_std, y_train_input, batch_size):
            # Run optimizer with batch
            sess.run(optimizer, feed_dict={X: batch_X, y: np.transpose([batch_y])})
            #Ram print("batch_X",batch_X)
            #Ram print("batch_y",batch_y)
         
        # Show progress
        if (printcounter == display_step):
            
            printcounter = 0
            print("Epoch: ", e)
            # MSE train and test
            mse_train.append(sess.run(mse, feed_dict={X: X_train_std, y: np.transpose([y_train_input])}))
            mse_test.append(sess.run(mse, feed_dict={X: X_test_std, y: np.transpose([y_test_input])}))
            print('MSE Train: ', mse_train[-1])
            print('MSE Test: ', mse_test[-1])
        
        printcounter += 1
        
    # Print final MSE after Training
    mse_final = sess.run(mse, feed_dict={X: X_test_std, y: np.transpose([y_test_input])})
    print("MSE Final",mse_final)
    
    # Close writer
    writer.flush()
    writer.close()
        
    # Save model weights to disk
    save_path = saver.save(sess, model_path)
    print("Model saved in file: %s" % save_path)
    print("First Optimization Finished!")

Fit neural net
Run
Epoch:  50
MSE Train:  0.66133934
MSE Test:  2.0214076
Epoch:  100
MSE Train:  1.2463222
MSE Test:  4.8861437
Epoch:  150
MSE Train:  0.5524249
MSE Test:  1.5428063
Epoch:  200
MSE Train:  0.5269637
MSE Test:  0.9128028
Epoch:  250
MSE Train:  0.504704
MSE Test:  2.006605
Epoch:  300
MSE Train:  0.5011368
MSE Test:  1.3647156
Epoch:  350
MSE Train:  0.5088436
MSE Test:  0.7686715
Epoch:  400
MSE Train:  0.52742594
MSE Test:  0.82534343
Epoch:  450
.
.
.
Epoch:  4650
MSE Train:  0.5119001
MSE Test:  0.06555417
Epoch:  4700
MSE Train:  0.5741918
MSE Test:  0.39353475
Epoch:  4750
MSE Train:  0.5000178
MSE Test:  0.039051585
Epoch:  4800
MSE Train:  0.4873231
MSE Test:  0.038123287
Epoch:  4850
MSE Train:  0.49308863
MSE Test:  0.083646305
Epoch:  4900
MSE Train:  0.48717114
MSE Test:  0.07902511
Epoch:  4950
MSE Train:  0.48384216
MSE Test:  0.034786217
MSE Final 0.032983813
Model saved in file: ../models/08_Gold_Prediction-60Min
First Optimization Finished!

In [36]:

batch_y.shape

Out[36]:

(1,)

In [37]:

%matplotlib inline
# Plot loss (MSE) over time
plt.plot(mse_train, 'k-', label='mse_train')
plt.plot(mse_test, 'r--', label='mse_test')
plt.title('Loss (MSE) vs epoch')
plt.legend(loc='upper right')
plt.xlabel('Generation x'+str(display_step))
plt.ylabel('Loss')
plt.show()

Wait!, mse_train becomes lower than mse_test, why? Recall that the MSE is only a point estimate sampled from a distribution of possible MSEs and what exact data entered in your training set against your set of tests affects this point estimate. As an exercise, I suggest that you try to get an idea of what the underlying distribution looks like to be able to test whether the training MSE is significantly greater than the test MSE. Also, remember that only one year’s data has been used! that there are very few data.

One way to do this is to take your training set and randomly select, for example, 80% or 85% of it for a new set of sub learning. Then, of what is left, you have a new set of subtests. Record the MSE for both this sub training and for the subtests. Then repeat this process many times and trace the distribution.

Tensorboard Graph

What follows is the graph we have executed and all the data about it. Note the “save” label as several layers.

Saving a Tensorflow model

So, now we have our model saved.

Tensorflow model has four main files:

a) Meta graph: This is a protocol buffer which saves the complete Tensorflow graph; i.e. all variables, operations, collections, etc. This file has a .meta extension.
b) y c) Checkpoint files: It is a binary file which contains all the values of the weights, biases, gradients and all the other variables saved. Tensorflow has changed from version 0.11. Instead of a single .ckpt file, we have now two files: .index and .data file that contains our training variables.
d) Along with this, Tensorflow also has a file named checkpoint which simply keeps a record of latest checkpoint files saved.

Predict

Finally, we can use the model to make some predictions.

In [38]:

# Running a new session for predictions and export model to production
print("Starting prediction session...")
with tf.Session() as sess:
    # Initialize variables
    sess.run(init)

    # Try to restore a model if any.
    try:
        saver.restore(sess, model_path)
        print("Model restored from file: %s" % model_path)
        # We try to predict the close price of test samples
        feed_dict = {X: X_test_std}
        
        prediction = sess.run(y_hat, feed_dict)
        print(prediction) 

        %matplotlib inline
        # Plot Prices over time
        plt.plot(y_test_std, 'k-', label='y_test')
        plt.plot(prediction, 'r--', label='prediction')
        plt.title('Price over time')
        plt.legend(loc='upper right')
        plt.xlabel('Time')
        plt.ylabel('Price')
        plt.show()
        
        if to_production:
            # Pick out the model input and output
            X_tensor = sess.graph.get_tensor_by_name("X"+ ':0')
            y_tensor = sess.graph.get_tensor_by_name("out_layer" + ':0')

            model_input = build_tensor_info(X_tensor)
            model_output = build_tensor_info(y_tensor)
            
            # Create a signature definition for tfserving
            signature_definition = signature_def_utils.build_signature_def(
                inputs={"X": model_input},
                outputs={"out_layer": model_output},
                method_name=signature_constants.PREDICT_METHOD_NAME)

            model_version = 1
            export_model_dir = production_model_path+"/"+str(model_version)
            while os.path.exists(export_model_dir):
                model_version += 1
                export_model_dir = production_model_path+"/"+str(model_version)       
    
            builder = saved_model_builder.SavedModelBuilder(export_model_dir)  
            builder.add_meta_graph_and_variables(sess,
                                                 [tag_constants.SERVING],
                                                 signature_def_map={
                                                     signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
                                                     signature_definition})

            # Save the model so we can serve it with a model server :)
            builder.save()

    except Exception:
        print("Unexpected error:", sys.exc_info()[0])
        pass

Starting prediction session...
INFO:tensorflow:Restoring parameters from ../models/08_Gold_Prediction-60Min
Model restored from file: ../models/08_Gold_Prediction-60Min
[[2.3185585]
 [2.3329074]
 [2.2485995]
 ...
 [1.1020833]
 [1.080542 ]
 [1.0774844]]

INFO:tensorflow:No assets to save.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: b'/home/parrondo/PRODUCTION/models/08_Gold_Prediction-60Min/11/saved_model.pb'

Remember this price is transformed price. If you want to know the real price you must revert the transformation. Below is an example o reversion where you can see the typical price of Gold pound in USD in 2017.

In [39]:

y_hat_rev = sc.inverse_transform(prediction)
y_hat_rev

Out[39]:

array([[1330.6877],
       [1331.1957],
       [1328.2114],
       ...,
       [1287.6268],
       [1286.8643],
       [1286.7561]], dtype=float32)

In [40]:

dfvisual = pd.DataFrame()
dfvisual["y_test_std"] = y_test_input
dfvisual["prediction"]= prediction
dfvisual["Abs.error"]=dfvisual["y_test_std"]-dfvisual["prediction"]
dfvisual["Relat.error"]=abs(dfvisual["Abs.error"]/dfvisual["y_test_std"])*100

dfvisual["y_test"] = y_test
dfvisual["prediction_rev"]= y_hat_rev
dfvisual["Abs.error_rev"]=dfvisual["y_test"]-dfvisual["prediction_rev"]
dfvisual["Relat.error_rev"]=abs(dfvisual["Abs.error_rev"]/dfvisual["y_test"])*100
dfvisual

Out[40]:

	y_test_std	prediction	Abs.error	Relat.error	y_test	prediction_rev	Abs.error_rev	Relat.error_rev
0	2.196580	2.318558	-0.121978	5.553097	1326.37	1330.687744	-4.317744	0.325531
1	2.118328	2.332907	-0.214580	10.129684	1323.60	1331.195679	-7.595679	0.573865
2	2.173415	2.248600	-0.075184	3.459273	1325.55	1328.211426	-2.661426	0.200779
3	2.170025	2.297498	-0.127473	5.874266	1325.43	1329.942261	-4.512261	0.340437
4	2.204490	2.279125	-0.074635	3.385589	1326.65	1329.291992	-2.641992	0.199148
5	2.222570	2.311989	-0.089419	4.023215	1327.29	1330.455200	-3.165200	0.238471
6	2.219745	2.308049	-0.088303	3.978089	1327.19	1330.315796	-3.125796	0.235520
7	2.246300	2.297558	-0.051257	2.281852	1328.13	1329.944458	-1.814458	0.136618
8	2.253928	2.306988	-0.053060	2.354119	1328.40	1330.278198	-1.878198	0.141388
9	2.320880	2.298412	0.022468	0.968099	1330.77	1329.974609	0.795391	0.059769
10	2.340373	2.322234	0.018139	0.775063	1331.46	1330.817871	0.642129	0.048227
11	2.341503	2.315413	0.026090	1.114254	1331.50	1330.576416	0.923584	0.069364
12	2.398003	2.335415	0.062588	2.609997	1333.50	1331.284546	2.215454	0.166138
13	2.355063	2.359980	-0.004917	0.208776	1331.98	1332.154053	-0.174053	0.013067
14	2.322010	2.322639	-0.000628	0.027046	1330.81	1330.832275	-0.022275	0.001674
15	2.320315	2.326271	-0.005956	0.256681	1330.75	1330.960815	-0.210815	0.015842
16	2.354216	2.346890	0.007325	0.311149	1331.95	1331.690674	0.259326	0.019470
17	2.360996	2.372955	-0.011960	0.506545	1332.19	1332.613403	-0.423403	0.031783
18	2.344611	2.366705	-0.022095	0.942358	1331.61	1332.392090	-0.782090	0.058733
19	2.322575	2.350963	-0.028388	1.222247	1330.83	1331.834839	-1.004839	0.075505
20	2.316360	2.334955	-0.018594	0.802736	1330.61	1331.268188	-0.658188	0.049465
21	2.297715	2.336296	-0.038581	1.679087	1329.95	1331.315674	-1.365674	0.102686
22	2.361843	2.318575	0.043268	1.831977	1332.22	1330.688354	1.531646	0.114969
23	2.365233	2.394488	-0.029255	1.236886	1332.34	1333.375610	-1.035610	0.077729
24	2.374838	2.404136	-0.029298	1.233677	1332.68	1333.717041	-1.037041	0.077816
25	2.423428	2.400697	0.022731	0.937968	1334.40	1333.595337	0.804663	0.060301
26	2.387833	2.444053	-0.056220	2.354447	1333.14	1335.130127	-1.990127	0.149281
27	2.342351	2.425597	-0.083247	3.553979	1331.53	1334.476807	-2.946807	0.221310
28	2.222005	2.375738	-0.153733	6.918665	1327.27	1332.711914	-5.441914	0.410008
29	2.107310	2.238379	-0.131069	6.219752	1323.21	1327.849609	-4.639609	0.350633
…	…	…	…	…	…	…	…	…
2574	1.289753	0.838800	0.450954	34.964340	1294.27	1278.307129	15.962871	1.233349
2575	1.282691	0.873014	0.409677	31.938871	1294.02	1279.518188	14.501812	1.120679
2576	1.290883	0.880109	0.410774	31.821172	1294.31	1279.769409	14.540591	1.123424
2577	1.301336	0.893184	0.408151	31.364035	1294.68	1280.232178	14.447822	1.115938
2578	1.292861	0.876727	0.416134	32.187052	1294.38	1279.649658	14.730342	1.138023
2579	1.301618	0.854520	0.447098	34.349396	1294.69	1278.863525	15.826475	1.222414
2580	1.285516	0.878323	0.407192	31.675413	1294.12	1279.706177	14.413823	1.113793
2581	1.281278	0.857069	0.424209	33.108282	1293.97	1278.953735	15.016265	1.160480
2582	1.300206	0.844217	0.455989	35.070544	1294.64	1278.498779	16.141221	1.246773
2583	1.326761	0.856385	0.470376	35.452953	1295.58	1278.929565	16.650435	1.285172
2584	1.332976	0.869408	0.463567	34.776880	1295.80	1279.390625	16.409375	1.266351
2585	1.321676	0.893048	0.428628	32.430614	1295.40	1280.227417	15.172583	1.171266
2586	1.328738	0.874963	0.453775	34.150830	1295.65	1279.587158	16.062842	1.239752
2587	1.336931	0.886586	0.450345	33.684998	1295.94	1279.998657	15.941343	1.230099
2588	1.326196	0.895432	0.430764	32.481199	1295.56	1280.311768	15.248232	1.176961
2589	1.312071	0.894894	0.417176	31.795259	1295.06	1280.292725	14.767275	1.140277
2590	1.349643	0.882475	0.467168	34.614185	1296.39	1279.853149	16.536851	1.275608
2591	1.361508	0.942728	0.418781	30.758593	1296.81	1281.985962	14.824038	1.143116
2592	1.367441	0.947259	0.420182	30.727636	1297.02	1282.146362	14.873638	1.146755
2593	1.367441	0.952195	0.415246	30.366632	1297.02	1282.321045	14.698955	1.133287
2594	1.355011	0.959212	0.395799	29.210050	1296.58	1282.569458	14.010542	1.080577
2595	1.438349	0.958809	0.479539	33.339572	1299.53	1282.555176	16.974824	1.306228
2596	1.489199	0.998524	0.490674	32.948890	1301.33	1283.961060	17.368940	1.334707
2597	1.523946	1.038934	0.485012	31.826066	1302.56	1285.391479	17.168521	1.318060
2598	1.545981	1.065707	0.480274	31.065994	1303.34	1286.339111	17.000889	1.304409
2599	1.622539	1.105397	0.517142	31.872377	1306.05	1287.744141	18.305859	1.401620
2600	1.554739	1.163175	0.391564	25.185199	1303.65	1289.789307	13.860693	1.063222
2601	1.534399	1.102083	0.432315	28.174909	1302.93	1287.626831	15.303169	1.174520
2602	1.542026	1.080542	0.461484	29.927137	1303.20	1286.864258	16.335742	1.253510
2603	1.542026	1.077484	0.464542	30.125422	1303.20	1286.756104	16.443896	1.261809

2604 rows × 8 columns In [41]:

# Mean relative error
dfvisual["Relat.error_rev"].mean()

Out[41]:

0.4114874168732872

In [42]:

dfvisual["Abs.error_rev"].mean()

Out[42]:

2.2686865478140428

In [43]:

# Seaborn visualization library
import seaborn as sns

# Create the default pairplot
sns.pairplot(dfvisual)

Out[43]:

<seaborn.axisgrid.PairGrid at 0x7fad71f532e8>

OK, better results, but still not very good results. We could try to improve them with a deeper network (more layers) or retouching the net parameters and number of neurons. That is another story.

Remember that we are only at the beginning of our journey through the fascinating world of predictions.

Do not miss the next posts because, in addition to TensorFlow, we will explore the wide variety of tools that we have at our disposal to undertake these tasks.

This series of articles on Tensorflow is not over yet. I know that it is long and probably something difficult, but it is worth it if you want to have enough expertise to perform trading systems with modern machine learning and deep learning tools.

Remember to start here: https://todotrader.com/start-here/

Also, you must take into account that this site is alive and continuously I am increasing the content of the other pages like:

Resources

Black Belt

DeepTrading with TensorFlow VI

2019-07-072019-07-07
by parrondo

Data corrupts. Absolute Data corrupts absolutely. This is my impression every time I am faced with the amount of data that is available to us in the current times.

This is the moment of truth. Today you will learn how to make some predictions in the Forex market. This is probably the Far West of the financial markets.

But you have nothing to fear as I am revealing step by step what could take months and probably years, as it has cost me because I have gone the dark (hard) side.

Making the first prediction in the Forex Market

We will illustrate how to create a multiple fully connected hidden layer NN, save it and make predictions with trained model after reload it.

Furthermore, we will use the EURUSD data for this exercise.

In this post, we will build a four-hidden layer neural network to predict the next close price, from the other four features of the precedent period (open, high, low and close).

This is a practical exercise to learn how to make predictions with TensorFlow, but it is a naive approach to the real forecasting problem. Don’t worry we will be climbing toward better approaches. Also, in the meantime, you will be able to elaborate on your own systems.

Load configuration

Below, it is an example of .env file for your customization:

PROJ_DIR=.
DATA_DIR=../data/
RAW_DIR=../data/raw/
INTERIM_DIR=../data/interim/
PROCESSED_DIR=../data/processed/
FIGURES_DIR=../reports/figures/
MODEL_DIR=../models/
EXTERNAL_DIR=../data/external/
PRODUCTION_DIR=/home/PRODUCTION/

In this part of the code we should load all our experiment parameters.

In this way, you can organize a long series of experiments without having to ask where the parameters are changed.

In [1]:

## 1. Import libraries and modules. Load env variables
import os
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#import seaborn as sns
import zipfile
import sqlite3
from datetime import date, datetime, timezone
from dotenv import find_dotenv, load_dotenv

import tensorflow as tf
from tensorflow.python.framework import ops
from tensorflow.python.saved_model import builder as saved_model_builder
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import signature_def_utils
from tensorflow.python.saved_model import tag_constants
from tensorflow.python.saved_model.utils import build_tensor_info

from sklearn.preprocessing import MinMaxScaler
#from IPython.display import Image

#Reads the key,value pair from .env and adds them to environment variable 
load_dotenv(find_dotenv())

# Check the env variables exist. Check as many variables as you need
raw_msg = "Set your raw data absolute path in the .env file at project root"
assert "RAW_DIR" in os.environ, raw_msg
data_msg = "Set your processed data absolute path in the .env file at project root"
assert "DATA_DIR" in os.environ, data_msg
interim_msg = "Set your interim data absolute path in the .env file at project root"
assert "INTERIM_DIR" in os.environ, interim_msg

# Load env variables
proj_dir = os.path.expanduser(os.environ.get("PROJ_DIR"))
data_dir = os.path.expanduser(os.environ.get("DATA_DIR"))
raw_dir = os.path.expanduser(os.environ.get("RAW_DIR"))
interim_dir = os.path.expanduser(os.environ.get("INTERIM_DIR"))
processed_dir = os.path.expanduser(os.environ.get("PROCESSED_DIR"))
figures_dir = os.path.expanduser(os.environ.get("FIGURES_DIR"))
model_dir = os.path.expanduser(os.environ.get("MODEL_DIR"))
external_dir = os.path.expanduser(os.environ.get("EXTERNAL_DIR"))
production_dir = os.path.expanduser(os.environ.get("PRODUCTION_DIR"))

# Import our project modules
#
#Add src/app to the PATH
#Ram sys.path.append(os.path.join(proj_dir,"src/app"))

#Add src/data to the PATH
sys.path.append(os.path.join(proj_dir,"src/data"))
#Ram import make_dataset as md

#Add src/visualization to the PATH
#Ram sys.path.append(os.path.join(proj_dir,"src/visualization"))
#Ram import visualize as vs

#Data files
#raw_data = 
#interim_data = 

#Global configuration variables
# Send models to production env. folder (True: send. False: Do not send)
to_production = True

/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)

Ingest raw data

Download raw datasets

I include forex time series zipped files needed in this tutorial

In [2]:

## 2. Download raw data sets

#This point must be adapted for each project

Download .csv data file from HistData.com and save in ../data/raw dir

Basic pre-process data

Machine Learning time series algorithms usually require data to be into a single text file in tabular format, with each row representing a timestamp of the input dataset and each column one of its features.

“Prepare” data for Machine Learning is a complex task depending on where the data is stored and where it is obtained from. And doubtless, it is one of the most time-consuming task. Often the Forex data is not available in a single file. They may be distributed across different sources like multiple compressed CSV files, spreadsheets or plain text files, normalized in database tables, or even in NoSql database like MongoDB. So we need a tool to stage, filter, transform when necessary, and finally export to a single flat, text CSV file.

If your Forex data is small and the changes are simple such as adding a derived field or new events you can use a spreadsheet, make the necessary changes, and then export it to a CSV file. Certainly, not too professional. But when the changes are more complex; e.g., joining several sources, filtering a subset of the data, or managing a large number of timestamp rows, you might need a more powerful tool like an RDBMS. MySQL is a great one and it’s free and opensourced. In this tutorial, we have selected SQLite which is enough for our purpose and data size. Here we treat several compressed .csv files distributed in different folders, which is very usual in real trading. If the data size that we are managing is in the terabytes, then we should consider Hadoop. But trust us, that is another story.

In [3]:

# All available instruments to trade with
symbol = "EURUSD" # type=str, symbol list using format "EURUSD" "EURGBP"

In [4]:

# Clean database table
DATABASE_FILE = processed_dir+"Data.db"
def initialize_db(self):
    with sqlite3.connect(DATABASE_FILE) as connection:
        cursor = connection.cursor()
        cursor.execute('CREATE TABLE IF NOT EXISTS History (timestamp INTEGER,'
                       'symbol VARCHAR(20), high FLOAT, low FLOAT,'
                       'open FLOAT, close FLOAT, volume FLOAT, '
                       'quoteVolume FLOAT, weightedAverage FLOAT,'
                       'PRIMARY KEY (timestamp, symbol));')
        connection.commit()

initialize_db(DATABASE_FILE)
conn = sqlite3.connect(DATABASE_FILE)

In [5]:

# Create the dataframe
columns = ["timestamp", "symbol", "open", "high", "low", "close", "volume", "quoteVolume", "weightedAverage"]
dtype = {"timestamp":"INTEGER",
         "symbol":"VARCHAR(20)",
         "open":"FLOAT",
         "high":"FLOAT",
         "low":"FLOAT",
         "close":"FLOAT",
         "volume":"FLOAT",
         "quoteVolume":"FLOAT",
         "weightedAverage":"FLOAT"}
df0 = pd.DataFrame(columns=columns)

# Write dataframe to sqlite database
df0.to_sql("History", conn, if_exists="replace", index=False, dtype=dtype)

In [6]:

#
# Database population
#
#
# All price instrument in cash currency base
#

# Initialicing dataframes           
df1 = pd.DataFrame().iloc[0:0]
df2 = pd.DataFrame().iloc[0:0]

# Managing diferent raw data files from several folders
compressedfile = os.path.join(raw_dir,symbol.lower(),'HISTDATA_COM_ASCII_'+symbol+'_M1_2017.zip')
zf = zipfile.ZipFile(compressedfile) # having .csv zipped file
inputfile = 'DAT_ASCII_'+symbol+'_M1_2017.csv'
print("inputfile: ",inputfile)
#df1 = pd.read_csv(inputfile, names=['date', 'open', 'high', 'low', 'close', 'volume'],index_col=0, parse_dates=True, delimiter=";")

df1 = pd.read_csv(zf.open(inputfile), header=None,
                  names=['timestamp', 'open', 'high', 'low', 'close', 'volume'],
                  index_col=0, parse_dates=True,sep=';') # reads the csv and creates the dataframe called "df1"

# Resampling data from 1Min to desired Period
df2 =  df1["open"].resample('60Min').ohlc()
        
# Convert pandas timestamps in Unix timestamps:
df2.index = df2.index.astype(np.int64) // 10**9

# Insert new columns with the instrument name and their values
df2.insert(loc=0, column='symbol', value=symbol)

#Only for compatibility with stocks code (optional, you may want to remove this fields from database)
df2['volume']=1000.
df2['quoteVolume']=1000.
df2['weightedAverage']=1.

# Reset index to save in database
df2=df2.reset_index()
 
#Filling gaps forward
df2 = df2.fillna(method='pad')
        
# Save to database (Populate database)
df2.to_sql("History", conn, if_exists="append", index=False, chunksize=1000)

# Liberate memory
del df1
del df2

inputfile:  DAT_ASCII_EURUSD_M1_2017.csv

In [7]:

# Committing changes and closing the connection to the database file
conn.commit()
conn.close()

Load data from database

In [8]:

# Load dataset (In this case reading the database)
DATABASE_FILE=processed_dir+"Data.db"
conn = sqlite3.connect(DATABASE_FILE)
df = pd.read_sql_query("select * from History;", conn)
df=df.drop(["index"], axis=1, errors="ignore")

In [9]:

df

Out[9]:

	timestamp	symbol	open	high	low	close	volume	quoteVolume	weightedAverage
0	1483322400	EURUSD	1.05155	1.05213	1.05130	1.05150	1000.0	1000.0	1.0
1	1483326000	EURUSD	1.05152	1.05175	1.04929	1.04929	1000.0	1000.0	1.0
2	1483329600	EURUSD	1.04889	1.04904	1.04765	1.04868	1000.0	1000.0	1.0
3	1483333200	EURUSD	1.04866	1.04885	1.04791	1.04803	1000.0	1000.0	1.0
4	1483336800	EURUSD	1.04805	1.04812	1.04768	1.04782	1000.0	1000.0	1.0
5	1483340400	EURUSD	1.04782	1.04782	1.04653	1.04659	1000.0	1000.0	1.0
6	1483344000	EURUSD	1.04655	1.04680	1.04615	1.04668	1000.0	1000.0	1.0
7	1483347600	EURUSD	1.04655	1.04747	1.04649	1.04747	1000.0	1000.0	1.0
8	1483351200	EURUSD	1.04718	1.04729	1.04637	1.04699	1000.0	1000.0	1.0
9	1483354800	EURUSD	1.04696	1.04771	1.04676	1.04686	1000.0	1000.0	1.0
10	1483358400	EURUSD	1.04690	1.04690	1.04621	1.04655	1000.0	1000.0	1.0
11	1483362000	EURUSD	1.04654	1.04665	1.04605	1.04605	1000.0	1000.0	1.0
12	1483365600	EURUSD	1.04600	1.04627	1.04581	1.04592	1000.0	1000.0	1.0
13	1483369200	EURUSD	1.04589	1.04597	1.04565	1.04582	1000.0	1000.0	1.0
14	1483372800	EURUSD	1.04582	1.04582	1.04496	1.04525	1000.0	1000.0	1.0
15	1483376400	EURUSD	1.04534	1.04702	1.04532	1.04605	1000.0	1000.0	1.0
16	1483380000	EURUSD	1.04616	1.04678	1.04557	1.04573	1000.0	1000.0	1.0
17	1483383600	EURUSD	1.04572	1.04703	1.04572	1.04662	1000.0	1000.0	1.0
18	1483387200	EURUSD	1.04660	1.04805	1.04659	1.04763	1000.0	1000.0	1.0
19	1483390800	EURUSD	1.04758	1.04772	1.04711	1.04713	1000.0	1000.0	1.0
20	1483394400	EURUSD	1.04717	1.04838	1.04715	1.04838	1000.0	1000.0	1.0
21	1483398000	EURUSD	1.04845	1.04896	1.04824	1.04860	1000.0	1000.0	1.0
22	1483401600	EURUSD	1.04882	1.04896	1.04814	1.04814	1000.0	1000.0	1.0
23	1483405200	EURUSD	1.04820	1.04882	1.04820	1.04863	1000.0	1000.0	1.0
24	1483408800	EURUSD	1.04880	1.04880	1.04539	1.04586	1000.0	1000.0	1.0
25	1483412400	EURUSD	1.04600	1.04641	1.04218	1.04370	1000.0	1000.0	1.0
26	1483416000	EURUSD	1.04351	1.04355	1.04025	1.04100	1000.0	1000.0	1.0
27	1483419600	EURUSD	1.04017	1.04158	1.03972	1.03980	1000.0	1000.0	1.0
28	1483423200	EURUSD	1.03964	1.03976	1.03822	1.03931	1000.0	1000.0	1.0
29	1483426800	EURUSD	1.03941	1.03941	1.03753	1.03852	1000.0	1000.0	1.0
…	…	…	…	…	…	…	…	…	…
8649	1514458800	EURUSD	1.19366	1.19539	1.19366	1.19520	1000.0	1000.0	1.0
8650	1514462400	EURUSD	1.19513	1.19547	1.19497	1.19530	1000.0	1000.0	1.0
8651	1514466000	EURUSD	1.19530	1.19587	1.19530	1.19582	1000.0	1000.0	1.0
8652	1514469600	EURUSD	1.19578	1.19584	1.19500	1.19519	1000.0	1000.0	1.0
8653	1514473200	EURUSD	1.19527	1.19527	1.19413	1.19413	1000.0	1000.0	1.0
8654	1514476800	EURUSD	1.19403	1.19446	1.19391	1.19423	1000.0	1000.0	1.0
8655	1514480400	EURUSD	1.19415	1.19445	1.19364	1.19379	1000.0	1000.0	1.0
8656	1514484000	EURUSD	1.19384	1.19409	1.19384	1.19391	1000.0	1000.0	1.0
8657	1514487600	EURUSD	1.19394	1.19451	1.19393	1.19441	1000.0	1000.0	1.0
8658	1514491200	EURUSD	1.19441	1.19486	1.19373	1.19419	1000.0	1000.0	1.0
8659	1514494800	EURUSD	1.19421	1.19474	1.19376	1.19474	1000.0	1000.0	1.0
8660	1514498400	EURUSD	1.19476	1.19476	1.19418	1.19426	1000.0	1000.0	1.0
8661	1514502000	EURUSD	1.19426	1.19458	1.19415	1.19443	1000.0	1000.0	1.0
8662	1514505600	EURUSD	1.19444	1.19473	1.19438	1.19465	1000.0	1000.0	1.0
8663	1514509200	EURUSD	1.19489	1.19553	1.19487	1.19543	1000.0	1000.0	1.0
8664	1514512800	EURUSD	1.19551	1.19581	1.19464	1.19525	1000.0	1000.0	1.0
8665	1514516400	EURUSD	1.19523	1.19685	1.19523	1.19674	1000.0	1000.0	1.0
8666	1514520000	EURUSD	1.19674	1.19870	1.19674	1.19829	1000.0	1000.0	1.0
8667	1514523600	EURUSD	1.19814	1.19848	1.19736	1.19806	1000.0	1000.0	1.0
8668	1514527200	EURUSD	1.19803	1.19887	1.19800	1.19867	1000.0	1000.0	1.0
8669	1514530800	EURUSD	1.19870	1.19946	1.19849	1.19946	1000.0	1000.0	1.0
8670	1514534400	EURUSD	1.19951	1.19983	1.19839	1.19846	1000.0	1000.0	1.0
8671	1514538000	EURUSD	1.19867	1.19946	1.19861	1.19926	1000.0	1000.0	1.0
8672	1514541600	EURUSD	1.19927	1.20069	1.19880	1.20069	1000.0	1000.0	1.0
8673	1514545200	EURUSD	1.20097	1.20215	1.20023	1.20215	1000.0	1000.0	1.0
8674	1514548800	EURUSD	1.20220	1.20255	1.20193	1.20197	1000.0	1000.0	1.0
8675	1514552400	EURUSD	1.20214	1.20231	1.20124	1.20133	1000.0	1000.0	1.0
8676	1514556000	EURUSD	1.20134	1.20138	1.20071	1.20106	1000.0	1000.0	1.0
8677	1514559600	EURUSD	1.20092	1.20104	1.19978	1.19983	1000.0	1000.0	1.0
8678	1514563200	EURUSD	1.19978	1.20035	1.19927	1.19982	1000.0	1000.0	1.0

8679 rows × 9 columns

In [10]:

df1 = df.loc[(df['symbol'] == "EURUSD"),['timestamp','open','high','low','close']]

In [11]:

df1

Out[11]:

	timestamp	open	high	low	close
0	1483322400	1.05155	1.05213	1.05130	1.05150
1	1483326000	1.05152	1.05175	1.04929	1.04929
2	1483329600	1.04889	1.04904	1.04765	1.04868
3	1483333200	1.04866	1.04885	1.04791	1.04803
4	1483336800	1.04805	1.04812	1.04768	1.04782
5	1483340400	1.04782	1.04782	1.04653	1.04659
6	1483344000	1.04655	1.04680	1.04615	1.04668
7	1483347600	1.04655	1.04747	1.04649	1.04747
8	1483351200	1.04718	1.04729	1.04637	1.04699
9	1483354800	1.04696	1.04771	1.04676	1.04686
10	1483358400	1.04690	1.04690	1.04621	1.04655
11	1483362000	1.04654	1.04665	1.04605	1.04605
12	1483365600	1.04600	1.04627	1.04581	1.04592
13	1483369200	1.04589	1.04597	1.04565	1.04582
14	1483372800	1.04582	1.04582	1.04496	1.04525
15	1483376400	1.04534	1.04702	1.04532	1.04605
16	1483380000	1.04616	1.04678	1.04557	1.04573
17	1483383600	1.04572	1.04703	1.04572	1.04662
18	1483387200	1.04660	1.04805	1.04659	1.04763
19	1483390800	1.04758	1.04772	1.04711	1.04713
20	1483394400	1.04717	1.04838	1.04715	1.04838
21	1483398000	1.04845	1.04896	1.04824	1.04860
22	1483401600	1.04882	1.04896	1.04814	1.04814
23	1483405200	1.04820	1.04882	1.04820	1.04863
24	1483408800	1.04880	1.04880	1.04539	1.04586
25	1483412400	1.04600	1.04641	1.04218	1.04370
26	1483416000	1.04351	1.04355	1.04025	1.04100
27	1483419600	1.04017	1.04158	1.03972	1.03980
28	1483423200	1.03964	1.03976	1.03822	1.03931
29	1483426800	1.03941	1.03941	1.03753	1.03852
…	…	…	…	…	…
8649	1514458800	1.19366	1.19539	1.19366	1.19520
8650	1514462400	1.19513	1.19547	1.19497	1.19530
8651	1514466000	1.19530	1.19587	1.19530	1.19582
8652	1514469600	1.19578	1.19584	1.19500	1.19519
8653	1514473200	1.19527	1.19527	1.19413	1.19413
8654	1514476800	1.19403	1.19446	1.19391	1.19423
8655	1514480400	1.19415	1.19445	1.19364	1.19379
8656	1514484000	1.19384	1.19409	1.19384	1.19391
8657	1514487600	1.19394	1.19451	1.19393	1.19441
8658	1514491200	1.19441	1.19486	1.19373	1.19419
8659	1514494800	1.19421	1.19474	1.19376	1.19474
8660	1514498400	1.19476	1.19476	1.19418	1.19426
8661	1514502000	1.19426	1.19458	1.19415	1.19443
8662	1514505600	1.19444	1.19473	1.19438	1.19465
8663	1514509200	1.19489	1.19553	1.19487	1.19543
8664	1514512800	1.19551	1.19581	1.19464	1.19525
8665	1514516400	1.19523	1.19685	1.19523	1.19674
8666	1514520000	1.19674	1.19870	1.19674	1.19829
8667	1514523600	1.19814	1.19848	1.19736	1.19806
8668	1514527200	1.19803	1.19887	1.19800	1.19867
8669	1514530800	1.19870	1.19946	1.19849	1.19946
8670	1514534400	1.19951	1.19983	1.19839	1.19846
8671	1514538000	1.19867	1.19946	1.19861	1.19926
8672	1514541600	1.19927	1.20069	1.19880	1.20069
8673	1514545200	1.20097	1.20215	1.20023	1.20215
8674	1514548800	1.20220	1.20255	1.20193	1.20197
8675	1514552400	1.20214	1.20231	1.20124	1.20133
8676	1514556000	1.20134	1.20138	1.20071	1.20106
8677	1514559600	1.20092	1.20104	1.19978	1.19983
8678	1514563200	1.19978	1.20035	1.19927	1.19982

8679 rows × 5 columns In [12]:

# Create the apropiate features dataframe (In this case is open, high, low and close prices of symbol)
df2 = pd.DataFrame()
symbol_features = ['open', 'high', 'low', 'close']
for feature in symbol_features:
    df1 = df.loc[(df['symbol'] == symbol),['timestamp', feature]]
    df1.columns=['timestamp',symbol+feature]
    # Setting the timestamp as the index
    df1.set_index('timestamp', inplace=True)
    
    # Convert timestamps to dates but it's not mandatory
    #df1.index = pd.to_datetime(df1.index, unit='s')

    # Just perform a join and that's it
    df2 = df2.join(df1, how='outer')

# Filling the remaining gaps backguards (the initial gaps has not before value)
df2 = df2.fillna(method='bfill')

# Independent variables data
X_raw = df2

# Dimensions of dataset
print("Dimensions of dataset")
n = X_raw.shape[0]
p = X_raw.shape[1]
print("n=",n,"p=",p)

# Drop timestamp variable (only when necessary)
#Ram print("Drop timestamp variable")
#Ram X_raw = X_raw.drop(['timestamp'], 1)

Dimensions of dataset
n= 8679 p= 4

In [13]:

X_raw

Out[13]:

	EURUSDopen	EURUSDhigh	EURUSDlow	EURUSDclose
timestamp
1483322400	1.05155	1.05213	1.05130	1.05150
1483326000	1.05152	1.05175	1.04929	1.04929
1483329600	1.04889	1.04904	1.04765	1.04868
1483333200	1.04866	1.04885	1.04791	1.04803
1483336800	1.04805	1.04812	1.04768	1.04782
1483340400	1.04782	1.04782	1.04653	1.04659
1483344000	1.04655	1.04680	1.04615	1.04668
1483347600	1.04655	1.04747	1.04649	1.04747
1483351200	1.04718	1.04729	1.04637	1.04699
1483354800	1.04696	1.04771	1.04676	1.04686
1483358400	1.04690	1.04690	1.04621	1.04655
1483362000	1.04654	1.04665	1.04605	1.04605
1483365600	1.04600	1.04627	1.04581	1.04592
1483369200	1.04589	1.04597	1.04565	1.04582
1483372800	1.04582	1.04582	1.04496	1.04525
1483376400	1.04534	1.04702	1.04532	1.04605
1483380000	1.04616	1.04678	1.04557	1.04573
1483383600	1.04572	1.04703	1.04572	1.04662
1483387200	1.04660	1.04805	1.04659	1.04763
1483390800	1.04758	1.04772	1.04711	1.04713
1483394400	1.04717	1.04838	1.04715	1.04838
1483398000	1.04845	1.04896	1.04824	1.04860
1483401600	1.04882	1.04896	1.04814	1.04814
1483405200	1.04820	1.04882	1.04820	1.04863
1483408800	1.04880	1.04880	1.04539	1.04586
1483412400	1.04600	1.04641	1.04218	1.04370
1483416000	1.04351	1.04355	1.04025	1.04100
1483419600	1.04017	1.04158	1.03972	1.03980
1483423200	1.03964	1.03976	1.03822	1.03931
1483426800	1.03941	1.03941	1.03753	1.03852
…	…	…	…	…
1514458800	1.19366	1.19539	1.19366	1.19520
1514462400	1.19513	1.19547	1.19497	1.19530
1514466000	1.19530	1.19587	1.19530	1.19582
1514469600	1.19578	1.19584	1.19500	1.19519
1514473200	1.19527	1.19527	1.19413	1.19413
1514476800	1.19403	1.19446	1.19391	1.19423
1514480400	1.19415	1.19445	1.19364	1.19379
1514484000	1.19384	1.19409	1.19384	1.19391
1514487600	1.19394	1.19451	1.19393	1.19441
1514491200	1.19441	1.19486	1.19373	1.19419
1514494800	1.19421	1.19474	1.19376	1.19474
1514498400	1.19476	1.19476	1.19418	1.19426
1514502000	1.19426	1.19458	1.19415	1.19443
1514505600	1.19444	1.19473	1.19438	1.19465
1514509200	1.19489	1.19553	1.19487	1.19543
1514512800	1.19551	1.19581	1.19464	1.19525
1514516400	1.19523	1.19685	1.19523	1.19674
1514520000	1.19674	1.19870	1.19674	1.19829
1514523600	1.19814	1.19848	1.19736	1.19806
1514527200	1.19803	1.19887	1.19800	1.19867
1514530800	1.19870	1.19946	1.19849	1.19946
1514534400	1.19951	1.19983	1.19839	1.19846
1514538000	1.19867	1.19946	1.19861	1.19926
1514541600	1.19927	1.20069	1.19880	1.20069
1514545200	1.20097	1.20215	1.20023	1.20215
1514548800	1.20220	1.20255	1.20193	1.20197
1514552400	1.20214	1.20231	1.20124	1.20133
1514556000	1.20134	1.20138	1.20071	1.20106
1514559600	1.20092	1.20104	1.19978	1.19983
1514563200	1.19978	1.20035	1.19927	1.19982

8679 rows × 4 columns

In [14]:

# Target
# We use as target one of the symbols rate, i.e. "EURUSD". That is we try to predict next value of EURUSD
lag = -1
y_raw = df2.loc[:,"EURUSDclose"].shift(periods=lag)

In [15]:

#
# Removal of Null values**
# Now since there still exists 'NaN' values in our target dataframe, and these are Null values,
# we have to do something about them. In here, I will just do the naive thing of replacing these NaNs
# with previous value because it is only the last value an error is negligible as such:

# Filling gaps forward
y_raw = y_raw.fillna(method='pad')
y_raw

# Drop timestamp variable (only when necessary)
#Ram print("Drop timestamp variable")
#Ram y_raw = data.drop(['timestamp'], 1)

Out[15]:

timestamp
1483322400    1.04929
1483326000    1.04868
1483329600    1.04803
1483333200    1.04782
1483336800    1.04659
1483340400    1.04668
1483344000    1.04747
1483347600    1.04699
1483351200    1.04686
1483354800    1.04655
1483358400    1.04605
1483362000    1.04592
1483365600    1.04582
1483369200    1.04525
1483372800    1.04605
1483376400    1.04573
1483380000    1.04662
1483383600    1.04763
1483387200    1.04713
1483390800    1.04838
1483394400    1.04860
1483398000    1.04814
1483401600    1.04863
1483405200    1.04586
1483408800    1.04370
1483412400    1.04100
1483416000    1.03980
1483419600    1.03931
1483423200    1.03852
1483426800    1.03910
               ...   
1514458800    1.19530
1514462400    1.19582
1514466000    1.19519
1514469600    1.19413
1514473200    1.19423
1514476800    1.19379
1514480400    1.19391
1514484000    1.19441
1514487600    1.19419
1514491200    1.19474
1514494800    1.19426
1514498400    1.19443
1514502000    1.19465
1514505600    1.19543
1514509200    1.19525
1514512800    1.19674
1514516400    1.19829
1514520000    1.19806
1514523600    1.19867
1514527200    1.19946
1514530800    1.19846
1514534400    1.19926
1514538000    1.20069
1514541600    1.20215
1514545200    1.20197
1514548800    1.20133
1514552400    1.20106
1514556000    1.19983
1514559600    1.19982
1514563200    1.19982
Name: EURUSDclose, Length: 8679, dtype: float64

In [16]:

#A quick look at the dataframe time series using pyplot.plot(X_raw['EURUSD']):
plt.plot(X_raw['EURUSDclose'])

Out[16]:

[<matplotlib.lines.Line2D at 0x7f109dd4b240>]

Split data

In [17]:

# split into train and test sets

# Total samples
nsamples = n

# Splitting into train (70%) and test (30%) sets
split = 90 # training split% ; test (100-split)%
jindex = nsamples*split//100 # Index for slicing the samples

# Samples in train
nsamples_train = jindex

# Samples in test
nsamples_test = nsamples - nsamples_train
print("Total number of samples: ",nsamples,"\nSamples in train set: ", nsamples_train,
      "\nSamples in test set: ",nsamples_test)

# Here are train and test samples
X_train = X_raw.values[:jindex, :]
y_train = y_raw.values[:jindex]

X_test = X_raw.values[jindex:, :]
y_test = y_raw.values[jindex:]

print("X_train.shape = ", X_train.shape, "y_train.shape =", y_train.shape, "\nX_test.shape =  ",
      X_test.shape, "y_test.shape = ", y_test.shape)

Total number of samples:  8679 
Samples in train set:  7811 
Samples in test set:  868
X_train.shape =  (7811, 4) y_train.shape = (7811,) 
X_test.shape =   (868, 4) y_test.shape =  (868,)

In [18]:

#X_train as dataframe (optional, only for printing. See note in the beginning)
X_Train = pd.DataFrame(data=X_train)
X_Train.columns = X_raw.columns
print("X_train")
X_Train

X_train

Out[18]:

	EURUSDopen	EURUSDhigh	EURUSDlow	EURUSDclose
0	1.05155	1.05213	1.05130	1.05150
1	1.05152	1.05175	1.04929	1.04929
2	1.04889	1.04904	1.04765	1.04868
3	1.04866	1.04885	1.04791	1.04803
4	1.04805	1.04812	1.04768	1.04782
5	1.04782	1.04782	1.04653	1.04659
6	1.04655	1.04680	1.04615	1.04668
7	1.04655	1.04747	1.04649	1.04747
8	1.04718	1.04729	1.04637	1.04699
9	1.04696	1.04771	1.04676	1.04686
10	1.04690	1.04690	1.04621	1.04655
11	1.04654	1.04665	1.04605	1.04605
12	1.04600	1.04627	1.04581	1.04592
13	1.04589	1.04597	1.04565	1.04582
14	1.04582	1.04582	1.04496	1.04525
15	1.04534	1.04702	1.04532	1.04605
16	1.04616	1.04678	1.04557	1.04573
17	1.04572	1.04703	1.04572	1.04662
18	1.04660	1.04805	1.04659	1.04763
19	1.04758	1.04772	1.04711	1.04713
20	1.04717	1.04838	1.04715	1.04838
21	1.04845	1.04896	1.04824	1.04860
22	1.04882	1.04896	1.04814	1.04814
23	1.04820	1.04882	1.04820	1.04863
24	1.04880	1.04880	1.04539	1.04586
25	1.04600	1.04641	1.04218	1.04370
26	1.04351	1.04355	1.04025	1.04100
27	1.04017	1.04158	1.03972	1.03980
28	1.03964	1.03976	1.03822	1.03931
29	1.03941	1.03941	1.03753	1.03852
…	…	…	…	…
7781	1.17528	1.17528	1.17381	1.17411
7782	1.17422	1.17540	1.17418	1.17498
7783	1.17506	1.17774	1.17506	1.17761
7784	1.17758	1.17961	1.17720	1.17852
7785	1.17860	1.17974	1.17834	1.17861
7786	1.17886	1.17976	1.17865	1.17945
7787	1.17952	1.17986	1.17929	1.17985
7788	1.17986	1.18251	1.17959	1.18191
7789	1.18250	1.18250	1.18142	1.18186
7790	1.18199	1.18244	1.18187	1.18222
7791	1.18205	1.18212	1.18166	1.18178
7792	1.18179	1.18205	1.18139	1.18139
7793	1.18146	1.18202	1.18139	1.18139
7794	1.18142	1.18242	1.18142	1.18229
7795	1.18228	1.18246	1.18178	1.18214
7796	1.18207	1.18376	1.18193	1.18324
7797	1.18322	1.18372	1.18322	1.18343
7798	1.18346	1.18372	1.18257	1.18288
7799	1.18294	1.18346	1.18270	1.18280
7800	1.18281	1.18381	1.18249	1.18299
7801	1.18263	1.18447	1.18263	1.18425
7802	1.18444	1.18465	1.18365	1.18431
7803	1.18424	1.18444	1.18383	1.18404
7804	1.18465	1.18510	1.18423	1.18463
7805	1.18454	1.18529	1.18410	1.18429
7806	1.18432	1.18489	1.18430	1.18489
7807	1.18492	1.18519	1.18460	1.18466
7808	1.18457	1.18534	1.18451	1.18460
7809	1.18430	1.18497	1.18402	1.18496
7810	1.18506	1.18539	1.18474	1.18490

7811 rows × 4 columns In [19]:

#X_test as dataframe (optional, only for printing. See note in the beginning)
X_Test = pd.DataFrame(data=X_test)
X_Test.columns = X_raw.columns
print("X_test")
X_Test

X_test

Out[19]:

	EURUSDopen	EURUSDhigh	EURUSDlow	EURUSDclose
0	1.18488	1.18531	1.18483	1.18488
1	1.18504	1.18514	1.18462	1.18514
2	1.18512	1.18527	1.18493	1.18502
3	1.18514	1.18514	1.18463	1.18490
4	1.18478	1.18507	1.18452	1.18496
5	1.18498	1.18527	1.18477	1.18503
6	1.18501	1.18544	1.18398	1.18427
7	1.18429	1.18495	1.18371	1.18476
8	1.18481	1.18555	1.18481	1.18534
9	1.18552	1.18552	1.18437	1.18479
10	1.18483	1.18587	1.18464	1.18526
11	1.18523	1.18547	1.18502	1.18531
12	1.18515	1.18534	1.18437	1.18474
13	1.18472	1.18501	1.18377	1.18431
14	1.18474	1.18640	1.18446	1.18619
15	1.18584	1.18744	1.18584	1.18689
16	1.18681	1.18696	1.18627	1.18659
17	1.18663	1.18683	1.18560	1.18582
18	1.18565	1.18732	1.18565	1.18724
19	1.18739	1.18975	1.18729	1.18968
20	1.18963	1.19197	1.18962	1.19197
21	1.19192	1.19352	1.19192	1.19352
22	1.19342	1.19435	1.19317	1.19357
23	1.19364	1.19366	1.19239	1.19274
24	1.19274	1.19326	1.19230	1.19258
25	1.19253	1.19287	1.19239	1.19264
26	1.19265	1.19314	1.19256	1.19303
27	1.19299	1.19381	1.19285	1.19285
28	1.19299	1.19381	1.19285	1.19285
29	1.19299	1.19381	1.19285	1.19285
…	…	…	…	…
838	1.19366	1.19539	1.19366	1.19520
839	1.19513	1.19547	1.19497	1.19530
840	1.19530	1.19587	1.19530	1.19582
841	1.19578	1.19584	1.19500	1.19519
842	1.19527	1.19527	1.19413	1.19413
843	1.19403	1.19446	1.19391	1.19423
844	1.19415	1.19445	1.19364	1.19379
845	1.19384	1.19409	1.19384	1.19391
846	1.19394	1.19451	1.19393	1.19441
847	1.19441	1.19486	1.19373	1.19419
848	1.19421	1.19474	1.19376	1.19474
849	1.19476	1.19476	1.19418	1.19426
850	1.19426	1.19458	1.19415	1.19443
851	1.19444	1.19473	1.19438	1.19465
852	1.19489	1.19553	1.19487	1.19543
853	1.19551	1.19581	1.19464	1.19525
854	1.19523	1.19685	1.19523	1.19674
855	1.19674	1.19870	1.19674	1.19829
856	1.19814	1.19848	1.19736	1.19806
857	1.19803	1.19887	1.19800	1.19867
858	1.19870	1.19946	1.19849	1.19946
859	1.19951	1.19983	1.19839	1.19846
860	1.19867	1.19946	1.19861	1.19926
861	1.19927	1.20069	1.19880	1.20069
862	1.20097	1.20215	1.20023	1.20215
863	1.20220	1.20255	1.20193	1.20197
864	1.20214	1.20231	1.20124	1.20133
865	1.20134	1.20138	1.20071	1.20106
866	1.20092	1.20104	1.19978	1.19983
867	1.19978	1.20035	1.19927	1.19982

868 rows × 4 columns

Transform features

Note

Pay special attention to how the data changes before and after the transformation. This way you will get a better feeling on the information you are handling.

In [20]:

# Scale data
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)

y_train_std = sc.fit_transform(y_train.reshape(-1, 1))
y_test_std = sc.transform(y_test.reshape(-1, 1))

In [21]:

print("Mean:",sc.mean_)
print("Variance",sc.var_)

Mean: [1.12383816]
Variance [0.00252308]

In [22]:

#X_train_std as dataframe (optional, only for printing. See note in the beginning)
X_Train_std = pd.DataFrame(data=X_train_std)
X_Train_std.columns = X_Train.columns
print("X_train_std")
X_Train_std

X_train_std

Out[22]:

	EURUSDopen	EURUSDhigh	EURUSDlow	EURUSDclose
0	-1.438023	-1.438023	-1.431822	-1.439736
1	-1.438620	-1.445583	-1.471819	-1.483732
2	-1.490953	-1.499494	-1.504453	-1.495876
3	-1.495530	-1.503274	-1.499280	-1.508816
4	-1.507668	-1.517796	-1.503856	-1.512996
5	-1.512245	-1.523764	-1.526740	-1.537483
6	-1.537517	-1.544055	-1.534302	-1.535691
7	-1.537517	-1.530727	-1.527536	-1.519964
8	-1.524980	-1.534307	-1.529924	-1.529519
9	-1.529358	-1.525952	-1.522163	-1.532107
10	-1.530552	-1.542066	-1.533108	-1.538279
11	-1.537716	-1.547039	-1.536292	-1.548233
12	-1.548461	-1.554599	-1.541067	-1.550821
13	-1.550650	-1.560567	-1.544251	-1.552811
14	-1.552043	-1.563551	-1.557981	-1.564159
15	-1.561594	-1.539679	-1.550818	-1.548233
16	-1.545277	-1.544453	-1.545843	-1.554603
17	-1.554033	-1.539480	-1.542858	-1.536885
18	-1.536522	-1.519188	-1.525546	-1.516779
19	-1.517021	-1.525753	-1.515199	-1.526732
20	-1.525179	-1.512624	-1.514403	-1.501848
21	-1.499709	-1.501085	-1.492713	-1.497468
22	-1.492346	-1.501085	-1.494703	-1.506626
23	-1.504684	-1.503871	-1.493509	-1.496871
24	-1.492744	-1.504268	-1.549425	-1.552015
25	-1.548461	-1.551814	-1.613301	-1.595015
26	-1.598009	-1.608709	-1.651706	-1.648766
27	-1.664471	-1.647899	-1.662252	-1.672655
28	-1.675017	-1.684105	-1.692101	-1.682410
29	-1.679594	-1.691067	-1.705831	-1.698137
…	…	…	…	…
7781	1.024055	1.011850	1.006000	1.001130
7782	1.002962	1.014237	1.013363	1.018449
7783	1.019677	1.060788	1.030874	1.070806
7784	1.069822	1.097988	1.073458	1.088922
7785	1.090119	1.100575	1.096143	1.090714
7786	1.095292	1.100972	1.102311	1.107436
7787	1.108426	1.102962	1.115047	1.115399
7788	1.115191	1.155679	1.121016	1.156409
7789	1.167724	1.155480	1.157431	1.155413
7790	1.157576	1.154287	1.166386	1.162580
7791	1.158770	1.147921	1.162207	1.153821
7792	1.153596	1.146528	1.156834	1.146057
7793	1.147029	1.145932	1.156834	1.146057
7794	1.146233	1.153889	1.157431	1.163973
7795	1.163346	1.154685	1.164595	1.160987
7796	1.159168	1.180546	1.167580	1.182886
7797	1.182051	1.179750	1.193250	1.186668
7798	1.186827	1.179750	1.180315	1.175719
7799	1.176480	1.174578	1.182902	1.174126
7800	1.173893	1.181541	1.178723	1.177909
7801	1.170311	1.194670	1.181509	1.202992
7802	1.206328	1.198251	1.201806	1.204187
7803	1.202348	1.194074	1.205388	1.198812
7804	1.210506	1.207203	1.213347	1.210557
7805	1.208318	1.210983	1.210761	1.203789
7806	1.203940	1.203026	1.214740	1.215733
7807	1.215879	1.208994	1.220710	1.211154
7808	1.208915	1.211978	1.218919	1.209960
7809	1.203542	1.204617	1.209169	1.217127
7810	1.218665	1.212972	1.223496	1.215932

7811 rows × 4 columns In [23]:

#X_train_std as dataframe (optional, only for printing. See note in the beginning)
X_Test_std = pd.DataFrame(data=X_test_std)
X_Test_std.columns = X_Test.columns

print("X_test_std")
X_Test_std

X_test_std

Out[23]:

	EURUSDopen	EURUSDhigh	EURUSDlow	EURUSDclose
0	1.215083	1.211381	1.225287	1.215534
1	1.218267	1.207999	1.221108	1.220710
2	1.219859	1.210585	1.227277	1.218321
3	1.220257	1.207999	1.221307	1.215932
4	1.213093	1.206606	1.219118	1.217127
5	1.217073	1.210585	1.224093	1.218520
6	1.217670	1.213967	1.208373	1.203390
7	1.203343	1.204219	1.203000	1.213145
8	1.213690	1.216155	1.224889	1.224692
9	1.227818	1.215558	1.216133	1.213742
10	1.214088	1.222521	1.221506	1.223099
11	1.222048	1.214564	1.229068	1.224094
12	1.220456	1.211978	1.216133	1.212747
13	1.211899	1.205413	1.204194	1.204187
14	1.212297	1.233065	1.217924	1.241613
15	1.234186	1.253754	1.245385	1.255548
16	1.253488	1.244205	1.253941	1.249576
17	1.249906	1.241619	1.240609	1.234247
18	1.230405	1.251367	1.241604	1.262516
19	1.265029	1.299708	1.274238	1.311090
20	1.309602	1.343871	1.320603	1.356679
21	1.355171	1.374706	1.366371	1.387535
22	1.385019	1.391217	1.391244	1.388531
23	1.389397	1.377491	1.375723	1.372008
24	1.371488	1.369533	1.373932	1.368822
25	1.367309	1.361775	1.375723	1.370017
26	1.369697	1.367146	1.379106	1.377781
27	1.376462	1.380475	1.384877	1.374197
28	1.376462	1.380475	1.384877	1.374197
29	1.376462	1.380475	1.384877	1.374197
…	…	…	…	…
838	1.389795	1.411906	1.400995	1.420980
839	1.419046	1.413498	1.427062	1.422971
840	1.422429	1.421455	1.433629	1.433323
841	1.431980	1.420858	1.427659	1.420781
842	1.421832	1.409519	1.410347	1.399679
843	1.397157	1.393406	1.405969	1.401670
844	1.399545	1.393207	1.400597	1.392910
845	1.393376	1.386045	1.404577	1.395299
846	1.395366	1.394400	1.406367	1.405253
847	1.404719	1.401363	1.402388	1.400873
848	1.400739	1.398976	1.402985	1.411823
849	1.411683	1.399374	1.411342	1.402267
850	1.401734	1.395793	1.410745	1.405651
851	1.405316	1.398777	1.415322	1.410031
852	1.414270	1.414691	1.425072	1.425559
853	1.426607	1.420262	1.420496	1.421975
854	1.421036	1.440951	1.432236	1.451638
855	1.451083	1.477754	1.462284	1.482494
856	1.478941	1.473377	1.474621	1.477916
857	1.476752	1.481135	1.487356	1.490059
858	1.490084	1.492873	1.497107	1.505786
859	1.506203	1.500233	1.495117	1.485879
860	1.489488	1.492873	1.499495	1.501805
861	1.501427	1.517341	1.503275	1.530273
862	1.535255	1.546386	1.531731	1.559338
863	1.559730	1.554343	1.565559	1.555754
864	1.558536	1.549569	1.551829	1.543013
865	1.542617	1.531068	1.541282	1.537638
866	1.534260	1.524304	1.522776	1.513152
867	1.511575	1.510578	1.512628	1.512953

868 rows × 4 columns In [24]:

#y_train as panda dataframe (optional, only for printing. See note in the beginning)
y_Train = pd.DataFrame(data=y_train)
y_Train.columns=["EURUSDclose"]
y_Train

Out[24]:

	EURUSDclose
0	1.04929
1	1.04868
2	1.04803
3	1.04782
4	1.04659
5	1.04668
6	1.04747
7	1.04699
8	1.04686
9	1.04655
10	1.04605
11	1.04592
12	1.04582
13	1.04525
14	1.04605
15	1.04573
16	1.04662
17	1.04763
18	1.04713
19	1.04838
20	1.04860
21	1.04814
22	1.04863
23	1.04586
24	1.04370
25	1.04100
26	1.03980
27	1.03931
28	1.03852
29	1.03910
…	…
7781	1.17498
7782	1.17761
7783	1.17852
7784	1.17861
7785	1.17945
7786	1.17985
7787	1.18191
7788	1.18186
7789	1.18222
7790	1.18178
7791	1.18139
7792	1.18139
7793	1.18229
7794	1.18214
7795	1.18324
7796	1.18343
7797	1.18288
7798	1.18280
7799	1.18299
7800	1.18425
7801	1.18431
7802	1.18404
7803	1.18463
7804	1.18429
7805	1.18489
7806	1.18466
7807	1.18460
7808	1.18496
7809	1.18490
7810	1.18488

7811 rows × 1 columns In [25]:

#y_train as panda dataframe (optional, only for printing. See note in the beginning)
y_Test = pd.DataFrame(data=y_test)
y_Test.columns=["EURUSDclose"]
y_Test

Out[25]:

	EURUSDclose
0	1.18514
1	1.18502
2	1.18490
3	1.18496
4	1.18503
5	1.18427
6	1.18476
7	1.18534
8	1.18479
9	1.18526
10	1.18531
11	1.18474
12	1.18431
13	1.18619
14	1.18689
15	1.18659
16	1.18582
17	1.18724
18	1.18968
19	1.19197
20	1.19352
21	1.19357
22	1.19274
23	1.19258
24	1.19264
25	1.19303
26	1.19285
27	1.19285
28	1.19285
29	1.19285
…	…
838	1.19530
839	1.19582
840	1.19519
841	1.19413
842	1.19423
843	1.19379
844	1.19391
845	1.19441
846	1.19419
847	1.19474
848	1.19426
849	1.19443
850	1.19465
851	1.19543
852	1.19525
853	1.19674
854	1.19829
855	1.19806
856	1.19867
857	1.19946
858	1.19846
859	1.19926
860	1.20069
861	1.20215
862	1.20197
863	1.20133
864	1.20106
865	1.19983
866	1.19982
867	1.19982

868 rows × 1 columns In [26]:

#y_train_std as panda dataframe (optional, only for printing. See note in the beginning)
y_Train_std = pd.DataFrame(data=y_train_std)
y_Train_std.columns=["EURUSDclose"]
y_Train_std

Out[26]:

	EURUSDclose
0	-1.484129
1	-1.496273
2	-1.509213
3	-1.513394
4	-1.537881
5	-1.536089
6	-1.520362
7	-1.529918
8	-1.532506
9	-1.538677
10	-1.548632
11	-1.551220
12	-1.553211
13	-1.564558
14	-1.548632
15	-1.555002
16	-1.537284
17	-1.517176
18	-1.527131
19	-1.502245
20	-1.497865
21	-1.507023
22	-1.497268
23	-1.552414
24	-1.595416
25	-1.649169
26	-1.673059
27	-1.682814
28	-1.698541
29	-1.686994
…	…
7781	1.018148
7782	1.070507
7783	1.088624
7784	1.090415
7785	1.107138
7786	1.115102
7787	1.156113
7788	1.155117
7789	1.162284
7790	1.153525
7791	1.145760
7792	1.145760
7793	1.163678
7794	1.160692
7795	1.182591
7796	1.186373
7797	1.175424
7798	1.173831
7799	1.177614
7800	1.202698
7801	1.203893
7802	1.198518
7803	1.210263
7804	1.203495
7805	1.215440
7806	1.210861
7807	1.209666
7808	1.216833
7809	1.215639
7810	1.215241

7811 rows × 1 columns In [27]:

#y_train_std as panda dataframe (optional, only for printing. See note in the beginning)
y_Test_std = pd.DataFrame(data=y_train_std)
y_Test_std.columns=["EURUSDclose"]
y_Test_std

Out[27]:

	EURUSDclose
0	-1.484129
1	-1.496273
2	-1.509213
3	-1.513394
4	-1.537881
5	-1.536089
6	-1.520362
7	-1.529918
8	-1.532506
9	-1.538677
10	-1.548632
11	-1.551220
12	-1.553211
13	-1.564558
14	-1.548632
15	-1.555002
16	-1.537284
17	-1.517176
18	-1.527131
19	-1.502245
20	-1.497865
21	-1.507023
22	-1.497268
23	-1.552414
24	-1.595416
25	-1.649169
26	-1.673059
27	-1.682814
28	-1.698541
29	-1.686994
…	…
7781	1.018148
7782	1.070507
7783	1.088624
7784	1.090415
7785	1.107138
7786	1.115102
7787	1.156113
7788	1.155117
7789	1.162284
7790	1.153525
7791	1.145760
7792	1.145760
7793	1.163678
7794	1.160692
7795	1.182591
7796	1.186373
7797	1.175424
7798	1.173831
7799	1.177614
7800	1.202698
7801	1.203893
7802	1.198518
7803	1.210263
7804	1.203495
7805	1.215440
7806	1.210861
7807	1.209666
7808	1.216833
7809	1.215639
7810	1.215241

7811 rows × 1 columns

Implement the model

In [28]:

# Clears the default graph stack and resets the global default graph
ops.reset_default_graph()

In [29]:

# make results reproducible
seed = 2
tf.set_random_seed(seed)
np.random.seed(seed)  


# Parameters
learning_rate = 0.005
batch_size = 256
n_features = X_train.shape[1]#  Number of features in training data
epochs = 1000
display_step = 100
model_path = model_dir+"07_First_Forex_Prediction"
production_model_path = production_dir+"models/"+"07_First_Forex_Prediction"
n_classes = 1

# Network Parameters
# See figure of the model
d0 = D = n_features # Layer 0 (Input layer number of features)
d1 = 1024 # Layer 1 (1024 hidden nodes)
d2 = 512 # Layer 2 (512 hidden nodes) 
d3 = 256 # Layer 3 (256 hidden nodes)
d4 = 128 # Layer 4 (128 hidden nodes)
d5 = C = 1 # Layer 5 (Output layer)

# tf Graph input
print("Placeholders")
X = tf.placeholder(dtype=tf.float32, shape=[None, n_features], name="X")
y = tf.placeholder(dtype=tf.float32, shape=[None,n_classes], name="y")


# Initializers
print("Initializers")
sigma = 1
weight_initializer = tf.variance_scaling_initializer(mode="fan_avg", distribution="uniform", scale=sigma)
bias_initializer = tf.zeros_initializer()

# Create model
def multilayer_perceptron(X, variables):
    # Hidden layer with ReLU activation
    layer_1 = tf.nn.relu(tf.add(tf.matmul(X, variables['W1']), variables['bias1']), name="layer_1")
    # Hidden layer with ReLU activation
    layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, variables['W2']), variables['bias2']), name="layer_2")
    # Hidden layer with ReLU activation
    layer_3 = tf.nn.relu(tf.add(tf.matmul(layer_2, variables['W3']), variables['bias3']), name="layer_3")
    # Hidden layer with ReLU activation
    layer_4 = tf.nn.relu(tf.add(tf.matmul(layer_3, variables['W4']), variables['bias4']), name="layer_4")
    # Output layer with ReLU activation
    out_layer = tf.nn.relu(tf.add(tf.matmul(layer_4, variables['W5']), variables['bias5']), name="out_layer")
    return out_layer

# Store layers weight & bias
variables = {
    'W1': tf.Variable(weight_initializer([n_features, d1]), name="W1"), # inputs -> d1 hidden neurons
    'bias1': tf.Variable(bias_initializer([d1]), name="bias1"), # one biases for each d1 hidden neurons
    'W2': tf.Variable(weight_initializer([d1, d2]), name="W2"), # d1 hidden inputs -> d2 hidden neurons
    'bias2': tf.Variable(bias_initializer([d2]), name="bias2"), # one biases for each d2 hidden neurons
    'W3': tf.Variable(weight_initializer([d2, d3]), name="W3"), ## d2 hidden inputs -> d3 hidden neurons
    'bias3': tf.Variable(bias_initializer([d3]), name="bias3"), # one biases for each d3 hidden neurons
    'W4': tf.Variable(weight_initializer([d3, d4]), name="W4"), ## d3 hidden inputs -> d4 hidden neurons
    'bias4': tf.Variable(bias_initializer([d4]), name="bias4"), # one biases for each d4 hidden neurons
    'W5': tf.Variable(weight_initializer([d4, d5]), name="W5"), # d4 hidden inputs -> 1 output
    'bias5': tf.Variable(bias_initializer([d5]), name="bias5") # 1 bias for the output
}

# Construct model
y_hat = multilayer_perceptron(X, variables)

# Cost function
print("Cost function")
mse = tf.reduce_mean(tf.squared_difference(y_hat, y))

# Optimizer
print("Optimizer")
optimizer = tf.train.AdamOptimizer().minimize(mse)

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

# 'Saver' op to save and restore all the variables
saver = tf.train.Saver()

def generate_one_epoch(X_train, y_train, batch_size):
    num_batches = int(len(X_train)) // batch_size
    if batch_size * num_batches < len(X_train):
        num_batches += 1

    batch_indices = range(num_batches)
    batch_n = np.random.permutation(batch_indices)
    for j in batch_n:
        batch_X = X_train[j * batch_size: (j + 1) * batch_size]
        batch_y = y_train[j * batch_size: (j + 1) * batch_size]

    yield batch_X, batch_y

Placeholders
Initializers
Cost function
Optimizer

Train the model and Evaluate the model

In [30]:

# Shape of tensors
print("X_train_std.shape = ", X_train_std.shape, "y_train.shape =", y_train.shape,
      "\nX_train_std.shape =  ", X_train_std.shape, "y_test.shape = ", y_test.shape)

X_train_std.shape =  (7811, 4) y_train.shape = (7811,) 
X_train_std.shape =   (7811, 4) y_test.shape =  (868,)

In [31]:

# Fit neural net
print("Fit neural net")

with tf.Session() as sess:

    # Writer to record image, scalar, histogram and graph for display in tensorboard
    writer = tf.summary.FileWriter("/tmp/tensorflow_logs", sess.graph)  # create writer
    writer.add_graph(sess.graph)

    # Run the initializer
    sess.run(init)
    
    # Restore model weights from previously saved model
    #Ram saver.restore(sess, model_path)
    #Ram print("Model restored from file: %s" % model_path)
    '''
    # Try to restore a model if any.
    try:
        saver.restore(sess, model_path)
        print("Model restored from file: %s" % model_path)
    except Exception:
        print("No model file to restore")
        pass
    '''
    # Training cycle
    mse_train = []
    mse_test = []

    
    # Run
    print("Run")
    printcounter = 0
    for e in range(epochs):
        # Minibatch training
        
        for batch_X, batch_y in generate_one_epoch(X_train_std, y_train, batch_size):
            # Run optimizer with batch
            sess.run(optimizer, feed_dict={X: batch_X, y: np.transpose([batch_y])})
            #Ram print("batch_X",batch_X)
            #Ram print("batch_y",batch_y)
         
        # Show progress
        if (printcounter == display_step):
            
            printcounter = 0
            print("Epoch: ", e)
            # MSE train and test
            mse_train.append(sess.run(mse, feed_dict={X: X_train_std, y: np.transpose([y_train])}))
            mse_test.append(sess.run(mse, feed_dict={X: X_test_std, y: np.transpose([y_test])}))
            print('MSE Train: ', mse_train[-1])
            print('MSE Test: ', mse_test[-1])
        
        printcounter += 1
        
    # Print final MSE after Training
    mse_final = sess.run(mse, feed_dict={X: X_test, y: np.transpose([y_test])})
    print(mse_final)
    
    # Close writer
    writer.flush()
    writer.close()
        
    # Save model weights to disk
    save_path = saver.save(sess, model_path)
    print("Model saved in file: %s" % save_path)
    print("First Optimization Finished!")

Fit neural net
Run
Epoch:  100
MSE Train:  0.0066263936
MSE Test:  0.014807874
Epoch:  200
MSE Train:  0.0012716963
MSE Test:  1.1004695e-05
Epoch:  300
MSE Train:  0.011530105
MSE Test:  0.02958147
Epoch:  400
MSE Train:  0.06092236
MSE Test:  0.10722729
Epoch:  500
MSE Train:  0.00049403677
MSE Test:  0.00016906655
Epoch:  600
MSE Train:  0.010172739
MSE Test:  4.9944865e-05
Epoch:  700
MSE Train:  0.00041126224
MSE Test:  0.000107200845
Epoch:  800
MSE Train:  0.0005514498
MSE Test:  1.6022494e-05
Epoch:  900
MSE Train:  0.00016947073
MSE Test:  0.00035529005
4.568873e-05
Model saved in file: ../models/07_First_Forex_Prediction
First Optimization Finished!

In [32]:

batch_y.shape

Out[32]:

(256,)

In [33]:

%matplotlib inline
# Plot loss (MSE) over time
plt.plot(mse_train, 'k-', label='mse_train')
plt.plot(mse_test, 'r--', label='mse_test')
plt.title('Loss (MSE) vs epoch')
plt.legend(loc='upper right')
plt.xlabel('Generation x'+str(display_step))
plt.ylabel('Loss')
plt.show()

Tensorboard Graph

What follows is the graph we have executed and all data about it. Note the “save” label and the several layers.

Saving a Tensorflow model

So, now we have our model saved.

Tensorflow model has four main files:

a) Meta graph: This is a protocol buffer which saves the complete Tensorflow graph; i.e. all variables, operations, collections etc. This file has .meta extension.
b) y c) Checkpoint files: It is a binary file which contains all the values of the weights, biases, gradients and all the other variables saved. Tensorflow has changed from version 0.11. Instead of a single .ckpt file, we have now two files: .index and .data file that contains our training variables.
d) Along with this, Tensorflow also has a file named checkpoint which simply keeps a record of latest checkpoint files saved.

Predict

Finally, we can use the model to make some predictions. In [34]:

# Running a new session for predictions and export model to production
print("Starting prediction session...")
with tf.Session() as sess:
    # Initialize variables
    sess.run(init)

    # Try to restore a model if any.
    try:
        saver.restore(sess, model_path)
        print("Model restored from file: %s" % model_path)
        # We try to predict the close price of test samples
        feed_dict = {X: X_test_std}
        
        prediction = sess.run(y_hat, feed_dict)
        print(prediction) 

        %matplotlib inline
        # Plot Prices over time
        plt.plot(y_test, 'k-', label='y_test')
        plt.plot(prediction, 'r--', label='prediction')
        plt.title('Price over time')
        plt.legend(loc='upper right')
        plt.xlabel('Time')
        plt.ylabel('Price')
        plt.show()
        
        if to_production:
            # Pick out the model input and output
            X_tensor = sess.graph.get_tensor_by_name("X"+ ':0')
            y_tensor = sess.graph.get_tensor_by_name("out_layer" + ':0')

            model_input = build_tensor_info(X_tensor)
            model_output = build_tensor_info(y_tensor)
            
            # Create a signature definition for tfserving
            signature_definition = signature_def_utils.build_signature_def(
                inputs={"X": model_input},
                outputs={"out_layer": model_output},
                method_name=signature_constants.PREDICT_METHOD_NAME)

            model_version = 1
            export_model_dir = production_model_path+"/"+str(model_version)
            while os.path.exists(export_model_dir):
                model_version += 1
                export_model_dir = production_model_path+"/"+str(model_version)       
    
            builder = saved_model_builder.SavedModelBuilder(export_model_dir)  
            builder.add_meta_graph_and_variables(sess,
                                                 [tag_constants.SERVING],
                                                 signature_def_map={
                                                     signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
                                                     signature_definition})

            # Save the model so we can serve it with a model server :)
            builder.save()

    except Exception:
        print("Unexpected error:", sys.exc_info()[0])
        pass

Starting prediction session...
INFO:tensorflow:Restoring parameters from ../models/07_First_Forex_Prediction
Model restored from file: ../models/07_First_Forex_Prediction
[[1.1836615]
 [1.1830435]
 [1.1836624]
 [1.1831424]
 [1.1830487]
.
.
.

INFO:tensorflow:No assets to save.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: b'/home/parrondo/PRODUCTION/models/07_First_Forex_Prediction/49/saved_model.pb'

In [35]:

dfvisual = pd.DataFrame()
dfvisual["y_test"] = y_test
dfvisual["prediction"]= prediction
dfvisual["Abs.error"]=dfvisual["y_test"]-dfvisual["prediction"]
dfvisual["Relat.error"]=abs(dfvisual["Abs.error"]/dfvisual["y_test"])*100
dfvisual

Out[35]:

	y_test	prediction	Abs.error	Relat.error
0	1.18514	1.183661	0.001479	0.124756
1	1.18502	1.183043	0.001977	0.166792
2	1.18490	1.183662	0.001238	0.104446
3	1.18496	1.183142	0.001818	0.153387
4	1.18503	1.183049	0.001981	0.167192
5	1.18427	1.183423	0.000847	0.071487
6	1.18476	1.182371	0.002389	0.201653
7	1.18534	1.181851	0.003489	0.294313
8	1.18479	1.183465	0.001325	0.111834
9	1.18526	1.182618	0.002642	0.222933
10	1.18531	1.183244	0.002066	0.174281
11	1.18474	1.183669	0.001071	0.090372
12	1.18431	1.182771	0.001539	0.129952
13	1.18619	1.182014	0.004176	0.352083
14	1.18689	1.182597	0.004293	0.361708
15	1.18659	1.184501	0.002089	0.176076
16	1.18582	1.184912	0.000908	0.076564
17	1.18724	1.184120	0.003120	0.262820
18	1.18968	1.184052	0.005628	0.473109
19	1.19197	1.185615	0.006355	0.533156
20	1.19352	1.188248	0.005272	0.441715
21	1.19357	1.191016	0.002554	0.213983
22	1.19274	1.192745	-0.000005	0.000447
23	1.19258	1.191547	0.001033	0.086626
24	1.19264	1.191733	0.000907	0.076070
25	1.19303	1.191921	0.001109	0.092957
26	1.19285	1.192051	0.000799	0.066988
27	1.19285	1.192567	0.000283	0.023695
28	1.19285	1.192567	0.000283	0.023695
29	1.19285	1.192567	0.000283	0.023695
…	…	…	…	…
838	1.19530	1.193010	0.002290	0.191556
839	1.19582	1.194838	0.000982	0.082096
840	1.19519	1.195210	-0.000020	0.001692
841	1.19413	1.194753	-0.000623	0.052146
842	1.19423	1.193709	0.000521	0.043585
843	1.19379	1.193620	0.000170	0.014243
844	1.19391	1.193260	0.000650	0.054407
845	1.19441	1.193658	0.000752	0.062970
846	1.19419	1.193620	0.000570	0.047694
847	1.19474	1.193214	0.001526	0.127692
848	1.19426	1.193123	0.001137	0.095167
849	1.19443	1.193878	0.000552	0.046190
850	1.19465	1.193910	0.000740	0.061913
851	1.19543	1.194193	0.001237	0.103477
852	1.19525	1.194690	0.000560	0.046853
853	1.19674	1.194170	0.002570	0.214730
854	1.19829	1.194859	0.003431	0.286322
855	1.19806	1.196727	0.001333	0.111260
856	1.19867	1.197464	0.001206	0.100593
857	1.19946	1.198498	0.000962	0.080162
858	1.19846	1.199031	-0.000571	0.047634
859	1.19926	1.199037	0.000223	0.018569
860	1.20069	1.199316	0.001374	0.114412
861	1.20215	1.199298	0.002852	0.237250
862	1.20197	1.201247	0.000723	0.060133
863	1.20133	1.204203	-0.002873	0.239192
864	1.20106	1.203112	-0.002052	0.170850
865	1.19983	1.202235	-0.002405	0.200483
866	1.19982	1.200985	-0.001165	0.097134
867	1.19982	1.200185	-0.000365	0.030446

868 rows × 4 columns In [36]:

# Seaborn visualization library
import seaborn as sns

# Create the default pairplot
sns.pairplot(dfvisual)

Out[36]:

<seaborn.axisgrid.PairGrid at 0x7f109de84518>

OK, better results, but still not very good results. We could try to improve them with a deeper network (more layers) or retouching the net parameters and number of neurons. That is another story.

In [37]:

# Running a new session for predictions and export model to production
print("Starting prediction session...")
with tf.Session() as sess:
    # Initialize variables
    sess.run(init)

    # Try to restore a model if any.
    try:
        saver.restore(sess, model_path)
        print("Model restored from file: %s" % model_path)
        # We try to predict the close price of test samples
        X_test_pif = [[1.87374825, 1.87106024, 1.87083053, 1.86800846]]
        feed_dict = {X: X_test_pif}
        
        prediction = sess.run(y_hat, feed_dict)
        print(prediction)

    except Exception:
        print("Unexpected error:", sys.exc_info()[0])
        pass

Starting prediction session...
INFO:tensorflow:Restoring parameters from ../models/07_First_Forex_Prediction
Model restored from file: ../models/07_First_Forex_Prediction
[[1.2461773]]

In [38]:

X_test_std

Out[38]:

array([[1.21508319, 1.21138084, 1.22528685, 1.21553408],
       [1.218267  , 1.20799896, 1.22110807, 1.22071004],
       [1.2198589 , 1.2105851 , 1.22727675, 1.21832114],
       ...,
       [1.5426173 , 1.53106792, 1.54128243, 1.53763841],
       [1.53425981, 1.52430416, 1.52277639, 1.51315211],
       [1.51157518, 1.51057771, 1.51262792, 1.51295304]])

In [39]:

# Running a new session for predictions and export model to production
print("Starting prediction session...")
with tf.Session() as sess:
    # Initialize variables
    sess.run(init)

    saver.restore(sess, model_path)
    print("Model restored from file: %s" % model_path)
    # We try to predict the close price of test samples
    feed_dict = {X: X_test_std}
    
    prediction = sess.run(y_hat, feed_dict)
    print(prediction) 

    %matplotlib inline
    # Plot Prices over time
    plt.plot(y_test, 'k-', label='y_test')
    plt.plot(prediction, 'r--', label='prediction')
    plt.title('Price over time')
    plt.legend(loc='upper right')
    plt.xlabel('Time')
    plt.ylabel('Price')
    plt.show()
        

    # Pick out the model input and output
    X_tensor = sess.graph.get_tensor_by_name("X"+ ':0')
    y_tensor = sess.graph.get_tensor_by_name("out_layer" + ':0')

    model_input = build_tensor_info(X_tensor)
    model_output = build_tensor_info(y_tensor)
         
    # Create a signature definition for tfserving
    signature_definition = signature_def_utils.build_signature_def(
         inputs={"X": model_input},
         outputs={"out_layer": model_output},
         method_name=signature_constants.PREDICT_METHOD_NAME)
    
    model_version = 1
    export_model_dir = production_model_path+"/"+str(model_version)
    while os.path.exists(export_model_dir):
        model_version += 1
        export_model_dir = production_model_path+"/"+str(model_version)       
    
    builder = saved_model_builder.SavedModelBuilder(export_model_dir)
    builder.add_meta_graph_and_variables(sess,
                                         [tag_constants.SERVING],
                                         signature_def_map={
                                             signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
                                             signature_definition})

    # Save the model so we can serve it with a model server :)
    builder.save()

Starting prediction session...
INFO:tensorflow:Restoring parameters from ../models/07_First_Forex_Prediction
Model restored from file: ../models/07_First_Forex_Prediction
[[1.1836615]
 [1.1830435]
 [1.1836624]
 [1.1831424]
 [1.1830487]
 [1.1834234]
 [1.1823709]
.
.
.

INFO:tensorflow:No assets to save.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: b'/home/parrondo/PRODUCTION/models/07_First_Forex_Prediction/50/saved_model.pb'

In [40]:

len(y_train)

Out[40]:

In [41]:

X_Train_std

Out[41]:

	EURUSDopen	EURUSDhigh	EURUSDlow	EURUSDclose
0	-1.438023	-1.438023	-1.431822	-1.439736
1	-1.438620	-1.445583	-1.471819	-1.483732
2	-1.490953	-1.499494	-1.504453	-1.495876
3	-1.495530	-1.503274	-1.499280	-1.508816
4	-1.507668	-1.517796	-1.503856	-1.512996
5	-1.512245	-1.523764	-1.526740	-1.537483
6	-1.537517	-1.544055	-1.534302	-1.535691
7	-1.537517	-1.530727	-1.527536	-1.519964
8	-1.524980	-1.534307	-1.529924	-1.529519
9	-1.529358	-1.525952	-1.522163	-1.532107
10	-1.530552	-1.542066	-1.533108	-1.538279
11	-1.537716	-1.547039	-1.536292	-1.548233
12	-1.548461	-1.554599	-1.541067	-1.550821
13	-1.550650	-1.560567	-1.544251	-1.552811
14	-1.552043	-1.563551	-1.557981	-1.564159
15	-1.561594	-1.539679	-1.550818	-1.548233
16	-1.545277	-1.544453	-1.545843	-1.554603
17	-1.554033	-1.539480	-1.542858	-1.536885
18	-1.536522	-1.519188	-1.525546	-1.516779
19	-1.517021	-1.525753	-1.515199	-1.526732
20	-1.525179	-1.512624	-1.514403	-1.501848
21	-1.499709	-1.501085	-1.492713	-1.497468
22	-1.492346	-1.501085	-1.494703	-1.506626
23	-1.504684	-1.503871	-1.493509	-1.496871
24	-1.492744	-1.504268	-1.549425	-1.552015
25	-1.548461	-1.551814	-1.613301	-1.595015
26	-1.598009	-1.608709	-1.651706	-1.648766
27	-1.664471	-1.647899	-1.662252	-1.672655
28	-1.675017	-1.684105	-1.692101	-1.682410
29	-1.679594	-1.691067	-1.705831	-1.698137
…	…	…	…	…
7781	1.024055	1.011850	1.006000	1.001130
7782	1.002962	1.014237	1.013363	1.018449
7783	1.019677	1.060788	1.030874	1.070806
7784	1.069822	1.097988	1.073458	1.088922
7785	1.090119	1.100575	1.096143	1.090714
7786	1.095292	1.100972	1.102311	1.107436
7787	1.108426	1.102962	1.115047	1.115399
7788	1.115191	1.155679	1.121016	1.156409
7789	1.167724	1.155480	1.157431	1.155413
7790	1.157576	1.154287	1.166386	1.162580
7791	1.158770	1.147921	1.162207	1.153821
7792	1.153596	1.146528	1.156834	1.146057
7793	1.147029	1.145932	1.156834	1.146057
7794	1.146233	1.153889	1.157431	1.163973
7795	1.163346	1.154685	1.164595	1.160987
7796	1.159168	1.180546	1.167580	1.182886
7797	1.182051	1.179750	1.193250	1.186668
7798	1.186827	1.179750	1.180315	1.175719
7799	1.176480	1.174578	1.182902	1.174126
7800	1.173893	1.181541	1.178723	1.177909
7801	1.170311	1.194670	1.181509	1.202992
7802	1.206328	1.198251	1.201806	1.204187
7803	1.202348	1.194074	1.205388	1.198812
7804	1.210506	1.207203	1.213347	1.210557
7805	1.208318	1.210983	1.210761	1.203789
7806	1.203940	1.203026	1.214740	1.215733
7807	1.215879	1.208994	1.220710	1.211154
7808	1.208915	1.211978	1.218919	1.209960
7809	1.203542	1.204617	1.209169	1.217127
7810	1.218665	1.212972	1.223496	1.215932

7811 rows × 4 columns In [42]:

y_train.shape

Out[42]:

(7811,)

In [43]:

# Mean relative error
dfvisual["Relat.error"].mean()

Out[43]:

0.15334987531611016

In [44]:

# Mean Absolute error
dfvisual["Abs.error"].mean()

Out[44]:

0.0017724057803175793

Incredible just 18 pips of absolute error!

In the next post, we will delve deeper into the art of prediction. You have all the notebooks at your disposal on my Github website:

https://github.com/parrondo/deeptrading

Black Belt

Deep Trading with TensorFlow V

2019-07-022019-07-03
by parrondo

Do you want to know how to build a multi-layered neural network? As deep as you want?

In the next post, we will use real market data. In this one, we will still use non-trading data, because we are looking for a well-established knowledge of the basic concepts of Tensorflow. But we will use data used in other very real and current problems.

OK, remember to keep in mind our other posts that make up a systematic and complete structure to deal with problems of supervised machine learning:

https://todotrader.com/deeptrading-with-tensorflow/

https://todotrader.com/deeptrading-with-tensorflow-I/

https://todotrader.com/deeptrading-with-tensorflow-II/

https://todotrader.com/deeptrading-with-tensorflow-III/

https://todotrader.com/deeptrading-with-tensorflow-IV/

Implementing a multiple hidden layer Neural Network

In the last post, we presented a simple regression problem that we solved with a neural network with a single hidden layer. I made a prediction of one of the variables involved.

This type of prediction is called “problems or regression analysis”, compared to other types of problems such as “classification” (see Artifical Intelligence Taxonomy in my post, https://todotrader.com/artificial-intelligence-trading-systems /)

Regression analysis can help us to model the relationship between a dependent variable (which we are trying to predict) and one or more independent variables (the input of the model). The regression analysis can show if there is a significant relationship between the independent variables and the dependent variable, and the importance of their interrelation: when the independent variables move, how much can we expect the dependent variable to move?

We will illustrate how to create a multiple fully connected hidden layer NN, save it and make predictions with a trained model after reload it.

We will use a more complex data than the iris data is for this exercise. That dataset is “Concrete Compressive Strength Data Set” from https://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength

Data Characteristics:

The actual concrete compressive strength (MPa) for a given mixture under a specific age (days) was determined from laboratory. Data is in raw form (not scaled).

Summary Statistics:

Number of instances (observations): 1030 Number of Attributes: 9 Attribute breakdown: 8 quantitative input variables, and 1 quantitative output variable Missing Attribute Values: None

We will build a three-hidden layer neural network to predict the nineth attribute, the concrete compressive strength, from the other eight.

Load configuration

In [1]:

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
#from sklearn.datasets import load_iris
from tensorflow.python.framework import ops
import pandas as pd

/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)

Ingest raw data

In [2]:

# Dataset "Concrete Compressive Strength Data Set" from: https://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength
df = pd.read_excel(r'../data/raw/Concrete_Data.xls') #for an earlier version of Excel, you may need to use the file extension of 'xls'

#Simplifying column names
df.columns = ["Cement", "Blast Furnace Slag", "Fly Ash", "Water", "Superplasticizer",
              "Coarse Aggregate", "Fine Aggregate", "Age", "Strength"]
# We get a pandas dataframe to better visualize the datasets
df

Out[2]:

	Cement	Blast Furnace Slag	Fly Ash	Water	Superplasticizer	Coarse Aggregate	Fine Aggregate	Age	Strength
0	540.0	0.0	0.0	162.0	2.5	1040.0	676.0	28	79.986111
1	540.0	0.0	0.0	162.0	2.5	1055.0	676.0	28	61.887366
2	332.5	142.5	0.0	228.0	0.0	932.0	594.0	270	40.269535
3	332.5	142.5	0.0	228.0	0.0	932.0	594.0	365	41.052780
4	198.6	132.4	0.0	192.0	0.0	978.4	825.5	360	44.296075
5	266.0	114.0	0.0	228.0	0.0	932.0	670.0	90	47.029847
6	380.0	95.0	0.0	228.0	0.0	932.0	594.0	365	43.698299
7	380.0	95.0	0.0	228.0	0.0	932.0	594.0	28	36.447770
8	266.0	114.0	0.0	228.0	0.0	932.0	670.0	28	45.854291
9	475.0	0.0	0.0	228.0	0.0	932.0	594.0	28	39.289790
10	198.6	132.4	0.0	192.0	0.0	978.4	825.5	90	38.074244
11	198.6	132.4	0.0	192.0	0.0	978.4	825.5	28	28.021684
12	427.5	47.5	0.0	228.0	0.0	932.0	594.0	270	43.012960
13	190.0	190.0	0.0	228.0	0.0	932.0	670.0	90	42.326932
14	304.0	76.0	0.0	228.0	0.0	932.0	670.0	28	47.813782
15	380.0	0.0	0.0	228.0	0.0	932.0	670.0	90	52.908320
16	139.6	209.4	0.0	192.0	0.0	1047.0	806.9	90	39.358048
17	342.0	38.0	0.0	228.0	0.0	932.0	670.0	365	56.141962
18	380.0	95.0	0.0	228.0	0.0	932.0	594.0	90	40.563252
19	475.0	0.0	0.0	228.0	0.0	932.0	594.0	180	42.620648
20	427.5	47.5	0.0	228.0	0.0	932.0	594.0	180	41.836714
21	139.6	209.4	0.0	192.0	0.0	1047.0	806.9	28	28.237490
22	139.6	209.4	0.0	192.0	0.0	1047.0	806.9	3	8.063422
23	139.6	209.4	0.0	192.0	0.0	1047.0	806.9	180	44.207822
24	380.0	0.0	0.0	228.0	0.0	932.0	670.0	365	52.516697
25	380.0	0.0	0.0	228.0	0.0	932.0	670.0	270	53.300632
26	380.0	95.0	0.0	228.0	0.0	932.0	594.0	270	41.151375
27	342.0	38.0	0.0	228.0	0.0	932.0	670.0	180	52.124386
28	427.5	47.5	0.0	228.0	0.0	932.0	594.0	28	37.427515
29	475.0	0.0	0.0	228.0	0.0	932.0	594.0	7	38.603761
…	…	…	…	…	…	…	…	…	…
1000	141.9	166.6	129.7	173.5	10.9	882.6	785.3	28	44.611855
1001	297.8	137.2	106.9	201.3	6.0	878.4	655.3	28	53.524711
1002	321.3	164.2	0.0	190.5	4.6	870.0	774.0	28	57.218234
1003	366.0	187.0	0.0	191.3	6.6	824.3	756.9	28	65.909079
1004	279.8	128.9	100.4	172.4	9.5	825.1	804.9	28	52.826962
1005	252.1	97.1	75.6	193.8	8.3	835.5	821.4	28	33.399596
1006	164.6	0.0	150.4	181.6	11.7	1023.3	728.9	28	18.033934
1007	155.6	243.5	0.0	180.3	10.7	1022.0	697.7	28	37.363394
1008	160.2	188.0	146.4	203.2	11.3	828.7	709.7	28	35.314271
1009	298.1	0.0	107.0	186.4	6.1	879.0	815.2	28	42.644091
1010	317.9	0.0	126.5	209.7	5.7	860.5	736.6	28	40.062003
1011	287.3	120.5	93.9	187.6	9.2	904.4	695.9	28	43.798273
1012	325.6	166.4	0.0	174.0	8.9	881.6	790.0	28	61.235811
1013	355.9	0.0	141.6	193.3	11.0	801.4	778.4	28	40.868690
1014	132.0	206.5	160.9	178.9	5.5	866.9	735.6	28	33.306517
1015	322.5	148.6	0.0	185.8	8.5	951.0	709.5	28	52.426376
1016	164.2	0.0	200.1	181.2	12.6	849.3	846.0	28	15.091251
1017	313.8	0.0	112.6	169.9	10.1	925.3	782.9	28	38.461040
1018	321.4	0.0	127.9	182.5	11.5	870.1	779.7	28	37.265488
1019	139.7	163.9	127.7	236.7	5.8	868.6	655.6	28	35.225329
1020	288.4	121.0	0.0	177.4	7.0	907.9	829.5	28	42.140084
1021	298.2	0.0	107.0	209.7	11.1	879.6	744.2	28	31.875165
1022	264.5	111.0	86.5	195.5	5.9	832.6	790.4	28	41.542308
1023	159.8	250.0	0.0	168.4	12.2	1049.3	688.2	28	39.455954
1024	166.0	259.7	0.0	183.2	12.7	858.8	826.8	28	37.917043
1025	276.4	116.0	90.3	179.6	8.9	870.1	768.3	28	44.284354
1026	322.2	0.0	115.6	196.0	10.4	817.9	813.4	28	31.178794
1027	148.5	139.4	108.6	192.7	6.1	892.4	780.0	28	23.696601
1028	159.1	186.7	0.0	175.6	11.3	989.6	788.9	28	32.768036
1029	260.9	100.5	78.3	200.6	8.6	864.5	761.5	28	32.401235

1030 rows × 9 columns

In [3]:

# Now our usual X, y variables
X_raw = df[df.columns[0:8]].values
y_raw = df[df.columns[8]].values

# Dimensions of dataset
print("Dimensions of dataset")
n = X_raw.shape[0]
p = X_raw.shape[1]
print("n=",n,"p=",p)

Dimensions of dataset
n= 1030 p= 8

In [4]:

X_raw.shape # Array 1030x8. Each element is an 8-dimensional data point: Cement, Blast Furnace Slag, Fly Ash,…

Out[4]:

(1030, 8)

In [5]:

y_raw.shape Vector 1030. Each element is a 1-dimensional (scalar) data point: Strength

Out[5]:

(1030,)

In [6]:

# We can confirm the data are right with a simple visualization.
X_raw

Out[6]:

array([[ 540. ,    0. ,    0. , ..., 1040. ,  676. ,   28. ],
       [ 540. ,    0. ,    0. , ..., 1055. ,  676. ,   28. ],
       [ 332.5,  142.5,    0. , ...,  932. ,  594. ,  270. ],
       ...,
       [ 148.5,  139.4,  108.6, ...,  892.4,  780. ,   28. ],
       [ 159.1,  186.7,    0. , ...,  989.6,  788.9,   28. ],
       [ 260.9,  100.5,   78.3, ...,  864.5,  761.5,   28. ]])

In [7]:

y_raw

Out[7]:

array([79.98611076, 61.88736576, 40.26953526, ..., 23.69660064,
       32.76803638, 32.40123514])

Basic pre-process data

Checking multicollinearity of pairs

Plotting the pairwise scatterplots

Pairwise scatter plots and correlation heatmap are usual visual tools for checking multicollinearity. We can use the pairplot function from the seaborn library to plot the pairwise scatterplots of all combinations.

In [8]:

# Visualization
sns.set_style("whitegrid");
sns.pairplot(df);
plt.show()

As you can see it is a pretty difficult problem. There are almost no correlations between Strenght and the other features. Visually we can see some correlation of Strenght with the kind of Cement (it is not surprising, of course).

Plotting a diagonal correlation matrix

Now we are going to plot the diagonal correlation matrix with Seaborn:

In [9]:

# Correlation
sns.set(style="white")

# Generate a large random dataset
rs = np.random.RandomState(33)

# Compute the correlation matrix
corr = df.corr()

# Generate a mask for the upper triangle
mask = np.zeros_like(corr, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True

# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 9))

# Generate a custom diverging colormap
cmap = sns.diverging_palette(220, 10, as_cmap=True)

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

Out[9]:

<matplotlib.axes._subplots.AxesSubplot at 0x7ff0696ddb38>

In [10]:

#
# We could try to make some pre-processing of data in order to find or simplify features,
# but it is a problem for another notebook. Now we mantain the focus on the basis.
#

Split data

In [11]:

# split into train and test sets

# Total samples
nsamples = n

# Splitting into train (90%) and test (10%) sets
split = 90 # training split% ; test (100-split)%
jindex = nsamples*split//100 # Index for slicing the samples

# Samples in train
nsamples_train = jindex

# Samples in test
nsamples_test = nsamples - nsamples_train
print("Total number of samples: ",nsamples,"\nSamples in train set: ", nsamples_train,
      "\nSamples in test set: ",nsamples_test)

# Here are train and test samples
X_train = X_raw[:jindex, :]
y_train = y_raw[:jindex]

X_test = X_raw[jindex:, :]
y_test = y_raw[jindex:]

print("X_train.shape = ", X_train.shape, "y_train.shape =", y_train.shape, "\nX_test.shape =  ",
      X_test.shape, "y_test.shape = ", y_test.shape)

Total number of samples:  1030 
Samples in train set:  927 
Samples in test set:  103
X_train.shape =  (927, 8) y_train.shape = (927,) 
X_test.shape =   (103, 8) y_test.shape =  (103,)

Transform features

Note

# Scale data
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)

y_train_std = sc.fit_transform(y_train.reshape(-1, 1))
y_test_std = sc.transform(y_test.reshape(-1, 1))

/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)

In [13]:

y_test

Out[13]:

array([33.05347944, 24.5798194 , 21.91154728, 30.88163004, 15.340841  ,
       24.3385028 , 23.8903434 , 22.93197176, 29.41304616, 28.62980142,
       36.80491836, 18.28766142, 32.72046253, 31.4201108 , 28.9379972 ,
       40.92522693, 12.18097249, 25.5595648 , 36.44363293, 32.96384756,
       23.8358748 , 26.23318285, 17.95947085, 38.6306508 , 19.0095428 ,
       33.71882378,  8.53640236, 13.46132942, 32.24541357, 23.52423164,
       29.72606826, 49.77327244, 52.44637089, 40.9348796 , 44.86834018,
       13.20208645, 37.43165204, 29.87085822, 56.61907964, 12.4595208 ,
       23.786922  , 13.29378676, 39.42147978, 46.23419213, 44.52360218,
       23.74417449, 26.14768782, 15.52631004, 43.57833058, 35.86585204,
       41.05346947, 28.99108685, 46.24729218, 26.92265885, 10.53588276,
       25.10382116, 29.07313449,  9.73815902, 33.798803  , 37.17103011,
       33.76226077, 16.50398701, 19.98790924, 36.3498642 , 38.21558625,
       15.42357812, 33.4195912 , 39.0560575 , 27.68108245, 26.85991653,
       45.30477848, 30.12320644, 15.56974703, 44.6118551 , 53.52471136,
       57.21823429, 65.90907927, 52.82696164, 33.39959639, 18.03393426,
       37.36339392, 35.31427124, 42.6440906 , 40.06200298, 43.79827342,
       61.23581094, 40.8686899 , 33.30651713, 52.42637609, 15.09125069,
       38.46103971, 37.26548832, 35.22532884, 42.14008364, 31.87516496,
       41.54230795, 39.45595358, 37.91704314, 44.284354  , 31.1787942 ,
       23.69660064, 32.76803638, 32.40123514])

But we can revert the situation if we need.

In [14]:

sc.inverse_transform(y_test_std)

Out[14]:

array([[33.05347944],
       [24.5798194 ],
       [21.91154728],
       [30.88163004],
       [15.340841  ],
       [24.3385028 ],
       [23.8903434 ],
       [22.93197176],
       [29.41304616],
       [28.62980142],
       [36.80491836],
       [18.28766142],
       [32.72046253],
       [31.4201108 ],
       [28.9379972 ],
       [40.92522693],
       [12.18097249],
       [25.5595648 ],
       [36.44363293],
       [32.96384756],
       [23.8358748 ],
       [26.23318285],
       [17.95947085],
       [38.6306508 ],
       [19.0095428 ],
       [33.71882378],
       [ 8.53640236],
       [13.46132942],
       [32.24541357],
       [23.52423164],
       [29.72606826],
       [49.77327244],
       [52.44637089],
       [40.9348796 ],
       [44.86834018],
       [13.20208645],
       [37.43165204],
       [29.87085822],
       [56.61907964],
       [12.4595208 ],
       [23.786922  ],
       [13.29378676],
       [39.42147978],
       [46.23419213],
       [44.52360218],
       [23.74417449],
       [26.14768782],
       [15.52631004],
       [43.57833058],
       [35.86585204],
       [41.05346947],
       [28.99108685],
       [46.24729218],
       [26.92265885],
       [10.53588276],
       [25.10382116],
       [29.07313449],
       [ 9.73815902],
       [33.798803  ],
       [37.17103011],
       [33.76226077],
       [16.50398701],
       [19.98790924],
       [36.3498642 ],
       [38.21558625],
       [15.42357812],
       [33.4195912 ],
       [39.0560575 ],
       [27.68108245],
       [26.85991653],
       [45.30477848],
       [30.12320644],
       [15.56974703],
       [44.6118551 ],
       [53.52471136],
       [57.21823429],
       [65.90907927],
       [52.82696164],
       [33.39959639],
       [18.03393426],
       [37.36339392],
       [35.31427124],
       [42.6440906 ],
       [40.06200298],
       [43.79827342],
       [61.23581094],
       [40.8686899 ],
       [33.30651713],
       [52.42637609],
       [15.09125069],
       [38.46103971],
       [37.26548832],
       [35.22532884],
       [42.14008364],
       [31.87516496],
       [41.54230795],
       [39.45595358],
       [37.91704314],
       [44.284354  ],
       [31.1787942 ],
       [23.69660064],
       [32.76803638],
       [32.40123514]])

Implement the model

In [15]:

# Clears the default graph stack and resets the global default graph
ops.reset_default_graph()

In [16]:

# make results reproducible
seed = 2
tf.set_random_seed(seed)
np.random.seed(seed)  


# Parameters
learning_rate = 0.005
batch_size = 50
n_features = X_train.shape[1]#  Number of features in training data
epochs = 10000
display_step = 100
model_path = "../model/tmp/model.ckpt"
n_classes = 1

# Network Parameters
# See figure of the model
d0 = D = n_features # Layer 0 (Input layer number of features)
d1 = 5 # Layer 1 (5 hidden nodes)
d2 = 15 # Layer 2 (15 hidden nodes) 
d3 = 5 # Layer 3 (5 hidden nodes)
d4 = C = 1 # Layer 4 (Output layer)

# tf Graph input
print("Placeholders")
X = tf.placeholder(dtype=tf.float32, shape=[None, n_features], name="X")
y = tf.placeholder(dtype=tf.float32, shape=[None,n_classes], name="y")


# Initializers
print("Initializers")
sigma = 1
weight_initializer = tf.variance_scaling_initializer(mode="fan_avg", distribution="uniform", scale=sigma)
bias_initializer = tf.zeros_initializer()

# Create model
def multilayer_perceptron(X, variables):
    # Hidden layer with ReLU activation
    layer_1 = tf.nn.relu(tf.add(tf.matmul(X, variables['W1']), variables['bias1']))
    # Hidden layer with ReLU activation
    layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, variables['W2']), variables['bias2']))
    # Hidden layer with ReLU activation
    layer_3 = tf.nn.relu(tf.add(tf.matmul(layer_2, variables['W3']), variables['bias3']))
    # Output layer with ReLU activation
    out_layer = tf.nn.relu(tf.add(tf.matmul(layer_3, variables['W4']), variables['bias4']))
    return out_layer

# Store layers weight & bias
variables = {
    'W1': tf.Variable(weight_initializer([n_features, d1]), name="W1"), # inputs -> d1 hidden neurons
    'bias1': tf.Variable(bias_initializer([d1]), name="bias1"), # one biases for each d1 hidden neurons
    'W2': tf.Variable(weight_initializer([d1, d2]), name="W2"), # d1 hidden inputs -> d2 hidden neurons
    'bias2': tf.Variable(bias_initializer([d2]), name="bias2"), # one biases for each d2 hidden neurons
    'W3': tf.Variable(weight_initializer([d2, d3]), name="W3"), ## d2 hidden inputs -> d3 hidden neurons
    'bias3': tf.Variable(bias_initializer([d3]), name="bias3"), # one biases for each d3 hidden neurons
    'W4': tf.Variable(weight_initializer([d3, d4]), name="W4"), # d3 hidden inputs -> 1 output
    'bias4': tf.Variable(bias_initializer([d4]), name="bias4") # 1 bias for the output
}

# Construct model
y_hat = multilayer_perceptron(X, variables)

# Define loss and optimizer
loss = tf.reduce_mean(tf.square(y - y_hat)) # MSE
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss) # Train step

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

# 'Saver' op to save and restore all the variables
saver = tf.train.Saver()

Placeholders
Initializers

Train the model and Evaluate the model

In [17]:

# Running first session
print("Starting 1st session...")
with tf.Session() as sess:

    # Writer to record image, scalar, histogram and graph for display in tensorboard
    writer = tf.summary.FileWriter("../model/tmp/tensorflow_logs", sess.graph)  # create writer
    writer.add_graph(sess.graph)

    # Run the initializer
    sess.run(init)

    # Training cycle
    train_loss = []
    test_loss = []
    
    for epoch in range(epochs):
        rand_index = np.random.choice(len(X_train_std), size=batch_size)
        X_rand = X_train_std[rand_index]
        y_rand = y_train_std[rand_index]
        #y_rand = np.transpose([y_train[rand_index]])
        sess.run(optimizer, feed_dict={X: X_rand, y: y_rand})

        train_temp_loss = sess.run(loss, feed_dict={X: X_rand, y: y_rand})
        train_loss.append(np.sqrt(train_temp_loss))
    
        test_temp_loss = sess.run(loss, feed_dict={X: X_test_std, y: y_test_std})
        #test_temp_loss = sess.run(loss, feed_dict={X: X_test_std, y: np.transpose([y_test])})
        test_loss.append(np.sqrt(test_temp_loss))
        if (epoch+1) % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "Loss=", \
                "{:.9f}".format(train_temp_loss))

    # Close writer
    writer.flush()
    writer.close()
        
    # Save model weights to disk
    save_path = saver.save(sess, model_path)
    print("Model saved in file: %s" % save_path)
    print("First Optimization Finished!")

Starting 1st session...
Epoch: 0100 Loss= 1.187452793
Epoch: 0200 Loss= 0.653784931
Epoch: 0300 Loss= 0.850386798
Epoch: 0400 Loss= 0.666906953
Epoch: 0500 Loss= 0.922570705
Epoch: 0600 Loss= 0.788938463
Epoch: 0700 Loss= 0.720452368
Epoch: 0800 Loss= 0.650036752
Epoch: 0900 Loss= 0.696900964
Epoch: 1000 Loss= 0.559514642
Epoch: 1100 Loss= 0.576028287
Epoch: 1200 Loss= 0.649010479
Epoch: 1300 Loss= 0.679859638
Epoch: 1400 Loss= 0.513536632
Epoch: 1500 Loss= 0.522186100
Epoch: 1600 Loss= 0.464669585
Epoch: 1700 Loss= 0.482417375
Epoch: 1800 Loss= 0.514982164
Epoch: 1900 Loss= 0.873984993
Epoch: 2000 Loss= 0.710068643
Epoch: 2100 Loss= 0.583979726
Epoch: 2200 Loss= 0.510860145
Epoch: 2300 Loss= 0.409339219
Epoch: 2400 Loss= 0.689149320
Epoch: 2500 Loss= 0.465828985
Epoch: 2600 Loss= 0.475971222
Epoch: 2700 Loss= 0.609764397
Epoch: 2800 Loss= 0.535831273
Epoch: 2900 Loss= 0.499730915
Epoch: 3000 Loss= 0.495828092
Epoch: 3100 Loss= 0.536907375
Epoch: 3200 Loss= 0.523644745
Epoch: 3300 Loss= 0.694934130
Epoch: 3400 Loss= 0.761239827
Epoch: 3500 Loss= 0.659935832
Epoch: 3600 Loss= 0.594260037
Epoch: 3700 Loss= 0.426724046
Epoch: 3800 Loss= 0.568280101
Epoch: 3900 Loss= 0.541940570
Epoch: 4000 Loss= 0.557236254
Epoch: 4100 Loss= 0.574568808
Epoch: 4200 Loss= 0.530215859
Epoch: 4300 Loss= 0.757081807
Epoch: 4400 Loss= 0.486846477
Epoch: 4500 Loss= 0.601066351
Epoch: 4600 Loss= 0.541417539
Epoch: 4700 Loss= 0.496193975
Epoch: 4800 Loss= 0.665046215
Epoch: 4900 Loss= 0.676892161
Epoch: 5000 Loss= 0.389218241
Epoch: 5100 Loss= 0.538830817
Epoch: 5200 Loss= 0.345810086
Epoch: 5300 Loss= 0.470290005
Epoch: 5400 Loss= 0.720839977
Epoch: 5500 Loss= 0.593623638
Epoch: 5600 Loss= 0.359151542
Epoch: 5700 Loss= 0.415069729
Epoch: 5800 Loss= 0.525559127
Epoch: 5900 Loss= 0.364094764
Epoch: 6000 Loss= 0.461983174
Epoch: 6100 Loss= 0.440127134
Epoch: 6200 Loss= 0.368957520
Epoch: 6300 Loss= 0.528110206
Epoch: 6400 Loss= 0.443941981
Epoch: 6500 Loss= 0.693621516
Epoch: 6600 Loss= 0.591214240
Epoch: 6700 Loss= 0.628453016
Epoch: 6800 Loss= 0.600034475
Epoch: 6900 Loss= 0.474740833
Epoch: 7000 Loss= 0.448413581
Epoch: 7100 Loss= 0.407264739
Epoch: 7200 Loss= 0.508368433
Epoch: 7300 Loss= 0.560470521
Epoch: 7400 Loss= 0.457139134
Epoch: 7500 Loss= 0.415577382
Epoch: 7600 Loss= 0.366004378
Epoch: 7700 Loss= 0.743126690
Epoch: 7800 Loss= 0.570745826
Epoch: 7900 Loss= 0.420384288
Epoch: 8000 Loss= 0.515913427
Epoch: 8100 Loss= 0.345190585
Epoch: 8200 Loss= 0.593500078
Epoch: 8300 Loss= 0.634471953
Epoch: 8400 Loss= 0.403643250
Epoch: 8500 Loss= 0.410774380
Epoch: 8600 Loss= 0.522914648
Epoch: 8700 Loss= 0.450753480
Epoch: 8800 Loss= 0.415820956
Epoch: 8900 Loss= 0.760270894
Epoch: 9000 Loss= 0.478740126
Epoch: 9100 Loss= 0.467870414
Epoch: 9200 Loss= 0.568970919
Epoch: 9300 Loss= 0.707495451
Epoch: 9400 Loss= 0.528955817
Epoch: 9500 Loss= 0.630569756
Epoch: 9600 Loss= 0.737566054
Epoch: 9700 Loss= 0.482343763
Epoch: 9800 Loss= 0.543290317
Epoch: 9900 Loss= 0.706319273
Epoch: 10000 Loss= 0.350363821
Model saved in file: ../model/tmp/model.ckpt
First Optimization Finished!

In [18]:

%matplotlib inline
# Plot loss (MSE) over time
plt.plot(train_loss[100:], 'k-', label='Train Loss')
plt.plot(test_loss[100:], 'r--', label='Test Loss')
plt.title('Loss (MSE) per Generation')
plt.legend(loc='upper right')
plt.xlabel('Generation')
plt.ylabel('Loss')
plt.show()

We can see the error of this model is big. So, it is not a very accurate model.

Tensorboard Graph

What follows is the graph we have executed and all data about it. Note the “save” label and the several layers.

Saving a TensorFlow model

So, now we have our model saved.

Tensorflow model has four main files:

a) Meta graph: This is a protocol buffer which saves the complete Tensorflow graph; i.e. all variables, operations, collections etc. This file has .meta extension.
b) y c) Checkpoint files: It is a binary file which contains all the values of the weights, biases, gradients and all the other variables saved. Tensorflow has changed from version 0.11. Instead of a single .ckpt file, we have now two files: .index and .data file that contains our training variables.
d) Along with this, Tensorflow also has a file named checkpoint which simply keeps a record of latest checkpoint files saved.

Retrain the model

We can retrain the model as many times as we want to. In [19]:

# Running a new session
print("Starting 2nd session...")
with tf.Session() as sess:
    # Initialize variables
    sess.run(init)

    # Restore model weights from previously saved model
    saver.restore(sess, model_path)
    print("Model restored from file: %s" % model_path)

    # Resume training
    for epoch in range(epochs):
        rand_index = np.random.choice(len(X_train), size=batch_size)
        X_rand = X_train_std[rand_index]
        y_rand = y_train_std[rand_index]
        #y_rand = np.transpose([y_train[rand_index]])
        sess.run(optimizer, feed_dict={X: X_rand, y: y_rand})

        train_temp_loss = sess.run(loss, feed_dict={X: X_rand, y: y_rand})
        train_loss.append(np.sqrt(train_temp_loss))
    
        test_temp_loss = sess.run(loss, feed_dict={X: X_test_std, y: y_test_std})
        # test_temp_loss = sess.run(loss, feed_dict={X: X_test_std, y: np.transpose([y_test])})
        test_loss.append(np.sqrt(test_temp_loss))
        if (epoch+1) % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "Loss=", \
                "{:.9f}".format(train_temp_loss))

    # Close writer
    writer.flush()
    writer.close()
    
    # Save model weights to disk
    save_path = saver.save(sess, model_path)
    print("Model saved in file: %s" % save_path)
    print("Second Optimization Finished!")

Starting 2nd session...
INFO:tensorflow:Restoring parameters from ../model/tmp/model.ckpt
Model restored from file: ../model/tmp/model.ckpt
Epoch: 0100 Loss= 0.553170979
Epoch: 0200 Loss= 0.394454986
Epoch: 0300 Loss= 0.430046231
Epoch: 0400 Loss= 0.474143565
Epoch: 0500 Loss= 0.581448019
Epoch: 0600 Loss= 0.509629428
Epoch: 0700 Loss= 0.395796895
Epoch: 0800 Loss= 0.555636227
Epoch: 0900 Loss= 0.343794316
Epoch: 1000 Loss= 0.837739229
Epoch: 1100 Loss= 0.313927382
Epoch: 1200 Loss= 0.527575970
Epoch: 1300 Loss= 0.503197849
Epoch: 1400 Loss= 0.409355879
Epoch: 1500 Loss= 0.603905439
Epoch: 1600 Loss= 0.409259796
Epoch: 1700 Loss= 0.363913029
Epoch: 1800 Loss= 0.405373842
Epoch: 1900 Loss= 0.355158567
Epoch: 2000 Loss= 0.567333281
Epoch: 2100 Loss= 0.626644850
Epoch: 2200 Loss= 0.366723627
Epoch: 2300 Loss= 0.392675012
Epoch: 2400 Loss= 0.459658206
Epoch: 2500 Loss= 0.624418736
Epoch: 2600 Loss= 0.503614545
Epoch: 2700 Loss= 0.536920547
Epoch: 2800 Loss= 0.298828572
Epoch: 2900 Loss= 0.674796045
Epoch: 3000 Loss= 0.548250556
Epoch: 3100 Loss= 0.545930266
Epoch: 3200 Loss= 0.614865839
Epoch: 3300 Loss= 0.528206229
Epoch: 3400 Loss= 0.464911759
Epoch: 3500 Loss= 0.575236857
Epoch: 3600 Loss= 0.532283604
Epoch: 3700 Loss= 0.456996202
Epoch: 3800 Loss= 0.412953496
Epoch: 3900 Loss= 0.472213984
Epoch: 4000 Loss= 0.478306949
Epoch: 4100 Loss= 0.585611463
Epoch: 4200 Loss= 0.535116374
Epoch: 4300 Loss= 0.439250022
Epoch: 4400 Loss= 0.394633591
Epoch: 4500 Loss= 0.543169975
Epoch: 4600 Loss= 0.678306758
Epoch: 4700 Loss= 0.401767313
Epoch: 4800 Loss= 0.717293620
Epoch: 4900 Loss= 0.395379066
Epoch: 5000 Loss= 0.619805276
Epoch: 5100 Loss= 0.558279872
Epoch: 5200 Loss= 0.435946107
Epoch: 5300 Loss= 0.512710631
Epoch: 5400 Loss= 0.596122563
Epoch: 5500 Loss= 0.624202967
Epoch: 5600 Loss= 0.406295478
Epoch: 5700 Loss= 0.477687240
Epoch: 5800 Loss= 0.500929236
Epoch: 5900 Loss= 0.388294727
Epoch: 6000 Loss= 0.515086234
Epoch: 6100 Loss= 0.513015807
Epoch: 6200 Loss= 0.541478157
Epoch: 6300 Loss= 0.551055014
Epoch: 6400 Loss= 0.320969999
Epoch: 6500 Loss= 0.418038338
Epoch: 6600 Loss= 0.324854702
Epoch: 6700 Loss= 0.482309490
Epoch: 6800 Loss= 0.451527864
Epoch: 6900 Loss= 0.578172445
Epoch: 7000 Loss= 0.441889644
Epoch: 7100 Loss= 0.649310470
Epoch: 7200 Loss= 0.584093750
Epoch: 7300 Loss= 0.546306133
Epoch: 7400 Loss= 0.403149009
Epoch: 7500 Loss= 0.547944903
Epoch: 7600 Loss= 0.494749665
Epoch: 7700 Loss= 0.388097167
Epoch: 7800 Loss= 0.602027893
Epoch: 7900 Loss= 0.430499852
Epoch: 8000 Loss= 0.684577465
Epoch: 8100 Loss= 0.413421541
Epoch: 8200 Loss= 0.450443357
Epoch: 8300 Loss= 0.311192632
Epoch: 8400 Loss= 0.642973185
Epoch: 8500 Loss= 0.478902638
Epoch: 8600 Loss= 0.665206373
Epoch: 8700 Loss= 0.606129229
Epoch: 8800 Loss= 0.493207961
Epoch: 8900 Loss= 0.620301306
Epoch: 9000 Loss= 0.489933938
Epoch: 9100 Loss= 0.447370619
Epoch: 9200 Loss= 0.593291521
Epoch: 9300 Loss= 0.488727689
Epoch: 9400 Loss= 0.468365222
Epoch: 9500 Loss= 0.465868115
Epoch: 9600 Loss= 0.548444033
Epoch: 9700 Loss= 0.501695275
Epoch: 9800 Loss= 0.455697834
Epoch: 9900 Loss= 0.423523307
Epoch: 10000 Loss= 0.724091947
Model saved in file: ../model/tmp/model.ckpt
Second Optimization Finished!

Predict

Finally, we can use the model to make some predictions. First, we transform our samples accordingly. In [20]:

sc.transform([[203.5, 305.3, 0.0, 203.5, 0.0, 963.4, 630.0, 90],
              [173.0, 116.0, 0.0, 192.0, 0.0, 946.8, 856.8, 90],
              [522.0, 0.0, 0.0, 146.0, 0.0, 896.0, 896.0, 7]]) #True value  51.86, 32.10, 50.51

Out[20]:

array([[ 9.78564946, 15.74026644, -2.11773518,  9.78564946, -2.11773518,
        54.23470101, 34.73303791,  3.14666097],
       [ 8.00160409,  4.66748653, -2.11773518,  9.11297662, -2.11773518,
        53.26371238, 47.99931622,  3.14666097],
       [28.41576252, -2.11773518, -2.11773518,  6.42228525, -2.11773518,
        50.29225322, 50.29225322, -1.70828215]])

Then, we make the predictions:

In [21]:

# Running a new session for predictions
print("Starting prediction session...")
with tf.Session() as sess:
    # Initialize variables
    sess.run(init)

    # Restore model weights from previously saved model
    saver.restore(sess, model_path)
    print("Model restored from file: %s" % model_path)

    # We try to predict the Concrete compressive strength (MPa megapascals) of three samples
    feed_dict_std = {X: [[ 9.78564946, 15.74026644, -2.11773518,  9.78564946, -2.11773518,
        54.23470101, 34.73303791,  3.14666097],
       [ 8.00160409,  4.66748653, -2.11773518,  9.11297662, -2.11773518,
        53.26371238, 47.99931622,  3.14666097],
       [28.41576252, -2.11773518, -2.11773518,  6.42228525, -2.11773518,
        50.29225322, 50.29225322, -1.70828215]]}
    prediction = sess.run(y_hat, feed_dict_std)
    print(prediction) #True value  51.86, 32.10, 50.51

Starting prediction session...
INFO:tensorflow:Restoring parameters from ../model/tmp/model.ckpt
Model restored from file: ../model/tmp/model.ckpt
[[0.137997 ]
 [0.137997 ]
 [1.1572909]]

OK, better results, but still not very good results. We could try to improve them with a deeper network (more layers) or retouching the net parameters and number of neurons. That is another story.

In [22]:

y_hat_rev = sc.inverse_transform(prediction)
y_hat_rev

Out[22]:

array([[38.563946],
       [38.563946],
       [55.989773]], dtype=float32)

Not really good but illustrative of using deep neural network for this kind of difficult problems.

You can find the complete Jupyter notebooks (two notebooks because the second one is for restoring the model) in my Github repository:

https://github.com/parrondo/deeptrading

At last! In the next post Tensorflow with real data of financial markets. We will see how far we are able to arrive with this magnificent calculation and forecasting tool. 🙂

Meanwhile, I invite you to read my post about artificial intelligence applied to trading systems so that you have a bird’s eye view of the resources that we will be exploring in our series of articles.

Artificial Intelligence Trading Systems

Black Belt

DeepTrading with Tensorflow IV

2019-06-232019-06-24
by parrondo

After you have trained a neural network (NN), you would want to save it for future calculation and eventually deploying to production. So, what is a Tensorflow model? Tensorflow model contains the network design or graph and values of the network parameters that we have trained.

Important Note: I know that the reader is impatient to use real data from the financial markets. Please be patient, I promise that we will use them properly when you are ready, but now we must strengthen our knowledge to have a strong foundation.

Also, remember to have a look at the first posts of the series to have the full picture:

https://todotrader.com/deeptrading-with-tensorflow/

https://todotrader.com/deeptrading-with-tensorflow-ii/

https://todotrader.com/deeptrading-with-tensorflow-iii/

Implementing a one hidden layer Neural Network with save and restore

Here is the one-hidden layer network model again to refresh our knowledge. As usual, we will follow our supervised learning flowchart.

We will illustrate how to create a one hidden layer NN, save it and make predictions with a trained model after reloading it.

Again, we will use the iris data for this exercise. Remember the important note above!

We will build a one-hidden layer neural network to predict the fourth attribute, Petal Width from the other three (Sepal length, Sepal width, Petal length).

There are several differences with respect to the example before in order to illustrate more Tensorflow possibilities.

Caution: TensorFlow model files are code. Be careful with untrusted code. See Using TensorFlow Securely for details.

Load configuration

In [1]:

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from sklearn.datasets import load_iris
from tensorflow.python.framework import ops
import pandas as pd

/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)

Ingest raw data

In [2]:

# Before getting into pandas dataframes we will load an example dataset from sklearn library 
# type(data) #iris is a bunch instance which is inherited from dictionary
data = load_iris() #load iris dataset

# We get a pandas dataframe to better visualize the datasets
df = pd.DataFrame(data.data, columns=data.feature_names)

X_raw = np.array([x[0:3] for x in data.data])
y_raw = np.array([x[3] for x in data.data])

# Dimensions of dataset
print("Dimensions of dataset")
n = X_raw.shape[0]
p = X_raw.shape[1]
print("n=",n,"p=",p)

Dimensions of dataset
n= 150 p= 3

In [3]:

data.keys() #keys of the dictionary

Out[3]:

dict_keys(['target_names', 'target', 'data', 'DESCR', 'feature_names'])

In [4]:

X_raw.shape # Array 150x3. Each element is a 3-dimensional data point: sepal length, sepal width, petal length

Out[4]:

(150, 3)

In [5]:

y_raw.shape # Vector 150. Each element is a 1-dimensional (scalar) data point: petal width

Out[5]:

(150,)

In [6]:

df

Out[6]:

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2
5	5.4	3.9	1.7	0.4
6	4.6	3.4	1.4	0.3
7	5.0	3.4	1.5	0.2
8	4.4	2.9	1.4	0.2
9	4.9	3.1	1.5	0.1
10	5.4	3.7	1.5	0.2
11	4.8	3.4	1.6	0.2
12	4.8	3.0	1.4	0.1
13	4.3	3.0	1.1	0.1
14	5.8	4.0	1.2	0.2
15	5.7	4.4	1.5	0.4
16	5.4	3.9	1.3	0.4
17	5.1	3.5	1.4	0.3
18	5.7	3.8	1.7	0.3
19	5.1	3.8	1.5	0.3
20	5.4	3.4	1.7	0.2
21	5.1	3.7	1.5	0.4
22	4.6	3.6	1.0	0.2
23	5.1	3.3	1.7	0.5
24	4.8	3.4	1.9	0.2
25	5.0	3.0	1.6	0.2
26	5.0	3.4	1.6	0.4
27	5.2	3.5	1.5	0.2
28	5.2	3.4	1.4	0.2
29	4.7	3.2	1.6	0.2
…	…	…	…	…
120	6.9	3.2	5.7	2.3
121	5.6	2.8	4.9	2.0
122	7.7	2.8	6.7	2.0
123	6.3	2.7	4.9	1.8
124	6.7	3.3	5.7	2.1
125	7.2	3.2	6.0	1.8
126	6.2	2.8	4.8	1.8
127	6.1	3.0	4.9	1.8
128	6.4	2.8	5.6	2.1
129	7.2	3.0	5.8	1.6
130	7.4	2.8	6.1	1.9
131	7.9	3.8	6.4	2.0
132	6.4	2.8	5.6	2.2
133	6.3	2.8	5.1	1.5
134	6.1	2.6	5.6	1.4
135	7.7	3.0	6.1	2.3
136	6.3	3.4	5.6	2.4
137	6.4	3.1	5.5	1.8
138	6.0	3.0	4.8	1.8
139	6.9	3.1	5.4	2.1
140	6.7	3.1	5.6	2.4
141	6.9	3.1	5.1	2.3
142	5.8	2.7	5.1	1.9
143	6.8	3.2	5.9	2.3
144	6.7	3.3	5.7	2.5
145	6.7	3.0	5.2	2.3
146	6.3	2.5	5.0	1.9
147	6.5	3.0	5.2	2.0
148	6.2	3.4	5.4	2.3
149	5.9	3.0	5.1	1.8

150 rows × 4 columns

Basic pre-process data

In [7]:

#
# Leave in blanck intentionally
#

Split data

In [8]:

# split into train and test sets

# Total samples
nsamples = n

# Splitting into train (70%) and test (30%) sets
split = 70 # training split% ; test (100-split)%
jindex = nsamples*split//100 # Index for slicing the samples

# Samples in train
nsamples_train = jindex

# Samples in test
nsamples_test = nsamples - nsamples_train
print("Total number of samples: ",nsamples,"\nSamples in train set: ", nsamples_train,
      "\nSamples in test set: ",nsamples_test)

# Here are train and test samples
X_train = X_raw[:jindex, :]
y_train = y_raw[:jindex]

X_test = X_raw[jindex:, :]
y_test = y_raw[jindex:]

print("X_train.shape = ", X_train.shape, "y_train.shape =", y_train.shape, "\nX_test.shape =  ",
      X_test.shape, "y_test.shape = ", y_test.shape)

Total number of samples:  150 
Samples in train set:  105 
Samples in test set:  45
X_train.shape =  (105, 3) y_train.shape = (105,) 
X_test.shape =   (45, 3) y_test.shape =  (45,)

Transform features

Note

Be careful do not to write X_test_std = sc.fit_transform(X_test) instead of X_test_std = sc.transform(X_test). In this case, it wouldn’t make a great difference since the mean and standard deviation of the test set should be (quite) similar to the training set. However, this is not always the case in Forex market data, as has been well established in the literature. The correct way is to re-use parameters from the training set if we are doing any kind of transformation. So, the test set should basically stand for “new, unseen” data. In [9]:

# Scale data
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)

y_train_std = sc.fit_transform(y_train.reshape(-1, 1))
y_test_std = sc.transform(y_test.reshape(-1, 1))

Implement the model

In [10]:

# Clears the default graph stack and resets the global default graph
ops.reset_default_graph()

In [11]:

# make results reproducible
seed = 2
tf.set_random_seed(seed)
np.random.seed(seed)  


# Parameters
learning_rate = 0.005
batch_size = 50
n_features = X_train.shape[1]#  Number of features in training data
epochs = 1000
display_step = 50
model_path = "/tmp/model.ckpt"
n_classes = 1

# Network Parameters
# See figure of the model
d0 = D = n_features # Layer 0 (Input layer number of features)
d1 = 10 # Layer 1 (1st hidden layer number of features. Selected 10 for this example)
d2 = C = 1 # Layer 2 (Output layer)

# tf Graph input
print("Placeholders")
X = tf.placeholder(dtype=tf.float32, shape=[None, n_features], name="X")
y = tf.placeholder(dtype=tf.float32, shape=[None,n_classes], name="y")


# Initializers
print("Initializers")
sigma = 1
weight_initializer = tf.variance_scaling_initializer(mode="fan_avg", distribution="uniform", scale=sigma)
bias_initializer = tf.zeros_initializer()

# Create model
def onelayer_perceptron(X, variables):
    # Hidden layer with ReLU activation
    layer_1 = tf.nn.relu(tf.add(tf.matmul(X, variables['W1']), variables['bias1']))
    # Output layer with ReLU activation
    out_layer = tf.nn.relu(tf.add(tf.matmul(layer_1, variables['W2']), variables['bias2']))
    return out_layer

# Store layers weight & bias
variables = {
    'W1': tf.Variable(weight_initializer([n_features, d1]), name="W1"), # inputs -> hidden neurons
    'bias1': tf.Variable(bias_initializer([d1]), name="bias1"), # one biases for each hidden neurons
    'W2': tf.Variable(weight_initializer([d1, d2]), name="W2"), # hidden inputs -> 1 output
    'bias2': tf.Variable(bias_initializer([d2]), name="bias2") # 1 bias for the output
}

# Construct model
y_hat = onelayer_perceptron(X, variables)

# Define loss and optimizer
loss = tf.reduce_mean(tf.square(y - y_hat)) # MSE
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss) # Train step

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

# 'Saver' op to save and restore all the variables
saver = tf.train.Saver()

Placeholders
Initializers

Train the model and Evaluate the model

In [12]:

# Running first session
print("Starting 1st session...")
with tf.Session() as sess:

    # Writer to record image, scalar, histogram and graph for display in tensorboard
    writer = tf.summary.FileWriter("/tmp/tensorflow_logs", sess.graph)  # create writer
    writer.add_graph(sess.graph)

    # Run the initializer
    sess.run(init)

    # Training cycle
    train_loss = []
    test_loss = []
    
    for epoch in range(epochs):
        rand_index = np.random.choice(len(X_train), size=batch_size)
        X_rand = X_train[rand_index]
        y_rand = np.transpose([y_train[rand_index]])
        sess.run(optimizer, feed_dict={X: X_rand, y: y_rand})

        train_temp_loss = sess.run(loss, feed_dict={X: X_rand, y: y_rand})
        train_loss.append(np.sqrt(train_temp_loss))
    
        test_temp_loss = sess.run(loss, feed_dict={X: X_test, y: np.transpose([y_test])})
        test_loss.append(np.sqrt(test_temp_loss))
        if (epoch+1) % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "Lost=", \
                "{:.9f}".format(train_temp_loss))

    # Close writer
    writer.flush()
    writer.close()
        
    # Save model weights to disk
    save_path = saver.save(sess, model_path)
    print("Model saved in file: %s" % save_path)
    print("First Optimization Finished!")

Starting 1st session...
Epoch: 0050 Lost= 0.599382699
Epoch: 0100 Lost= 0.200652853
Epoch: 0150 Lost= 0.082070500
Epoch: 0200 Lost= 0.046969157
Epoch: 0250 Lost= 0.033277217
Epoch: 0300 Lost= 0.029509921
Epoch: 0350 Lost= 0.046582703
Epoch: 0400 Lost= 0.051407199
Epoch: 0450 Lost= 0.080046408
Epoch: 0500 Lost= 0.032044422
Epoch: 0550 Lost= 0.028484538
Epoch: 0600 Lost= 0.030885572
Epoch: 0650 Lost= 0.053837571
Epoch: 0700 Lost= 0.030355027
Epoch: 0750 Lost= 0.030203044
Epoch: 0800 Lost= 0.021480566
Epoch: 0850 Lost= 0.011752291
Epoch: 0900 Lost= 0.040840883
Epoch: 0950 Lost= 0.035907771
Epoch: 1000 Lost= 0.042663313
Model saved in file: /tmp/model.ckpt
First Optimization Finished!

In [13]:

%matplotlib inline
# Plot loss (MSE) over time
plt.plot(train_loss, 'k-', label='Train Loss')
plt.plot(test_loss, 'r--', label='Test Loss')
plt.title('Loss (MSE) per Generation')
plt.legend(loc='upper right')
plt.xlabel('Generation')
plt.ylabel('Loss')
plt.show()

Tensorboard Graph

What follows is the graph we have executed and all the data about it. Note the “save” label.

Saving a Tensorflow model

So, now we have our model saved.

Tensorflow model has four main files:

Meta graph: This is a protocol buffer which saves the complete Tensorflow graph; i.e. all variables, operations, collections, etc. This file has a .meta extension.
Two Checkpoint files: they are binary files which contain all the values of the weights, biases, gradients and all the other variables saved. Tensorflow has changed from version 0.11. Instead of a single .ckpt file, we have now two files: .index and .data file that contains our training variables.
Along with thes, Tensorflow also has a file named checkpoint which simply keeps a record of latest checkpoint files saved.

Retrain the model

We can retrain the model as many times as we want to.

In [14]:

# Running a new session
print("Starting 2nd session...")
with tf.Session() as sess:
    # Initialize variables
    sess.run(init)

    # Restore model weights from previously saved model
    saver.restore(sess, model_path)
    print("Model restored from file: %s" % model_path)

    # Resume training
    for epoch in range(epochs*2):
        rand_index = np.random.choice(len(X_train), size=batch_size)
        X_rand = X_train[rand_index]
        y_rand = np.transpose([y_train[rand_index]])
        sess.run(optimizer, feed_dict={X: X_rand, y: y_rand})

        train_temp_loss = sess.run(loss, feed_dict={X: X_rand, y: y_rand})
        train_loss.append(np.sqrt(train_temp_loss))
    
        test_temp_loss = sess.run(loss, feed_dict={X: X_test, y: np.transpose([y_test])})
        test_loss.append(np.sqrt(test_temp_loss))
        if (epoch+1) % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "Lost=", \
                "{:.9f}".format(train_temp_loss))

    # Close writer
    writer.flush()
    writer.close()
    
    # Save model weights to disk
    save_path = saver.save(sess, model_path)
    print("Model saved in file: %s" % save_path)
    print("Second Optimization Finished!")

Starting 2nd session...
INFO:tensorflow:Restoring parameters from /tmp/model.ckpt
Model restored from file: /tmp/model.ckpt
Epoch: 0050 Lost= 0.045188859
Epoch: 0100 Lost= 0.035137746
Epoch: 0150 Lost= 0.040114976
Epoch: 0200 Lost= 0.040839382
Epoch: 0250 Lost= 0.029388864
Epoch: 0300 Lost= 0.050860386
Epoch: 0350 Lost= 0.023227667
Epoch: 0400 Lost= 0.034531657
Epoch: 0450 Lost= 0.036823772
Epoch: 0500 Lost= 0.020957258
Epoch: 0550 Lost= 0.023199901
Epoch: 0600 Lost= 0.029416963
Epoch: 0650 Lost= 0.028286777
Epoch: 0700 Lost= 0.029708408
Epoch: 0750 Lost= 0.038849130
Epoch: 0800 Lost= 0.021901334
Epoch: 0850 Lost= 0.019867409
Epoch: 0900 Lost= 0.038035385
Epoch: 0950 Lost= 0.046836123
Epoch: 1000 Lost= 0.024480129
Epoch: 1050 Lost= 0.025052661
Epoch: 1100 Lost= 0.028433315
Epoch: 1150 Lost= 0.022785973
Epoch: 1200 Lost= 0.018632039
Epoch: 1250 Lost= 0.024766553
Epoch: 1300 Lost= 0.027888060
Epoch: 1350 Lost= 0.030560365
Epoch: 1400 Lost= 0.041359652
Epoch: 1450 Lost= 0.015819877
Epoch: 1500 Lost= 0.029382044
Epoch: 1550 Lost= 0.034098670
Epoch: 1600 Lost= 0.025412932
Epoch: 1650 Lost= 0.036478702
Epoch: 1700 Lost= 0.030148495
Epoch: 1750 Lost= 0.016189585
Epoch: 1800 Lost= 0.023110745
Epoch: 1850 Lost= 0.029191718
Epoch: 1900 Lost= 0.018225947
Epoch: 1950 Lost= 0.023598077
Epoch: 2000 Lost= 0.015231807
Model saved in file: /tmp/model.ckpt
Second Optimization Finished!

Predict

We got it!

Finally, we can use the model to make some predictions.

In [15]:

# Running a new session for predictions
print("Starting prediction session...")
with tf.Session() as sess:
    # Initialize variables
    sess.run(init)

    # Restore model weights from previously saved model
    saver.restore(sess, model_path)
    print("Model restored from file: %s" % model_path)

    # We try to predict the petal width (cm) of three samples
    #Caution!!! This data are not the right data (see below why)
    feed_dict = {X: [[5.1, 3.5, 1.4],
                     [4.8, 3.0, 1.4],
                     [6.3, 3.4, 5.6]]
                }
    prediction = sess.run(y_hat, feed_dict)
    print(prediction) # True value 0.2, 0.1, 2.4

Starting prediction session...
INFO:tensorflow:Restoring parameters from /tmp/model.ckpt
Model restored from file: /tmp/model.ckpt
[[0.19734718]
 [0.28260154]
 [1.7156498 ]]

Caution Note: continue reading

OK, not very good results. But it is worst that we could think! Data are not right because we have trained our model with transformed data (standardization) and now we must use again transformed data to make predictions. Also, we will get back-transformed data again. So, we must inverse the transformation to get the original kind of data.

First: transform our original data. The data we want to make the prediction about.

In [16]:

X_pred = [[5.1, 3.5, 1.4],
          [4.8, 3.0, 1.4],
          [6.3, 3.4, 5.6]]

In [17]:

X_pred_std = sc.transform(X_pred)
X_pred_std

Out[17]:

array([[6.86549436, 4.28228483, 0.89182234],
       [6.38114257, 3.47503186, 0.89182234],
       [8.8029015 , 4.12083424, 7.67274733]])

Second: we are ready to make the predictions

In [18]:

# Running a new session for predictions
print("Starting prediction session...")
with tf.Session() as sess:
    # Initialize variables
    sess.run(init)

    # Restore model weights from previously saved model
    saver.restore(sess, model_path)
    print("Model restored from file: %s" % model_path)

    # We try to predict the petal width (cm) of three samples
    feed_dict_std = {X: [[6.86549436, 4.28228483, 0.89182234],
       [6.38114257, 3.47503186, 0.89182234],
       [8.8029015 , 4.12083424, 7.67274733]]}
    prediction = sess.run(y_hat, feed_dict_std)
    print(prediction) # True value 0.2, 0.1, 2.4

Starting prediction session...
INFO:tensorflow:Restoring parameters from /tmp/model.ckpt
Model restored from file: /tmp/model.ckpt
[[0.15292837]
 [0.20799588]
 [2.3737454 ]]

Third: we reverse the transformation

In [19]:

y_hat_rev = sc.inverse_transform(prediction)
y_hat_rev

Out[19]:

array([[0.9423405],
       [0.9764485],
       [2.3178802]], dtype=float32)

Not bad. True values are 0.2, 0.1, 2.4. We’ll try to improve them with a deeper network. That is the goal of the next notebook.

In the mean time, try to have a full comprehension of this result.

Remember you can get the full Jupyter notebook on my Github repo:

https://github.com/parrondo/deeptrading

Black Belt

DeepTrading with TensorFlow III

2019-06-192019-06-22
by parrondo

We are now closer to applying our knowledge of neural networks (NN) to our trading systems. But, we still have to tune our rudiments a bit on TensorFlow.
If you are not yet familiar with our supervised machine learning flowchart, take a look at the first two posts in this series.

DeepTrading with Tensorflow

DeepTrading with TensorFlow II

As usual, the calculations contained in this post are part of a Jupyter notebook that is in our Github repository:

https://github.com/parrondo/deeptrading

Implementing a one hidden layer Neural Network

We will illustrate how to create a one hidden layer NN.

The readers will use the iris data for this exercise.

Finally, we will build a one-hidden-layer neural network to predict the fourth attribute, Petal Width from the other three (Sepal length, Sepal width, Petal length).

Load configuration

In [1]:

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from sklearn.datasets import load_iris
from tensorflow.python.framework import ops
import pandas as pd

Ingest raw data

In [2]:

# Before getting into pandas dataframes we will load an example dataset from sklearn library 
# type(data) #iris is a bunch instance which is inherited from dictionary
data = load_iris() #load iris dataset



data = load_iris()

# We get a pandas dataframe to better visualize the datasets
df = pd.DataFrame(data.data, columns=data.feature_names)

X_raw = np.array([x[0:3] for x in data.data])
y_raw = np.array([x[3] for x in data.data])

# Dimensions of dataset
print("Dimensions of dataset")
n = X_raw.shape[0]
p = X_raw.shape[1]
print("n=",n,"p=",p)

Dimensions of dataset
n= 150 p= 3

In [3]:

data.keys() #keys of the dictionary

Out[3]:

dict_keys(['DESCR', 'data', 'target', 'feature_names', 'target_names'])

In [4]:

X_raw.shape # Array 150x3. Each element is a 3-dimensional data point: sepal length, sepal width, petal length

Out[4]:

(150, 3)

In [5]:

y_raw.shape # Vector 150. Each element is a 1-dimensional (scalar) data point: petal width

Out[5]:

(150,)

In [6]:

df

Out[6]:

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2
5	5.4	3.9	1.7	0.4
6	4.6	3.4	1.4	0.3
7	5.0	3.4	1.5	0.2
8	4.4	2.9	1.4	0.2
9	4.9	3.1	1.5	0.1
10	5.4	3.7	1.5	0.2
11	4.8	3.4	1.6	0.2
12	4.8	3.0	1.4	0.1
13	4.3	3.0	1.1	0.1
14	5.8	4.0	1.2	0.2
15	5.7	4.4	1.5	0.4
16	5.4	3.9	1.3	0.4
17	5.1	3.5	1.4	0.3
18	5.7	3.8	1.7	0.3
19	5.1	3.8	1.5	0.3
20	5.4	3.4	1.7	0.2
21	5.1	3.7	1.5	0.4
22	4.6	3.6	1.0	0.2
23	5.1	3.3	1.7	0.5
24	4.8	3.4	1.9	0.2
25	5.0	3.0	1.6	0.2
26	5.0	3.4	1.6	0.4
27	5.2	3.5	1.5	0.2
28	5.2	3.4	1.4	0.2
29	4.7	3.2	1.6	0.2
…	…	…	…	…
120	6.9	3.2	5.7	2.3
121	5.6	2.8	4.9	2.0
122	7.7	2.8	6.7	2.0
123	6.3	2.7	4.9	1.8
124	6.7	3.3	5.7	2.1
125	7.2	3.2	6.0	1.8
126	6.2	2.8	4.8	1.8
127	6.1	3.0	4.9	1.8
128	6.4	2.8	5.6	2.1
129	7.2	3.0	5.8	1.6
130	7.4	2.8	6.1	1.9
131	7.9	3.8	6.4	2.0
132	6.4	2.8	5.6	2.2
133	6.3	2.8	5.1	1.5
134	6.1	2.6	5.6	1.4
135	7.7	3.0	6.1	2.3
136	6.3	3.4	5.6	2.4
137	6.4	3.1	5.5	1.8
138	6.0	3.0	4.8	1.8
139	6.9	3.1	5.4	2.1
140	6.7	3.1	5.6	2.4
141	6.9	3.1	5.1	2.3
142	5.8	2.7	5.1	1.9
143	6.8	3.2	5.9	2.3
144	6.7	3.3	5.7	2.5
145	6.7	3.0	5.2	2.3
146	6.3	2.5	5.0	1.9
147	6.5	3.0	5.2	2.0
148	6.2	3.4	5.4	2.3
149	5.9	3.0	5.1	1.8

150 rows × 4 columns

Basic pre-process data

Here we will do nothing, but I like to leave it blank so that the reader does not lose the thread of our flowchart. 🙂

In [7]:

#
# Leave in blanck intentionally
#

Split data

In [8]:

# split into train and test sets

# Total samples
nsamples = n

# Splitting into train (70%) and test (30%) sets
split = 70 # training split% ; test (100-split)%
jindex = nsamples*split//100 # Index for slicing the samples

# Samples in train
nsamples_train = jindex

# Samples in test
nsamples_test = nsamples - nsamples_train
print("Total number of samples: ",nsamples,"\nSamples in train set: ", nsamples_train,
      "\nSamples in test set: ",nsamples_test)

# Here are train and test samples
X_train = X_raw[:jindex, :]
y_train = y_raw[:jindex]

X_test = X_raw[jindex:, :]
y_test = y_raw[jindex:]

print("X_train.shape = ", X_train.shape, "y_train.shape =", y_train.shape, "\nX_test.shape =  ",
      X_test.shape, "y_test.shape = ", y_test.shape)

Total number of samples:  150 
Samples in train set:  105 
Samples in test set:  45
X_train.shape =  (105, 3) y_train.shape = (105,) 
X_test.shape =   (45, 3) y_test.shape =  (45,)

Transform features

Important Note

Be careful not to writeX_test_std = sc.fit_transform(X_test) instead ofX_test_std = sc.transform(X_test). In this case, it wouldn’t make a great difference since the mean and standard deviation of the test set should be (quite) similar to the training set. However, this is not always the case in Forex market data, as has been well established in the literature. The correct way is to re-use parameters from the training set if we are doing any kind of transformation. So, the test set should basically stand for “new, unseen” data.

In [9]:

# Scale data
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)

y_train_std = sc.fit_transform(y_train.reshape(-1, 1))
y_test_std = sc.transform(y_test.reshape(-1, 1))

Implement the model

In [10]:

# Clears the default graph stack and resets the global default graph
ops.reset_default_graph()

In [11]:

# make results reproducible
seed = 2
tf.set_random_seed(seed)
np.random.seed(seed)  

# Initialize hyperparameters
n_features = X_train.shape[1]#  Number of features in training data
print("Number of featuress in training data: ", n_features)

batch_size = 50

# Placeholders
print("Placeholders")
X = tf.placeholder(dtype=tf.float32, shape=[None, n_features], name="X")
y = tf.placeholder(dtype=tf.float32, shape=[None,1], name="y")

# Initializers
print("Initializers")
sigma = 1
weight_initializer = tf.variance_scaling_initializer(mode="fan_avg", distribution="uniform", scale=sigma)
bias_initializer = tf.zeros_initializer()

Number of featuress in training data:  3
Placeholders
Initializers

In [12]:

# Dimensions of the layers (aka layer nodes, neurons)(See figure of the model)
d0 = D = n_features # Layer 0 (Input layer)
d1 = 10 # Layer 1 (Hidden layer 1). Selected 10 for this example
d2 = C = 1 # Layer 2 (Output layer)

print("d0 =", d0, "d1 =", d1, "d2 =", d2)

# Create variables for NN layers
W1 = tf.Variable(weight_initializer([n_features, d1]), name="W1") # inputs -> hidden neurons
bias1 = tf.Variable(bias_initializer([d1]), name="bias1") # one biases for each hidden neurons
W2 = tf.Variable(weight_initializer([d1, d2]), name="W2") # hidden inputs -> 1 output
bias2 = tf.Variable(bias_initializer([d2]), name="bias2") # 1 bias for the output

# Construct model
hidden_output = tf.nn.relu(tf.add(tf.matmul(X, W1), bias1))
final_output = tf.nn.relu(tf.add(tf.matmul(hidden_output, W2), bias2))

# Define loss function (MSE)
loss = tf.reduce_mean(tf.square(y - final_output))

# Define optimizer
my_opt = tf.train.GradientDescentOptimizer(0.005)
train_step = my_opt.minimize(loss)

# Initialize variables
init = tf.global_variables_initializer()

d0 = 3 d1 = 10 d2 = 1

In [13]:

W1

Out[13]:

<tf.Variable 'W1:0' shape=(3, 10) dtype=float32_ref>

Train the model and Evaluate the model

In [14]:

# Create graph session 
sess = tf.Session()

# Writer to record image, scalar, histogram and graph for display in tensorboard
writer = tf.summary.FileWriter("/tmp/tensorflow_logs", sess.graph)

sess.run(init)

# Training loop
train_loss = []
test_loss = []
for i in range(1000):
    rand_index = np.random.choice(len(X_train), size=batch_size)
    X_rand = X_train[rand_index]
    y_rand = np.transpose([y_train[rand_index]])
    sess.run(train_step, feed_dict={X: X_rand, y: y_rand})

    train_temp_loss = sess.run(loss, feed_dict={X: X_rand, y: y_rand})
    train_loss.append(np.sqrt(train_temp_loss))
    
    test_temp_loss = sess.run(loss, feed_dict={X: X_test, y: np.transpose([y_test])})
    test_loss.append(np.sqrt(test_temp_loss))
    if (i+1)%50==0:
        print('Generation: ' + str(i+1) + '. Loss = ' + str(train_temp_loss))

writer.flush()
writer.close()

Generation: 50. Loss = 0.5993827
Generation: 100. Loss = 0.20065285
Generation: 150. Loss = 0.0820705
Generation: 200. Loss = 0.046969157
Generation: 250. Loss = 0.033277217
Generation: 300. Loss = 0.02950992
Generation: 350. Loss = 0.046582703
Generation: 400. Loss = 0.0514072
Generation: 450. Loss = 0.08004641
Generation: 500. Loss = 0.032044422
Generation: 550. Loss = 0.028484538
Generation: 600. Loss = 0.030885572
Generation: 650. Loss = 0.05383757
Generation: 700. Loss = 0.030355027
Generation: 750. Loss = 0.030203044
Generation: 800. Loss = 0.021480566
Generation: 850. Loss = 0.011752291
Generation: 900. Loss = 0.040840883
Generation: 950. Loss = 0.03590777
Generation: 1000. Loss = 0.042663313

In [15]:

%matplotlib inline
# Plot loss (MSE) over time
plt.plot(train_loss, 'k-', label='Train Loss')
plt.plot(test_loss, 'r--', label='Test Loss')
plt.title('Loss (MSE) per Generation')
plt.legend(loc='upper right')
plt.xlabel('Generation')
plt.ylabel('Loss')
plt.show()

Tensorboard Graph

What follows is the graph we have executed and all data about it.

Predict

In [16]:

#
# Leave in blanck intentionally
#

We have reached the end of this post, but don’t worry, I will continue it very soon.
In the meantime, I propose that you put your infrastructure in place to create trading systems. Take a look at the following posts that are aimed at you acquiring good practices. So, your work could be reproducible.

Quant Trading Project Structure

Robust Git Workflow for Research Projects

Remember all these calculations are included in a Jupyter notebook in my Github repository:

https://github.com/parrondo/deeptrading

Black Belt

DeepTrading with TensorFlow II

2019-06-152019-06-22
by parrondo

OK, you know what tensors are or perhaps you don’t, but you are sure you want to use TensorFlow to trade with it. This post introduces you to how to create elemental NN tensors in TensorFlow.

This is the second post of the serie, so you need to be familiarized with the concepts exposed in the first post DeepTrading with Tensorflow.

Tensors

Tensors (of order higher than two) are data structures indexed by three or more indices, say (i,j,k,…) — a generalization of matrices, which are indexed by two indices, (m,n) for (m rows, n columns). These algebraic animals are very interesting from a theoretical point of view, and tensor-based methods have recently become very important in signal processing, data science, and machine learning applications.

Internally, TensorFlow represents tensors as n-dimensional arrays of base data types.

We use tensors all the time in Deep Learning, but you do not need to be an expert in them to use it. You may need to understand a little about them, so here are some good resources:

How to create elemental NN tensors in TensorFlow

The following is an elemental computational graph that we are going to create and execute. It is very similar to many of the calculations that must be made in artificial neural networks. We will use Jupyter notebook to accomplish all calculations.

It is formed by a linear transformation, xW + b, followed by a non-linear activation fucntion, ReLU(). Where:

x is a D-dimensional data point.
W is the DxM matrix of weights.
b is the bias M-dimensional vector

We will always follow the supervised flowchart:

Remember that you can find the complete Jupyter notebook in my Github repository:

https://github.com/parrondo/deeptrading

LOAD CONFIGURATION

First, we start with loading TensorFlow and resetting the computational graph.

In [1]:

import tensorflow as tf #Tensorflow 1.5 import warnings https://github.com/ContinuumIO/anaconda-issues/issues/6678
from tensorflow.python.framework import ops
ops.reset_default_graph()
import random as rnd
import numpy as np

/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/parrondo/anaconda3/envs/deeptrading/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)

IMPLEMENT THE MODEL

1. Build a graph

a. Graph contains parameter specifications, model architecture, optimization process, …
b. Somewhere between 5 and 5000 lines

These are the elements involved in our graph:

Variables are 0-ary stateful nodes which output their current value.
Placeholders are 0-ary nodes whose value is fed in at execution time.
Mathematical operations:
- MatMul: Multiply two matrix values.
- Add: Add elementwise (with broadcasting).
- ReLU: Activate with elementwise rectified linear function.

In [2]:

b = tf.Variable(tf.zeros((100,)), name='biases')
W=tf.Variable(tf.random_uniform((1024, 100), -1, 1), name='weights')
x = tf.placeholder(tf.float32, (1, 1024), name="x")
h_i = tf.nn.relu(tf.matmul(x, W) + b, name="h_i")

Now, you can see what the object b is:

In [3]:

And the output:

Out[3]:

<tf.Variable 'biases:0' shape=(100,) dtype=float32_ref>

The object W:

In [4]:

Out[4]:

<tf.Variable 'weights:0' shape=(1024, 100) dtype=float32_ref>

The object x:

In [5]:

Out[5]:

<tf.Tensor 'x:0' shape=(1, 1024) dtype=float32>

2. Start a graph session

Launch the graph in a session.

a session: a binding to a particular execution context sess.run(fetches, feeds)
Fetches: List of graph nodes. Return the outputs of these nodes.
Feeds: Dictionary mapping from graph nodes to concrete values. Specifies the value of each graph node given in the dictionary.

In [6]:

# Initial and Run Session
sess = tf.Session()
sess.run(tf.global_variables_initializer())
rand_array = np.random.rand(1, 1024)
sess.run(h_i, feed_dict={x: rand_array})

Out[6]:

array([[ 0.        ,  2.3799796 ,  0.        ,  0.        ,  0.        ,
         5.7116704 ,  9.892372  ,  0.        ,  0.        , 10.10139   ,
         0.        ,  0.40419406,  6.433485  ,  8.08982   ,  3.366507  ,
         0.        ,  2.9525614 ,  0.6404748 ,  0.        , 13.50066   ,
         0.        ,  0.        ,  0.        ,  0.        , 11.59197   ,
         0.        ,  0.        ,  7.8827147 , 14.467917  ,  0.        ,
         0.        ,  0.        ,  9.669572  ,  0.0907014 , 16.898396  ,
         0.        ,  0.        ,  3.7823548 ,  0.        ,  0.7162654 ,
         0.        , 17.48152   ,  0.        ,  3.1157236 ,  0.        ,
         1.0707028 ,  0.        ,  0.        ,  0.        ,  6.848372  ,
        11.503601  ,  0.        ,  0.        ,  3.4340348 ,  0.        ,
         1.9381552 ,  3.2755644 ,  6.5616198 , 10.4794655 ,  0.        ,
         1.5994972 ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  8.973095  , 11.838539  ,  0.        ,  0.5300804 ,
         0.        , 16.315956  ,  8.245536  ,  0.        , 11.83759   ,
         0.        , 14.958574  ,  0.        , 13.71896   , 21.845812  ,
         3.563043  , 16.940338  ,  0.        ,  8.182699  ,  9.952968  ,
         8.413696  , 15.420557  ,  0.        , 12.520411  ,  0.        ,
         8.671607  ,  0.        ,  0.        ,  3.582255  ,  0.        ,
        17.792744  ,  0.        ,  2.5677953 , 12.895992  ,  0.        ]],
      dtype=float32)

Visualizing the Variable Creation in TensorBoard

To visualize the creation of variables in Tensorboard, we will reset the computational graph and create a global initializing operation.

Typical TensorFlow graphs can have many thousands of nodes. To simplify, variable names can be grouped and the visualization uses this information to define a hierarchy on the nodes in the graph. By default, only the top of this hierarchy is shown. Here is an example that defines three operations under the hidden name scope using:tf.name_scope

Grouping nodes by name scopes is important to making a legible graph. If we are building a model, name scopes give us control over the resulting visualization. The better our name scopes, the better our visualization.

In [7]:

# Reset graph
ops.reset_default_graph()

b = tf.Variable(tf.zeros((100,)), name='biases')
W=tf.Variable(tf.random_uniform((1024, 100), -1, 1), name='weights')
x = tf.placeholder(tf.float32, (1, 1024), name="x")
h_i = tf.nn.relu(tf.matmul(x, W) + b, name="h_i")


# Initial and Run Session
with tf.Session() as sess:
    writer = tf.summary.FileWriter("/tmp/tensorflow_logs", sess.graph)
    sess.run(tf.global_variables_initializer())
    rand_array = np.random.rand(1, 1024)
    sess.run(h_i, feed_dict={x: rand_array})
    writer.flush()
    writer.close()

Therefore, we now run the following command in our command prompt:

$ <code>tensorboard --logdir=tmp/tensorflow_logs</code>

And it will tell us the URL we can navigate our browser to see Tensorboard. The default should be:

http://0.0.0.0:6006/

Here is the graph.

graph_1 — Our first TensorFlow graph as view in Tensorboard

Creating Tensors

TensorFlow has built-in function to create tensors for use in variables. For example, we can create a zero-filled tensor of predefined shape using the functiontf.zeros() as follows.

In [8]:

one_tensor = tf.zeros([1,50])

3. Fetch and feed data with Session.run

The Compilation, optimization, etc. are involved in this step. We probably will not notice

We can evaluate tensors by calling a run() method on our session.

In [9]:

# Start session
sess = tf.Session()
sess.run(tf.global_variables_initializer())

sess.run(one_tensor)

Out[9]:

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0.]], dtype=float32)

TensorFlow algorithms need to know which objects are variables and which are constants. Therefore, we create a variable using the TensorFlow function: tf.Variable() as follows.

In [10]:

one_var = tf.Variable(tf.zeros([1,64]))

Note that we can not run, sess.run(one_var) this would result in an error. Because TensorFlow operates with computational graphs, we have to create a variable initialization operation in order to evaluate variables. So, we can initialize one variable at a time by calling the variable method, one_var.initializer

In [11]:

sess.run(one_var.initializer)
sess.run(one_var)

Out[11]:

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]],
      dtype=float32)

It is very important to control the dimensions of our entities. So, this is a very sensitive point and due to the high quantity of data involved in NN calculations, that detail is of major importance. Let’s first start by creating variables of specific shape by declaring our row and column size.

In [12]:

row_dim = 3
col_dim = 5

Here are variables initialized to contain all zeros or ones.

In [13]:

zero_var = tf.Variable(tf.zeros([row_dim, col_dim]))
ones_var = tf.Variable(tf.ones([row_dim, col_dim]))

Now, we can call the initializer method on our variables and run them to evaluate their contents.

In [14]:

sess.run(zero_var.initializer)
sess.run(ones_var.initializer)
print(sess.run(zero_var))
print(sess.run(ones_var))

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]

In [15]:

# Type the entity class
zero_var

Out[15]:

<tf.Variable 'Variable_1:0' shape=(3, 5) dtype=float32_ref>

In [16]:

# Type the entity class
ones_var

Out[16]:

<tf.Variable 'Variable_2:0' shape=(3, 5) dtype=float32_ref>

Creating Tensors Based on Other Tensor’s Shape

If the shape of a tensor depends on the shape of another tensor, then we can use the TensorFlow built-in functions, ones_like()or,zeros_like()

In [17]:

other_zero_var = tf.Variable(tf.zeros_like(zero_var))
other_ones_var = tf.Variable(tf.ones_like(ones_var))

sess.run(other_zero_var.initializer)
sess.run(other_ones_var.initializer)

print(sess.run(other_zero_var))
print(sess.run(other_ones_var))

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]

Filling a Tensor with a Constant

Here is how we fill a tensor with a constant.

In [18]:

filled_var = tf.Variable(tf.fill([row_dim, col_dim], 3.14))
sess.run(filled_var.initializer)
print(sess.run(filled_var))

[[3.14 3.14 3.14 3.14 3.14]
 [3.14 3.14 3.14 3.14 3.14]
 [3.14 3.14 3.14 3.14 3.14]]

We can also create a variable from an array or list of constants.

In [19]:

# Create a variable from a constant
const_var = tf.Variable(tf.constant([3, 1, 4, 1, 5, 9, 2]))

# This can also be used to fill an array:
const_fill_array = tf.Variable(tf.constant([3,1,4,1,5,9,2,6,5,3,5,8,9,7,9], shape=[row_dim, col_dim]))

sess.run(const_var.initializer)
sess.run(const_fill_array.initializer)

print(sess.run(const_var))
print(sess.run(const_fill_array))

[3 1 4 1 5 9 2]
[[3 1 4 1 5]
 [9 2 6 5 3]
 [5 8 9 7 9]]

Creating Tensors Based on Sequences and Ranges

We can also create tensors from sequence generation functions in TensorFlow. The TensorFlow function, linspace() and, range() operate very similar to the python/numpy equivalents.

In [20]:

# Linspace in TensorFlow
linear_var = tf.Variable(tf.linspace(start=0.0, stop=1.0, num=5)) # Generates [0.,0.25,0.5,0.75, 1.] includes the end

# Range in TensorFlow
sequence_var = tf.Variable(tf.range(start=6, limit=17, delta=3)) # Generates [6, 9, 12, 15] doesn't include the end

sess.run(linear_var.initializer)
sess.run(sequence_var.initializer)

print(sess.run(linear_var))
print(sess.run(sequence_var))

[0.   0.25 0.5  0.75 1.  ]
[ 6  9 12 15]

Random Number Tensors

Certainly, we can initialize tensors that come from random numbers like following. In [21]:

rnorm_var = tf.random_normal([row_dim, col_dim], mean=0.0, stddev=1.0)
runif_var = tf.random_uniform([row_dim, col_dim], minval=0, maxval=4)

print(sess.run(rnorm_var))
print(sess.run(runif_var))

[[-1.3629563  -1.3664439  -0.72835475 -2.3570406  -0.31535667]
 [ 0.26235196  0.301876    0.20770198  2.2769299   1.7364241 ]
 [-0.388656    0.3083807   1.0538763   0.4854179  -0.41834855]]
[[0.67954206 2.5638103  1.9654655  2.6455693  2.5157561 ]
 [0.733438   0.48339128 3.7160473  2.1167154  1.9247737 ]
 [3.7721186  2.155336   2.2508612  3.9784613  3.4868307 ]]

Come on TRY IT! Don’t forget to visit my Github repository to get this series posts related notebooks:

https://github.com/parrondo/deeptrading

Black Belt

DeepTrading with TensorFlow

2019-06-122019-06-22
by parrondo

Do you want to maximize your trading knowledge using TensorFlow? Here are several tips that will surely help you.

Introduction

Within TodoTrader’s commitment related to the generation and dissemination of knowledge, I want to offer a series of tutorials on the use of TensorFlow for algorithmic trading.

The objective of these tutorials, which I will publish periodically, is to offer in a simple and didactic way, through practical examples, the basics and basic concepts essential for the task of algorithmic trading. At the end of the series, we will have developed an application that allows creating a neural network in TensorFlow, trainable and able to perform operations in the financial markets.

How TensorFlow Works

The complexity of the financial markets has forced to create trading strategies based on artificial intelligence (AI) models. The last ones require a large amount of computing and deep learning algorithms can easily need tens of millions of parameters and billions of connections. Algorithmic trading is full of data and calculations with the data. To deal with it, tensors (multidimensional data arrays) are ideal mathematical entities. And Tensorflow is the right software to use tensors. The training and use of those models require enormous computational resources, in addition, the TensorFlow library allows one to concentrate on the creativity of its solution and leave the infrastructure aside.

TensorFlow was open-sourced in November 2015. Since the inception date, TensorFlow has become Github’s most prominent machine learning repository. (https://github.com/tensorflow/tensorflow)

TensorFlow’s popularity is due to many things, but mainly because of the computational graph concept and the adaptability of the Tensorflow python API structure. This makes solving real problems with TensorFlow accessible to most programmers, even the beginner ones.

You can get all these tutorials in my Github repository:

https://github.com/parrondo/deeptrading

How TensorFlow Operates

Basics of TensorFlow is that first, we create a model which is called a computational graph with TensorFlow objects then we create a TensorFlow session in which we start running all the computation. This tutorial will talk you through pseudocode of how a Tensorflow algorithm usually works.

Tensorflow is supported on the three principal OS systems (Windows, Linux, and Mac). Throughout these Jupyter notebooks, we will only concern ourselves with the Python library wrapper of Tensorflow. This book will use Python 3.X (https://www.python.org) and Tensorflow 0.10+ (https://www.tensorflow.org). Tensorflow can run on the CPU, but it runs faster if it runs on the GPU, and it is supported on graphics cards with NVidia Compute Capability 3.0+. To run on a GPU, you will also need to download and install the NVidia Cuda Toolkit (https://developer.nvidia.com/cuda-downloads).

As usual, we use Conda environments to develop our code (https://github.com/parrondo/quant-trading-project-structure). Please look into the file inside the main directory of this repository, <em>environment.yml</em>, and run the command:

$ conda env create --file environment.yml

So you guarantee that all the necessary libraries are available.

Important Note: As I mentioned in my previous post, Build TensorFlow from Source in Centos 7, the binary files of TensorFlow for Linux is only available in Conda for CPU up to version 1.5. Therefore I preferred to limit the notebooks to this version to avoid possible problems for readers. However, these examples have been tested until the stable released version 1.12 working perfectly (compiled by me for CPU).

General TensorFlow Algorithm Workflow

Here we introduce the general workflow of TensorFlow Algorithms. This workflow can be follow as a template.

Load configuration

This is usually the first step. Here you import libraries and modules as needed. Also, load environment variables and configuration files.

Ingest data

All of machine learning algorithms depend on data. So, we either generate data or use an outside source of data. Sometimes it is better to rely on generated data because we will want to test the expected outcome. Most times we will access market data sets for the given research. in any case, it is convenient to have a well defined ingestion data model as we provide in this tutorial.

Output: raw dataset files under “data/raw” folder.

Basic pre-process data

The raw dataset usually has faults which difficult the next steps. In these steps, we proceed to clean data, manage missing data, define features and labels, encode the dependent variable and dataset time alignment when necessary.

Split data

This step is useful when you need to separate data into training and test sets. We can also customize the way to divide the data. Sometimes we need to support data randomization; but, a certain type of data or model type needs the design of other split methods.

Output: two dataset training dataset and test dataset, usually they are resident in memory but in case we need to save them, then “data/interim” is our folder.

Transform features

In general, the data is not in the correct dimension, structure or type expected by our TensorFlow trading algorithms. We have to transform the raw or provisional (interim) data before we can use them. Most algorithms also expect standardized (normalized) data and we will do this here as well. Tensorflow has built-in functions that can normalize the data for you.

  data = tf.nn.batch_norm_with_global_normalization(...)

Caution! Some algorithms require normalization of the data before training a model. Other algorithms, on the other hand, perform their own data scale or normalization. So, when choosing an automatic learning algorithm to use in a predictive model, be sure to review the algorithm data requirements before applying the normalization to the training data.

This stage include dimension reduction, when necessary.

Finally, in this step, we must have clear what will be the structure (dimensions) of the tensors that are involved in the input of data and in all calculations.

Output: two datasets transformed training dataset and transformed test dataset. It may be, this step is accomplished several times given several pairs of train-test datasets (i.e. normalized dataset, PCA dataset, standardized dataset,…)

Implement the model

Several sub-process expected here, describing as follow:

Set algorithm parameters

Algorithms usually have a set of parameters that we hold constant throughout the procedure (i.e. the number of iterations, the learning rate, or other fixed parameters). It is a good practice to initialize these together so the user can easily find them.

learning_rate = 0.005
a = b
iterations = 1000
epochs=50

Initialize variables and placeholders

we have to tell Tensorflow what it can and cannot modify. TensorFlow will modify the variables during optimization to minimize a loss function. To accomplish this, we feed in data through placeholders. Placeholder simply allocates a block of memory for future use. By default, placeholder has an unconstrained shape, which allows us to feed tensors of different shapes in a session. We need to initialize variables and define size and type of placeholders so that TensorFlow knows what to expect.

k_var = tf.constant(50)
x_train = tf.placeholder(tf.float32, [None, input_size])
y_train = tf.placeholder(tf.fload32, [None, num_classes])

Define the model structure

After we have the data and initialized variables and set placeholders, we have to define the model. This is done by mean of the powerful concept of a computational graph. The graph nodes represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. We tell Tensorflow what operations must be done on the variables and placeholders to get our model predictions. Most TensorFlow programs start with a dataflow graph construction phase. In this phase, we invoke TensorFlow API functions that construct new tf.Operation (node) and tf.Tensor (edge) objects and add them to a tf.Graph instance.

y_pred = tf.add(tf.mul(x_input, weight_matrix), b_matrix)

Set loss functions

After defining the model, we must be able to evaluate the output. THere we set the loss function. The loss function is very important a tells us how far off our predictions are from the actual values. There are several types of loss functions.

loss = tf.reduce_mean(tf.square(y_actual – y_pred))

Train the model

Now that we have everything in place, we create an instance or our computational graph and feed in the data through the placeholders and let Tensorflow change the variables to predict our training data. TensorFlow provides a default graph that is an implicit argument to all API functions in the same context. Here is one way to initialize the computational graph.

with tf.Session(graph=graph) as session:
     ...
     session.run(...)
     ...

Note that we can also initiate our graph with:

# Using the "close()" method.
 sess = tf.Session(graph=graph)
 sess.run(...)
 sess.close()        
 ...

Output: Trained model which is stored in the folder “models”

Evaluate the model

Once we have built and trained the model, we should evaluate the model by looking at how well it does on new data known as test data.

Hyperparameter optimization

This is not a mandatory step but it is convenient. The initial neural network is probably not the optimal one. So here we can tweak a bit in the parameters of the network to try to improve them. Then train an evaluate again and again until meet the optimization condition. As result, we get the final selected network. Output: Final selected trained model which is stored in the folder “models”

Predict

Yeees, this is the climax of our work!. We want to predict as much as possible, It is also important to know how to make predictions on new, unseen, data. The readers can do this with all the models, once we have them trained. So, We could say that this is the goal of all our algorithmic trading efforts. Output: A prediction. This will help us what to do with a selected financial instrument: Buy, Hold, Sell,…

Summary

TensorFlow is an open source software library for numerical computation using data flow graphs. To work with it, we have to setup the data, variables, placeholders, and model before we tell the program to train. Tensorflow accomplishes this through the computational graph. The graph nodes represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. We tell it to minimize a loss function and Tensorflow does this by modifying the variables in the model. Tensorflow knows how to modify the variables because it keeps track of the computations in the model and automatically calculates the gradients for every variable.

TensorFlow algorithms are designed to have a cyclic workflow. We set up this cycle as a computational graph and (1) feed in data through the placeholders, (2) calculate the output of the computational graph, (3) compare the output to the desired output with the aid of a loss function, (4) modify the model variables according to the automatic back propagation, and finally (5) repeat the process until a stopping criterion is met. (6) Then we evaluate the trained model and if we are confortable with it finally (6) we make predictions.

Remember all these TensorFlow tutorials will be in my github repository:

https://github.com/parrondo/deeptrading