Training

The training component is the one that will be used to automatically schedule and launch training jobs.

To automate the training process, we will use Prefect to schedule and run the training jobs and Kedro that handles the different steps represented as modular pipelines.

Setup

Before we start, you need to check if you are in the right environment. If you are not, follow these instructions to setup the required environment.

Once you have activated your working environment and make-us-rich is installed, we are ready to start.

Initialize the training component

To initialize the training component, run the following command:

mkrich init training

This command will create a directory named mkrich-training in your current working directory.

All required files for the training component will be created in this directory.
Now navigate to the directory and open it in your favorite text editor:

cd mkrich-training

# I'm using VS Code
code .

Don't worry if you don't use VS Code, you can open the directory in any text editor of your choice.

Configure the env variables

Binance API

The training component requires some environment variables to be set.

This component uses the Binance exchange API to fetch the required data. You can easily create an account on Binance by following the instructions and then get your API key and secret.

When your binance account is created, you will be able to access the API key and secret in your account settings. Please follow these FAQ instructions to get your API key and secret.

Warning

Never share your API key and secret with anyone. I recommend to create an read-only API key and secret, in order to avoid any security risk. read-only means that you cannot trade with the API key and secret.

Creating and validating your account could take a while. Please be patient, it's worth it. After everything is done, the access to the Binance API will be immediately available.

Once you have your API key and secret, you can set them as environment variables in the dedicated env file located in /mkrich-training/conf/base/.env-binance.

Info

To avoid credentials exposure, a .gitignore file is created during the initialization process to exclude the conf/ folder from the git repository.

Object Storage

In this project we will use Minio as object storage. You can use any other object storage service, but Minio is a good choice, because it is easy to use and it is free. You can use AWS S3 or Azure as well because they are all compatible with Minio.

Because running an object storage service is out of the scope of this project, we won't describe how to set up the service here and assume you already have it.

Note

If you don't have any object storage service, check online ressources and you will find a lot of tutorials.

Finally, if you don't succeed, you still can open an issue on the make-us-rich repository to ask for help. Maybe, I will be able to help you to set up your object storage service 🤗.

When you have your object storage service ready, you can change all required env variables in the env file located in /mkrich-training/conf/base/.env-minio.

ACCESS_KEY and SECRET_KEY are the credentials to access your object storage service. I recommend to create an dedicated user for this purpose, with limited permissions.
The user only needs to upload and download files in the bucket you defined as BUCKET in the env file. The ENDPOINT is the URL of your object storage service used to interact with it.

Launch

There is two steps to make the training component fully functional.

Setup Prefect

Prefect is an usefull tool to automating and monitoring the execution of tasks. We will use this great tool combined with Kedro to automate the training process.

Kedro is a Python framework that allows to build modular pipelines.

Because I already configured everything for you, you can launch the training component by running the following command:

mkrich run training

Tip

Be sure to be in the mkrich-training directory before running the command.

When the training component is launched, you will get access to a nice dashboard that will show you the status of registered flows. A flow is a set of tasks that will be executed in sequence by a Prefect worker, also known as an agent.

By default, the dashboard is available at http://127.0.0.1:8080. You can navigate to the flows tab to see the registered flows. If everything is working, you should see three different flows: btc_usdt, eth_usdt and chz_usdt which belongs to the make-us-rich project.

All registered flows are scheduled to run every hour by default. You can change this behavior by editing the schedule of each flow.

For the moment, no flow is running because Prefect needs an available agent to run the flows. So let's create a worker to run the flows.

Create a local agent

To create a local agent, you need to run the following command:

mkrich start agent

This will start a local agent that will run the registered flows in parallel in the background.

If you don't want to run a local agent, you can refer to the Prefect documentation to create a remote agent or an agent in a Docker container.

Stop the training component

You can simply stop the agent by closing the terminal where you started it.

To stop all the training component, run the following command:

mkrich stop

This command will stop all containers launched by Prefect and remove them.

Training pipeline

Steps of the training pipeline

There is 5 steps for the pipeline to complete:

🪙 Fetching data from Binance API.
🔨 Preprocessing data:
- Extract features from fetched data.
- Split extracted features.
- Scale splitted features.
- Create sequences with scaled train features.
- Create sequences with scaled test features.
- Split train sequences as train and validation sequences.
🏋️ Training model.
🔄 Converting model to ONNX format.
📁 Uploading converted model to object storage service.

Once all the steps are completed, you will have a trained model ready to be used and available in the object storage.

Models are used by the serving component to make predictions.

Training parameters

You can change the training parameters by editing the dedicated file located in /mkrich-training/conf/base/parameters.yml.

GPU or CPU

The run_on_gpu parameter defines if the training will be done on a GPU or not. Set it to True if you have a GPU and you want to use it for training, otherwise set it to False.

WandB logging

The training pipeline will log the training process to WandB.

You will need to create an account on WandB in order to use this feature. It's free for a personal user and you can create as many projects as you want.

Once registered, you need to authenticate though the wandb cli tool. You can do it by running the following command:

wandb login

Don't forget to change the wandb_project parameter in the parameters.yml file if you want to use a different name for your project.

Tip

PyTorch Lightning has a WandB integration feature which allows us to automatically log all the training process to the WandB platform.

Find help

If you need help to understand or to use the training component, please open an issue on the make-us-rich repository. I will be able to help you!

Last update: 2022-05-04