Attention

The DEEP platform is sunsetting.

The DEEP-Hybrid-DataCloud project project has ended and its plaftform and software are being decomissioned during 2023, as they have been superseeded by the AI4EOSC platform and the AI4OS software stack.

Please refer to the following links for further information:

AI4OS software distribution documentation

AI4EOSC project

AI4EOSC platformt

Train a model locally

Useful video demos

Running a module locally with docker

This is a step by step guide on how to train a module from the Marketplace with your own dataset, on your local machine.

In this tutorial we will see how to retrain a generic image classifier on a custom dataset to create a phytoplankton classifier. If you want to follow along, you can download the toy phytoplankton dataset here.

If you are new to Machine Learning, you might want to check some useful Machine Learning resources we compiled to help you getting started.

Requirements

having Docker installed. For an up-to-date installation please follow the official Docker installation guide. As you will likely be using GPUs for training, you also have to install the nvidia-container-toolkit to make them visible from inside the container.

1. Choose a module from the Marketplace

The first step is to choose a model from the DEEP Open Catalog marketplace. Make sure to select a module with the trainable tag. For educational purposes we are going to use a general model to identify images. This will allow us to see the general workflow.

Once we have chosen the model at the DEEP Open Catalog marketplace we will find that it has an associated docker container in DockerHub. For example, in the example we are running here, the container would be deephdc/deep-oc-image-classification-tf. So let’s pull the docker image from DockerHub:

$ docker pull deephdc/deep-oc-image-classification-tf

Docker images have usually tags depending on whether they are using master or test and whether they use cpu or gpu. Tags are usually:

latest or cpu: master + cpu
gpu: master + gpu
cpu-test: test + cpu
gpu-test: test + gpu

You tipically want to run your training on master with a gpu:

$ docker pull deephdc/deep-oc-image-classification-tf:gpu

Tip

Instead of pulling from Dockerhub, it’s also possible to build the image yourself:

$ git clone https://github.com/deephdc/deep-oc-image-classification-tf
$ cd deep-oc-image-classification-tf
$ docker build -t deephdc/deep-oc-image-classification-tf .

2. Prepare your dataset

For this tutorial, we will assume that you also have your data stored locally. If you have your data in a remote storage, check the rclone docs to see of you can copy them to your local machine.

When training a model, the data has usually to be in a specific format and folder structure. It’s usually helpful to read the README in the source code of the module (in this case located here) to learn the correct way to setting it up.

In the case of the image classification module, we will create the following folders:

A folder called models where the new training weights will be stored after the training is completed
A folder called data that contains two different folders:
- The sub folder images containing the input images needed for the training
- The sub folder dataset_files containing a couple of files:
  train.txt indicating the relative path to the training images
  
  classes.txt indicating which are the categories for the training

Again, the folder structure and their content will of course depend on the module to be used. This structure is just an example in order to complete the workflow for this tutorial.

3. Run your module

When running the Docker container, you have to make sure that the data folder is accessible from inside the container. This is done via the Docker volume -v flag:

$ docker run -ti -p 5000:5000 -p 6006:6006  -p 8888:8888 -v path_to_local_folder:path_to_docker_folder deephdc/deep-oc-image-classification-tf

We also need to make GPUs visible from inside the container using the --runtime=nvidia (or the --gpus all flag).

In our case, the final command, mounting the data folder and the model weights folder (where we will later retrieve the newly trained model), looks as following:

$ docker run -ti -p 5000:5000 -p 6006:6006  -p 8888:8888 -v /home/ubuntu/data:/srv/image-classification-tf/data -v /home/ubuntu/models:/srv/image-classification-tf/models --runtime=nvidia deephdc/deep-oc-image-classification-tf:gpu

4. Open the DEEPaaS API and train the model

Go to http://0.0.0.0:5000/ui and look for the train POST method. Modify the training parameters you wish to change and execute.

If some kind of monitorization tool is available for this model you will be able to follow the training progress from http://0.0.0.0:6006.

5. Test and export the newly trained model

Once the training has finished, you can directly test it by clicking on the predict POST method. For this you have to kill the process running deepaas, and launch it again.

$ kill -9 $(ps aux | grep '[d]eepaas-run' | awk '{print $2}')
$ kill -9 $(ps aux | grep '[t]ensorboard' | awk '{print $2}')  # optionally also kill monitoring process

This is because the user inputs for deepaas are generated at the deepaas launching. Thus it is not aware of the newly trained model. Once deepaas is restarted, head to the predict POST method, select you new model weights and upload the image your want to classify.

If you are satisfied with your model, then it’s time to save it into your remote storage, so that you still have access to it if your machine is deleted. For this we have to create a tar file with the model folder (in this case, the foldername is the timestamp at which the training was launched) so that we can download in our Docker container.

For the next step, you need to make them publicly available through an URL so they can be downloaded in your Docker container. In Nextcloud, go to the tar file you just created: ➜ Share Link ➜ (Create a new share link)

6. Next steps

The next steps are common with the remote training tutorial.