During the last Netguru team dinner a friend showed me a neat feature of his fancy Samsung device - Gesture Control to Take a Selfie (detecting hand to take a selfie). For some time now I’ve been interested in machine learning and I thought of implementing this myself. To solve this problem I’ve used Object Detection API SSD MultiBox model using mobilenet feature map extractor pretrained on COCO(Common Objects in Context) dataset. Follow these steps to create a simple hand detection app and see the results of my experiment:
Before we can start training the model we need some input data for training and evaluation, in a format accepted by the ObjectDetection API - TFRecord. Additionally we should specify the label map, which does the mapping of class id to class name. Labels should be identical for training and evaluation datasets. To accomplish this step I’ve used this script, that fetches human hand pictures dataset from http://www.robots.ox.ac.uk/~vgg/data/hands/index.html and creates necessary files. Output of this script: hands_train.record, hands_val.record, hands_test.record and hands_label_map.pbtxt.
When the input files are ready we can start configuring our model. ObjectDetection API uses protobuf files to configure train and eval work, more info about configuration pipeline can be found here. ObjectDetection API provides several sample configuration. Those configurations are a good starting point, with minimal effort you’ve got working configuration. As I wrote on the beginning of this post I’ve used ssd_mobilenet_v1_coco.config. I’ve changed following parameters:
num_classes to 1 because I wanted to detect only one type of objects - hand.
num_steps to 15000 because running locally can take forever :D
fine_tune_checkpoint to location of earlier downloaded frozen model ssd_mobilenet_v1_coco_2017_11_17/model.ckpt. Frozen models can be downloaded here.
input_path and label_map_path of train_put_reader and eval_input_reader path to previously generated hands_train.record, hands_test.record and hands_label_map.pbtxt files.
I’ve trained model on my local machine, to do this I’ve used script from library:
Now it’s time to use our model to detect a hand on a mobile device’s camera preview. To implement this quickly I’ve used a demo project from the TensorFlow repo. I’ve cloned it and imported a project from examples/android directory. Project can be build using different build systems: bazel, cmake, makefile, none. I’ve built project using cmake, to do this I’ve changed in build.gradle variable nativeBuildSystem to cmake(I had a problems with others build systems).
In order to use the frozen model and labels, we need to put them in the assets directory, then assign the assets names to the TF_OD_API_MODEL_FILE and TF_OD_API_LABELS_FILE variables in DetectorActivity class. Additionally I’ve changed the camera preview to use the front camera and implemented a toast message to pop up when a hand is detected.
Results of my experiment:
Thanks to the ObjectDetection API I was able to easily implement a hand detector app. The most confusing part was installing all the dependencies. In this article I’ve listed all the required steps to train a custom model and use it in an Android application. I hope this article will help you create some cool apps using machine learning.