Hand Detection using ObjectDetection API on Android

Photo of Przemek Dąbrowski

Przemek Dąbrowski

Updated Jan 4, 2023 • 8 min read
shttefan-280960-unsplash-265393-edited

During the last Netguru team dinner a friend showed me a neat feature of his fancy Samsung device - Gesture Control to Take a Selfie (detecting hand to take a selfie).

For some time now I’ve been interested in machine learning and I thought of implementing this myself. To solve this problem I’ve used Object Detection API SSD MultiBox model using mobilenet feature map extractor pretrained on COCO(Common Objects in Context) dataset. Follow these steps to create a simple hand detection app and see the results of my experiment:

    1. The first step is to install all the necessary dependencies and clone the Object Detection API repository. The ObjectDetection API repository can be cloned from https://github.com/tensorflow/models.
    2. Before we can start training the model we need some input data for training and evaluation, in a format accepted by the ObjectDetection API - TFRecord. Additionally we should specify the label map, which does the mapping of class id to class name. Labels should be identical for training and evaluation datasets. To accomplish this step I’ve used this script, that fetches human hand pictures dataset from http://www.robots.ox.ac.uk/~vgg/data/hands/index.html and creates necessary files. Output of this script: hands_train.record, hands_val.record, hands_test.record and hands_label_map.pbtxt.
    3. When the input files are ready we can start configuring our model. ObjectDetection API uses protobuf files to configure train and eval work, more info about configuration pipeline can be found here. ObjectDetection API provides several sample configuration. Those configurations are a good starting point, with minimal effort you’ve got working configuration. As I wrote on the beginning of this post I’ve used ssd_mobilenet_v1_coco.config. I’ve changed following parameters:
      1. num_classes to 1 because I wanted to detect only one type of objects - hand.
      2. num_steps to 15000 because running locally can take forever :D
      3. fine_tune_checkpoint to location of earlier downloaded frozen model ssd_mobilenet_v1_coco_2017_11_17/model.ckpt.
      4. input_path and label_map_path of train_put_reader and eval_input_reader path to previously generated hands_train.record, hands_test.record and hands_label_map.pbtxt files.
    4. I’ve trained model on my local machine, to do this I’ve used script from library:

      python object_detection/train.py \
      --logtostderr \
      --pipeline_config_path=object_detection/ssd_mobilenet_v1_hands.config \
      --train_dir=object_detection/training/

    5. The library also provides a script to evaluate the model during and after training:

      python object_detection/eval.py \
      --logtostderr \
      --train_dir=object_detection/training/ \
      --pipeline_config_path=object_detection/ssd_mobilenet_v1_hands.config \
      --checkpoint_dir=object_detection/training/ \
      --eval_dir=object_detection/training/

    6. After the work is done we can freeze our trained model using the following script:

      python object_detection/export_inference_graph.py \
      --input_type image_tensor \
      --pipeline_config_path object_detection/ssd_mobilenet_v1_hands.config \ --trained_checkpoint_prefix object_detection/training/model.ckpt-15000 --output_directory object_detection/frozen/

    7. Now it’s time to use our model to detect a hand on a mobile device’s camera preview. To implement this quickly I’ve used a demo project from the TensorFlow repo. I’ve cloned it and imported a project from examples/android directory. Project can be build using different build systems: bazel, cmake, makefile, none. I’ve built project using cmake, to do this I’ve changed in build.gradle variable nativeBuildSystem to cmake(I had a problems with others build systems).
    8. In order to use the frozen model and labels, we need to put them in the assets directory, then assign the assets names to the TF_OD_API_MODEL_FILE and TF_OD_API_LABELS_FILE variables in DetectorActivity class. Additionally I’ve changed the camera preview to use the front camera and implemented a toast message to pop up when a hand is detected.
    9. Results of my experiment:
      2018_03_27_09_19_27

Thanks to the ObjectDetection API I was able to easily implement a hand detector app. The most confusing part was installing all the dependencies. In this article I’ve listed all the required steps to train a custom model and use it in an Android application. I hope this article will help you create some cool apps using machine learning.

Photo of Przemek Dąbrowski

More posts by this author

Przemek Dąbrowski

Przemek is in love with mobile technology, it’s amazing how smartphones changed the world. His...
Lost with AI?  Get the most important news weekly, straight to your inbox, curated by our CEO  Subscribe to AI'm Informed

We're Netguru

At Netguru we specialize in designing, building, shipping and scaling beautiful, usable products with blazing-fast efficiency.

Let's talk business