Steves Computer Vision Blog: Classifying everything using your RPi Camera: Deep Learning with the Pi

(P.S. Those of you wanting to learn more about Deep Learning, check out this book:

https://www.kickstarter.com/projects/adrianrosebrock/deep-learning-for-computer-vision-with-python-eboo)

For those who don't want to read, the code can be found on my github with a readme:
https://github.com/StevenHickson/RPi_CaffeQuery
You can also read about it on my Hackaday io page here.

What is object classification?

Object classification has been a very popular topic the past couple years. Given an image, we want a computer to be able to tell us what that image is showing. The newest trend has been using convolutional neural networks in order to classify networks trained with a large amount of data.

One of the bigger frameworks for this is the Caffe framework. For more on this see the Caffe home page.
You can test out there web demo here. It isn't great at people but it is very good at cats, dogs, objects, and activities.

Why is this useful?

There are all kinds of autonomous tasks you can do with the RPi camera. Perhaps you want to know if your dog is in your living room, so the Pi can take his/her picture or tell him/her they are a good dog. Perhaps you want your RPi to recognize whether there is fruit in your fruit drawer so it can order you more when it is empty. The possibilities are endless.

How do convolutional neural networks work (a VERY simple overview)?

Convolutional neural networks are based loosely off how the human brain works. They are built of layers of many neurons that are "activated" by certain inputs. The input layer is connected in a network through a series of interconnected neurons in hidden layers like so:

[1]

Each neuron sends its signal to any other neuron it is connected to which is then multiplied by the connection weight and run through a sigmoid function. The training of the network is done by changing the weights in order to minimize the error function based on a set of inputs with a known set of outputs using back propagation.

How do we get this on the Pi?

Well I went ahead and compiled Caffe on the RPi. Unfortunately since it doesn't have code to optimize the network with it's GPU, the classification takes ~20-25s per image, which is far too much.

Note: I did find a different optimized CNN network for the RPi by Pete Warden here. It looks great but it still takes about 3 seconds per image, which still doesn't seem fast enough.

You will also need the Raspberry Pi camera which you can get from here:
Raspberry PI 5MP Camera Board Module

A better option: Using the web demo with python

So we can take advantage of the Caffe web demo and use that to reduce the processing time even further. With this method, the image classification takes ~1.5s, which is usable for a system.

How does the code work?

We make a symbolic link from /dev/shm/images/ to our /var/www for apache and forward our router port 5050 to the Pi port 80.

Then we use raspistill to take an image and save it to memory as /dev/shm/images/test.jpg. Since this is symlinked in /var/www, we should be able to see it at http://YOUR-EXTERNAL-IP:5005/images/test.jpg.

Then we use grab to qull up the Caffe demo framework with our image and get the classification results. This is done in queryCNN.py which gets the results.

What does the output look like?

Given a picture of some of my Pi components, I get this, which is pretty accurate:

Where can I get the code?

https://github.com/StevenHickson/RPi_CaffeQuery

[1] http://white.stanford.edu/teach/index.php/An_Introduction_to_Convolutional_Neural_Networks

Consider donating to further my tinkering since I do all this and help people out for free.

Places you can find me

Steves Computer Vision Blog

Tuesday, March 31, 2015

Classifying everything using your RPi Camera: Deep Learning with the Pi