Real time Image recognition on single core Cortex-A9 @ 1GHz

Pedestrian detection 800% faster running on Cortex-A15 at 12 fps

We just made FAST9 faster !

FAST9 is a keypoint detection algorithm proposed by Edward Rosten and Tom Drummond(Dept of Engineering, Cambridge University, UK). The primary purpose behind proposing this scheme was to make it computationally feasible to run on embedded processor.

FAST9 involves checking if a sequence of neighbouring points(in this case 9) are bigger or smaller than the central pixel by a particular threshold. This involves a lot of if else conditions which are computationally very expensive on an embedded processor. Instead Rosten and Drummond proposed that one doesnt have to really check for all the conditions to verify if a point is corner point or not. Instead you could make a few quick checks and then do an early exit, if it appears that the point under consideration is not a corner point. For arriving at the most optimal sequence of the quick checks, Rosten and Drummond proposed using machine learning.

Uncanny Vision was determined to make this keypoint  detection scheme even faster, by utilizing the ARM Neon instruction set. We used a different technique to optimize FAST9. The performance of our implementation is not image or threshold dependent, whereas this is not the case with the original implementation proposed by the authors.

The original implementation had 2 steps to it. First was to detect potential corner points, based on a threshold. Step 2 was to rate these detected points using a particular score. This scoring can be used to select the best points amongst the corner points detected. The original paper proposes a few possible scoring functions, while OpenCV implements one particular option from amongst these. Uncanny Vision chose a different scoring function(one amongst the options suggested in the paper), which suits our implementation.

Profile details for FAST9 are as follows on a Cortex A8(profiled in beagle board running Linux, clocked at 720 MHz) for  a 640×480 image


Threshold = 20 , Total Corners =8702
OpenCV
FAST9       : 21.5 ms
FAST score(OpenCV version)      : 11.7 ms
Total time = 21.5 + 11.7 = 33.2ms

UncannyCV
FAST9+FAST score(Uncanny version)     :18.4 ms

Threshold = 10, Total Corners =23388
OpenCV
FAST9        : 24.6 ms
FAST score(OpenCV version)      : 24.9 ms
Total time = 24.6 + 24.9 = 49.5ms

UncannyCV
FAST9+FAST score(Uncanny version)     :17.9 ms

ORB for real-time tracking on an ARM processor

Feature detection and matching is at the heart of several computer vision algorithms. In particular its used for object recognition, structure from motion, image stitching etc. SIFT and SURF are the 2 popular algorithms in this space. However both of them are patented which makes it unattractive to many users. More importantly both algorithms is computationally very intensive, and too slow for use in real-time algorithms.

ORB was proposed as an alternative to these 2 algorithms. THe original paper(Ethan Rublee, Vincent Rabaud, Kurt Konolige, Gary R. Bradski: ORB: An efficient alternative to SIFT or SURF. ICCV 2011: 2564-2571.) shows that its an order of magnitude faster than SURF and two orders of magnitude faster than SIFT. ORB is rotation invariant, but only partially scale invariant. We decided to test ORB for feature detection and tracking. The objective was to use it for tracking objects in a moving video, using a Cortex A9 ARM processor. Try if ORB can be used as an alternative to say Lucas Kanade Optical Flow.

The ORB feature descriptor and matching despite being significantly faster than SIFT and SURF, is not fast enough to meet real-time requirement for tracking on Cortex A9. Obviously we were excited at the prospect of making ORB running real-time on our OMAP4 Pandaboard.

Though OMAP4 has a dual-core 1GHZ A9 processor, for the sake of our experiments we ran our application in a single thread to ensure we used only a single core(using both cores can make the results confusing).

We only used a single scale for ORB since for tracking in video only a single frame is required. The keypoint detection used is FAST9 as it is with the original paper. We had optimized FAST9 and written about in a previous blog post.

Below are the results:

On single core Cortex A9(1GHz.) For single scale(since its for tracking application) and 500 keypoints.

UncannyCV

FAST9 kepoint detection and ORB descriptor creation – 33ms

ORB descriptor matching – 5.5ms

Total time = 33 + 5.5 = 38.5ms

OpenCV

FAST9 kepoint detection and ORB descriptor creation – 116ms

ORB descriptor matching(already Neon optimized by OpenCV) – 71ms

Total time = 116 + 71 = 187ms

For single scale and 1500 keypoints.

UncannyCV

FAST9 kepoint detection and ORB descriptor creation – 48.3ms

ORB descriptor matching – 58.5ms

Total time = 48.3 + 58.5 = 106.8ms

OpenCV

FAST9 kepoint detection and ORB descriptor creation – 148.3ms

ORB descriptor matching(already Neon optimized by OpenCV) – 493.7ms

Total time = 148.3 + 493.7 = 642ms

The images we used for our testing are 2 consecutive frames from the daimler pedestrian detection dataset.

The images we used for our testing are 2 consecutive frames from the daimler pedestrian detection dataset.

Setting up OpenCV2.4.2

For the last few days I had been trying to upgrade my Angstrom version to 2012.05 from Angstrom  v2010.07 on Beagleboard. An older version of u-boot.bin and TI X-loader 1.9.2  were used to boot the old Angstrom. There were a lot of changes in the boot sector too which even removes the use of X-loader completely. My aim was to have the latest Angstrom version 2012.05 which have the graphical interface too along with the console and to get OpenCV2.4.2 running and make it possible to compile an OpenCV project on the embedded processor. There were some difficulties in getting it all done and so I thought to bring it all together to a single location. The host system is Ubuntu.

http://downloads.angstrom-distribution.org/demo/beagleboard/ consists of all the the images of the rootfiles. 6 various versions of the rootfiles are available in the list at present. But sadly 3 of them are the  only one that supports GUI of them 2 are outdated and the latest is having some trouble in booting.

Angstrom-systemd-GNOME-image-eglibc-ipk-v2012.05-beagleboard.rootfs.tar.bz2 seems to be the apt file for us. But its reported to have lot of troubles to many users. The most notable is that GUI is either working with errors or not working at all!

The same file is present in ‘untested’ folder too, which satisfies all our criteria.

In short, the following steps will help in setting up the beagleboard:

  1. Download http://downloads.angstrom-distribution.org/demo/beagleboard/untested/Angstrom-systemd-GNOME-image-eglibc-ipk-v2012.05-beagleboard.rootfs.tar.bz2

  2. Download http://downloads.angstrom-distribution.org/demo/beagleboard/MLO

  3. Download http://downloads.angstrom-distribution.org/demo/beagleboard/u-boot.img

  4. Download http://downloads.angstrom-distribution.org/demo/beagleboard/mkcard.txt

  5. Insert the card in the host machine.

  6. Type ‘sudo sh mkcard.txt /dev/sdX’ in a terminal where ‘X’ stands for the drive letter of SD card.

  7. On completion, mount both the partitions, by entering into those folders.

  8. Copy MLO to 1st partition followed by copying u-boot.img

  9. sudo tar -xjv -C /media/rootfs -f /path/to/Angstrom-systemd-GNOME-image-eglibc-ipk-v2012.05-beagleboard.rootfs.tar.bz2
  10. Copy the uImage file from 2ndpartition/boot to the first partition

  11. Unmount both the partitions.

  12. Insert the card in the beagleboard and reboot while pressing the ‘User button’ on the board.

  13. On booting press ‘Enter’

  14. Incase, we want to have the bootfiles in the NAND memory do the following, else skip this step:

    1. mmc rescan 0

    2. fatload mmc 0 82000000 MLO

    3. nandecc hw

    4. nand erase 0 80000

    5. nand write 82000000 0 2000

    6. nand write 82000000 20000 20000

    7. nand write 82000000 40000 20000

    8. nand write 82000000 60000 20000

    9. fatload mmc 0 0x82000000 u-boot.img

    10. nandecc hw

    11. nand erase 80000 170000

    12. nand write 0x82000000 80000 170000

    13. nand erase 260000 20000

  15. The following steps setup the environment variables:

  1. setenv bootargs ‘console=ttyO2,115200n8, root=/dev/mmcblk0p2 init=/init rw rootwait video=omapfb:vram:12M’
    setenv bootcmd ‘mmc rescan;fatload mmc 0 82000000 uImage;bootm 82000000’
    saveenv
  2. boot

You should now have the Angstrom successfully loaded and display the login prompt in the GUI.

On connecting to net, many files are to be downloaded to have opencv running. Do the following for this.

Open a terminal and type:

  1. opkg update
  2. opkg install g++ gcc
  3. (If typing ‘g++’ results to ‘keyword not found’, rename or make a copy of /usr/bin/arm-angstrom-linux-gnueabi-g++ and name it g++.)
  4. (Similarly if typing ‘gcc’ results to ‘keyword not found’, rename or make a copy of /usr/bin/arm-angstrom-linux-gnueabi-gcc and name it gcc.)
  5. opkg install opencv libopencv-calib3d2.4 libopencv-contrib2.4 libopencv-core2.4 libopencv-features2d2.4 libopencv-flann2.4 libopencv-gpu2.4 libopencv-highgui2.4 libopencv-imgproc2.4 libopencv-legacy2.4 libopencv-ml2.4 libopencv-nonfree2.4 libopencv-objdetect2.4 libopencv-photo2.4 libopencv-stitching2.4 libopencv-ts2.4 libopencv-video2.4 libopencv-videostab2.4
  6. (Try compiling an OpenCV file. If the headers are not detected, /usr/pkgconfig/opencv.pc file is to be edited. Rename all the *.so files to *.so.2.4.2)
  7. opkg install task-native-sdk make

The above steps will successfully help in setting up the required files for compiling an OpenCV code in beagleboard.

At this point, it is always advisable to create a backup of the SD card in ‘.gz’ format. In future if the card were to get corrupt, we can get a working version running quickly using the backed-up .gz file. ‘.gz’ file will even handle the partitioning of the SD card too.

 

5 new algorithms

Additional 5 algorithms release in Jan 2013.

UncannyCV

ULTRA FAST COMPUTER VISION LIBRARY ON ARM
 
Software library containing image processing and Computer Vision kernels that have been optimized for ARM architecture primarily using the NEON instruction set. This library is for Computer Vision algorithm developers who want to quickly get on to an embedded platform

(more…)

Sales office at Japan

We have a sales office in Japan now:

BTG CONSULTING,

4F 4-8-6 Roppongi,

Minato-ku,

Tokyo 106-0032, Japan

(TEL) 03-6439-1188