Hi guys, today I am going to introduce some recent object detection techniques using CNN. This post is written based on CS231n of Stanford University.
1. What is Object Detection?
Object Detection is figuring out whether there exist objects (from some fixed set of categories we are interested in) in the input image, and localizing that objects.
The difference between object detection and classification+localization problem is
- there might be varying number of outputs for every input image
- you don’t know ahead of time how many objects you expect to find in each image
2. Evolution of Object Detection Techniques Using CNN
- Run some Region Proposal Network and get 2k Regions of Interest (ROI). This is not learned network though. Rather, it uses traditional computer vision techniques like Selective Search, which looks for blob regions.
- Warp ROIs to fixed size. This is because the rest part of R-CNN pipelines requires fixed size & square input.
- Run each ROIs through a convolutional neural network. Classify ROIs with SVMs. Predicts regression. What I mean by “Predicts regression” is that, do some correction (offset) to the bounding box proposed at the region proposal stage. Note that sometimes the final predicted bounding box will be outside the region of interest.
* There is also “background” class so that when there exists no object in the ROI, R-CNN predicts background to say that there was no object here.
- Rather than processing each region of interest separately, instead we’re going to run the entire image through some convolutional layers all at once to give high resolution convolutional feature map corresponding to the entire image.
- We still use region proposals from some fixed thing like Selective Search,