{"id":9669,"date":"2020-09-15T06:36:39","date_gmt":"2020-09-15T06:36:39","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/?p=9669"},"modified":"2023-11-07T09:17:27","modified_gmt":"2023-11-07T09:17:27","slug":"the-nuts-and-bolts-of-deep-learning-algorithms-for-object-detection","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/the-nuts-and-bolts-of-deep-learning-algorithms-for-object-detection\/","title":{"rendered":"The Nuts and Bolts of Deep Learning Algorithms for Object Detection"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"9669\" class=\"elementor elementor-9669\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-2906e0bd elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"2906e0bd\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-2465809\" data-id=\"2465809\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-19978ddb elementor-widget elementor-widget-text-editor\" data-id=\"19978ddb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>You just got a new drone and you want it to be super smart! Maybe it should detect whether workers are properly wearing their helmets or how big the cracks on a factory rooftop are.<\/p>\n\n\n\n<p>In this blog post, we\u2019ll look at the basic methods of object detection (Exhaustive Search, R-CNN, Fast R-CNN and Faster R-CNN) and try to understand the technical details of each model. The best part? We\u2019ll do all of this without any formula, allowing readers with all levels of experience to follow along!<\/p>\n\n\n\n<p>Finally, we will follow this post with a second one, where we will take a deeper dive into Single Shot Detector (SSD) networks and see how this can be deployed\u2026 on a drone.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2266b9a elementor-widget elementor-widget-heading\" data-id=\"2266b9a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Our First Steps Into Object Detection<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e3caeab elementor-widget elementor-widget-heading\" data-id=\"e3caeab\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Is It a Bird? Is It a Plane?<strong>\u2014\u00a0<\/strong>Image Classification<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1298597 elementor-widget elementor-widget-image\" data-id=\"1298597\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1400\/0*o8QFPl64GIUk1ZCE\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f94edc3 elementor-widget elementor-widget-text-editor\" data-id=\"f94edc3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>Object detection (or recognition) builds on image classification. Image classification is the task of \u2014 you guessed it\u2014classifying an image (via a grid of pixels like shown above) into a class category. For a refresher on image classification, we refer the reader to\u00a0<a href=\"https:\/\/towardsdatascience.com\/wtf-is-image-classification-8e78a8235acb\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">this post<\/a>.<\/p>\n\n\n\n<p>Object recognition is the process of identifying and classifying objects inside an image, which looks something like this:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b587e94 elementor-widget elementor-widget-image\" data-id=\"b587e94\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1097\/0*SNvltLm8eGH5F5AQ\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-28258ce elementor-widget elementor-widget-text-editor\" data-id=\"28258ce\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>In order for the model to be able to learn the class and the position of the object in the image, the target has to be a five-dimensional label (class, x, y, width, length).<\/p>\n\n\n<hr class=\"wp-block-separator\" \/>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5624b86 elementor-widget elementor-widget-heading\" data-id=\"5624b86\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">The Inner Workings of Object Detection Methods<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-09767f1 elementor-widget elementor-widget-heading\" data-id=\"09767f1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">A Computationally Expensive Method: Exhaustive Search<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-1bb48d7 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"1bb48d7\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-a64f830\" data-id=\"a64f830\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-44a9e58 elementor-widget elementor-widget-text-editor\" data-id=\"44a9e58\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>The simplest object detection method is using an image classifier on various subparts of the image. Which ones, you might ask? Let\u2019s consider each of them:<\/p>\n\n\n\n<p>1. First, take the image on which you want to perform object detection.<\/p>\n\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4c9bdb5 elementor-widget elementor-widget-image\" data-id=\"4c9bdb5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/648\/1*PoaFBde3tyE9qoHCESp_Nw.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fe1b040 elementor-widget elementor-widget-text-editor\" data-id=\"fe1b040\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n\n<p>2. Then, divide this image into different sections, or \u201cregions\u201d, as shown below:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-174eeb0 elementor-widget elementor-widget-image\" data-id=\"174eeb0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/632\/1*Zu_UDQCwfMh4k9eoTDSGWg.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ef704b9 elementor-widget elementor-widget-text-editor\" data-id=\"ef704b9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>3. Consider each region as an individual image.<\/p>\n\n\n\n<p>4. Classify each image using a classic image classifier.<\/p>\n\n\n\n<p>5. Finally, combine all the images with the predicted label for each region where one object has been detected.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-4f93e5e elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"4f93e5e\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-06f7aa3\" data-id=\"06f7aa3\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-657b02d elementor-widget elementor-widget-image\" data-id=\"657b02d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/613\/1*lCNPjw1U9W9Vk0J5JDdq_Q.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-cd8dc13 elementor-widget elementor-widget-text-editor\" data-id=\"cd8dc13\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>One problem with this method is that objects can have different aspect ratios and spatial locations, which can lead to unnecessarily expensive computations of a large number of regions. It presents too big of a bottleneck in terms of computation time to be used for real-life problems.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-adca8ad elementor-widget elementor-widget-heading\" data-id=\"adca8ad\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Region Proposal Methods and Selective Search<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5140586 elementor-widget elementor-widget-text-editor\" data-id=\"5140586\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>A more recent approach is to break down the problem into two tasks: detect the areas of interest first and then perform image classification to determine the category of each object.<\/p>\n\n\n\n<p>The first step usually consists in applying\u00a0<strong>region proposal methods<\/strong>. These methods output bounding boxes that are likely to contain objects of interest. If the object has been properly detected in one of the region proposals, then the classifier should detect it as well. That\u2019s why it\u2019s important for these methods to not only be fast, but also to have a very high recall.<\/p>\n\n\n\n<p>These methods also use a clever architecture where part of the image preprocessing is the same for the object detection and for the classification tasks, making them faster than simply chaining two algorithms. One of the most frequently used region proposal methods is\u00a0<a href=\"https:\/\/www.researchgate.net\/publication\/262270555_Selective_Search_for_Object_Recognition\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\"><strong>selective search<\/strong><\/a>:<\/p>\n\n\n\n<p>Its first step is to apply <a href=\"http:\/\/cs.brown.edu\/people\/pfelzens\/segment\/\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>image segmentation<\/strong><\/a>, as shown here:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ac5af0a elementor-widget elementor-widget-image\" data-id=\"ac5af0a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/700\/1*DDjorNRY9F82TLLRdwNh7w.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-22228ef elementor-widget elementor-widget-text-editor\" data-id=\"22228ef\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>From the image segmentation output, selective search will successively:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Create bounding boxes from the segmented parts and add them to the list of region proposals.<\/li>\n<li>Combine several small adjacent segments to larger ones based on four types of similarity: color, texture, size, and shape.<\/li>\n<li>Go back to step one until the section covers the entire image.<\/li>\n<\/ol>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f4831c8 elementor-widget elementor-widget-image\" data-id=\"f4831c8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1273\/1*7BzlA5qJ-ENZYsVsXKNwlw.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-083152c elementor-widget elementor-widget-text-editor\" data-id=\"083152c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>Now that we understand how selective search works, let\u2019s introduce some of the most popular object detection algorithms that leverage it.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9c1fbcc elementor-widget elementor-widget-heading\" data-id=\"9c1fbcc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">A First Object Detection Algorithm: R-CNN<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bcd2f45 elementor-widget elementor-widget-text-editor\" data-id=\"bcd2f45\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p><a href=\"https:\/\/arxiv.org\/pdf\/1311.2524.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Ross Girshick et al<\/a>. proposed Region-CNN (R-CNN) which allows the combination of selective search and CNNs. Indeed, for each region proposal (2000 in the paper), one forward propagation generates an output vector through a CNN. This vector will be fed to a\u00a0<strong>one-vs-all classifier<\/strong>\u00a0(i.e. one classifier per class, for instance one classifier where labels = 1 if the image is a dog and 0 if not, a second one where labels = 1 if the image is a cat and 0 if not, etc),\u00a0<strong>SVM<\/strong>\u00a0is the classification algorithm used by R-CNN.<\/p>\n\n\n\n<p>But how do you label the region proposals? Of course, if it perfectly matches our ground truth we can label it as 1, and if a given object is not present at all, we can then label it 0 for this object. What if a part of an object is present in the image? Should we label the region as 0 or 1? To make sure we are training our classifier on regions that we can realistically have when predicting an image (and not only perfectly matching regions), we are going to look at the\u00a0<strong>intersection over union<\/strong>\u00a0(IoU) of the boxes predicted by the selective search and the ground truth:<\/p>\n\n\n\n<p>The IoU is a metric represented by the area of overlap between the predicted and the ground truth boxes divided by their area of union. It rewards successful pixel detection and penalizes false positives in order to prevent algorithms from selecting the whole image.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fa2dbfb elementor-widget elementor-widget-image\" data-id=\"fa2dbfb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/600\/0*siV0gKnY7XopdCue\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0961882 elementor-widget elementor-widget-text-editor\" data-id=\"0961882\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>Going back to our R-CNN method, if the IoU is lower than a given threshold (0.3), then the associated label would be 0.<\/p>\n\n\n\n<p>After running the classifier on all region proposals, R-CNN proposes to refine the bounding box (bbox) using a class-specific\u00a0<strong>bbox regressor<\/strong>. The bbox regressor can fine-tune the position of the bounding box boundaries. For example, if the selective search has detected a dog but only selected half of it, the bbox regressor, which is aware that dogs have four legs, will ensure that the whole body is selected.<\/p>\n\n\n\n<p>Also thanks to the new bbox regressor prediction, we can discard overlapping proposals using\u00a0<a href=\"https:\/\/towardsdatascience.com\/non-maximum-suppression-nms-93ce178e177c\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>non-maximum suppression<\/strong><\/a>\u00a0(NMS). Here, the idea is to identify and delete overlapping boxes of the same object. NMS sorts the proposals per classification score for each class and computes the IoU of the predicted boxes with the highest probability score with all the other predicted boxes (of the same class). It then discards the proposals if the IoU is higher than a given threshold (e.g., 0.5). This step is then repeated for the next best probabilities.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bf04fcc elementor-widget elementor-widget-image\" data-id=\"bf04fcc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1558\/0*5QRD-lMelOskSzB9.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-748cc8f elementor-widget elementor-widget-text-editor\" data-id=\"748cc8f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>To sum up, R-CNN follows the following steps:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Create region proposals from selective search (i.e, predict the parts of the image that are likely to contain an object).<\/li>\n<li>Run these regions through a pre-trained model and then a SVM to classify the sub-image.<\/li>\n<li>Run the positive prediction through a bounding box prediction which allows for a better box accuracy.<\/li>\n<li>Apply an NMS when predicting to get rid of overlapping proposals.<\/li>\n<\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-924062c elementor-widget elementor-widget-image\" data-id=\"924062c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/2370\/1*T4NkJFHyrWSfXpkBDEfJDA.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7691f03 elementor-widget elementor-widget-text-editor\" data-id=\"7691f03\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>There are, however, some issues with R-CNN:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>This method still needs to classify all the region proposals which can lead to computational bottlenecks \u2014 it\u2019s not possible to use it for a real-time use case.<\/li>\n<li>No learning happens at the selective search stage, which can lead to bad region proposals for certain types of datasets.<\/li>\n<\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-84d134a elementor-widget elementor-widget-heading\" data-id=\"84d134a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">A Marginal Improvement: Fast R-CNN<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0484b54 elementor-widget elementor-widget-text-editor\" data-id=\"0484b54\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>Fast R-CNN \u2014 as its name indicates \u2014 is faster than R-CNN. It is based on R-CNN with two differences:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Instead of feeding the CNN for every region proposal,\u00a0<strong>you feed the CNN only once<\/strong>\u00a0by taking the whole image to generate a convolutional feature map (take a vector of pixels and transform it into another vector using a filter which will give you a convolutional feature map \u2014 you can find more info\u00a0<a href=\"https:\/\/towardsdatascience.com\/a-beginners-guide-to-convolutional-neural-networks-cnns-14649dbddce8\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">here<\/a>). Next, the region of proposals are identified with selective search and then they are reshaped into a fixed size using a Region of Interest pooling (<a href=\"https:\/\/towardsdatascience.com\/region-of-interest-pooling-f7c637f409af\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">RoI pooling<\/a>) layer to be able to use as an input of the fully connected layer.<\/li>\n<li>Fast-RCNN uses the\u00a0<strong>softmax layer instead of SVM<\/strong>\u00a0in its classification of region proposals which is faster and generates a better accuracy.<\/li>\n<\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8cca5b6 elementor-widget elementor-widget-heading\" data-id=\"8cca5b6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Here is the architecture of the network:<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-670783d elementor-widget elementor-widget-image\" data-id=\"670783d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/proxy\/0*iduNRkh-qO5_633O\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7f971c4 elementor-widget elementor-widget-text-editor\" data-id=\"7f971c4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>As we can see in the figure below, Fast R-CNN is way faster at training and testing than R-CNN. However, a bottleneck still remains due to the selective search method.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-390b0ad elementor-widget elementor-widget-image\" data-id=\"390b0ad\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1400\/0*spkvY9VVWBMoAmcV\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8f31c33 elementor-widget elementor-widget-heading\" data-id=\"8f31c33\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">How Fast Can R-CNN Get? \u2014 FASTER R-CNN<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-cd59547 elementor-widget elementor-widget-text-editor\" data-id=\"cd59547\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>While Fast R-CNN was a lot faster than R-CNN, the bottleneck remains with selective search as it is very time consuming. Therefore,<a href=\"https:\/\/arxiv.org\/pdf\/1506.01497.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">\u00a0Shaoqing Ren et al<\/a>. came up with Faster R-CNN to solve this and proposed to replace selective search by a very small convolutional network called\u00a0<strong>Region Proposal Network<\/strong>\u00a0(RPN) to find the regions of interest.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-28dae6d elementor-widget elementor-widget-image\" data-id=\"28dae6d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/960\/0*7PUKxNSC0-Skulh6\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2f59afa elementor-widget elementor-widget-text-editor\" data-id=\"2f59afa\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>In a nutshell, RPN is a small network that directly finds region proposals.<\/p>\n\n\n\n<p>One naive approach to this would be to create a <a href=\"https:\/\/www.experfy.com\/blog\/time-series-classification-with-deep-learning\/\" target=\"_blank\" rel=\"noreferrer noopener\">deep learning<\/a> model which outputs x_min, y_min, x_max, and x_max to get the bounding box for one region proposal (so 8,000 outputs if we want 2,000 regions). However, there are two fundamental problems:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The images can have very different sizes and ratios, so to create a model correctly predicting raw coordinates can be tricky.<\/li>\n<li>There are some coordinate ordering constraints in our prediction (x_min &lt; x_max, y_min &lt; y_max).<\/li>\n<\/ul>\n\n\n\n<p>To overcome this, we are going to use\u00a0<strong>anchors:<\/strong><\/p>\n\n\n<p>Anchors are predefined boxes of different ratios and scales all over the image. For example, for a given central point, we usually start with three sets of sizes (e.g., 64px, 128px, 256px) and three different width\/height ratios (1\/1, \u00bd, 2\/1). In this example, we would end up having nine different boxes for a given pixel of the image (the center of our boxes).<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1fdd61b elementor-widget elementor-widget-image\" data-id=\"1fdd61b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/702\/0*Z9VBfzYGG774Va4Q\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-39d1809 elementor-widget elementor-widget-text-editor\" data-id=\"39d1809\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n\n<p>So how many anchors would I have in total for one image?<\/p>\n\n\n\n<p>It is paramount to understand that we are not going to create anchors on the raw images, but on the output feature maps on the last convolutional layer. For instance, it\u2019s false to say that for a 1,000*600 input image we would have one anchor per pixel so 1,000*600*9 = 5,400,000 anchors. Indeed, since we are going to create them on the feature map, there is a subsampling ratio to take into account (which is the factor reduction between the input and the output dimension due to strides in our convolutional layer).<\/p>\n\n\n\n<p>In our example, if we take this ratio to be 16 (like in VGG16) we would have nine anchors per spatial position of the feature map so \u201conly\u201d around 20,000 anchors (5,400,000 \/ 16\u00b2). This means that two consecutive pixels in the output features correspond to two points which are 16 pixels apart in the input image. Note that this down sampling ratio is a tunable parameter of Faster R-CNN.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0175008 elementor-widget elementor-widget-image\" data-id=\"0175008\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/800\/0*RfC8Lx-BHEGzbyDv\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-41c496d elementor-widget elementor-widget-text-editor\" data-id=\"41c496d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>The remaining question now is how to go from those 20,000 anchors to 2,000 region proposals (taking the same number of region proposals as before), which is the goal of our RPN.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-03b3d51 elementor-widget elementor-widget-heading\" data-id=\"03b3d51\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">How to Train the Region Proposal Network<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a6d6346 elementor-widget elementor-widget-text-editor\" data-id=\"a6d6346\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>To achieve this, we want our RPN to tell us whether a box contains an object or is a background, as well as the accurate coordinates of the object. The output predictions are probability of being background, probability of being foreground, and the deltas Dx, Dy, Dw, Dh which are the difference between the anchor and the final proposal).<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list -->\n<ul>\n<li>First, we will remove the cross-boundary anchors (i.e. the anchors which are cut due to the border of the image) \u2014 this left us with around 6,000 images.<\/li>\n<li>We need to label our anchors positive if either of the two following conditions exist:<\/li>\n<\/ul>\n<!-- \/wp:list -->\n\n<!-- wp:paragraph -->\n<p>\u2192 The anchor has the highest IoU with a ground truth box among all the other anchors.<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3cc00f3 elementor-widget elementor-widget-text-editor\" data-id=\"3cc00f3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>\u2192 The anchor has at least 0.7 of IoU with a ground truth box.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list -->\n<ul>\n<li>We need to label our anchors negative if its IoU is less than 0.3 with all ground truth boxes.<\/li>\n<li>Wedisregard all the remaining anchors.<\/li>\n<li>We train the binary classification and the bounding box regression adjustment.<\/li>\n<\/ul>\n<!-- \/wp:list -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-38fa9d4 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"38fa9d4\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-35b2935\" data-id=\"35b2935\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-24c3e9d elementor-widget elementor-widget-heading\" data-id=\"24c3e9d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Finally, a few remarks about the implementation:<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-df2596f elementor-widget elementor-widget-text-editor\" data-id=\"df2596f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:list {\"ordered\":true} -->\n<ol>\n<li>We want the number of positive and negative anchors to be balanced in our mini batch.<\/li>\n<li>We use a multi-task loss, which makes sense since we want to minimize either loss \u2014 the error of mistakenly predicting foreground or background and also the error of accuracy in our box.<\/li>\n<li>We initialize the convolutional layer using weights from a pre-trained model.<\/li>\n<\/ol>\n<!-- \/wp:list -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8acac0e elementor-widget elementor-widget-heading\" data-id=\"8acac0e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">How to Use the Region Proposal Network<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f3fbc34 elementor-widget elementor-widget-text-editor\" data-id=\"f3fbc34\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:list -->\n<ul>\n<li>All the anchors (20,000) are scored so we get new bounding boxes and the probability of being a foreground (i.e., being an object) for all of them.<\/li>\n<li>Use non-maximum suppression (see the R-CNN section)<\/li>\n<li>Proposal selection: Finally, only the top N proposals sorted by score (with N=2,000, we are back to our 2,000 region proposals) are kept.<\/li>\n<\/ul>\n<!-- \/wp:list -->\n\n<!-- wp:paragraph -->\n<p>We finally have our 2,000 proposals like in the previous methods. Despite appearing more complex, this prediction step is way faster and more accurate than the previous methods.<\/p>\n<!-- \/wp:paragraph -->\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4a75786 elementor-widget elementor-widget-text-editor\" data-id=\"4a75786\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p>The next step is to create a similar model as in Fast R-CNN (i.e. RoI pooling, and a classifier + bbox regressor), using RPN instead of selective search. However, we don\u2019t want to do exactly as before, i.e. take the 2,000 proposals, crop them, and pass them through a pre-trained base network. Instead,\u00a0<strong>reuse the existing convolutional feature map<\/strong>. Indeed, one of the advantages of using an RPN as a proposal generator is to share the weights and CNN between the RPN and the main detector network.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list {\"ordered\":true} -->\n<ol>\n<li>The RPN is trained using a pre-trained network and then fine-tuned.<\/li>\n<li>The detector network is trained using a pre-trained network and then fine-tuned. Proposal regions from the RPN are used.<\/li>\n<li>The RPN is initialized using the weights from the second model and then fine-tuned\u2014this is going to be our final RPN model).<\/li>\n<li>Finally, the detector network is fine-tuned (RPN weights are fixed). The CNN feature maps are going to be shared amongst the two networks (see next figure).<\/li>\n<\/ol>\n<!-- \/wp:list -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5b6d4fc elementor-widget elementor-widget-image\" data-id=\"5b6d4fc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1181\/1*0en9RVt_cA4l0V5LhTfRlA.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-aac16a2 elementor-widget elementor-widget-text-editor\" data-id=\"aac16a2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p>To sum up, Faster R-CNN is more accurate than the previous methods and is about 10 times faster than Fast-R-CNN, which is a big improvement and a start for real-time scoring.<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-dc07fa6 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"dc07fa6\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-2c966f5\" data-id=\"2c966f5\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-f962129 elementor-widget elementor-widget-image\" data-id=\"f962129\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1268\/0*V_gnW-fQsu_IG3pV\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5a7b10a elementor-widget elementor-widget-text-editor\" data-id=\"5a7b10a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p>Even still, region proposal detection models won\u2019t be enough for an embedded system since these models are heavy and not fast enough for most real-time scoring cases \u2014 the last example is about five images per second.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>In our next post, we will discuss faster methods like SSD and real use cases with image detection from drones.<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>This blog post looks at the basic methods of object detection and tries to explain the technical details of each model without any formula, allowing readers with all levels of experience to follow along! <\/p>\n","protected":false},"author":914,"featured_media":9670,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[625,626,623,624],"ppma_author":[3809],"class_list":["post-9669","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-fast-r-cnn","tag-faster-r-cnn","tag-object-detection","tag-r-cnn"],"authors":[{"term_id":3809,"user_id":914,"is_guest":0,"slug":"augustin-ador","display_name":"Augustin Ador","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/09\/Augustin-Ador-150x150.jpg","user_url":"https:\/\/www.dataiku.com\/%20","last_name":"Ador","first_name":"Augustin","job_title":"","description":"Augustin Ador is Lead Data Scientist Central Europe at Dataiku, the centralized data platform that moves businesses along their data journey from analytics at scale to enterprise AI."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/9669","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/914"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=9669"}],"version-history":[{"count":6,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/9669\/revisions"}],"predecessor-version":[{"id":33948,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/9669\/revisions\/33948"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/9670"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=9669"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=9669"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=9669"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=9669"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}