Table of Contents Hide
Facebook AI Research’s collection of state-of-the-art object detection algorithm is called Detectron. This is based on the Mask R-CNN benchmark and is written in Python and PyTorch. It supports bounding box detection, densepose detection, instance segmentation, keypoint detection, and other computer vision tools.
Examples of its usage and research papers are located here.
Unfortunately, installation is limited to Unix operating systems. Although if you have installed the Linux subsystem on Windows, this should not be a problem.
conda install -c conda-forge detectron2;
pip install -U iopath==0.1.4 omegaconf opencv-python;
To make use of detectron we begin by creating a python script made up from the following:
As usual, we start by importing the relevant libraries.
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfgfrom detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalogimport matplotlib.pyplot as plt
Next we need to load the pre-trained model: Here we chose the Mask R-CNN model and trained weights from the COCO-Instance Segmentation library and set the match score threshold.
cfg = get_cfg()
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg.MODEL.DEVICE = "cpu" # we use a CPU Detectron copy# create predictor
predictor = DefaultPredictor(cfg)
Reading the Image File
Finally we need to read our photographs. For this we leverage the OpenCV library.
file = 'filename.jpeg'
frame = cv2.imread(file)
image = cv2.cvtColor(frame, cv2.IMREAD_COLOR)# predict categories
output = predictor(image)
Once we have run the predictor, we can check our results. To do this we extract each individual matched element and overlay them on the original image:
v = Visualizer(image[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN), scale=1.2)out = v.draw_instance_predictions(output["instances"].to("cpu"))
Once we have each segment, we use matplotlib to plot the result:
plt.imshow(out.get_image()[..., ::-1][..., ::-1])
This produces the following:
Using this method it is also possible to explore each match individually. We start by creating a list of prediction categories and a bounding box per match.
# create 2 element list of class number and bounding boxes
matches = zip(output['instances'].get('pred_classes'),output['instances'].get( 'pred_boxes'))
In iterating through the values and selecting only those which correspond to people (category 0) …
for l,j in matches: # skip all categories which do not correspond to people
if int(l)!=0 : continue # get bounding box for person
i = [int(k) for k in j
…we can use the bounding box to crop the original image and save each individual match should we wish:
# crop the original image using the bounding box
img = image[i:i, i:i] #crop to bb plt.imshow(img)
As with all machine learning algorithms, we cannot expect 100% accuracy, especially since many of our photos may be of poor quality, or contain unconventional angles/positions.
An example is seen below — here the boat helm is misidentified due to being presented in an unusual position (with respect to the algorithms’ training dataset anyway).
In this article we looked at how to use Facebook’s detectron2 R-CNN to detect people in an image. This means that we can now auto describe contents or identify images we may need to check before publication.
We did however mention, that for completeness it is probably still a good idea to glance over before discarding unmatched photos — just to make sure.
(If you found this useful, please click the ‘clap’ button. And remember you can click it more than once! Maybe even 50 times?)
Read the full article here