# Practice №2. Object detection in images using OpenVINO

In this practice, we solve the problem of object detection and propose its solution using OpenVINO.

To enable OpenVINO support in Jupyter Notebook please run a command line, activate the virtual environment (if any) in the command line and run the __setupvars__ script to activate OpenVINO, and then start Jupyter Notebook.

For Windows:

```bash
<virtual_enviroment>/Scripts/activate.bat
"C:\Program Files (x86)\Intel\openvino_2021\bin\setupvars.bat"
jupyter notebook
```

For Linux:
```bash
source <virtual_enviroment>/bin/activate
source /opt/intel/openvino_2021/bin/setupvars.sh
jupyter notebook
```

In [None]:
#Checking that OpenVINO is installed and connected
import openvino

## Dependencies installation

To operate with OpenVINO, you need to convert the model from the original model framework to the Intermediate representation (IR) of OpenVINO. To convert, you need to install the current version of the training framework. You can install one framework of interest or all at once using one of the following commands. 

Installing PIP packages for downloading models from the Open Model Zoo

In [None]:
#!pip install -r "/opt/intel/openvino_2021/deployment_tools/open_model_zoo/tools/downloader/requirements.in"
!pip install -r "C:\Program Files (x86)\Intel\openvino_2021\deployment_tools\open_model_zoo\tools\downloader\requirements.in"

Installing PIP packages for converting models

*openvino_2021/deployment_tools/model_optimizer* directory contains dependency files that should be installed to convert models from different frameworks:

- requirements_caffe.txt
- requirements_kaldi.txt
- requirements_mxnet.txt
- requirements_onnx.txt
- requirements_tf.txt
- requirements_tf2.txt
- requirements.txt (to install all frameworks together)

In [None]:
# To convert Caffe models, install dependencies from requirements_caffe.txt
#!pip install -r /opt/intel/openvino_2021/deployment_tools/model_optimizer/requirements_caffe.txt
!pip install -r "C:\Program Files (x86)\Intel\openvino_2021\deployment_tools\model_optimizer\requirements_caffe.txt"

## Using OpenVINO Samples and Demos in Python

After you have installed all the dependencies, you can proceed to download and convert models from the Open Model Zoo

You can use the Model Downloader tool to download models.

To see all available models, use the __--print_all__ parameter

In [None]:
#!python /opt/intel/openvino_2021/deployment_tools/tools/model_downloader/downloader.py --print_all
!python "C:\Program Files (x86)\Intel\openvino_2021\deployment_tools\tools\model_downloader\downloader.py" --print_all

Download the __ssd300__ model by running the downloader.py script with the --name ssd300 parameter

In [None]:
#!python /opt/intel/openvino_2021/deployment_tools/tools/model_downloader/downloader.py --name ssd300
!python "C:\Program Files (x86)\Intel\openvino_2021\deployment_tools\tools\model_downloader\downloader.py" --name ssd300 

Models from Open Model Zoo are converted to OpenVINO format with one run of the script __converter.py__

After that, the files __ssd300.xml__ and __ssd300.bin__ will appear in the directory _notebook_dir_/public/ssd300/FP32

In [None]:
#!python /opt/intel/openvino_2021/deployment_tools/tools/model_downloader/converter.py --name ssd300
!python "C:\Program Files (x86)\Intel\openvino_2021\deployment_tools\tools\model_downloader\converter.py" --name ssd300 

Download an image for classification using the command below or enter your own

In [None]:
# Install wget (for Windows)
!pip install wget

# Download dog image via command line
!python -m wget http://hpc-education.unn.ru/files/courses/intel-dl-cv-course/dog.jpg 

In [None]:
input_image = 'dog.jpg'

from PIL import Image
from IPython.display import display

display(Image.open(input_image)) 

### Run OpenVINO object detection sample

In [None]:
# Run OpenVINO object detection sample
#!python /opt/intel/openvino_2021/inference_engine/samples/python/object_detection_sample_ssd/object_detection_sample_ssd.py -i dog.jpg -m public/ssd300/FP32/ssd300.xml
!python "C:\Program Files (x86)\Intel\openvino_2021\inference_engine\samples\python\object_detection_sample_ssd\object_detection_sample_ssd.py" -i dog.jpg -m public/ssd300/FP32/ssd300.xml

In [None]:
# display image with detections
input_image = 'out.bmp'
from PIL import Image
from IPython.display import display

display(Image.open(input_image)) 

## Developing the object detection application using OpenVINO

To develop the application, create a class __InferenceEngineDetector__ with a constructor and methods ___prepare_image, detect, draw_detection__

In [None]:
import cv2
import numpy as np
from openvino.inference_engine import IECore

class InferenceEngineDetector:
    def __init__(self, configPath = None, weightsPath = None,
            extension=None, classesPath = None):
        pass

    def _prepare_image(self, image, h, w):
        pass

    def detect(self, image):
        pass

    def draw_detection(self, detections, image, confidence=0.5, 
                       draw_text=True):
        pass

### Loading deep model from file
 
In order to load the model, we need to implement the constructor of the InferenceEngineClassifier class placed in the ie_classifier.py file. The constructor receives the following required and optional parameters:

- configPath is a path to the .xml file of the model description.

- weightsPath is a path to the .bin file of the model weights.

- classesPath is a path to the file containing class names for the given classification model.

The constructor performs the following actions:

1.	Creating an object of the class IECore.


    self.ie = IECore() 

2.	Creating an object of the class IENetwork with parameters corresponding to the model paths.


    self.net = self.ie.read_network(model=configPath, weights=weightsPath)  
    
3.	Loading the created object of the IENetwork class into IECore, this means preparing the model for executing device.


    self.exec_net = self.ie.load_network(network=self.net,
                                             device_name=device)
4.	Loading the class names from the file located at path classesPath.


    if classesPath:
        self.classes = [line.rstrip('\n') for line in open(classesPath)]

In [None]:
class InferenceEngineDetector:
    def _prepare_image(self, image, h, w):
        pass
    def detect(self, image):
        pass
    def draw_detection(self, detections, image, confidence=0.5, 
                       draw_text=True):
        pass
    
# Constructor implementation   
    def __init__(self, configPath = None, weightsPath = None,
            extension=None, classesPath = None):
        self.ie = IECore()
        self.net = self.ie.read_network(model=configPath, weights=weightsPath)
        if extension:
            self.ie.add_extension(extension, 'CPU')
        self.exec_net = self.ie.load_network(network=self.net,
                                             device_name=device)
        if classesPath:
            self.classes = [line.rstrip('\n') for line in open(classesPath)]
        return

### Loading and preprocessing image

The next step is to implement the ___prepare_image__ method. 

In first, it is necessary to resize the image to the size of the network input.

    image = cv2.resize(image, (w, h))
    
Deep models require images in a per-channel format, and not pixel-by-pixel format, input images have to be converted from the format RGBRGBRG... to the format RRRGGGBBB... You can use the __transpose()__ function to do this.


    image = image.transpose((2, 0, 1))
    
In common, a 4-dimensional tensor should be set to the model input, for example, tensor [1,3,300,300], where the first coordinate is the number of images in a batch (subset of images processed simultaneously); 3 is the number of color channels of the image; 300, 300 are width and height of the image. However, if a 3-dimensional tensor [3,300,300] is set to the network input, then the OpenVINO Toolkit will automatically add the fourth dimension.

It is also worth remembering one fact about the OpenCV library. OpenCV stores images in a BGR format, not RGB. If the model is loaded from the Open Model Zoo and converted with default parameters, then this moment is already taken into account the model. However, if the model is not used from the Open Model Zoo, then the red and blue channels of the image have to be swapped.


In [None]:
class InferenceEngineDetector:
    def __init__(self, configPath = None, weightsPath = None,
            extension=None, classesPath = None):
        self.ie = IECore()
        self.net = self.ie.read_network(model=configPath, weights=weightsPath)
        if extension:
            self.ie.add_extension(extension, 'CPU')
        self.exec_net = self.ie.load_network(network=self.net,
                                             device_name=device)
        if classesPath:
            self.classes = [line.rstrip('\n') for line in open(classesPath)]
        return
    def detect(self, image):
        pass
    def draw_detection(self, detections, image, confidence=0.5, 
                       draw_text=True):
        pass    


# Image preprocessing  
    def _prepare_image(self, image, h, w):
        if image.shape[:-1] != (h, w):
            image = cv2.resize(image, (w, h))
        image = image.transpose((2, 0, 1))
        return image

### Inferring model

The next step is the implementation of the __detect__ method, which launches the deep model inference on the device specified in the constructor. The sequence of operations for the __detect__ method is as follows:

1. Get information about the model input and output. 


    input_blob = next(iter(self.net.input_info))
    out_blob = next(iter(self.net.outputs))

2. From the model input, obtain the input dimension required by the model for the image. 


    n, c, h, w = self.net.input_info[input_blob].input_data.shape
    
3. Preprocess image using the function ___prepare_image__. 


    blob = self._prepare_image(image, h, w)
    
4. Infer the model in synchronous mode. 


    output = self.exec_net.infer(inputs = {input_blob: blob})
    
5. Extract the tensor with the detection result from the model output. 


    output = output[out_blob]

In [None]:
class InferenceEngineClassifier:
    def __init__(self, configPath=None, weightsPath=None,
            device='CPU', extension=None, classesPath=None):
        self.ie = IECore()
        self.net = self.ie.read_network(model=configPath, weights=weightsPath)
        self.exec_net = self.ie.load_network(network=self.net,
                                             device_name=device)
        if classesPath:
            self.classes = [line.rstrip('\n') for line in open(classesPath)]
        return

    def _prepare_image(self, image, h, w):
        if image.shape[:-1] != (h, w):
            image = cv2.resize(image, (w, h))
        image = image.transpose((2, 0, 1))
        return image
    
    def draw_detection(self, detections, image, confidence=0.5, 
                       draw_text=True):
        pass

# Main inference function
    def detect(self, image):
        input_blob = next(iter(self.net.input_info))
        out_blob = next(iter(self.net.outputs))
        n, c, h, w = self.net.input_info[input_blob].input_data.shape
        blob = self._prepare_image(image, h, w)
        output = self.exec_net.infer(inputs={input_blob: blob})
        detection = output[out_blob]
        return detection


### The output of the most SSD-based models is a tensor of the size [1,1,N,7], in which each row (the last dimension of the tensor) contains the following parameters: [image_number, classid, score, left, bottom, right, top], where ‘image_number’ is a number of images; ‘classid’ is a class identifier; ‘score’ is a confidence of the object location in the selected area; ‘left, bottom, right, top’ are coordinates of the bounding boxes in the range from 0 to 1.
To process the output, you should implement the draw_detection method, which draws the constructed bounding boxes in the image. The sequence of the actions is as follows. In a loop through the output rows:

1.	Extract the current row of the matrix.

2.	Extract the confidence of the detected object (third parameter in the row).

3.	If the confidence is greater than a threshold value (0.5 is recommended), then get the class identifier and the coordinates of the bounding box. The class identifier can be used to get the class name. To obtain the coordinates of the bounding boxes in the coordinate system associated with the image, it is necessary to multiply the normalized values obtained from the output tensor by the height and width of the input image.

4.	Draw a rectangle on the image using OpenCV. To draw rectangle, use the cv2.rectangle function. The parameters description and example of using this function are given below.


    cv2.rectangle(img, point1, point2, color, line_width)

- img is an image to draw detections.
- point1 = (x,y) is a tuple of two integers corresponding to the coordinates of the top left corner of the bounding box.
- point2 = (x,y) is a tuple of two integers corresponding to the coordinates of the bottom right corner of the bounding box.
- color = (B,G,R) is a tuple of three integers from 0 to 255, which determines the color of the line.
- line_width = 1 is a floating-point number that determines the thickness of the line.

To display an object class name on the image, use the cv2.puttext function. The parameters description and example of using the function are given below.


    cv2.putText(img, text, point, cv2.FONT_HERSHEY_COMPLEX, text_size, color, 1)

- img is an image to draw detections.
- text is a text for label.
- point are coordinates of the start text position.
- color is a tuple of three integers from 0 to 255, which determines the color of the text.
- text_size = 0.45 is a floating-point number that determines the size of the text.

In [None]:
class InferenceEngineDetector:
    def __init__(self, configPath=None, weightsPath=None,
            device='CPU', extension=None, classesPath=None):
        self.ie = IECore()
        self.net = self.ie.read_network(model=configPath, weights=weightsPath)
        if extension:
            self.ie.add_extension(extension, 'CPU')
        self.exec_net = self.ie.load_network(network=self.net,
                                             device_name=device)
        if classesPath:
            self.classes = [line.rstrip('\n') for line in open(classesPath)]
        return

    def _prepare_image(self, image, h, w):
        if image.shape[:-1] != (h, w):
            image = cv2.resize(image, (w, h))
        image = image.transpose((2, 0, 1))
        return image

    def detect(self, image):
        input_blob = next(iter(self.net.input_info))
        out_blob = next(iter(self.net.outputs))
        n, c, h, w = self.net.input_info[input_blob].input_data.shape
        blob = self._prepare_image(image, h, w)
        output = self.exec_net.infer(inputs={input_blob: blob})
        detection = output[out_blob]
        return detection

# Processing detection results
    def draw_detection(self, detections, img, confidence=0.5, draw_text=True):
        (h, w) = img.shape[:2]
        for i in range(0, detections.shape[2]):
            conf = detections[0, 0, i, 2]
            if conf > confidence:
                box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
                (startX, startY, endX, endY) = box.astype("int")
                try:
                    text = "Class {} {:.2f}%".format(
                        self.classes[int(detections[0, 0, i, 1])], conf * 100)
                except:
                    text = "Class {} {:.2f}%".format(
                        detections[0, 0, i, 1], conf * 100)
                y = startY - 10 if startY - 10 > 10 else startY + 10
                cv2.rectangle(img, (startX, startY), (endX, endY),
                              (0, 255, 0), 2)
                if draw_text:
                    cv2.putText(img, text, (startX, y),
                                cv2.FONT_HERSHEY_COMPLEX, 0.45, (0, 0, 255), 1)
        return img    

### Implementing sample 

To detect objects in your image, you shoult create an object of __InferenceEngineDetector__ class, load object detection model to it, then read the image and perform the model inference using the __detect__ method.


After you've got the list of bounding rectangles, you need to draw them on the original image and display the original image on the screen.

In [None]:
model_path = 'public/ssd300/FP32/ssd300.xml'
weights_path = 'public/ssd300/FP32/ssd300.bin'
device = 'CPU' # 'CPU' or 'GPU'
input_image = input_image

ie_detector = InferenceEngineDetector(configPath=model_path, weightsPath=weights_path, device=device) 
img = cv2.imread(input_image) 
detections = ie_detector.detect(img) 
image_with_detections = ie_detector.draw_detection(detections, img)


image_with_detections = cv2.cvtColor(image_with_detections, cv2.COLOR_BGR2RGB) # Converting BGR to RGB
display(Image.fromarray(image_with_detections))



You can use the resulting InferenceEngineClassifier class in your own projects using deep learning.



###	Additional tasks

The developed detection sample contains the minimum required functionality. As additional tasks, it is proposed to provide support for the following features:

1.	Supporting other deep models for object detection included in the Open Model Zoo.

2.	Measuring the time required image processing.

It is proposed to solve these tasks independently using the documentation and examples included in the OpenVINO Toolkit.

