# Practice №4. Solving the problem of video analytics, including detection, recognition and tracking of objects in the video

In this practice, we propose a solution to the problem of counting vehicles of different classes on video and the development of an appropriate application using OpenVINO.

To enable OpenVINO support in Jupyter Notebook please run a command line, activate the virtual environment (if any) in the command line and run the __setupvars__ script to activate OpenVINO, and then start Jupyter Notebook.

For Windows:

```bash
<virtual_enviroment>/Scripts/activate.bat
"C:\Program Files (x86)\Intel\openvino_2021\bin\setupvars.bat"
jupyter notebook
```

For Linux:
```bash
source <virtual_enviroment>/bin/activate
source /opt/intel/openvino_2021/bin/setupvars.sh
jupyter notebook
```

In [None]:
# Checking that OpenVINO is installed and connected
import openvino

## Problem statement of counting vehicles on video

In this practice, the problem of counting vehicles of different classes on the video from the crossroad camera is considered. The input is a sequence of video frames. It is required to count the number of vehicles of each category (“car”, “bus”, “motorbike” and others) that have already been observed in the video by the current moment.

### Algorithm for counting vehicles of different classes on video
The base algorithm for counting vehicles of different classes on video based on the object detection and tracking consists of the several steps performed processing the next received frame.

1.	Object detection. To detect objects, any deep model capable of detecting vehicles of different classes (“car”, “bus”, “train”, and “motorbike”) can be used. It is required to filter out objects that do not belong to the set of vehicle classes. Filtering can be performed at the detection stage or at the stage of calculating statistics. It is also necessary to remember that the detection algorithm for the same object on different frames can predict a different class. In our solution, the class of the object is the class predicted at the last processed frame. For vehicle detection, the SSD300 model and InferenceEngineDetector class can be used.

2.	Object tracking. You can use tracking-by-matching implementation of the tracker from the practice “Tracking objects on video using deep neural networks”. 

3.	Calculating statistics for vehicles of different classes, which were observed from the beginning of the video to the current frame. To calculate statistics, it is required to create counters for each class of vehicles and to iterate over all the tracks, increasing the counters of the corresponding vehicle class.

4.	Displaying frame with detected vehicles and their tracks, displaying the collected statistic information on the screen.



In [None]:
import cv2
import numpy as np
from openvino.inference_engine import IECore
from scipy.optimize import linear_sum_assignment
import math
from collections import namedtuple

class InferenceEngineDetector:
    def __init__(self, configPath=None, weightsPath=None,
            device='CPU', extension=None, classesPath=None):
        self.ie = IECore()
        self.net = self.ie.read_network(model=configPath, weights=weightsPath)
        if extension:
            self.ie.add_extension(extension, 'CPU')
        self.exec_net = self.ie.load_network(network=self.net,
                                             device_name=device)
        if classesPath:
            self.classes = [line.rstrip('\n') for line in open(classesPath)]
        return

    def _prepare_image(self, image, h, w):
        if image.shape[:-1] != (h, w):
            image = cv2.resize(image, (w, h))
        image = image.transpose((2, 0, 1))
        return image

    def detect(self, image):
        input_blob = next(iter(self.net.input_info))
        out_blob = next(iter(self.net.outputs))
        n, c, h, w = self.net.input_info[input_blob].input_data.shape
        blob = self._prepare_image(image, h, w)
        output = self.exec_net.infer(inputs={input_blob: blob})
        detection = output[out_blob]
        return detection

    def draw_detection(self, detections, img, confidence=0.5, draw_text=True):
        (h, w) = img.shape[:2]
        for i in range(0, detections.shape[2]):
            conf = detections[0, 0, i, 2]
            if conf > confidence:
                box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
                (startX, startY, endX, endY) = box.astype("int")
                try:
                    text = "Class {} {:.2f}%".format(
                        self.classes[int(detections[0, 0, i, 1])], conf * 100)
                except:
                    text = "Class {} {:.2f}%".format(
                        detections[0, 0, i, 1], conf * 100)
                y = startY - 10 if startY - 10 > 10 else startY + 10
                cv2.rectangle(img, (startX, startY), (endX, endY),
                              (0, 255, 0), 2)
                if draw_text:
                    cv2.putText(img, text, (startX, y),
                                cv2.FONT_HERSHEY_COMPLEX, 0.45, (0, 0, 255), 1)
        return img

DetectedObject = namedtuple(
    'DetectedObject', 
    ['confidence', 'frame_idx', 'object_id', 'timestamp', 
     'class_id', 'x_left', 'y_bottom', 'x_right', 'y_top',])    
    
class Tracklet():
    def __init__(self, detection=[]):
        self._trackedObjects = []
        self._trackedObjects.append(detection)
    def __len__(self):
        return len(self._trackedObjects)
    def __getitem__(self, position):
        return self._trackedObjects[position]
    def add_new_detection(self, detection):
        self._trackedObjects.append(detection)
    
    
class MatchingTracker:
    
    def _affinity(self, obj1, obj2):
        shp_aff = self._shape_affinity(obj1, obj2, self._shape_affinity_weight)
        mot_aff = self._place_affinity(obj1, obj2, self._place_affinity_weight)
        return shp_aff * mot_aff      
    
    def _shape_affinity(self, obj1, obj2, weight):
        obj1_width = obj1.x_right-obj1.x_left
        obj2_width = obj2.x_right-obj2.x_left
        obj1_height = obj1.y_top - obj1.y_bottom
        obj2_height = obj2.y_top - obj2.y_bottom
        w_dist = abs(obj1_width - obj2_width) / (obj1_width + obj2_width)
        h_dist = abs(obj1_height - obj2_height) / (obj1_height + obj2_height)
        return math.exp(-weight * (w_dist + h_dist))
    
    def _place_affinity(self, o1, o2, weight):
        obj1_x = (o1.x_left + o1.x_right) * 0.5
        obj1_y = (o1.y_top + o1.y_bottom) * 0.5
        obj2_x = (o2.x_left + o2.x_right) * 0.5
        obj2_y = (o2.y_top + o2.y_bottom) * 0.5
        obj2_width = o2.x_right-o2.x_left
        obj2_height = o2.y_top - o2.y_bottom
        x_dist = ((obj1_x - obj2_x) ** 2) / (obj2_width ** 2)
        y_dist = ((obj1_y - obj2_y) ** 2) / (obj2_height ** 2)
        return math.exp(-weight * (x_dist + y_dist))
    
    def _compute_dissimilarity_matrix(self, tracks, detections):
        size = max(len(tracks), len(detections))
        diss_mat = np.zeros(shape=(size, size), dtype=float)
        for i in range(len(tracks)):
            for j in range(len(detections)):
                diss_mat[i, j] = 1.0 - \
                    self._affinity(tracks[i][-1], detections[j])
        return diss_mat
    
    def _solve_assignment_problem(self, tracks, detections):
        dissimilarity_mat = self._compute_dissimilarity_matrix(
            tracks, detections)
        row_ind, col_ind = linear_sum_assignment(dissimilarity_mat)
        return row_ind, col_ind

    def filter_detections(self, detect_mat, threshold=0.5):
        detect_mat = detect_mat[0, 0, :, :]
        detections = []
        for i in range(detect_mat.shape[0]):
            # Parse one string in ie_detection_output
            conf = detect_mat[i, 2]
            if conf > threshold:
                detection = DetectedObject(
                    detect_mat[i, 2],
                    -1,
                    -1,
                    -1,
                    detect_mat[i, 1],
                    detect_mat[i, 3],
                    detect_mat[i, 4],
                    detect_mat[i, 5],
                    detect_mat[i, 6])
                detections.append(detection)
        return detections  

    def add_new_track(self, detection):
        track = Tracklet(detection)
        self._tracks.append(track)
        
    def process_new_frame(self, frame, detections, timestamp):
        if self._tracks and detections:
            row_indexes, col_indexes = self._solve_assignment_problem(
                self._tracks, detections)
            # For each assignment
            for i in range(len(row_indexes)):
                row_id = row_indexes[i]
                col_id = col_indexes[i]
                # If we find existing track and existing detection
                if row_id < len(self._tracks) and col_id < len(detections):
                    # Add detection to track if objects are close
                    dist = 1.0 - \
                        self._affinity(
                            self._tracks[row_id][-1], detections[col_id])
                    if dist < self._dist_weight:
                        self._tracks[row_id].add_new_detection(
                            detections[col_id])
                elif col_id < len(detections):
                    self.add_new_track(detections[col_id])
        else:
            # Add new tracks
            for id, detection in enumerate(detections):
                self.add_new_track(detection)
        return

    def __init__(self, dist_weight=0.5, shape_affinity_weight=0.5,
            place_affinity_weight=0.7):
        self._tracks = []
        self._tracks_counter = 0
        self._dist_weight = dist_weight
        self._shape_affinity_weight = shape_affinity_weight
        self._place_affinity_weight = place_affinity_weight
        return
    
    def draw_active_tracks(self, image):
        w, h = image.shape[:2]
        for i, track in enumerate(self._tracks):
            # Draw one track from segments
            for i in range(len(track)-1):
                cv2.line(img=image,
                    pt1=(int((track[i].x_left + track[i].x_right) * h) // 2,
                         int((track[i].y_bottom + track[i].y_top) * w) // 2),
                    pt2=(int((track[i+1].x_left + track[i+1].x_right) * h) // 2,
                         int((track[i+1].y_bottom + track[i+1].y_top) * w) // 2),
                    color=(0, 255, 0), thickness=3)
        return image

## Developing the application for counting vehicles of different classes on video

It is assumed that the InferenceEngineDetector and MatchingTracker modules, which contain the implementation of algorithms for detecting and tracking objects, have already been developed during the previous practical work. Let's take a closer look at the Videoanalytics class. The class contains the following methods:

- _init_ is a class constructor. It loads a list of vehicle classes to be counted, and the names of these classes to display on the screen.
- count_objects_per_classes is a method that counts vehicles of each class.
- draw_videoanalytics is a method for displaying the vehicle counting results on the video.

The constructor implementation is trivial, so it is proposed to develop it by yourself. The implementation of other methods will be described below

In [None]:
import cv2
import numpy as np


class Videoanalytics():
    def __init__(self, classIds, classesPath=None):
        pass
    def count_objects_per_classes(self, tracker):
        pass
    def draw_videoanalytics(self, frame, tracker):
        pass

__Counting vehicles of different classes on video__

Consider the implementation of the count_objects_per_classes method for counting the number of vehicles of each class. This method iterates over all tracks obtained as a result of tracking, and counts the number of objects of each class.

In [None]:
import cv2
import numpy as np

class Videoanalytics():

    def draw_videoanalytics(self, frame, tracker):
        pass

# Constructor
    def __init__(self, classIds, classesPath=None):
        self.classIds = classIds
        if classesPath:
            self.classes = [line.rstrip('\n') for line in open(classesPath)]
        return

# Counting objects
    def count_objects_per_classes(self, tracker):
        results = {}
        for tracklet in tracker._tracks:
            # Get object class from tracklet
            classid = tracklet[-1].class_id
            # Add +1 to counter for object
            if classid in self.classIds:
                if classid in results:
                    results[classid] += 1
                else:
                    results[classid] = 1
        return results

__Displaying of vehicle counting results__

The draw_videoanalytics method displays the results of counting vehicles. It consists of three parts.

1.	Displaying detected objects of vehicle classes. It is required to iterate over all tracks.

    1.1.	Searching for the last bounding box corresponding to the object location in the track.

    1.2.	If the object class is in the list of vehicle classes and the object is found in the current frame, then displaying the bounding box.


2.	Displaying the tracks of vehicles. It is supposed to iterate over all tracks.

    2.1.	Searching for the last bounding box corresponding to the object location in the track.

    2.2.	If the object class is in the list of vehicle classes and the object is found in the current frame, then iterate over all bounding box in the track, calculate their centers and draw line between neighbor frame centers.


3.	Displaying statistics.

    3.1.	Count the number of vehicles of each class using the count_objects_per_classes method.

    3.2.	For each vehicle class, create a line to display the following information: “Class <class_name>: <number_of_objects>”.

    3.3.	Displaying lines with counting information in the upper left corner of the frame.

In [None]:
import cv2
import numpy as np

class Videoanalytics():
    def __init__(self, classIds, classesPath=None):
        self.classIds = classIds
        if classesPath:
            self.classes = [line.rstrip('\n') for line in open(classesPath)]
        else:
            self.classes = None
        return

    def count_objects_per_classes(self, tracker):
        results = {}
        for tracklet in tracker._tracks:
            # Get object class from tracklet
            classid = tracklet[-1].class_id
            # Add +1 to counter for object
            if classid in self.classIds:
                if classid in results:
                    results[classid] += 1
                else:
                    results[classid] = 1
        return results
    
# Draw videoanalytics
    def draw_videoanalytics(self, frame, tracker):
        (h, w) = frame.shape[:2]

        # Draw detections of chosen classes
        for tracklet in tracker._tracks:
            # Get object class from tracklet
            classid = int(tracklet[-1].class_id)
            if classid in self.classIds:
                # Draw bbox of the last detection
                x_left = int(tracklet[-1].x_left * w)
                y_bottom = int(tracklet[-1].y_bottom * h)
                x_right = int(tracklet[-1].x_right * w)
                y_top = int(tracklet[-1].y_top * h)
                cv2.rectangle(frame, (x_left, y_bottom), (x_right, y_top),
                    (0, 255, 0), 2)
        
        # Draw tracklets of chosen classes
        for track in tracker._tracks:
            # Get object class from tracklet
            classid = int(track[-1].class_id)
            if classid in self.classIds:
            # Draw one tracklet from segments 
                for i in range(len(track)-1):
                    x1 = int((track[i].x_left + track[i].x_right) * w) // 2
                    y1 = int((track[i].y_bottom + track[i].y_top) * h) // 2
                    x2 = int((track[i+1].x_left + track[i+1].x_right) * w) // 2
                    y2 = int((track[i+1].y_bottom + track[i+1].y_top) * h) // 2
                    cv2.line(img=frame, pt1=(x1,y1), pt2=(x2,y2), 
                        color=(0, 255, 0), thickness=3)

        # Draw statistics
        counts = self.count_objects_per_classes(tracker)
        for i, elem in enumerate(counts):
            if self.classes:
                id = int(elem)
                text = 'Class {}: {} objects'.format(
                    self.classes[id], counts[elem])
            else:
                text = 'Class {}: {} objects'.format(elem, counts[elem])
            text_pos = (0, i * 30 + 30)
            cv2.putText(frame, text, text_pos,
                        cv2.FONT_HERSHEY_COMPLEX, 1.0, (0, 255, 255), 2)
        return frame

    

### Implementing sample

1.	Creating an object of the InferenceEngineDetector class.

2.	Creating an object of the MatchingTracker class with empirically selected affinity weight parameters for the input video.

3.	Creating an object of the Videoanalytics class with the necessary parameters.

4.	Loading the video.

5.	Performing actions for each video frame:

    5.1.	Detecting objects in the image.
    
    5.2.	Filtering detections using the tracker.filter_detections method.
    
    5.3.	Tracking objects using the tracker.process_new_frame method.
    
    5.4.	Displaying detections and tracks of vehicles on the frame and displaying the statistics on the frame using the videoanalytics.draw_videoanalytics method.

In [None]:
model_path = 'public/ssd300/FP32/ssd300.xml'
weights_path = 'public/ssd300/FP32/ssd300.bin'
device = 'CPU' # 'CPU' or 'GPU'
input_video = 'camera.mp4'
classes_for_videoanalytics = [2,6,7,19]

ie_detector = InferenceEngineDetector(configPath=model_path, weightsPath=weights_path, device=device) 

tracker = MatchingTracker()

videoanalytics = Videoanalytics(classes_for_videoanalytics)

# Open imput video
cap = cv2.VideoCapture(input_video)

# Setup output_video
fps = cap.get(cv2.CAP_PROP_FPS)
target_width = int(cap.get(3))
target_height = int(cap.get(4))
video_output_size = (target_width, target_height)
fourcc = cv2.VideoWriter_fourcc(*'XVID')
output_video = cv2.VideoWriter('output.mp4', fourcc, fps, video_output_size)




# Main loop
timestamp = 0
while(cap.isOpened()):
    timestamp += 1
    ret, frame = cap.read()

    if ret == False:
        break
    
    detections_mat = ie_detector.detect(frame)
    detections = tracker.filter_detections(detections_mat, 0.5)

    tracker.process_new_frame(frame, detections, timestamp)

    result_image = videoanalytics.draw_videoanalytics(frame, tracker)

    output_video.write(result_image)

cap.release()
output_video.release()

In [None]:
# play result video
from IPython.display import Video
output_video = 'output.mp4'
Video(output_video, embed=True) 

You can use the resulting Videoanalytics class paired with the MatchingTracker and InferenceEngineDetector classes in your own deep learning projects.


### Дополнительные задания


The developed sample contains the minimum required functionality. As additional tasks, it is proposed to provide support for the following features:

1.	The scheme with storing information about all detected objects in tracks is not effective. It is proposed to filter non-vehicles immediately after detection and track vehicles.

2.	The detector on different frames for the same object can predict various object classes. It is supposed to implement a scheme for determining the object class based on the full track information, not just the last frame. The easiest scheme is the voting one, when the object class is determined by the majority of the bounding boxes in the track.


It is proposed to solve these tasks independently using the documentation and examples included in the OpenVINO Toolkit.