# Practice №3. Video object tracking using  OpenVINO

In this practice, we solve the problem of object tracking via constructing object movement trajectories and develop a solution using OpenVINO.

To enable OpenVINO support in Jupyter Notebook please run a command line, activate the virtual environment (if any) in the command line and run the __setupvars__ script to activate OpenVINO, and then start Jupyter Notebook.

For Windows:

```bash
<virtual_enviroment>/Scripts/activate.bat
"C:\Program Files (x86)\Intel\openvino_2021\bin\setupvars.bat"
jupyter notebook
```

For Linux:
```bash
source <virtual_enviroment>/bin/activate
source /opt/intel/openvino_2021/bin/setupvars.sh
jupyter notebook
```

In [None]:
# Checking that OpenVINO is installed and connected
import openvino

Load sample video for object tracking using the code below or add your video

In [None]:
# Install wget (for Windows)
!pip install wget

# Download test video from roadcamera
!python -m wget http://hpc-education.unn.ru/files/courses/intel-dl-cv-course/camera.mp4    

In [None]:
input_video = 'camera.mp4'

from IPython.display import Video

Video(input_video, embed=True) 

### Run OpenVINO object tracking sample

Open Model Zoo contains an example of a large demo of object tracking, which uses two deep model: the one to detect objects in the image, and the other to calculate the "similarity" of objects.

For tracking we will use models trained on pedestrians.

In [None]:
# Install requirements for demo
!pip install -r "C:\Program Files (x86)\Intel\openvino_2021\deployment_tools\open_model_zoo\demos\python_demos\multi_camera_multi_target_tracking\requirements.txt"


In [None]:
# Download object object detection model
#!python /opt/intel/openvino_2021/deployment_tools/tools/model_downloader/downloader.py --name person-detection-retail-0013
!python "C:\Program Files (x86)\Intel\openvino_2021\deployment_tools\tools\model_downloader\downloader.py" --name person-detection-retail-0013

# Download person-reidentification model
#!python /opt/intel/openvino_2021/deployment_tools/tools/model_downloader/downloader.py --name person-reidentification-retail-0277  
!python "C:\Program Files (x86)\Intel\openvino_2021\deployment_tools\tools\model_downloader\downloader.py" --name person-reidentification-retail-0277 

# Start demo and save result video to file
#!python /opt/intel/openvino_2021/inference_engine/samples/python/object_detection_sample_ssd/object_detection_sample_ssd.py -i dog.jpg -m public/ssd300/FP32/ssd300.xml
!python "C:\Program Files (x86)\Intel\openvino_2021\deployment_tools\open_model_zoo\demos\python_demos\multi_camera_multi_target_tracking\multi_camera_multi_target_tracking.py" --no_show --output_video result.mp4 -i camera.mp4 --m_d intel/person-detection-retail-0013/FP32/person-detection-retail-0013.xml --m_reid intel/person-reidentification-retail-0277/FP32/person-reidentification-retail-0277.xml

In [None]:
# Watch result video with tracking
input_video = 'result.mp4'

from IPython.display import Video

Video(input_video, embed=True) 

## Developing the multi object tracking application using OpenVINO 

In this practice, we will develop an algorithm for tracking by detecting objects and finding matches (tracking-by-detection, tracking-by-matching).

### General scheme of tracking algorithm
An input of object tracking is a sequence of video frames. An output is a set of object location sequences on the input frames. The formal problem statement is represented in the lecture “Deep models for tracking objects in the video” of this course.
The general scheme of the algorithm consists of several stages. It is also assumed that at first step, separate parts of the tracks have already been constructed for a set of objects detected by the current step. A track is a sequence of object locations (bounding boxes) on video frames.

1.	Extract current video frame.

2.	Detect objects in this frame.

3.	Calculate the similarity matrix between the detected objects and objects for which separate parts of the tracks are constructed.

4.	Using the given similarity matrix, answer next questions:

    4.1.	What tracks does the detected object correspond to?
    
    4.2.	What objects appeared for the first time on the video (do not correspond to any of the existing tracks)?
    
    4.3.	For which tracks on the new frame was the object not detected (the track ended because of the object leaving the camera's field of view)?
    
5.	In accordance with the answers, update the positions in the tracks, create new tracks for newly discovered objects.

6.	Repeat from the first step until the video ends.

The first step of the described scheme is performed by standard libraries. The second step supposes solving the problem of object detection using deep learning models. Object detection based on deep learning approach was described in the previous practice. Steps 3 – 5 are described in detail below.

### Calculating similarity matrix

Similarity matrix A between N tracks and M objects is a matrix of the size N×M, where each element a_ij represents the similarity coefficient between the track Ti and the object Rj. Using this matrix, you can find the best matches between the detected objects and existing tracks.

The easiest way to calculate the similarity coefficient of the track Ti and the new object Rj is shown below:

Extract the last object position (bounding box) from the track Ti.
Compare the object in the track and the detected object Rj according to some features.

To compare it is possible to use one or more of the following features: location, shape and appearance of the object. Let D is the distance between the centers of the bounding boxes (diagonals intersection), (w1,h1) is the width and height of the last bounding box in the track, (w2,h2) is the width and height of the detected bounding box, C1,C2 are the weights of the features into the similarity coefficient. Then the formula for calculating the similarity coefficient for the location feature is as follows:


_affinity_place=e^(-C1(D^2/(w1*h1)))_,

and the formula for calculating the similarity coefficient by the shape feature is represented below:


_affinity_shapes=e^(-C2((w1-w2)/w1+(h1-h2)/h1))_.

The similarity coefficient is determined as follows:


_affinity = affinity_place*affinity_shapes_


### Searching for the best matches by similarity matrix

The problem of searching for the matches by the similarity matrix reduces to the assignment problem.
	There are {1,...,N} agents and {1,...,N} tasks, which can be distributed between these agents.
	Only one task can be assigned to each agent, and each task can be assigned to only one agent 
j=f(i) with cost a(i,j)≥0.
	The assignment problem is to find a feasible set of assignments A={(i1,j1 ),…,(in,jn )} of the minimum total cost: ∑(a(i,f(i)) →min.
The problem of searching for the best matches of the tracks and detections by the similarity matrix is the problem of maximizing the total similarity. In order to reduce this problem to the assignment problem, it is required to perform the following actions:
	Make matrix A square. We are able to add a number of empty rows and columns filled by zeros.
	Reduce the maximization problem to the minimization problem. Since 0≤a_ij≤1, it is enough to replace each element in the similarity matrix by the formula a_ij'=1-a_ij.
To solve the assignment problem, you can use the linear_sum_assignment function of the scipy.optimize package, which implements the Hungarian algorithm.

### Updating tracks

When the matches of tracks and objects were constructed, it is required to update tracks.

1.	If the similarity coefficient between the object and the track exceeds a threshold (usually in the range 0.02 – 0.5), then append the object to the track.

2.	If the object is not appended to any track, then create a new track and append the detected object to this track.


In [None]:
# Install scipy to solve similarity matrix
!pip install scipy

### Developing data structures

For this practice, please, create two classes Tracklet and MatchingTracker.

The base entity, which contains information about the object location in the image, is the named tuple DetectedObject.

- confidence is a confidence of the object location in the selected area (float).
- frame_idx is a frame number (integer).
- object_id is a track identifier (integer, takes value -1 if the object is not assigned to any track).
- timestamp is a timestamp (integer, in milliseconds), it is required to track the point in time when an object was detected.
- class_id is an object class identifier (integer, used to display the class name on the screen).
- x_left, y_bottom, x_right, y_top are coordinates of the bounding boxes in the range from 0 to 1.

In [None]:
from collections import namedtuple
DetectedObject = namedtuple(
    'DetectedObject', 
    ['confidence', 'frame_idx', 'object_id', 'timestamp', 'class_id', 'x_left', 'y_bottom', 'x_right', 'y_top',])

A class to represent a track is Tracklet. This class stores a list of DetectedObject objects and provides several methods.

- __init__ is a constructor, it creates inside itself a list of DetectedObject for storing.
- __len__ is a method to get the length of an object list.
- __getitem__ is a method to receive an object by its number.
- __add_new_detection__ is a method to add a new DetectedObject to the list.

There is a common practice in Python to create a new data type from named tuple and a class for storing and access to elements. To work with tracks in the paradigm of the Python language, the standard methods __len__ and __getitem__ are overridden.


In [None]:
class Tracklet():
    def __init__(self, detection=[]):
        self._trackedObjects = []
        self._trackedObjects.append(detection)
    def __len__(self):
        return len(self._trackedObjects)
    def __getitem__(self, position):
        return self._trackedObjects[position]
    def add_new_detection(self, detection):
        self._trackedObjects.append(detection)


The main part of this application is the MatchingTracker class. This class creates and stores tracks, and also provides matching between tracks and objects detected in the frame. The MatchingTracker class contains the following methods:

- init is a constructor, it creates inside itself a list of Tracklets.
- add_new_track is a method to add a new track.
- filter_detections is a method that filters the detection output and creates objects of type DetectedObject from the detection output.
- _shape_affiinity is a method that calculates the similarity coefficient of two objects based on their size.
- _place_affinity is a method that calculates the similarity coefficient of two objects based on shapes affinity.
- _affinity is a method that calculates the total similarity coefficient based on shape and location affinities.
- _compute_dissimilarity_matrix is a method for constructing a two-dimensional similarity matrix, which consists of similarity coefficient between tracks and detected objects.
- _solve_assignment_problem is a method to search for the best matches of tracks and detections by solving the assignment problem.
- process_new_frame is a public method that processes the new frame.
- _draw_active_track is a method for drawing existing tracks on the frame.

In [None]:
class MatchingTracker:
    """
    Class that stores and processes tracks based on images
    """

    def __init__(self, dist_weight=0.02, shape_affinity_weight=0.5,
            place_affinity_weight=0.7):
        pass
    def add_new_track(self, detection):
        pass
    def filter_detections(self, detect_mat, threshold=0.5):
        pass
    def process_new_frame(self, frame, detections, timestamp):
        pass
    def _solve_assignment_problem(self, tracks, detections):
        pass
    def _compute_dissimilarity_matrix(self, tracks, detections):
        pass
    def _affinity(self, obj1, obj2):
        pass
    def _shape_affinity(self, obj1, obj2, weight):
        pass
    def _place_affinity(self, o1, o2, weight):
        pass     
    def draw_active_tracks(self, image):
        pass


__Developing method to calculate the similarity coefficient between tracks and detections based on the place affinity__

The method of calculating the similarity coefficient between track and detection based on shape affinity takes the last object location in the track and the detection, and the coefficient “place affinity weight score”. This method calculates the similarity coefficient according to the formula presented in the algorithm described in the previous section. The result of the method is a value from 0 to 1.


__Developing method to calculate the similarity coefficient between tracks and detections based on the shape affinity__

The method for calculating the similarity coefficient between track and detection based on shape affinity takes the last object location in the track and the detection, and the weight for coefficient “shape affinity weight score”. This method calculates the similarity coefficient according to the formula presented in the algorithm described in the previous section. The result of the method is a value from 0 to 1.


__Developing method to calculate the total similarity coefficient between track and detection__

The __affinity__ method calculates the total similarity coefficient between tracks and detections using the similarity coefficients obtained with the previous two methods. The result of this function is a value from 0 to 1.

In [None]:
import math

class MatchingTracker:
    """
    Class that stores and processes tracks based on images
    """

    def __init__(self, dist_weight=0.02, shape_affinity_weight=0.5,
            place_affinity_weight=0.7):
        pass
    def add_new_track(self, detection):
        pass
    def filter_detections(self, detect_mat, threshold=0.5):
        pass
    def process_new_frame(self, frame, detections, timestamp):
        pass
    def _solve_assignment_problem(self, tracks, detections):
        pass
    def _compute_dissimilarity_matrix(self, tracks, detections):
        pass
    
# Calculate affinity
    def _affinity(self, obj1, obj2):
        shp_aff = self._shape_affinity(obj1, obj2, self._shape_affinity_weight)
        mot_aff = self._place_affinity(obj1, obj2, self._place_affinity_weight)
        return shp_aff * mot_aff
    
# Calculate shape affinity    
    def _shape_affinity(self, obj1, obj2, weight):
        obj1_width = obj1.x_right-obj1.x_left
        obj2_width = obj2.x_right-obj2.x_left
        obj1_height = obj1.y_top - obj1.y_bottom
        obj2_height = obj2.y_top - obj2.y_bottom
        w_dist = abs(obj1_width - obj2_width) / (obj1_width + obj2_width)
        h_dist = abs(obj1_height - obj2_height) / (obj1_height + obj2_height)
        return math.exp(-weight * (w_dist + h_dist))
    
# Calculate place affinity
    def _place_affinity(self, o1, o2, weight):
        obj1_x = (o1.x_left + o1.x_right) * 0.5
        obj1_y = (o1.y_top + o1.y_bottom) * 0.5
        obj2_x = (o2.x_left + o2.x_right) * 0.5
        obj2_y = (o2.y_top + o2.y_bottom) * 0.5
        obj2_width = o2.x_right-o2.x_left
        obj2_height = o2.y_top - o2.y_bottom
        x_dist = ((obj1_x - obj2_x) ** 2) / (obj2_width ** 2)
        y_dist = ((obj1_y - obj2_y) ** 2) / (obj2_height ** 2)
        return math.exp(-weight * (x_dist + y_dist))

__Developing method to construct similarity matrix of coefficients between tracks and objects__

The __compute_dissimilarity_matrix__ method constructs a similarity matrix where cell [i,j] contains a similarity coefficient of the track i and the detection j. To solve this assignment problem, the matrix should be square, it means that the number of tracks should correspond to the number of detections. If this condition is not true, empty rows and columns with zero elements should be added to the matrix.


__Developing method for solving the assignment problem__

The __solve_assignment_problem__ method searches for the best matches between the tracks and detections through solving the assignment problem. The input of this method is a list of tracks and a list of detections; the output is a set of matched pairs of track identifiers (row of the similarity matrix) and bounding box identifiers (column of the similarity matrix).


In [None]:
from scipy.optimize import linear_sum_assignment
import math

class MatchingTracker:
    """
    Class that stores and processes tracks based on images
    """

    def __init__(self, dist_weight=0.02, shape_affinity_weight=0.5,
            place_affinity_weight=0.7):
        pass
    def add_new_track(self, detection):
        pass
    def filter_detections(self, detect_mat, threshold=0.5):
        pass
    def process_new_frame(self, frame, detections, timestamp):
        pass
    
    def _affinity(self, obj1, obj2):
        shp_aff = self._shape_affinity(obj1, obj2, self._shape_affinity_weight)
        mot_aff = self._place_affinity(obj1, obj2, self._place_affinity_weight)
        return shp_aff * mot_aff      
    
    def _shape_affinity(self, obj1, obj2, weight):
        obj1_width = obj1.x_right-obj1.x_left
        obj2_width = obj2.x_right-obj2.x_left
        obj1_height = obj1.y_top - obj1.y_bottom
        obj2_height = obj2.y_top - obj2.y_bottom
        w_dist = abs(obj1_width - obj2_width) / (obj1_width + obj2_width)
        h_dist = abs(obj1_height - obj2_height) / (obj1_height + obj2_height)
        return math.exp(-weight * (w_dist + h_dist))
    
    def _place_affinity(self, o1, o2, weight):
        obj1_x = (o1.x_left + o1.x_right) * 0.5
        obj1_y = (o1.y_top + o1.y_bottom) * 0.5
        obj2_x = (o2.x_left + o2.x_right) * 0.5
        obj2_y = (o2.y_top + o2.y_bottom) * 0.5
        obj2_width = o2.x_right-o2.x_left
        obj2_height = o2.y_top - o2.y_bottom
        x_dist = ((obj1_x - obj2_x) ** 2) / (obj2_width ** 2)
        y_dist = ((obj1_y - obj2_y) ** 2) / (obj2_height ** 2)
        return math.exp(-weight * (x_dist + y_dist))
   
# Code for generation dissimilarity matrix
    def _compute_dissimilarity_matrix(self, tracks, detections):
        size = max(len(tracks), len(detections))
        diss_mat = np.zeros(shape=(size, size), dtype=float)
        for i in range(len(tracks)):
            for j in range(len(detections)):
                diss_mat[i, j] = 1.0 - \
                    self._affinity(tracks[i][-1], detections[j])
        return diss_mat

# Code for solving dissimilarity matrix
    def _solve_assignment_problem(self, tracks, detections):
        dissimilarity_mat = self._compute_dissimilarity_matrix(
            tracks, detections)
        row_ind, col_ind = linear_sum_assignment(dissimilarity_mat)
        return row_ind, col_ind

__Developing method for filtering detections before assignment__

The filter_detections method receives the result of object detection using a deep model (for example, the SSD300 model considered in the previous practice), and creates a list of DetectedObject objects that will be processed. Objects for which the confidence is less than a certain threshold (0.5 by default) are discarded and are not included in the list of detected objects.

In [None]:
from scipy.optimize import linear_sum_assignment
import math

class MatchingTracker:
    """
    Class that stores and processes tracks based on images
    """

    def __init__(self, dist_weight=0.02, shape_affinity_weight=0.5,
            place_affinity_weight=0.7):
        pass
    def add_new_track(self, detection):
        pass
    def process_new_frame(self, frame, detections, timestamp):
        pass
    
    def _affinity(self, obj1, obj2):
        shp_aff = self._shape_affinity(obj1, obj2, self._shape_affinity_weight)
        mot_aff = self._place_affinity(obj1, obj2, self._place_affinity_weight)
        return shp_aff * mot_aff      
    
    def _shape_affinity(self, obj1, obj2, weight):
        obj1_width = obj1.x_right-obj1.x_left
        obj2_width = obj2.x_right-obj2.x_left
        obj1_height = obj1.y_top - obj1.y_bottom
        obj2_height = obj2.y_top - obj2.y_bottom
        w_dist = abs(obj1_width - obj2_width) / (obj1_width + obj2_width)
        h_dist = abs(obj1_height - obj2_height) / (obj1_height + obj2_height)
        return math.exp(-weight * (w_dist + h_dist))
    
    def _place_affinity(self, o1, o2, weight):
        obj1_x = (o1.x_left + o1.x_right) * 0.5
        obj1_y = (o1.y_top + o1.y_bottom) * 0.5
        obj2_x = (o2.x_left + o2.x_right) * 0.5
        obj2_y = (o2.y_top + o2.y_bottom) * 0.5
        obj2_width = o2.x_right-o2.x_left
        obj2_height = o2.y_top - o2.y_bottom
        x_dist = ((obj1_x - obj2_x) ** 2) / (obj2_width ** 2)
        y_dist = ((obj1_y - obj2_y) ** 2) / (obj2_height ** 2)
        return math.exp(-weight * (x_dist + y_dist))
    def _compute_dissimilarity_matrix(self, tracks, detections):
        size = max(len(tracks), len(detections))
        diss_mat = np.zeros(shape=(size, size), dtype=float)
        for i in range(len(tracks)):
            for j in range(len(detections)):
                diss_mat[i, j] = 1.0 - \
                    self._affinity(tracks[i][-1], detections[j])
        return diss_mat
    def _solve_assignment_problem(self, tracks, detections):
        dissimilarity_mat = self._compute_dissimilarity_matrix(
            tracks, detections)
        row_ind, col_ind = linear_sum_assignment(dissimilarity_mat)
        return row_ind, col_ind

# Filtering objects with low confidence
    def filter_detections(self, detect_mat, threshold=0.5):
        detect_mat = detect_mat[0, 0, :, :]
        detections = []
        for i in range(detect_mat.shape[0]):
            # Parse one string in ie_detection_output
            conf = detect_mat[i, 2]
            if conf > threshold:
                detection = DetectedObject(
                    detect_mat[i, 2],
                    -1,
                    -1,
                    -1,
                    detect_mat[i, 1],
                    detect_mat[i, 3],
                    detect_mat[i, 4],
                    detect_mat[i, 5],
                    detect_mat[i, 6])
                detections.append(detection)
        return detections    
    

__Developing method to create a new track__

This method receives a new detection as an input, creates a new track and adds it to the track list.


__Developing method for processing a video frame__

The method receives a list of bounding boxes detected on the new frame as an input, constructs a similarity matrix, and solves the assignment problem. For the matches, the similarity coefficient for which is above the threshold, we update the corresponding tracks. If the similarity coefficient is low, then new tracks are created for such detections.

In [None]:
from scipy.optimize import linear_sum_assignment
import math

class MatchingTracker:
    """
    Class that stores and processes tracks based on images
    """

    def __init__(self, dist_weight=0.02, shape_affinity_weight=0.5,
            place_affinity_weight=0.7):
        pass
    
    def _affinity(self, obj1, obj2):
        shp_aff = self._shape_affinity(obj1, obj2, self._shape_affinity_weight)
        mot_aff = self._place_affinity(obj1, obj2, self._place_affinity_weight)
        return shp_aff * mot_aff      
    
    def _shape_affinity(self, obj1, obj2, weight):
        obj1_width = obj1.x_right-obj1.x_left
        obj2_width = obj2.x_right-obj2.x_left
        obj1_height = obj1.y_top - obj1.y_bottom
        obj2_height = obj2.y_top - obj2.y_bottom
        w_dist = abs(obj1_width - obj2_width) / (obj1_width + obj2_width)
        h_dist = abs(obj1_height - obj2_height) / (obj1_height + obj2_height)
        return math.exp(-weight * (w_dist + h_dist))
    
    def _place_affinity(self, o1, o2, weight):
        obj1_x = (o1.x_left + o1.x_right) * 0.5
        obj1_y = (o1.y_top + o1.y_bottom) * 0.5
        obj2_x = (o2.x_left + o2.x_right) * 0.5
        obj2_y = (o2.y_top + o2.y_bottom) * 0.5
        obj2_width = o2.x_right-o2.x_left
        obj2_height = o2.y_top - o2.y_bottom
        x_dist = ((obj1_x - obj2_x) ** 2) / (obj2_width ** 2)
        y_dist = ((obj1_y - obj2_y) ** 2) / (obj2_height ** 2)
        return math.exp(-weight * (x_dist + y_dist))
    
    def _compute_dissimilarity_matrix(self, tracks, detections):
        size = max(len(tracks), len(detections))
        diss_mat = np.zeros(shape=(size, size), dtype=float)
        for i in range(len(tracks)):
            for j in range(len(detections)):
                diss_mat[i, j] = 1.0 - \
                    self._affinity(tracks[i][-1], detections[j])
        return diss_mat
    
    def _solve_assignment_problem(self, tracks, detections):
        dissimilarity_mat = self._compute_dissimilarity_matrix(
            tracks, detections)
        row_ind, col_ind = linear_sum_assignment(dissimilarity_mat)
        return row_ind, col_ind

    def filter_detections(self, detect_mat, threshold=0.5):
        detect_mat = detect_mat[0, 0, :, :]
        detections = []
        for i in range(detect_mat.shape[0]):
            # Parse one string in ie_detection_output
            conf = detect_mat[i, 2]
            if conf > threshold:
                detection = DetectedObject(
                    detect_mat[i, 2],
                    -1,
                    -1,
                    -1,
                    detect_mat[i, 1],
                    detect_mat[i, 3],
                    detect_mat[i, 4],
                    detect_mat[i, 5],
                    detect_mat[i, 6])
                detections.append(detection)
        return detections  

# Add new tracklet to tracker
    def add_new_track(self, detection):
        track = Tracklet(detection)
        self._tracks.append(track)
        
# Process new frame of video
    def process_new_frame(self, frame, detections, timestamp):
        """
        First step is to compute an affinity matrix between tracklets and detections.
        The affinity equals to place_affinity * shape_affinity.
        Where appearance is 1 - distance(tracklet_fast_dscr, detection_fast_dscr).
        Second step is to solve the assignment problem using Kuhn-Munkres algorithm.
        If correspondence between some tracklet and detection is established with
        low confidence (affinity) then the strong descriptor is used to determine
        if there is correspondence between tracklet and detection.
        """
        if self._tracks and detections:
            row_indexes, col_indexes = self._solve_assignment_problem(
                self._tracks, detections)
            # For each assignment
            for i in range(len(row_indexes)):
                row_id = row_indexes[i]
                col_id = col_indexes[i]
                # If we find existing track and existing detection
                if col_id < len(self._tracks) and col_id < len(detections):
                    # Add detection to track if objects are close
                    dist = 1.0 - \
                        self._affinity(
                            self._tracks[row_id][-1], detections[col_id])
                    if dist > self._dist_weight:
                        self._tracks[row_id].add_new_detection(
                            detections[col_id])
                elif col_id < len(detections):
                    self.add_new_track(detections[col_id])
        else:
            # Add new tracks
            for id, detection in enumerate(detections):
                self.add_new_track(detection)
        return        

__Developing method for displaying tracks__

The draw_active_tracks method draws tracks on the frame. Each track is drawn as follows: the centers of the bounding boxes in frames i and i+1 are calculated and a line is drawn between them.


__Creating entity for object tracking__

The constructor of the MatchingTracker class creates an empty list of tracks and sets parameters for calculating the similarity coefficients of objects. These parameters are set for a specific video separately.

In [None]:
from scipy.optimize import linear_sum_assignment
import math

class MatchingTracker:
    """
    Class that stores and processes tracks based on images
    """

    def _affinity(self, obj1, obj2):
        shp_aff = self._shape_affinity(obj1, obj2, self._shape_affinity_weight)
        mot_aff = self._place_affinity(obj1, obj2, self._place_affinity_weight)
        return shp_aff * mot_aff      
    
    def _shape_affinity(self, obj1, obj2, weight):
        obj1_width = obj1.x_right-obj1.x_left
        obj2_width = obj2.x_right-obj2.x_left
        obj1_height = obj1.y_top - obj1.y_bottom
        obj2_height = obj2.y_top - obj2.y_bottom
        w_dist = abs(obj1_width - obj2_width) / (obj1_width + obj2_width)
        h_dist = abs(obj1_height - obj2_height) / (obj1_height + obj2_height)
        return math.exp(-weight * (w_dist + h_dist))
    
    def _place_affinity(self, o1, o2, weight):
        obj1_x = (o1.x_left + o1.x_right) * 0.5
        obj1_y = (o1.y_top + o1.y_bottom) * 0.5
        obj2_x = (o2.x_left + o2.x_right) * 0.5
        obj2_y = (o2.y_top + o2.y_bottom) * 0.5
        obj2_width = o2.x_right-o2.x_left
        obj2_height = o2.y_top - o2.y_bottom
        x_dist = ((obj1_x - obj2_x) ** 2) / (obj2_width ** 2)
        y_dist = ((obj1_y - obj2_y) ** 2) / (obj2_height ** 2)
        return math.exp(-weight * (x_dist + y_dist))
    
    def _compute_dissimilarity_matrix(self, tracks, detections):
        size = max(len(tracks), len(detections))
        diss_mat = np.zeros(shape=(size, size), dtype=float)
        for i in range(len(tracks)):
            for j in range(len(detections)):
                diss_mat[i, j] = 1.0 - \
                    self._affinity(tracks[i][-1], detections[j])
        return diss_mat
    
    def _solve_assignment_problem(self, tracks, detections):
        dissimilarity_mat = self._compute_dissimilarity_matrix(
            tracks, detections)
        row_ind, col_ind = linear_sum_assignment(dissimilarity_mat)
        return row_ind, col_ind

    def filter_detections(self, detect_mat, threshold=0.5):
        detect_mat = detect_mat[0, 0, :, :]
        detections = []
        for i in range(detect_mat.shape[0]):
            # Parse one string in ie_detection_output
            conf = detect_mat[i, 2]
            if conf > threshold:
                detection = DetectedObject(
                    detect_mat[i, 2],
                    -1,
                    -1,
                    -1,
                    detect_mat[i, 1],
                    detect_mat[i, 3],
                    detect_mat[i, 4],
                    detect_mat[i, 5],
                    detect_mat[i, 6])
                detections.append(detection)
        return detections  

    def add_new_track(self, detection):
        track = Tracklet(detection)
        self._tracks.append(track)
        
    def process_new_frame(self, frame, detections, timestamp):
        if self._tracks and detections:
            row_indexes, col_indexes = self._solve_assignment_problem(
                self._tracks, detections)
            # For each assignment
            for i in range(len(row_indexes)):
                row_id = row_indexes[i]
                col_id = col_indexes[i]
                # If we find existing track and existing detection
                if row_id < len(self._tracks) and col_id < len(detections):
                    # Add detection to track if objects are close
                    dist = 1.0 - \
                        self._affinity(
                            self._tracks[row_id][-1], detections[col_id])
                    if dist < self._dist_weight:
                        self._tracks[row_id].add_new_detection(
                            detections[col_id])
                elif col_id < len(detections):
                    self.add_new_track(detections[col_id])
        else:
            # Add new tracks
            for id, detection in enumerate(detections):
                self.add_new_track(detection)
        return

# Constructor
    def __init__(self, dist_weight=0.5, shape_affinity_weight=0.5,
            place_affinity_weight=0.7):
        self._tracks = []
        self._tracks_counter = 0
        self._dist_weight = dist_weight
        self._shape_affinity_weight = shape_affinity_weight
        self._place_affinity_weight = place_affinity_weight
        return
    
# Draw active tracklets
    def draw_active_tracks(self, image):
        w, h = image.shape[:2]
        for i, track in enumerate(self._tracks):
            # Draw one track from segments
            for i in range(len(track)-1):
                cv2.line(img=image,
                    pt1=(int((track[i].x_left + track[i].x_right) * h) // 2,
                         int((track[i].y_bottom + track[i].y_top) * w) // 2),
                    pt2=(int((track[i+1].x_left + track[i+1].x_right) * h) // 2,
                         int((track[i+1].y_bottom + track[i+1].y_top) * w) // 2),
                    color=(0, 255, 0), thickness=3)
        return image

To track objects, you will need to use a ready-made detector. We will use the detector that we developed in our previous practice, as well as the ssd300 model that we downloaded and converted in our previous practice.

In [None]:
import cv2
import numpy as np
from openvino.inference_engine import IECore

class InferenceEngineDetector:
    def __init__(self, configPath=None, weightsPath=None,
            device='CPU', extension=None, classesPath=None):
        self.ie = IECore()
        self.net = self.ie.read_network(model=configPath, weights=weightsPath)
        if extension:
            self.ie.add_extension(extension, 'CPU')
        self.exec_net = self.ie.load_network(network=self.net,
                                             device_name=device)
        if classesPath:
            self.classes = [line.rstrip('\n') for line in open(classesPath)]
        return

    def _prepare_image(self, image, h, w):
        if image.shape[:-1] != (h, w):
            image = cv2.resize(image, (w, h))
        image = image.transpose((2, 0, 1))
        return image

    def detect(self, image):
        input_blob = next(iter(self.net.input_info))
        out_blob = next(iter(self.net.outputs))
        n, c, h, w = self.net.input_info[input_blob].input_data.shape
        blob = self._prepare_image(image, h, w)
        output = self.exec_net.infer(inputs={input_blob: blob})
        detection = output[out_blob]
        return detection

    def draw_detection(self, detections, img, confidence=0.5, draw_text=True):
        (h, w) = img.shape[:2]
        for i in range(0, detections.shape[2]):
            conf = detections[0, 0, i, 2]
            if conf > confidence:
                box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
                (startX, startY, endX, endY) = box.astype("int")
                try:
                    text = "Class {} {:.2f}%".format(
                        self.classes[int(detections[0, 0, i, 1])], conf * 100)
                except:
                    text = "Class {} {:.2f}%".format(
                        detections[0, 0, i, 1], conf * 100)
                y = startY - 10 if startY - 10 > 10 else startY + 10
                cv2.rectangle(img, (startX, startY), (endX, endY),
                              (0, 255, 0), 2)
                if draw_text:
                    cv2.putText(img, text, (startX, y),
                                cv2.FONT_HERSHEY_COMPLEX, 0.45, (0, 0, 255), 1)
        return img    

### Implementing sample

1.	Creating an object of the InferenceEngineDetector class.
2.	Creating an object of the MatchingTracker class with empirically selected affinity weight parameters for the input video.
3.	Loading the video.
4.	Repeating the following actions for each video frame:

    4.1.	Detecting objects in the frame.

    4.2.	Filtering detections using the tracker.filter_detections method.

    4.3.	Tracking objects using the tracker.process_new_frame method.

    4.4.	Drawing tracking results on the frame and displaying on the screen.



In [None]:
model_path = 'public/ssd300/FP32/ssd300.xml'
weights_path = 'public/ssd300/FP32/ssd300.bin'
device = 'CPU' # 'CPU' or 'GPU'
input_video = 'camera.mp4'

ie_detector = InferenceEngineDetector(configPath=model_path, weightsPath=weights_path, device=device) 

tracker = MatchingTracker()

# Open imput video
cap = cv2.VideoCapture(input_video)

# Setup output_video
fps = cap.get(cv2.CAP_PROP_FPS)
target_width = int(cap.get(3))
target_height = int(cap.get(4))
video_output_size = (target_width, target_height)
fourcc = cv2.VideoWriter_fourcc(*'XVID')
output_video = cv2.VideoWriter('output.mp4', fourcc, fps, video_output_size)

print(fps)

# Main loop
timestamp = 0
while(cap.isOpened()):
    timestamp += 1
    ret, frame = cap.read()

    if ret == False:
        break
    
    detections_mat = ie_detector.detect(frame)
    detections = tracker.filter_detections(detections_mat, 0.5)

    tracker.process_new_frame(frame, detections, timestamp)

    tracks_image = tracker.draw_active_tracks(frame)

    result_image = ie_detector.draw_detection(detections_mat, 
        tracks_image)

    output_video.write(result_image)

cap.release()
output_video.release()

In [None]:
# Check the trackin
# Play video with shown tracklets
output_video = 'output.mp4'
Video(output_video, embed=True) 

You can use the resulting MatchingTracker class paired with InferenceEngineDetector in your own deep learning projects.


### Additional tasks

The developed sample contains the minimum required functionality. As additional tasks, it is proposed to provide support for the following features:
1.	Displaying the total number of detected objects on each frame and the number of objects by class. We propose to display information about the most numerous three classes.
2.	Limiting the number of object classes depending on the video context. For example, if a detection model trained on the PASCAL VOC dataset and we process video from crossroad camera, then only traffic objects (cars, buses, pedestrians) have to be detected and tracked.
3.	Supporting for dividing tracks into “active” and “inactive”. By “inactive” we mean tracks for which the corresponding object was not found within a certain period of time (for example, within a few seconds). It is proposed to draw tracks using different colors during this time period, and after this time not to display these tracks.
It is proposed to solve these tasks independently using the documentation and examples included in the OpenVINO Toolkit.