Correlate detected bounding boxes between two cameras

hello guys,

i use change detection in order to find moving objects in a scene. I also use two cameras mounted on the wall. i want to find the correlation between the same objects in the scene in order to do further processing with the respective objects. As a first step, i correctly find the bounding boxes around the objects but i want to correlate them. I’ve already used histogram matching inside the boxes but the results are not precise. Is there any possible solution?

the code is:

import cv2
import numpy as np
from change_detection_boxes import Detects
def change_det(substruction, frame, num):
bb = []


mask = substruction.apply(frame)

blur = cv2.GaussianBlur(mask, (5, 5), 0)

_, threshold = cv2.threshold(blur, 130, 255, cv2.THRESH_BINARY)

# applying erosion - dilation to eliminate noise

kernel_dil = np.ones((9, 9), np.uint8)
dilation = cv2.dilate(threshold, kernel_dil, iterations = 1)   
contours, hier = cv2.findContours(dilation, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)

if not contours:

cntrs = sorted(contours, key = cv2.contourArea, reverse = True)

cv2.namedWindow(f'Camera{num}', cv2.WINDOW_NORMAL)

for i in cntrs:
        if cv2.contourArea(i)> 10_000:
            # making contours convex
            hull = cv2.convexHull(i, clockwise = True, returnPoints = True)
            # contour approximation
            epsilon = 0.00001 * cv2.arcLength(hull,  False)
            approx = cv2.approxPolyDP(hull, epsilon, True)
            x, y, w, h = cv2.boundingRect(i)
            bb.append([x, y, w, h])
            # drawing and showing convexed aproximated contours
            img2 = cv2.drawContours(frame , [approx], -1, (255,0,0), 3)
            # drawing bounding boxes through contours
            cv2.rectangle(img2, (x, y), (x + w, y + h), (0,255,0),2)
            cv2.imshow(f'Camera{num}', img2)
            key = cv2.waitKey(20)
            if key == ord('q'):



if __name__ == '__main__':

substructionL = cv2.createBackgroundSubtractorMOG2()
substructionR = cv2.createBackgroundSubtractorMOG2()

video_right = cv2.VideoCapture('/home/karas/Desktop/ΔΙΠΛΩΜΑΤΙΚΗ_lastTry/change_detection/right_cut_video.mp4')
video_left = cv2.VideoCapture('/home/karas/Desktop/ΔΙΠΛΩΜΑΤΙΚΗ_lastTry/change_detection/left_cut_video.mp4')

while True:
    _, frameL =
    _, frameR =
    bbL = change_det(substructionL, frameL, num = 4)
    bbR = change_det(substructionR, frameR, num = 3)
    if not bbL or not bbR :
    detectionsL = [Detects(box) for box in bbL]
    detectionsR = [Detects(box) for box in bbR]


welcome. could you illustrate your situation with some pictures? also consider posting your code in proper format. python code actually needs correct indentation to be understandable.

hello. i know that python needs correct indentation in order to be run correctly. Unfortunately i am new on posting questions so i apologize of not knowing how to post them correctly. As for the pictures, i am posting two screenshots of a person from the two cameras. I just want to find an efficient way to know that this person is the same in each of my loops.left

I only upload one image because i am a new user and i am not allowed to post the other one.

My ultimate goal is to calculate the 3D coordinates of the corners of the boxes in order to create a 3D bounding box with the change detection algorithm. But if i have more than one person in the room, i want to pick the corresponding bounding boxes.

thanks for your reply!

okay, so your situation is multiple view geometry and you need depth information.

book recommendation: hartley and zisserman. several others are probably good too but I only know that one on the subject.

if the views/angles are sufficiently similar, you could solve this “cheaply” by calculating features in both boxes (sift/akaze) and then matching these features (flann matcher). if it’s the same object, you get many good matches.

or you could set your cameras up to be a stereo pair, and calculate depth information explicitly. that’s a somewhat expensive algorithm, which is why people usually run it in hardware, i.e. use “depth cameras”. then, given a 2d bounding box, you can determine the 3d points for that box, and reproject those 3d points into the other camera… if that is still what you actually need.

if your views are totally different, you would probably need neural networks to calculate stronger features you can use for matching/identification. I’ve heard of a “Re-ID” network that was trained for such a purpose.

overall, anything you do to solve your problem will be computationally more expensive than simple pixelwise change detection, so you might as well rethink the whole problem and approach.