How to find similarity of 3-4 images against the one captured with two cameras placed side by side

I reviewed many related posts, but didn’t find a specific case to my question.

I have the following image crops saved from a detection model.

I would like to find similarity of this grey crop image (b) with a somehow combined version of the above four (a). Both cameras are located next to each other, but their images are different in terms of field of view, brightness, etc. One camera (gray image source) takes pictures with some arbitrary latency. Because of this latency, color camera may record many vehicles. Therefore, I need to make sure that the captured gray image belongs to a specific vehicle color camera has seen.

  1. How can I combine the features of those 4 color crop images into one so that when compared with the gray crop image, it gives higher similarity score?

  2. Or is it better to compare 1 vs 1 ?

I hope to get as much information as possible.
Thanks.