How to locate the real time difference detection in similar images

I am creating real time object detection with locate the difference between database image and real time image using sift with opencv

#object detection based on feature match using knn, sift, opencv
import cv2
import numpy as np
import os

MIN_MATCH_COUNT = 30
detector = cv2.xfeatures2d.SIFT_create()

FLANN_INDEX_KDITREE = 0
flannParam = dict(algorithm=FLANN_INDEX_KDITREE, tree=5)
flann = cv2.FlannBasedMatcher(flannParam, {})

trainImg = cv2.imread("crop image0.jpeg", 0)
trainKP, trainDesc = detector.detectAndCompute(trainImg, None)

cam = cv2.VideoCapture(0)
while True:
    ret, QueryImgBGR = cam.read()
    height, width = QueryImgBGR.shape[:2]

    # Define ROI Box Dimensions
    top_left_x = int(width / 3)
    top_left_y = int((height / 2) + (height / 4))
    bottom_right_x = int((width / 3) * 2)
    bottom_right_y = int((height / 2) - (height / 4))
    cv2.rectangle(QueryImgBGR, (top_left_x, top_left_y), (bottom_right_x, bottom_right_y), 255, 3)
    cropped = QueryImgBGR[bottom_right_y:top_left_y, top_left_x:bottom_right_x]
    QueryImg = cv2.cvtColor(cropped, cv2.COLOR_BGR2GRAY)
    queryKP, queryDesc = detector.detectAndCompute(QueryImg, None)
    matches = flann.knnMatch(queryDesc, trainDesc, k=2)

    goodMatch = []
    for m, n in matches:
        if (m.distance < 0.75 * n.distance):
            goodMatch.append(m)
    if (len(goodMatch) > MIN_MATCH_COUNT):
        tp = []
        qp = []
        for m in goodMatch:
            tp.append(trainKP[m.trainIdx].pt)
            qp.append(queryKP[m.queryIdx].pt)
        tp, qp = np.float32((tp, qp))
        H, status = cv2.findHomography(tp, qp, cv2.RANSAC, 3.0)
        h, w = trainImg.shape
        trainBorder = np.float32([[[0, 0], [0, h - 1], [w - 1, h - 1], [w - 1, 0]]])
        queryBorder = cv2.perspectiveTransform(trainBorder, H)
        cv2.rectangle(QueryImgBGR, (top_left_x, top_left_y), (bottom_right_x, bottom_right_y), (0, 255, 0), 3)
        cv2.polylines(cropped, [np.int32(queryBorder)], True, (0, 0, 255), 3)
        cv2.putText(QueryImgBGR, 'Object Found', (50, 50), cv2.FONT_HERSHEY_COMPLEX, 2, (0, 255, 0), 2)
    else:
        cv2.rectangle(cropped, (top_left_x, top_left_y), (bottom_right_x, bottom_right_y), (255, 0, 0), 3)
        cv2.putText(QueryImgBGR, 'Obj not Found', (50, 50), cv2.FONT_HERSHEY_COMPLEX, 2, (0, 0, 255), 2)
        print("Not Enough match found- %d/%d" % (len(goodMatch), MIN_MATCH_COUNT))
    cv2.imshow('result', QueryImgBGR)
    if cv2.waitKey(10) == ord('q'):
        break
cam.release()
cv2.destroyAllWindows()

in short – you don’t.

SIFT matching is to find a homography / pose, not for object detection (or image classification)

to detect objects, rather have a look at cnn’s like SSD or YOLO (which are nicely supported from opencv’s dnn module)

if you still want to do some classification based on 2d features, have a look at the BagOfWords idea

this is “Content Based Image Retrieval”. the general approach is the Bag of Words idea, as mentioned.

it is possible to use those “traditional” feature descriptors to compare images. they are somewhat low level however, they can’t capture complex structure or meaning. one would cluster the extracted descriptors and then use those “feature signatures” or “feature histograms” to compare against the database. this is not trivial, neither in theory nor implementation. you have to read scientific literature.

using Deep Learning might be “easier” because all the hard part has been “solved” by those that designed the network, its training set, and the training regime. the network would emit a vector of features that can span many levels of abstraction. you’d compare that vector against the database. there exist networks that do this for faces and whole people. you can probably use any object classification network for this. perhaps you’ll need to remove/ignore the output layer and use the feature layer before it.

I want to compare training object to real time appearing object and detect the damaged parts and missing parts from the real time appearing object. Deep learning is used to object detection and classification. so, many sample images are needed for image classification but I am having only one real object and one fake object. In future if some part of the object is missing it should intimate the object is fake and locate the missing area.with this scenario how to classify the real and fake objects. Give any idea to solve this.

Hi @revathi

I believe you already saw this tutorial about object detection with SIFT in Python:
https://docs.opencv.org/master/d1/de0/tutorial_py_feature_homography.html

It assumes you object is flat.

Because knnmatcher is fast but don’t give you all the matches, after you detected the object position and rotation, you can try a brute force BFMatcher (you can speed it up by removing already known matches and constraining it to already known ROI).

Then you can analize the reference image keypoints that weren’t matched, and see it they tell you if something is missing.

Keep in mind yours is a highly contextual problem with no general solution:

  • Will homography suit your problem? Will you need epipolar geometry instead?
  • Are SIFT matches dense enough? Will you need to tune the detector or the matching distance threshold, or use another detector?
  • Are there outliers? Can you tell if a match is an outlier?
  • Do you know which parts can be lost? Can you detect them individually?

No theory answer these questions. They depend on your specific problem.

Thanks for your suggestion

maybe you can still try with SIFT & homography, even if it is only to get a transformation, so you can “map” the corresponding regions in your images onto each other, and go on with a simple absdiff(), then refine, what you found there

thanks for your reply