Object reconstruction from structure from motion with mask RCNN

I’m doing object reconstruction from structure from motion. The current situation right now, is I’m getting a multiple views for a car and apply a mask rcnn for that object to remove the background, because I only want that object to reconstruct and have a clean object.

  1. My current issues, right now are that The Object is not fully reconstructured.
  2. The mask that I get from Mask RCNN is not always have a fixed size for the SFM to work
  3. A background noise is still present in the scene reconstructed object
  4. Camera parameters are messed up when I use only the mask that are got from different views, how to fix that ?

Here are some results:

Original image of course there are mutliple views of it

Mask RCNN results that I use for SFM

enter image description here

enter image description here

enter image description here

and here is the result from SFM

enter image description here

// Draw the predicted bounding box, colorize and show the mask on the image
void drawBox(Mat& frame, int classId, float conf, Rect box, Mat& objectMask, std::vector<Mat> &contours_images)
    //Draw a rectangle displaying the bounding box
    //rectangle(frame, Point(box.x, box.y), Point(box.x + box.width, box.y + box.height), Scalar(255, 178, 50), 3);

    //Get the label for the class name and its confidence
    string label = format("%.2f", conf);
    if (!classes.empty())
        CV_Assert(classId < (int)classes.size());
        label = classes[classId] + ":" + label;

    //Display the label at the top of the bounding box
    int baseLine;
    Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);
    box.y = max(box.y, labelSize.height);
    //rectangle(frame, Point(box.x, box.y - round(1.5 * labelSize.height)), Point(box.x + round(1.5 * labelSize.width), box.y + baseLine), Scalar(255, 255, 255), FILLED);
    //putText(frame, label, Point(box.x, box.y), FONT_HERSHEY_SIMPLEX, 0.75, Scalar(0, 0, 0), 1);

    // Resize the mask, threshold, color and apply it on the image

   // Scalar color = colors[classId % colors.size()];

    // Resize the mask, threshold, color and apply it on the image
    resize(objectMask, objectMask, Size(box.width, box.height));
    Mat mask = (objectMask > maskThreshold);
    //Mat coloredRoi = (0.3 * color + 0.7 * frame(box));
   // coloredRoi.convertTo(coloredRoi, CV_8UC3);
    Mat coloredRoi(frame(box));
    // Draw the contours on the image
    vector<Mat> contours;
    Mat hierarchy;
    mask.convertTo(mask, CV_8U);
    findContours(mask, contours, hierarchy, RETR_CCOMP, cv::CHAIN_APPROX_NONE);
    //drawContours(coloredRoi, contours, -1, color, 5, LINE_8, hierarchy, 100);
  //  coloredRoi.copyTo(frame(box), mask);
    Mat outframe;
    coloredRoi.copyTo(outframe, mask);
    cv::resize(outframe, outframe, cv::Size(400, 400));
   // imshow("outframe", outframe);
   // waitKey(0);

    // For each frame, extract the bounding box and mask for each detected object
    void postprocess(Mat& frame, const vector<Mat>& outs, vector<Mat> & maskes)
        Mat outDetections = outs[0];
        Mat outMasks = outs[1];
        // Output size of masks is NxCxHxW where
        // N - number of detected boxes
        // C - number of classes (excluding background)
        // HxW - segmentation shape
        const int numDetections = outDetections.size[2];
        const int numClasses = outMasks.size[1];
        outDetections = outDetections.reshape(1, outDetections.total() / 7);
        for (int i = 0; i < numDetections; ++i)
            float score = outDetections.at<float>(i, 2);
            if (score > confThreshold)
                // Extract the bounding box
                int classId = static_cast<int>(outDetections.at<float>(i, 1));
                int left = static_cast<int>(frame.cols * outDetections.at<float>(i, 3));
                int top = static_cast<int>(frame.rows * outDetections.at<float>(i, 4));
                int right = static_cast<int>(frame.cols * outDetections.at<float>(i, 5));
                int bottom = static_cast<int>(frame.rows * outDetections.at<float>(i, 6));
                left = max(0, min(left, frame.cols - 1));
                top = max(0, min(top, frame.rows - 1));
                right = max(0, min(right, frame.cols - 1));
                bottom = max(0, min(bottom, frame.rows - 1));
                Rect box = Rect(left, top, right - left + 1, bottom - top + 1);
                // Extract the mask for the object
                Mat objectMask(outMasks.size[2], outMasks.size[3], CV_32F, outMasks.ptr<float>(i, classId));
                // Draw bounding box, colorize and show the mask on the image
                drawBox(frame, classId, score, box, objectMask, maskes);


The problems you are having seem to be related to the errors of mask-RCNN. Did you try to do SFM on the original video and then try to extract the object of interest?

@kbarni Yes I did that on the original videos and seems fine. but how do I extract the object of interest after SFM ? it’s a 3D Point cloud…

The original after SFM works fine… but I do get background information which I don’t want.

The Point cloud library is the solution for point cloud segmentation and cleanup.

@kbarni I know but do you know a method for ANY SFM result object segmenetation ?

@kbarni How depth map generated from SFM pipeline can be used in that purpose ?

I never really worked with PCL… it’s not really the subject of this forum.