Corners identification with OpenCV.js

Hi, I hope Javascript and Java are falling the the same category.

Using OpenCV.js I’m trying to identify the corners of a card mask in a picture. I asked 3 different LLM to generate the code for that but they all fail. Every time, it returns the outer picture (640x640). The card mask is represented as a quadrilateral. Not a rectangle, as it might have some slight inclination.

Here is an example.

All the solutions where based on findContours and approxPolyDP but none are working correctly. So I have decided to do it manually.

I’m wondering what will be the best logic here. I looked at approxPolyDP but I’m not sure it will work. The breaks on the contour will cause issues. It seems to also cause troubles to findContours. I tried to make the lines bigger, to fill the gaps, but some gaps are so big that it takes huge lines and they start touching the sides.

Is there any recommended approach?

  1. Connect the lines
  2. find the contours

Below is what I have so far, but it doesn’t work.

function approximateCardQuad(maskTensor, debugCanvas = null) {
    const height = maskTensor.shape[0];
    const width = maskTensor.shape[1];

    // Convert tensor to binary mask mat
    const maskData = tf.tidy(() => maskTensor.mul(255).toInt().dataSync());
    const maskUint8 = new Uint8Array(maskData);
    const maskMat = cv.matFromArray(height, width, cv.CV_8UC1, maskUint8);

    const thresholdMat = new cv.Mat();
    cv.threshold(maskMat, thresholdMat, 127, 255, cv.THRESH_BINARY);

    const contours = new cv.MatVector();
    const hierarchy = new cv.Mat();
    cv.findContours(thresholdMat, contours, hierarchy, cv.RETR_LIST, cv.CHAIN_APPROX_SIMPLE);

    console.log(`Found ${contours.size()} contours`);
	
	console.log(`--- Logging Contours (${new Date().toLocaleString("en-CA", {timeZone: "America/Toronto"})}) ---`); // Laval is EST/EDT
	console.log(`Total contours found: ${contours.size()}`);

	// Iterate through each contour found
	for (let i = 0; i < contours.size(); ++i) {
	    const contourMat = contours.get(i); // Get the cv.Mat for the i-th contour

	    // Get the number of points in this contour
	    const numPoints = contourMat.rows; // For contour Mats, rows usually equals number of points

	    console.log(`Contour ${i} has ${numPoints} points:`);

	    if (numPoints > 0) {
	        const points = [];
	        // Access the contour point data. It's typically stored as signed 32-bit integers.
	        // The layout is usually [x1, y1, x2, y2, x3, y3, ...] in the data32S array.
	        const contourData = contourMat.data32S;

	        for (let j = 0; j < numPoints; ++j) {
	            const x = contourData[j * 2];     // Get the x-coordinate
	            const y = contourData[j * 2 + 1]; // Get the y-coordinate
	            points.push({ x: x, y: y });      // Add point object to array

	            // --- OR --- Log each point individually (can be very verbose)
	            // console.log(`  Point ${j}: (x: ${x}, y: ${y})`);
	        }

	        // Log the array of points for the current contour
	        // Using JSON.stringify can be helpful if contours have many points,
	        // otherwise, logging the array directly gives an expandable object.
	        // console.log(points);
	        console.log(JSON.stringify(points)); // More compact console output

	    } else {
	        console.log("  (Contour is empty)");
	    }

	    // IMPORTANT: Do NOT delete contourMat here if you got it using contours.get(i).
	    // The MatVector 'contours' owns these Mats. Deleting the MatVector later
	    // in your 'finally' block will handle cleanup.
	    // If you had *cloned* the contour, you would delete the clone.

	} // End loop through contours

	console.log("--- Finished Logging Contours ---");

    let bestQuad = null;
    let maxArea = 0;

    // Optional debug draw
    let debugMat;
    if (debugCanvas) {
        debugMat = new cv.Mat.zeros(height, width, cv.CV_8UC3);
    }

    for (let i = 0; i < contours.size(); i++) {
        const cnt = contours.get(i);
        const area = cv.contourArea(cnt);
        if (area < 1000) continue; // Skip tiny noise

        // Approximate with a polygon
        const perimeter = cv.arcLength(cnt, true);
        const approx = new cv.Mat();
        cv.approxPolyDP(cnt, approx, 0.02 * perimeter, true);

        console.log(`Contour ${i}: area=${area}, approxPts=${approx.rows}`);

        // Optional draw for debug
        if (debugCanvas) {
            const color = new cv.Scalar(255, 0, 255); // Magenta
            cv.drawContours(debugMat, contours, i, color, 1, cv.LINE_8, hierarchy, 100);
        }

        // Prefer 4-point polygon if possible
        if (approx.rows === 4 && area > maxArea) {
            maxArea = area;
            bestQuad = approx.clone();
        }

        cnt.delete();
        approx.delete();
    }

    // Show debug canvas
    if (debugCanvas && debugMat) {
        cv.imshow(debugCanvas, debugMat);
        debugMat.delete();
    }

    thresholdMat.delete();
    maskMat.delete();
    contours.delete();
    hierarchy.delete();

    if (bestQuad) {
        const result = [];
        for (let i = 0; i < 4; i++) {
            result.push({
                x: bestQuad.intPtr(i, 0)[0],
                y: bestQuad.intPtr(i, 0)[1]
            });
        }
        bestQuad.delete();
        return result;
    }

    console.warn("No valid quadrilateral found.");
    return null;
}

Thaks,

JMS

nope. but you’re welcome, nonetheless :wink:

bc findContours expects white fg on black bg, you got it in reverse.

we don’t have a category specifically for javascript (yet). “uncategorized” is fine, with the javascript tag.

the opencv.js project was a GSoC project, a kind of feasibility study. there is hardly any code using it for LLMs to learn from.

approxPolyDP was never useful for things with rounded corners. there’s a new approxPolyN function that you should take a look at. for things with rounded corners, it makes sense.

I have no idea if that’s already in the whitelist for transpilation with emscripten (for opencv.js). if it isn’t, then you can try that yourself. build opencv yourself, specifically the opencv.js part of it. when that works as is, find the whitelist, add your API of interest to it, and see if the build still succeeds.

yes: show the source image for discussion. you presented something that has no more life in it, no more information.

Thanks for the replies here. I will try to revert the colors and retry the contour function.

There isn’t really a “source” image. This is the output of a ML model doing picture segmentation. This is the generated output mask, not modified, straight out of Tensor.js processing a video stream from a webcam of unique cards on different surfaces. (Example below) The goal is to see if those masks can be used for further processing. The model is also generating the bounding box, but I want the corners. The model takes a 640x640 input. Therefore the squares ratio of the masks.

I took a look and approxPolyN is not in OpenCV.js :-/

here is an updated input picture where I was able to remove the noise on the edges, and invert the colors. There are 38 contours. I will see what I can do with that… I will post my progress here.

I got this manual approach working where I search for the closest point to each corner. It’s not THAT bad of an approximation as long the card is not tilted too much. But there is a few issues. Like, when there is a “hole” in the corner, the detection is not super good. And when there is a noisy pixel somewhere, it can make things totally wrong. I will also have to assess the speed. I’m pretty sure their is a better solution.

/**
 * Estimate a quadrilateral by finding the closest white pixel to each corner of the mask.
 * @param {tf.Tensor} maskTensor - A binary mask tensor of shape [height, width] with values 0 or 1.
 * @returns {[{x: number, y: number}, ...] | null} Array of four closest points or null if any corner fails.
 */
function estimateQuadFromCorners(maskTensor) {
    const height = maskTensor.shape[0];
    const width = maskTensor.shape[1];
    const data = tf.tidy(() => maskTensor.toInt().dataSync()); // 0 or 1
    const getPixel = (x, y) => data[y * width + x];

    const whitePixels = [];

    // Collect all white pixels
    for (let y = 0; y < height; y++) {
        for (let x = 0; x < width; x++) {
            if (getPixel(x, y) > 0) {
                whitePixels.push({ x, y });
            }
        }
    }

    if (whitePixels.length === 0) {
        console.warn("No white pixels found in mask.");
        return null;
    }

    // Helper to find the closest pixel to a reference point
    function findClosest(refX, refY) {
        let best = null;
        let minDistSq = Infinity;
        for (const { x, y } of whitePixels) {
            const dx = x - refX;
            const dy = y - refY;
            const distSq = dx * dx + dy * dy;
            if (distSq < minDistSq) {
                minDistSq = distSq;
                best = { x, y };
            }
        }
        return best;
    }

    const topLeft     = findClosest(0, 0);
    const topRight    = findClosest(width - 1, 0);
    const bottomRight = findClosest(width - 1, height - 1);
    const bottomLeft  = findClosest(0, height - 1);

    if (topLeft && topRight && bottomRight && bottomLeft) {
        return [topLeft, topRight, bottomRight, bottomLeft];
    }

    console.warn("Failed to find closest white pixel for one or more corners.");
    return null;
}

many roads lead to… Rome?

object detection models might infer bounding boxes, or segmentation masks.

you could train one to predict the corner coordinates instead of just a bounding box. or you could have it infer an activation map with the corners being hot, but that could be tricky to group into detections if you have more than one card in view.

or train a model to infer the mask of the card, instead of its outline. that is a whole lot more robust.

I thought about training a model to predict the 4 corners. Way easier. But the lack of knowledge and time drove me away from this option for now :frowning: So I think I will stick with OpenCV to rework on the picture (revert the pixels, clean the edges etc.) and the manual prediction for now. Thanks again for the replies and ideas here.

I don’t know about javascript and I can only give you C-Code, but if your pic looks like that and you are able to remove all noisy pixels and javascript knows findNonZero, convexHull and approxPolyDP, you could do it like that.

 Mat img = imread("card.png",0);  //read the 38 black/white contours as 8Bit 1 channel image
 
 vector <Point> nonZeros,cHull,corners;  //output lists for points 

 findNonZero(img, nonZeros); //coordinates of all white pixels
 convexHull(nonZeros, cHull, false, true); //convex hull around that points
 approxPolyDP(cHull, corners, arcLength(cHull,true)*0.01, true); //reduce them 

 Mat out; //just for output
 cv::cvtColor(img, out, COLOR_GRAY2BGR); //color    
 cv::polylines(out, cHull, true, CV_RGB(0, 255, 255), 1,LINE_AA); //paint convex hull
 cv::polylines(out, corners, true, CV_RGB(255, 0, 0), 3,LINE_AA); //paint red rectancle

 imwrite("card_rect.png", out);

now you can warp them using warpPerspective to a real rectangle with fixed size. but single white pixels outside your card will kill this algorithm

and if you are looking for something within OpenCV to replace your findClosest function, check out the opencv functions norm and reduce (with parameters REDUCE_SUM2 or REDUCE_MIN) but maybe thats a bit too much for javascript.

Thanks for looking at this!
OpenCV.js sas norm, reduce, approxPolyDP and convexHull
But it’s missing warpPerspective and findNonZero

My end goal is definitely to get a fixed size rectangle at the end… So I will try crackwitz’s suggestion and try to build my own version of OpenCV with those functions enabled. No idea if it’s doable but the only wayt to know is to try! And then I will be able to try your code. Do you have the warpPerspective call to be added at the end?

yes, all roads lead to rome and why using a pair of tweezers if you can use a sledgehammer too!

i think i found something using findcontours working with your colored original picture. warpPerspective is just to make it look cool. if you want to implement it manually, it’s not simple math stuff.


   Mat BGR = imread("earth_elemental.jpg");  
   Mat I;
   cv::cvtColor(BGR, I, COLOR_BGR2GRAY);
   
   cv::equalizeHist(I, I); //if js doesn't know it, you will find something with playing around with thresholds and (HSV) color planes to create a workaround
   cv::threshold(I, I, 32, 255, THRESH_BINARY_INV); //separate black card border from bright background

   vector <vector <Point>> contours;
   cv::findContours(I, contours, RETR_EXTERNAL, CHAIN_APPROX_SIMPLE);
   
   bool found = false;
   if (contours.size())
       for(int i=0;i<contours.size()&&!found;i++)
           if (contourArea(contours[i])>100) //ignore small dots and only take contours with large areas (adjust the 100 if necessary)
           {
               vector <Point> cHull, corners;
               cv::convexHull(contours[i], cHull, true, true); //clockwise convex hull around that points
               approxPolyDP(cHull, corners, arcLength(cHull, true) * 0.01, true); //counterclockwise rectangle
               if (corners.size() == 4)
               {
                   found = true; //"corners" contains your rectangle, all other stuff is just output                    


                   //warp to card size
                   int width = 627, height = 873;
                   Mat out(height,width,CV_8UC3);//warped image                    

                   vector <Point2f> dst = {
                       Point(0,0),
                       Point(0,height - 1),                        
                       Point(width - 1,height-1),
                       Point(width - 1,0)
                   };
                   vector <Point2f> src; //Point -> Point2f
                   for (Point p : corners)
                       src.push_back(Point2f(p));

                   Mat M=getPerspectiveTransform(src, dst, DECOMP_SVD); //calc transformation matrix
                   warpPerspective(BGR, out, M, Size(width, height)); //apply 

                   drawContours(BGR, contours, i, CV_RGB(0, 255, 0), 3); //draw found card 

                   imwrite("warped.png", out);
                   imwrite("contour.png", BGR);
                   imshow("w", out);
                   imshow("c", BGR);
                   waitKey(0);
               }
           }


Oh, that looks very promizing!!! The only challenge is that not all cards have a black border… Some have a white one. Some have a gold one. Some have a silver one and some… don’t have any :-/

Here is another example:

The background is different, the orientation slightly different, but there is no border at all :-/

Here is what I’m able to get with the model prediction, the mask and the bruteforce Javascript approach, on the same previous background:
(in next post, new users can upload just one)

I have been able to rebuild OpenCV.js and add all the functions that were missing above! getPerspectiveTransform is already there too…

Here is the resulting ask. Notice the little white pixel on the right that was luckly far enough to not break the result. I might need to find a way to remove lonely pixels. Then I will need to apply the transformation.

(in next post, new users can upload just one)

It’s not THAT bad. Even if it’s not as good as your result. Do you think yours can work whatever the border is as long as the background is “plain”?

This is the mask

This is my other test:

You can remove single pixels using morphological filters. first erode the mask to remove single pixels and make the other lines thinner, then dilate it to make the remaining pixels bigger again.

But single pixels ain’t your problem. Check my code from above again:

if (contourArea(contours[i])>100) //ignore small dots and only take contours with large areas (adjust the 100 if necessary)           

this discards all found contours around single pixels. For example, this is how it looks like if I paint ALL found contours in dark red:

if single pixels are inside your card: don’t worry, the complex hull will eat them
if single pixels are outside your card: just ignore them by checking the size of your contour (contourArea, arcLength)

the different orientation is also not a problem, as long as you don’t create extreme situations as putting your card in a 45° degree. this would make it hard to find what corner is the upper left one and so on.

The ONLY problem you have is to separate the background from foreground. Your first table was just gray, this one is wooden brown. I think transforming the image to HSV color space using cvtColor, then use split and save or display the planes to analyze them. You can also do it with gimp . If you want to train something, the colors near image corners are mostly like background colours. If it’s a webcam, you might be also able to init your search with removing any cards and filming the pure background first.

Sorry for asking but I’ve lost the plot in the past years about DNN and “models” and OpenCV didn’t really made me smarter. I’m asking because I don’t have any idea.

  • how did you create that model?
  • what data did you use to train it
  • is there any comfortable way to create such a model or to let an AI create it?

people are talking about AI and models in these days like it’s as easy as putting fuel to a car to make it run and I feel a bit lost if they do

Ha, I was looking for something similar! But more around the idea of removing all “kernels” of pixls formed by less than 4 pixels. I’m wondering if I can do that with contour… Get the area, and if less than 4, turn it to black.. (Based on your contourArea(contours[i])>100 idea)

That’s super cool! That’s what I want to do to “delete” those spots!

That’s fine too. I don’t need to work in extreme situations. Since this is on a video stream (webcam), people will see it’s not recognized and will just have to rotate it better.

And that’s where the challenge is. Background can be absolutely anything! A plain color sheet of paper, a table, a carpet, a black plastic sheet, a wood table, AND… a playing mat with a picture on it! I can probably enforce the requirement to have something plain, instead of a mat, but it still gives infinite options.

Ha! I will be very happy to say more about that!

  1. how did you create that model?

I’m training a Segmentation YOLO model in Python. But this can’t be use on the browser side. So then I convert it into ONNX format and then into Tensor.JS format. Then only it is usable. The inference is pretty fast. Under 100ms in my laptop without a GPU.

  1. what data did you use to train it

I have generated my own dataset. I have taken pictures of cards in all the plain surfaces in my house. On the floor, on my desk, on the printer, on the kitchen counter, on the bathroom counter, on the washing machine, on my desk mat, on the bin top, etc. Every time a different card, and in each support multiple times, with different orientations and lightning. I didn’t need THAT many. I think did about 100 of them. Then using online numeric versions of other cards and pictures of background, I have generated similar pictures. Using those 100+100 I trained the model. And it works pretty well! On the python side I’m able to get those corners nicely! But the Tensor.js model doesn’t have all the outputs and it’s more challenging.

  1. is there any comfortable way to create such a model or to let an AI create it?

Writing the code and training the model didn’t take me more than a day! Converting the model into Tensor.js took me 3 days. Was a bit of a pain! Lot of python dependencies issues. Getting the inference working in the frontend and drawing the result on the page took me forever! 0 documentation, you are on your own trying all what you can! This took me more than a week to get something nice!

Here is the picture of what I’m able to get so far. The red box is the model prediction of the bounding box. The blue one is the javacript brute force trying to estimate the real corners from the masks. The masks are not showing. I just have them as a popup for debug purposes.

Speed of processing is important here. Because it’s a video stream. So I need to be able to capture that quickly. Then the quad is captured, I do a pHash calculation of the image. And I keep doing that non-stop. When there is a big change on the pHash it means the card has been changed. So it can be captured and then I wait for the next big change.

So long story short. Training the model was a piece of cake. Using it was a nightmare. I used close to no AI to write to code to generate. But I had to use lot of AI to use it.

Or a black bordered card on a black table? Or your unbordered fantasy painted “Lumra” with a shiny blue sky on your posted playing mat containing fantasy painted shiny blue stuff too? Absolutely anything also means a table with random colored tesselations, a reflecting table or even a playing mat being a collage of magic cards. I don’t know what AI is able to do but I could NOT do it.

I’m out if you put your Lumra on a black table, because it doesn’t have any border. It’s also difficult to separate on your wooden table because the waves arout the headline have nearly the same color as your wooden table. Above the "Bellow"s W the wooden table is scatched and nearly black

If it’s a videostream than maybe there are ways to determine the position of a card by comparing two or several frames. I think there are some object tracking functions and examples within OpenCV for foreground/background segmentation using video streams but I can’t help you here because I don’t know about them.

What? The last time I had to search some random perspectively transformed rectangle-like objects within photos with random backgrounds, it took me nearly a year.

I had to think about a preprocessing algorithm to get good contours and before starting my supervised training, I had to manually pick the desired object corners in all my photos to tell the neural network and other classificators, what a good contour is and what not. This alone was a pain in the butt. Of course I also had to transform the selected corners to equal sized feature vectors.

After having the feature vectors, I still had to try different classificators and to find good parameters for them and in the end, training the models with different parameters took also several days each.

But thanks, I’ll have a look on that yolo thing

That’s why I went the AI way. So far it has been working pretty well. Also, when a detection fails, I just label the picture and add it in the dataset, so the next iteration of the model becomes better. I have not tested on the fantasy mat. This will probably be VERY challenging. But all other surfaces have been detected pretty well.

Still, thanks for the pointer! I will try to find more about that approach and see if I can do it.

Ha, I also had to label all the pictures, which mean picking the 4 corners. For the pictures I took with my phone. The generated ones I was able to generate the quad too. But with the right tools, it takes only a few seconds per picture. This was a 10 minute job. I used label-studio on my local laptop for that. The output files are already vectors between 0 and 1 (% of the size). So nothing to be done, just ingesting them to the model. The fact that I’m using other models for other tasks helped me to quickly get this one working. I had to build the dataset with the right folders etc. This can be long when you don’t know, to figure what is the expected structure, but when you know, it takes no time. We are totally off topic :wink: But you can message me at any time if you need help to make a YOLO model working and I will be happy to help if I can.