Hey everyone! I want to fine-tune a transformer model to learn how to predict depth from line drawings only. I noticed that DPT (https://arxiv.org/pdf/2103.13413.pdf) does a pretty decent job at predicting depth of line drawings out of the box, particularly when there is a lot of lines dealing with perspective.
I want to see how well this model can perform if it is tuned to work on only line drawings.
To do this, my thought is to take the same dataset they use in the paper,
( NYU Depth dataset NYU Depth V2 « Nathan Silberman )
and do something like canny edge detection on the raw images.
But there are a few problems. I don’t want just normal lines, I would love to have dark lines on objects close to the camera and light lines on objects further away. This is typically how lines are drawn by humans to designate closeness in a line drawing.
Since these images have depth maps coupled to them, I figure it may be possible to do this. I am not particularly sure how though.
Does anyone know of an algorithm that could change edge detection line thickness based on a depth map?
use edge map as a mask onto the depth map.
a multiplication, if taken mathematically. or numpy indexing using a mask.
that’s unlikely to work (esp. using canny…), given the images are that cluttered.
maybe you take a look at the matlab script, that generated the rightmost, “labels image” ?
also, this network can do segmentation, the object borders from there would give much more consistant borders
Thanks everyone. So do you think I should apply canny to the raw image or the semantic segmentation output? I like the idea of using the depth map as a multiplier to bold the lines.
no (read again, please)
IF you need “object consistent” edges (idk !),
try to derive them from label information or segmentation
ELSE (if you only need “some lines”) at least use something else instead of canny
(which has some tendency to produce “inner and outer” edges)
not sure if that is a good idea at all.
you need to process your data in the very same way for training & inference later
and how would you get that information then ?
(depth is what you’re trying to find out, right ?)
again, this is machine learning, not human perception
Do you know any line producing algorithms that look more like they are drawn than Canny?
As for the data processing, I intend to just convert the colored raw image dataset into line drawing dataset. So, rather than pairing a raw image with a depth map, it would pair a line drawing with the depth map. But the depth map would be unchanged.
i dont understand “pairing” here.
the depth map is the network output, the inference result, no ?
The output of the network should be a depth map, but the ground truth label is a depth map too. So every line image is paired with a ground truth depth map. The model needs to learn how to recreate the depth map using the line drawing.
And thanks! I’ll check those alternatives out.