How to distinguish exterior wall outline lines from hatch, dimension, and view-crop lines in architectural elevation PDFs?

Hi everyone,

I’m building a construction plan takeoff pipeline for architectural drawings. I’m using Python and OpenCV to analyze architectural elevation PDFs that may be rasterized or converted to images.

The specific problem:

I can detect many lines from an exterior elevation, but the detected geometry includes different types of lines mixed together:

  • exterior wall outline / facade boundary lines
  • hatch or siding pattern lines
  • dimension extension lines
  • opening lines around windows and doors
  • view crop / title / annotation lines
  • roof/gable/context lines

I need a generic way to distinguish the main exterior wall/facade boundary from these other line types. I do not want to hardcode this for one specific PDF, sheet name, page number, address, or drawing style. The goal is to work across many architectural/construction drawing sets.

My current idea is to score line candidates using features such as:

  • line length
  • stroke width or apparent line weight
  • proximity to text/dimensions
  • whether the line is part of a connected contour
  • whether it forms a closed or nearly closed polygon
  • relationship to openings/windows/doors
  • orientation: horizontal/vertical/roof slope
  • repeated short parallel lines that may indicate hatch or siding
  • distance from view titles or dimension strings

Question:

For architectural elevation images, what OpenCV features or pipeline would you recommend to separate true exterior facade boundary lines from hatch lines, dimension lines, annotation lines, and opening lines?

Would you approach this with:

  1. morphological operations,
  2. connected components,
  3. contour hierarchy,
  4. Hough line detection,
  5. skeletonization,
  6. graph-based scoring,
  7. stroke width / line weight estimation,
  8. or a combination of these?

I’m especially interested in a robust, general approach rather than a one-off solution for a single drawing.

I can share a redacted cropped image and an overlay showing the detected lines if that helps.

Thank you!

How I would approach this? Not visually.

Whatever company made the program that made the PDF, tell them to embed the architectural source data into the PDF. Or work with the architectural source file itself. You should not scrape the data visually.

CV both is and isn’t magic. It’s “magic” (a marvel) to those who don’t know how it works. It’s “magic” (a “magic wand”) to those who are given it without having to engineer it. It’s not “magic” because it takes effort/engineering to make it work, and for many purposes it simply doesn’t work satisfactorily, whereas direct solutions do.

Mostly, it’s hard. And it should be a measure of last resort. Exhaust all other options. People often forget there are simpler solutions.

And don’t think you can just ask for a complete robust solution. That is what someone asks who’s looking to turn it into a product, make money from it.

If you’re set on “magic solutions”, you would need to hire someone who will train a neural network for the task.

1 Like