Anti-aliased downscaling in opencv for ML preprocessing, is it possible?

I built an inference pipeline in opencv for one of my computer vision models. After I tested this pipeline I noticed some big differences in scoring compared to pytorch inference. I tracked the difference to the resize step. PIL and torchvision by default use downsampling algorithms with anti-aliasing, for opencv anti-aliased downsampling doesn’t seem to be possible. This only seems to be a big problem with models with a low resolution (my model is for 112x112), for larger resolutions the difference is a lot smaller.

I considered turning the anti-aliasing off in torchvision but this article (The dangers behind image resizing) indicates that it gives much better quality when downscaling the images. For now I’ll be turning anti-aliasing off. But now being able to use this in opencv means that for a lot of pretrained models inference will give wrong results.

So the question is, is anti-aliased downsampling possible? If not, is there a place I could add a feature request to add this?

1 Like

Also, I looked at the pytorch code doing the downsampling. Looks like it’s not quite as simple as just adding a filter before or after the resize: pytorch/aten/src/ATen/native/cpu/UpSampleKernel.cpp at f6ce94dca5732eb19e65e612c22b0b86aa1146b5 · pytorch/pytorch · GitHub

The anti-aliasing is used in multiple dependent steps of the algorithm.

I think you can post an issue and ask a new attribute in Image2blobParams

workaround: cv::resize() with INTER_AREA

I agree that this is worth a bug report because that image2blobParams thing claims to be able to resize images, so it should do that properly.

INTER_AREA actually also doesn’t do an anti-aliased resize. None of the resizing algorithms in opencv do. In a lot of cases the anti-aliasing seem to improve the resizing quality when you do a downscale. So this is an issue for blobFromImage and resize. Python PIL and torchvision do support it and do this by default.

what do you consider “anti-aliased”? technically, INTER_AREA does perform some form of anti-aliasing: its calculation causes all source pixels to contribute to the result. that means high frequency components are suppressed (to some extent).

if you want a fuzzy picture, apply a blur.

feel free to post a picture with desired size and then I’ll show you how INTER_AREA looks.

The link I posted earlier has some good examples regarding to the differences: The dangers behind image resizing .

Not sure if anti-aliasing is the right word. Both torchvision and PIL apply some sort of kernel filter between the resizing steps. For some of the downscaling examples this results in better results. This mainly results in big differences when you resize to a very small resolution. A model trained with torchvision (very common) won’t be compatible with opencv blobFromImage or resize when the model frame resolution is small due to this, gives very different results.

For some of my bigger models this hasn’t been an issue, but this model is smaller an needs to run more efficiently, so a smaller resolution was choosen.

That’s Fourier theory and it’s describe in wikipedia

IMHO Your link is not good why use lanczos bilinear or nearest for downsampling?
lanczos bilinear are used for upsampling

Thanks for the wikipedia link, didn’t realize the link between fourier theory and image resizing, seems a bit obvious is hindsight.

Since the antialiasing step is supposed an essential part of the downsampling I’m even more confused about what exactly is going on here:

  • The default resizing algorithms for python pillow & torchvision can’t be reproduced in opencv when downsampling to a small resolution, you can’t even get close
  • The difference seems to be the antialias parameter (Resize — Torchvision main documentation). If I set antialias to false I can get closer (though not pixel perfect).
  • To build an inference pipeline in opencv for a model, often it’s best to match the preprocessing done in training as closely as possible. Right now that’s not always possible.
  • It seems very unlikely that opencv skips such an essential (antialiasing) step in these algorithms.

So either:

  • These libraries just have different parameters for the anti-aliasing step
  • OpenCV skips anti-aliasing completely (unlikely? but the resizing results are a bit sharper/pixelated compared to pillow/torchvision’s resizing results)
  • OpenCV or torchvision/pillow is implementing something wrong

Going to dig into this a bit more.

the table in that article, specifically the verdicts, strike me as somewhat arbitrary. the TF Lanczos result looks identical to PIL Lanczos, yet the verdicts differ.

the authors also appear to hate INTER_AREA, for no good reason, except the image isn’t sufficiently blurry for their tastes.

I find it interesting that there are differences even for a filter like the nearest neighbor that is quite straightforward to implement

that also makes me question the thoroughness of their investigation.

the proper “trick” is to lowpass before decimation. according to the table they provide, PIL does that, or else the bilinear wouldn’t look that good.

you can get the same effect if you lowpass explicitly.

“anti-aliasing” is a goal or a phenomenon. one could consider it to be a class of operations to choose from. by itself, it is not an operation.

OpenCV’s resize() has no lowpassing in its oldest interpolation modes. some of these modes should NEVER lowpass, in principle, so that’s not even a fault in some of them (e.g. NEAREST).

resize() is an old API, with improvements over time, but those can’t just break with past behavior. any improvements likely show up as new(er) interpolation flags, so the old flags keep their behavior.

if you want to propose an issue to OpenCV’s github, suggest an OR-able entry to InterpolationFlags, something named like “lowpass before decimation”, which might cause behavior comparable to a gaussian lowpass before decimation. that only makes sense for downsampling.

I personally took the blogpost seriously because, if you see the discussion in the comments, the torchvision library authors actually made modifications to their library based on that post. In hindsight, many of the comparisons are a bit dubious, although they’re not wrong on all points. Thank you both for the clarifications and suggestions.

I looked at the source code for some of these resize algorithms implemented by torchvision, and in some paths/interpolation types they apply the filter multiple times in-between steps. So it’s not just one filter, otherwise I’d have done it myself as a preprocessing step.

It does matter to have the ability to reproduce the same preprocessing for models in opencv, if it wants to serve as a library used for ML. I can work around the issue for now, but it’d be nice to have the option in opencv as an option. I think I’ll put in a ticket.

May be it’s only separable filter

downsampling in time

1 Like

implementing in OpenCV, I would recommend repeated pyrDown() until close enough, then resize(). pyrDown does a lowpass, then plain nearest neighbor sampling, because that’s recommended by signal processing theory. its only downside is that it’s specifically optimized for 2x decimation, nothing else. it’s made for image pyramids. there are some pyramid schemes that need better than an octave per step, so for those one uses pyrDown, then calculates the finer steps from the octave steps.

IDK what steps they do in the code you investigated, or why their filtering would happen between steps

I agree. I was surprised that those DNN calls just use nearest by default. that’s not a good default for them.

1 Like