Turning a flat sensor into a semi-spherical one

Hi, this is a bit of a wierd question but I have an image processing algorithm that works only with sensors in which each pixel has a uniform receptive field (e.g not flat sensors, as the pixels at the edge of the sensor cover a larger area of light entering than at the center).

Diagram below:

I have two main questions:

  1. Would mathematically correcting for the flat sensor by projecting the non-uniform pixels onto a new image in which the resolution at the edges of the image are lower than at the center be mostly equivalent to having a circular sensor?

  2. Would this be mostly equivalent to having multiple cameras mounted on a convex semi-sphere (like the compound eye of an insect).

As far as I can tell, the answer to both questions is yes, but wondering i’m what other people think.

Additionally, does calibrating a sensor correct for a flat sensor, or only for distortions produced by the lens? Surely even perfect pinhole cameras have this kind of distortion.

1 Like

yes and yes.

your sketches show a 2d situation where this is simple.

to keep this simplicity in 3D, perhaps consider a cylindrical projection. bending a flat square grid of pixels into a cylinder is easy.

if you wanted the sensor to have a spherical surface… it’s somewhat tricky to map a square grid onto a sphere. a flat projection surface (flat sensor you started with) is a perfectly valid type of projection.

the normal “lens distortion” model models lens radial and tangential distortions. fundamentally the optical axis plays a “central” role here. if your sensor happens to be shaped/warped symmetrically and it fits the polynomials that model lens distortion, the model won’t know the difference.

opencv’s stitching module comes with a bunch of projection types. they’ll make sense to you.

1 Like

Hi, thanks for replying.

So i could project onto a cylinder but this would only correct for distortion in one (e.g horizontal) direction?

As for distortion, are you saying that the distortion produced by using a flat sensor is corrected for by calibration anyway as calibration seeks to remove all distortion? I see radial distortion is similar to the distortion introduced by flat sensors, but I thought it corrects for curved lines, whereas the distortion from flat sensors is a (i think) projective transformation.

Ill check out the stitching module!

correct.

I can neither confirm nor deny that. because I don’t understand your statement. I don’t see where this is going. perhaps let’s talk about that instead, where this is supposed to go.

flat sensors don’t “distort” as such. they map just fine.

examples of lens distortion would be barrel-shaped and pillow-shaped distortion.

what’s the goal?

1 Like

Ahh apologies, I think i’m abusing the term distortion.

I need a precise measure of optical flow around an agent, so I need to make sure that all angles of light that enter the camera are treated uniformly by the sensor. Quite rightly, this is not distortion but instead an unfavourable projection that means objects on the sensor would appear to ‘move faster’ towards the edges. So my goal is to map perfectly the light that enters the camera at all angles, like the eye of an insect does.

Currently, i’m actually dealing with a virual camera so there is no lens distortion, only the issues from projecting onto a plane.

“an agent”

have no fear of being very very very overly specific. in academic circles, abstraction is sold as a virtue when it’s really a vice, a sin, a crime, especially inflicted upon academics. value examples over definitions, always. examples are cheaper to understand than a definition.

I haven’t seen signs of understanding for some points so I’ll review the whole thing and swing the mallet harder. my goal is that this all makes sense to you and your questions aren’t merely answered but disappear entirely because your model of the world has changed to make them superfluous.

in a computer you want to deal with flat things on a square grid.

everything that isn’t flat has to be mapped/projected to a flat thing. cylinders aren’t flat but they’re trivial to map (flat sheets bend). spheres aren’t flat, and they are not trivial to map.

a map is not reality. it’s allowed to have downsides. you can compensate for these downsides. you use a map because it has upsides, a common one being simplicity (a flat square grid of pixels is very simple to handle).

assuming you really really need a complete sphere mapped, you’ll have to do some calculations to turn distances and velocities on the map into angles and rates of rotation on the sphere.

I’d suggest an Equirectangular projection - Wikipedia, if you get there. coordinates on it directly map to angles on the sphere by nothing but a factor.

as long as you have a single normal camera, you can do this:

(2) you calculate the optical flow (in pixels of difference) on the picture.

(1) you calculate the correction map to that, based on the angle between the optical axis and the ray going through the pixel, because that angle says how something moving near that ray is projected near that pixel. these coefficients are static and they’re factors, so you can do this once, before you do anything else. this correction map, for any pixel position, converts pixel distances into angles or rates of rotation.

(3) you correct the optical flow using that map. division in the following example, or calculate inverses to get to use multiplication.

the exact math involves some trigonometry and some derivatives. I’ll show you difference quotients first because they’re easier to visualize… “eps” shall represent something moving a little bit (the optical flow).

at the center of the map (angle zero), you’d have a factor of 1 because

tan(eps) ~ eps

>>> a = 0 * pi; eps = 1e-8; (tan(a+eps) - tan(a-eps)) / (2*eps)
1.0

further away from the center you’d get larger factors because there the same angle difference moves farther:

>>> a = 1/4 * pi; eps = 1e-8; (tan(a+eps) - tan(a-eps)) / (2*eps)
2.0000000156006337

this difference quotient represents a derivative:

d/dx tan(x) = 1 / cos(x)^2

the calculation becomes:

>>> a = 1/4 * pi; 1 / cos(a)**2
1.9999999999999996

now you just need to know for every pixel what angle a ray through it has to the optical axis. you know the field of view (FoV) of your camera because you calibrated it.

equation from the camera matrix for horizontal FoV: a ray on the right edge of the view (hfov/2) is mapped to the right edge of the picture (usually, cx = width/2)

tan(hfov / 2) * fx + cx = width
| cx = width/2
tan(hfov / 2) * fx = width/2

equations get simpler if you first subtract the optical center (cx,cy) from pixel coordinates.

tan(x_angle / 2) * fx + cx = x
x_angle = arctan((x-cx) / fx) * 2

feel free to investigate whether you can separate these calculations into x and y direction on the picture, or whether you have to do anything more complicated. since camera sensors have a square grid, usually fx = fy is a fair assumption, so that makes things simpler.

1 Like

This is great, thankyou so much.

Firstly, i realize the diagram in my post is incorrect, the receptive field of pixels actually gets smaller towards the sensor edges, the objects/optical flow gets larger.

Apologies, ill be a bit more specific. I’m currently studying this paper here about optical flow obstacle avoidance in insects. I’m building an autonomous drone and this is the system I hope to implement for lower level navigation. I have two infrared cameras that I want to immitate insect compound eyes.

Should specify however that ill only be dealing with semi spheres, which might simplify things, as the cameras will all be much less than 180deg in fov, around 120 deg infact.

I’ve spent the last day thinking about projections and this problem. I coded up a quick snippet that takes images and makes the pixels all have a uniform size in the horizontal direction, heres the result for that (tested in minecraft as that’s super easy to play around with fov):

You can see it really fixes some of the stretched image towards the edges, but the vertical mapping is unchanged causing bending of straight lines.

It works (crudely) by comparing the size of each pixel to the ‘base’ pixel size at the center, using inverse tan per input image pixel to calculate how many degrees it ‘sees’ compared to the base pixel. It then places them sequentially taking into account their new size, producing a new pixel position. This leaves the pixels in the center of the image in place but moves the outer ones inward.

I think your method for optical flow may be similar? I’ve spent some time going through it, though i’m having trouble understanding it perfectly, from what I understand, does it calculate how much ‘movement’ (d(tan)/dx) on a flat sensor each degree produces, then uses that to compensate the pixel size? or does it calculate the angle for each pixel, giving circular/spherical coordinates (which is probably more flexible)? I’m just having a little trouble tying it in with “this correction map, for any pixel position, converts pixel distances into angles or rates of rotation.” and how this is applied to the final image.

One thought i’ve had is whether it would be better to calculate optical flow after reprojecting the image, as the methods i’ve used for optical flow tend to become inaccurate at high speeds near the edges of images where objects are moving faster across the sensor.

Also, looking around online, it appears like a Stereographic projection might be what i’m looking for, as its setup looks very similar to the problem. What do you think of this?

that paper, Fig 5, uses an equirectangular projection. axes are linear in degrees.

assuming I understand your question, that’s a yes, but I feel an urge to split hairs.

1 Like

Feel free to split hairs, I’m new to CV and projections etc.

After looking further into this, this is what they mention:

The output of these filters formed the input to the photoreceptors that were equally spaced at 2° along the elevation and azimuth of the eye. The array of photoreceptors formed a rectangular grid in the cylindrical projection with 91rows and 181columns.

So i’m not clear if it is an equirectangular projection (correct me if i’m wrong), as doesn’t that project onto a cylinder? It seems the best thing would be an gnomonic projection. Here, i would do something like turn pixel coordiantes into spherical coordiantes (azimuth and elevation), from a point at the height of the focal length, then just map each degree to a new image (this would be rectangular, though this method, and the paper, don’t perfectly imitate insect eyes as i think they evenly tesellate a sphere with their ommatidia, where as the density of resolution increases greatly near the poles with the papers and my method).

There is a paper from Brian Stonelake available here explaining about this projection, and presents the formula:

With a bit of tweaking, i can easily just use this to sample from my input image. Precomputing this, the main bottleneck would just be accessing the memory.

their description and their graphics describe/show an equirectangular projection.

that’s not a cylindrical projection. they made a mistake in labeling it as such.

I won’t even try to understand those equations you presented, or why you’d even consider projecting onto polyhedra. I see no reason at all to complicate things to that degree.

you aren’t trying to emulate an actual physical projection. equirectangular is physically impossible but mathematically very simple.

you have regular pictures and you want to calculate things on them. that is the whole goal.

you know those 360 degree panoramic pictures, as seen on google maps and elsewhere? they’re almost always made from equirectangular data. the devices that produce these pictures consist of multiple cameras, and software maps each view into an equirectangular whole.

1 Like

Hmm, okay i’m probably more confused at that paper than anything haha.

I now realize I was actually thinking of an equirecetangular projection the whole time. I even wrote some code thinking “This gnomonic projection works really well”, not realizing it was actually equirectangular.

The python code below takes an image, calculates the angule in azimuth and elevation from the focal axis then simply distributes the angles evenly into a grid of pixels, the result is shown below:

And the python code:

import cv2
import numpy as np
import math

## PARAMETERS ##

pixelDegDensity = 6 # How many pixels you get in the output image per degree
fov = math.radians(115) # Horizontal FOV
img = cv2.imread('Screenshot_16.png')

## WORKINGS ##

cv2.imshow("Main", img)

height, width, _ = img.shape
outWidth = round(math.degrees(fov) * pixelDegDensity)
outHeight = round(outWidth * 9/16)
newImg = np.zeros((outHeight, outWidth, 3), dtype=np.uint8)
focalLength = (width/2) / math.tan(fov/2) #in pixel units

mpx = round(width/2)
mpy = round(height/2)
outMpx = round(outWidth/2)
outMpy = round(outHeight/2)

## FOR EVERY DEG INCREMENT, FIND THE X AND Y IT INSECTS IN THE IMAGE, THEN PLACE THAT IN A NEW IMAGE ##

for outX in range(outWidth):
    az = (outX - outMpx) / pixelDegDensity
    x = round(focalLength * math.tan(math.radians(az)))
    for outY in range(outHeight):
        el = (outY - outMpy) / pixelDegDensity 
        y = round(math.sqrt(focalLength ** 2 + x **2) * math.tan(math.radians(el)))
        for c in range(0, 3):
            if -mpx < x < mpx and -mpy < y < mpy:
                newImg[outY, outX, c] = img[y + mpy, x + mpx, c]

cv2.imshow("image Output", newImg)

Slightly messy, and the mapping could be precalculated.

I realize in the end you can’t have your cake and eat it. Either you can perfectly replicate the eyes an animal, but not have a rectangular image which works with classical algorithms, or you make a compromise, where each pixel doesn’t correspond to an equal receptive area.

Thank you so much for helping and your patience haha, I think this is solved for now.

EDIT: After even further thinking, i’ve realized what i’m doing is two steps:

  1. Inverse gnomonic projection to get the angle that each ray crossed the focal axis
  2. Equirectangular projection (which is trivial as it is just simply placing each pixel value for each degree in an image).