Construct 360 degrees field of view panorama to determine obstructions in the sky

Background

I’m working on a mobile app to scan for obstructions in the sky. It is very similar to what Starlink has built for their mobile app - see video.

It does so by presenting a live camera feed that instructs the user to scan a 360 degrees field of view facing the sky. The camera feed will be augmented with green spherical markers to denote the full range of the field of view to be scanned. During the scan, the user attempts to “collect” all the markers so that the desired field of view will be scanned in its entirety. Once the scan is complete, the app will present a 3D hemisphere denoting the field of view with the detected obstructions being overlayed as a texture map over the 3D mesh.

This is my high level understanding of the implementation strategy:

  1. Use RealityKit (for iOS) or ARCore (for Android) to build a live camera feed augmented with virtual spherical markers anchored to the desired field of view in the sky.
  2. When the user orients their camera to point to the field of view, there would be logic to detect if the markers lie within the mobile device viewport. And if so, the markers will vanish and at the same time, images of the current camera view will be captured.
  3. Once all markers have been scanned, perform image stitching on the captured images using the OpenCV library and complex computer vision math (homography, rotation matrix, etc.) to generate a 360 degrees panorama.
  4. The panorama will be processed further by OpenCV or image segmentation ML libraries to detect non-sky objects (i.e. obstructions). The sky will be colored as blue and obstructions will be colored as red.
  5. Use the output from Step 4 as a texture map to overlay on a 3D mesh. The 3D mesh will be shaped like a hemisphere depicting the actual field of view that was scanned.

I am stuck at Step 3 - generating a 360 degrees panorama from the captured images, and partially stuck at Step 2 - because the image capture algorithm influences the quality of images and therefore the ease of generating the panorama in Step 3.

The problem

The difficulty in Step 3 comes from the fact that when scanning for obstructions in the sky, a true positive outcome would be a clear view of the sky with nothing at all (i.e. a monochrome blue background). This means that the captured images will be featureless and so you cannot apply traditional OpenCV feature matching algorithms like ORB to align images together. You have to write custom code involving complex computer vision concepts to construct the panorama via other means:

Attempt 1: Featureless image stitching using relative rotation

For each captured image, save its camera parameters (i.e., Euler angles, camera intrinsics), and use that to calculate its relative rotation matrix w.r.t. the previous image, and perform a rotation warp (fisheye/spherical).

Result: https://i.stack.imgur.com/m0MCP.png

Code:

    # "Estimate" camera params with the given metadata for all images.
    def estimate(self, metadata):
        cameras = []
        for i in range(len(metadata)):
            # For the first image, use the last image in the list as the reference point
            metadata1 = metadata[i - 1]
            metadata2 = metadata[i]

            # Camera intrinsics for image 2 (transposed because ARKit uses column-first matrices)
            K2 = np.array(metadata1['intrinsics'][0], dtype=np.float32).transpose()

            # Convert Euler angles to rotation matrix for image 1 and 2
            R1 = self.euler_to_rotation_matrix(metadata1['eulerAngleX'], metadata1['eulerAngleY'], metadata1['eulerAngleZ'])
            R2 = self.euler_to_rotation_matrix(metadata2['eulerAngleX'], metadata2['eulerAngleY'], metadata2['eulerAngleZ'])

            # Relative rotation matrix calculation using rotation matrices for image 1 and 2
            R = R1 @ np.linalg.inv(R2)

            camera = CameraParams(R, K2)
            cameras.append(camera)

        for cam in cameras:
            cam.R = cam.R.astype(np.float32)
        return cameras

    # Warp image with the camera params. I use warper_type = 'fisheye' here
    def warp_image(self, img, camera, aspect=1):
        warper = cv.PyRotationWarper(self.warper_type, self.scale * aspect)
        _, warped_image = warper.warp(
            img,
            Warper.get_K(camera, aspect),
            camera.R,
            cv.INTER_LINEAR,
            cv.BORDER_REFLECT,
        )
        return warped_image

As you can see, the shape of the panorama varies too much depending on the captured images. Also, there are too many gaps in the panorama such that I don’t think it’s feasible to use it as a basis to detect obstructions.

Attempt 2: Adding pseudo features to enable feature matching

Use the spherical markers in the field of view as features. Color them differently by regions in the field of view so that you can stitch overlapping images easily if they share the same colored markers.

That was the theory but I never made significant headway in this direction because for some reason, I wasn’t able to align at least two images together using stitching_detailed.py and OpenStitching. I dropped this idea in the end.

My questions

  1. Am I on the right track with my high level implementation strategy?
  2. Are there any obvious (or non-obvious) simplifiers that I’m missing when it comes to stitching the images together? In other words, are there any dead simple solutions that don’t require complicated computer vision concepts and still do a decent job to detect obstructions in the sky?
  3. To all the CV/AR experts out there, how would you tackle this image stitching problem?

surely those offer ways to capture a panorama, right? they already do all the work required for it.

crosspost:

That was my intuition as well.

I did quite a bit of research and prototyping with RealityKit before opting to use OpenCV to generate panoramas. However, it seems that there are no obvious ways (to me at least) to generate panoramas out of the box.

From what I know, AR libraries (RealityKit/ARCore) gives you Euler angles and camera intrinsics for each AR frame. It is up to the AR library user to construct the panorama by hand using the known camera parameters.

Maybe my Google/ChatGPT-fu is not that strong :slight_smile: If you think there’s some dead simple solution that I’ve missed in AR land, please let me know.

or you could make the system just run whatever panorama app is on there, take the picture with an interface the user might already know, and then give you the picture…

I’m afraid we can’t redirect the user to any 3P panorama app; everything has to be done within our app for the best customer experience.

surely those offer ways to capture a panorama, right? they already do all the work required for it.

After thinking about this suggestion a bit more, I realized you can obtain camera extrinsics (rotation, translation) and intrinsics (focal length, principal point) for each captured image via ARKit.

I think that simplifies the problem a bit. My understanding is that I have to do the following:

  1. For each image, warp it to cylindrical coordinates
  2. Compute homography between the current image and the base image using known camera extrinsics and intrinsics
  3. Warp the cylindrical image to the base image using the computed homography
  4. Stitch the warped cylindrical image with the base image

I have detailed my solution here: ios - Compute homography given two ARKit camera poses/transforms - Stack Overflow

However, I don’t think the resulting image looks anywhere close to a panorama. Am I doing something entirely wrong? Would appreciate any advice here.

Hey uohzxela!
I am trying to achieve something similar with my app (balkonsonne.app) and thought of this problem, too.
What I would do is to drop the featureless clear sky images in the stitching process.
In the end you only want to know if there is an obstruction or not, right? So if the image cannot be stitched it should contain now obstructions and you can paint its field of view blue in the postprocessing. What do you think?