Narrow fov dynamic cutout in wide fov image

Hi.

I have two different cameras, that i am getting a continious video feed from. The first camera is giving me a 320x320 pixel image, and has a very narrow fov (can see 50x50 cm at a couple of meters). The other camera is an Intel Realsense (1280x720), and has a regular fov. The realsense is mounted ontop of the narrow fov camera so they observe the same scene.

To help me aim the narrow fov camera i want to overlay the narrow fov camera in the intel realsense camera stream. This would be simple enough at a fixed distance looking at a flat surface. But as this system will need to move, the size of the narrow fov cutout needs to dynamically change size depending on how far away it is seeing.

I was thinking i could take the two image streams, do some feature-recognition to find the transformation between the images, and also use the depth information i get from my intel realsense camera, to essentially see how far my narrow fov camera is seeing, and use this to adjust the size and angle of the cutout.

I am unsure where to begin, and can find no examples of anyone doing something similar.