Is Pnp on moving object possible?

I am working on a control vision project in robotics
I have a 6DOF robot and a calibrated camera mounted on its TCP (both intrinsic and extrinsic calibrations are correct). So i can have the position of the camera in worlds frame (robot frame) at al times.
I am trying to maintain perspective between my camera and the object my robot is inspecting (not in real time)
and I’m trying to do it by using only a 2d camera and no ARUCO markers as i used to do in the past
So i figured that i would do the following:
In a first scenario 1:

  • I would use 2 cam perspectives to take 2 distinct images
  • I extract keypoints or features using Xfeat or other
  • I match them
  • And I triangulate (using cv.triangulatePoints) to get some 3d points (expressed in camera frame)
    This way i have generated some metric data of my object. I have achieved this in a simulated environment.

And in a 2nd scenario number 2:

  • I move my object
  • I put my camera on the perspectives of scenario 1
  • I capture and image + extract feature +match them
  • I use pnp to obtain a transformation of the camera in object frame (I haven’t simulated this yet)

And here is my question:

Since my camera hasn’t moved: Can I use the inverse of the result of pnp and apply it to my camera pose to find a new camera pose that preserves the perspective of scenario 1?
My intuition tells me that I can but I’m not sure. The results are somewhat good having in consideration my Xfeat is not very good at doing its job.

And an even further question assuming my 1st intuition is true:

What if in my scenario number 2 i need to move my camera with respect to scenario 1
meaning both my camera and object have moved simultaneously
since i have the robot I can know the camera translation between scenarios
Can i still calculate my new camera pose to preserve the perspective with respect to the object? If so, how?
I know that there is some ambiguity here since the pnp algorithm cant tell if the change of perspective is been generated by the camera movement of by the object movement. But maybe this ambiguity can be solved since I know the camera movement. **
**
I know that I could just repeat what I did on the 1st scenario to obtain another set of 3d points and try to match them using some point cloud registration algorithm like ICP or other. But I would like to avoid that if possible and do what I want only using PNP.

This is my 1st topic so I don’t know if went too far

Anyway. Thank you all in advance.

I can’t address your questions off the cuff. Let me just drop the term “visual servoing”.