if you have a depth camera, you ALREADY have the 3d coordinates of (almost) every pixel. consult the documentation for your camera as to what the values in the depth image mean and how to calculate cartesian coordinates from these.
if and only if you have a disparity map (that’s not a depth map!), look at reprojectImageTo3D
, which is a method that’s used on disparity maps to produce coordinates for every pixel.
if you wanted to do SLAM, you’d fuse point clouds from multiple frames over time. one algorithm for that is called “iterative closest point”. it aligns two clouds.
SLAM is a lot more complex than I care to explain here. you’ll need to read up on it. I can tell you that solvePnPRansac
is absolutely the wrong thing, it doesn’t apply at all here. it will not solve your problem.