As

“The distortion-free projective transformation given by a pinhole camera model is shown below. sp=A[R|t]Pw, where … and s is the projective transformation’s arbitrary scaling and not part of the camera model.” written on opencv tutorials

I am confused about which step s is used in? That is, which transformation does the projective transformation happen in?

it is *discarded*, not used.

if you have a 2D point (x,y), then in the corresponding projective space (3D), this is represented by (x,y,1), but also all (x,y,1) \cdot s for any nonzero s

you encounter this after matrix multiplication. you normalize (“dehomogenize”) the result by dividing by the last coordinate.

(x,y,1) = (x s, y s, s) / s

I get it. Thank you very much.