Get 3d world coordinates given 2D pixel coordinates and a reference depth

Hi there,
I am new to computer vision and am working on a project where I need to estimate the depth of a bus user from a camera. I have been able to successfully detect user faces and have got their pixel coordinates. I now want to get the depth in 3d of the image point in world frame.

I have measurements for the bus environment and they are constant for all images. As shown in the attached image if I have a reference length L and given the walls are normal to the floor, can I estimate the depth of a pixel coordinate? If so what method would you suggest?

Thank you for your help!