I am working on a master’s thesis project -an augmented-reality training system for eight-ball pool. The goal is to detect the real pool-table state from an overhead phone camera, map balls and pockets into table-plane coordinates, and send that state to a Meta Quest 3 Unity application for 3D visualization.
I am currently stuck on live pool-pocket detection and stable table registration.
I would like to reliable compute six-pocket positions in table-plane coordinates. A maximum error of about 0.4 cm / 4 mm would be acceptable. I do not need perfect billiards physics on the computer vision side of the project. I first need stable 2D-to-table-plane mapping.
The Quest app uses a JSON configuration to model the table/environment. The PC-side Python/OpenCV pipeline is the source of truth. Quest only receives and visualizes the computed state.
Setup:
Camera: iPhone 16 Pro Max main camera
Sensor: reported Sony IMX903
iOS: 18.7.7
Capture software: DroidCam
Capture mode: configured as no/low compression as far as I can tell
Camera mounting: as close as possible to directly above the table center, but it is an improvised solo setup, so it is not perfectly centered or perfectly perpendicular
Camera height: 2735 mm from the floor
Table playfield height: 810 mm
Table playfield size: 2450 x 1225 mm
Ball diameter: 57.15 mm
Pipeline: Python 3.12, OpenCV, NumPy, PyTorch, YOLOv5
GPU: mobile RTX 4070, 8 GB VRAM, CUDA acceleration
Runtime speed: usually around 15–20 FPS
I am using pix2pockets as the base detector/reference project.
The reported accuracy there is around 4 mm, not including wrong measurements.
Current status of my pipeline:
Ball detection and classification mostly work. There are some duplicate detections and some misclassifications, but I plan to handle those in Unity/Quest with a user correction or conflict-resolution interface. My current blocker is not YOLO ball detection.
The blockers are:
- Detecting the table reliably.
- Computing a valid homography.
- Detecting or placing the six pockets correctly.
- Mapping all detections into stable table-plane coordinates.
On static images, I can get the result I want. I can also get success on recorded videos. The real problem is live phone capture. Slight disturbances such as shadows, small lighting changes, people walking near the table, or small frame-to-frame changes can break the homography or pocket calculation.
Sometimes a valid homography is found for a moment, but pocket detection does not lock/stabilize reliably.
Important detail:
My --debug-phone mode uses the same main live-detection pipeline as the normal phone mode, except that it runs offline without requiring the Quest receiver. The static-image debug mode is easier and more stable, but it does not expose the live capture instability.
The relevant table configuration is here:
The values I currently have are approximately:
The files associated with last_environment.json use green table cloth.
Repository/code:
Current project state:
The relevant directory is:
PoolSimulatorComponents/CameraAnalysis
Testing files are here:
Here are the commands I use for development purposes and debugging issues.
Static ball detection:
python detection.py --debug-detection --debug --debug-static --debug-offline
This runs static-image ball detection. The --debug-offline flag means no Quest 3 receiver is required. The image or folder can be passed with --debug-image, for example:
python detection.py --debug-detection --debug --debug-static --debug-offline --debug-image “/path/to/image-or-folder”
Static pocket visualization:
python detection.py --debug-pocket-display --debug --debug-static --debug-offline
This uses the static-image input path and visualizes detected/calculated pocket locations.
Static cue-stick visualization:
python detection.py --debug-cue --debug --debug-static --debug-offline
This visualizes the cue-stick detection, but this is not my current priority because I still do not have stable table-plane coordinates.
Recorded video input:
python detection.py --debug --debug-recorded --debug-video “/path/to/video-or-folder” --debug-offline
This replaces static-image input with recorded-video input.
Live phone input:
python detection.py --debug-pocket-display --debug --debug-phone --debug-offline
This connects to DroidCam and uses the live phone stream. This is where the instability appears.
Things I am considering:
- Use normalized relative table coordinates instead of absolute image coordinates.
- Add a QR marker as a stable origin, probably near one corner or outside the playfield.
- Use a QR marker as (0, 0) and use its rotation to straighten the table coordinate system.
- Use known camera height and measured table dimensions to constrain the homography.
- Stop trying to detect pocket holes directly and instead detect the table boundary/rails, then place pockets from known geometry.
- Detect pocket candidates visually, but accept them only if their distances match expected table geometry.
- Use temporal median filtering instead of frame-to-frame pocket locking.
- Save every attempted pocket detection frame with overlays showing detected pockets, expected pockets, homography determinant, and pixel/mm error.
Main questions:
What would be the correct robust computer-vision approach here?
Should I:
detect the playfield rectangle and derive the pocket positions from known geometry,
detect actual pocket openings visually,
use fiducial markers,
use QR calibration around the table,
or combine expected table geometry with local visual refinement?
How would you make this robust enough for live phone capture where the camera is fixed but not perfectly centered or perpendicular, and where lighting/shadows are not fully controlled?
Any advice about missing dependencies or assumptions would also help, especially around:
- HSV masking,
- homography validation,
- temporal filtering,
- table-edge detection,
- pocket geometry validation,
- marker placement,
whether 4 mm accuracy is realistic in this setup.
Below you’ll find the detection I already make on static image(s). (Do not mind the not responding screen.)
Thanks.
