Try to detect the basketball, you will get a bounding box and will be able to calculate the center of this box. Then, calculate the distance between the center of your image and the center of the bounding box. If it is above a defined range, move the camera to the position of the ball but:
May be not efficient since the basketball is bouncing, with football it’s easier because the ball is just rolling over two axes
The ball may not be detected because of its size
Try to detect all the players and then calculate if the percentage of 10 is above a defined number. Like if 7 or 8 players are at a position, it probably means that the game takes place there. Then, with all the players boxes you should be able to calculate a global center of all there positions, and move the camera to this point but:
Again the players can be not detected since they somtimes overlap
If you chose this approach, I advise you to take only basketball players (the outfit) for the detection model, otherwise the public may be detected too.
The camera will need a large view or at least, not being zoomed into the players.
But your camera is only panning so it should be easier, and so the calcul position can be done only on the horizontal axis too
Also, you can combine the two ideas but they may need a large calculation power.
I also know that some datasets provides action models, like “hitting a ball”, “playing something” so it maybe an alternative of the player detection.
I am puzzled as to why you posed this as an either-or question.
it’s not a choice between two options.
the first is a fact.
the second is something you can do, if you need it.
the cartesian and polar representations of a vector represent the same vector. you decide what representation you need, and then you convert or you don’t.