Optimizing coordinates extraction time from video using OpenCV and MediaPipe

I am using MediaPipe and OpenCV in a Python backend server to extract human pose coordinates from the video sent through API payload. But it’s taking considerable time, for instance, it takes around 2 to 4 seconds for 1 second of video at 30/50 fps of 1080p resolution, 30 to 45 seconds for a 3 to 5-second video, and, around 2 minutes for videos of 15 seconds.
I have tried reducing the frame rate and splitting the video into multiple parts but the overall time required for the coordinate extraction didn’t improve. Any suggestion regarding how this time for coordination extraction can be optimized? Also is there any other faster alternative within the OpenCV compared to the MediaPipe for coordinate extraction?

crosspost: