Measure pan, rotate and resize between frames

I need to determine the above values between consecutive frames.
I know about identifying reference points,
and also about optical flow (which doesn’t accounts rotation and zooming).
I think this should be a common task, but couldn’t find anything about it.
(Maybe the wrong keywords?)
Any hint welcome!
Simplifying assumption: The content itself is constant. The manipulations
affects it at the whole.

Follow this video stabilization via the optical flow calculation example:

import numpy as np
import cv2
from affineTransformTools import getTranslationX, getTranslationY, getRotation, getAffineTransform

video = cv2.VideoCapture(‘…/record2.avi’)
fps = video.get(cv2.CAP_PROP_FPS)
hasFrame, frame =
prev_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

out = cv2.VideoWriter()‘…/record3.avi’,cv2.VideoWriter_fourcc(‘M’,‘J’,‘P’,‘G’),fps,(frame.shape[1],frame.shape[0]))

x = 0.0
y = 0.0
f = 0.0

while True:

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

prev_points = cv2.goodFeaturesToTrack(prev_gray,maxCorners=200,qualityLevel=0.01,minDistance=30,blockSize=3)

points, status, err = cv2.calcOpticalFlowPyrLK(prev_gray, gray, prev_points, None) 

indices = np.where(status==1)[0]
warp_matrix, _ = cv2.estimateAffinePartial2D(prev_points[indices], points[indices], method=cv2.LMEDS)

dx = getTranslationX(warp_matrix)
dy = getTranslationY(warp_matrix)
df = getRotation(warp_matrix)

x += dx
y += dy
f += df

final_warp_matrix = getAffineTransform(1.0,1.0,f,-x,-y)

stabilized = cv2.warpAffine(frame,final_warp_matrix,(frame.shape[1],frame.shape[0]))

key = cv2.waitKey(1)
if key == 27:

hasFrame, frame =
if not hasFrame:

prev_gray = gray


1 Like

Question on generalization: different scaling for each dimension.

I can’t use estimateAffinePartial2D then (only one scale factor).
So it would be to compute dense optical flow and remap by the flow?
I wonder if that will produce good quality as the mapping may be arbitrary,
while I know that there are uniformly just transition, rotate and resize.

It could be circumvented by ensuring identical scaling for both dimension.
But then I will suffer data loss.

What might be the better solution?

This is the code for a video taken by a camera. In that case, the aspect ratio is constant, and it works fine for suppressing handshaking. If you have different scales in x and y (the video source must be strange), try to employ cv2.estimateAffine2D instead of cv2.estimateAffinePartial2D. Show your data for more advice.

Yes indeed - IT IS a strange video camera the frames are from.
I got it cheaper, but the sensor changes its dimensions permanently.

:slight_smile: :slight_smile:
Just joking, I hope you don’t mind?

No, please excuse, I used the wrong word: “frames” instead of “scans”.

These scans are from same content, but on different paper and done by different scanners.
They differ in quality, crop-area, resolution, scaling (sometimes even different AR).
And I’m about to match them to take “best of both (several) worlds”.

I had a look at cv2.estimateAffine2D already, but I’m not sure
if my differences are affine transformations at all…

I implemented all 3 variants:

  • sparse flow and affine transformation (general, not partial)
  • dense flow and affine transformation (general, not partial)
  • dense flow and direct mapping

For that I tried

  • translation (small (abt. 10 pixel) and moderate (abt. 30 pixel))
  • rotation (10 degrees)
  • scale (80% of content, adding border).

The result:
“It depends.”

Which is about the worst result obtainable.
As to my thinking the main notion of “algorithm” is that it produces
usable results quite independent of the inputs.

In detail
In some of my images (esp. the ones I’m about) all variants
are about unusable (transformation not identified).
For other images (e.g. “Lena”) it does quite well.
(Except e.g. scaling with generated canvas for sparse variant:
Doesn’t identifys the scaling).

The direct mapping of dense flow doesn’t produce usable if it’s about pixels.
The mapping is in the right direction, but producing really “warped” images.

So I wonder about

Dense flow:
calcOpticalFlowFarneback (matGrayAdjusted, matGrayB, flow, 0.5, 3, 15, 3, 5, 1.2, 0);

Can this significantly improved due to parameters?
Or are there better methods for dense flow?

As I’m not about video frames, but single pairs of images,
time doesn’t matter so much.