How to determine average x + y motion of keypoints in video?

Good day :slight_smile:

I’m upgrading a testbench for 3D-printing with a high-speed camera, and I want to use the camera to determine motion of the filament (x-axis). I would also like to know the twist of the filament (y-axis). I want to associate these two values with a range of timestamps, thereby calculating velocity.

Since we’re using a macro lens, the camera sees only filament throughout its area.

I’ve tried using dense optical flow, but the frames are so noisy that the visualized colors skip all over the place.
I’m currently trying to use sparse optical flow, storing the vector coordinates in an array and taking their average over a set of frames. Problem is I’ve never touched python before, and I’ve got no idea how to achieve this …

Can you give me pointers?

The example code shows that sparse optical flow works alright for this scenario.

#!/usr/bin/env python

Lucas-Kanade tracker
Lucas-Kanade sparse optical flow demo. Uses goodFeaturesToTrack
for track initialization and back-tracking for match verification
between frames.
----- [<video_source>]
ESC - exit
# Python 2/3 compatibility
from __future__ import print_function

import numpy as np
import cv2 as cv

import video
from common import anorm2, draw_str

lk_params = dict( winSize  = (25, 25),
                  maxLevel = 3,
                  criteria = (cv.TERM_CRITERIA_EPS | cv.TERM_CRITERIA_COUNT, 10, 0.03))

feature_params = dict( maxCorners = 500,
                       qualityLevel = 0.3,
                       minDistance = 7,
                       blockSize = 7 )

class App:
    def __init__(self, video_src):
        self.track_len = 10
        self.detect_interval = 5
        self.tracks = [] = video.create_capture(video_src)
        self.frame_idx = 0
        self.movement_interval = 20

    def run(self):
        while True:
            _ret, frame =
            frame_gray = cv.cvtColor(frame, cv.COLOR_BGR2GRAY)
            vis = frame.copy()

            if len(self.tracks) > 0:
                img0, img1 = self.prev_gray, frame_gray
                p0 = np.float32([tr[-1] for tr in self.tracks]).reshape(-1, 1, 2)
                p1, _st, _err = cv.calcOpticalFlowPyrLK(img0, img1, p0, None, **lk_params)
                p0r, _st, _err = cv.calcOpticalFlowPyrLK(img1, img0, p1, None, **lk_params)
                d = abs(p0-p0r).reshape(-1, 2).max(-1)
                good = d < 1
                new_tracks = []
                for tr, (x, y), good_flag in zip(self.tracks, p1.reshape(-1, 2), good):
                    if not good_flag:
                    tr.append((x, y))
                    if len(tr) > self.track_len:
                        del tr[0]
          , (int(x), int(y)), 2, (0, 255, 0), -1)
                self.tracks = new_tracks
                cv.polylines(vis, [np.int64(tr) for tr in self.tracks], False, (0, 255, 0))
                draw_str(vis, (20, 20), 'track count: %d' % len(self.tracks))
		# Compute the magnitude and angle of the 2D vector
                #magnitude, angle = cv2.cartToPolar(p1[…, 0], p0r[…, 1])

            if self.frame_idx % self.detect_interval == 0:
                mask = np.zeros_like(frame_gray)
                mask[:] = 255
                for x, y in [np.int32(tr[-1]) for tr in self.tracks]:
          , (x, y), 5, 0, -1)
                p = cv.goodFeaturesToTrack(frame_gray, mask = mask, **feature_params)
                if p is not None:
                    for x, y in np.float32(p).reshape(-1, 2):
                        self.tracks.append([(x, y)])

            #code block added by Jacob, calculates the mean directional movement between each frame, sums them up, and after 20 frames it continuously displays the results
#            prww = p1
#            standard_deviation = np.std(prww)
#            draw_str(vis, (200, 40), 'movement: %d' % (standard_deviation))
#            if self.frame_idx % self.movement_interval == 0:


            self.frame_idx += 1
            self.prev_gray = frame_gray
            cv.imshow('lk_track', vis)

            ch = cv.waitKey(1)
            if ch == 27:

def main():
    import sys
        video_src = sys.argv[1]
        video_src = 0


if __name__ == '__main__':

this looks like an ideal application for optical flow… enough lowpass/blurring might take care of the noise. optical flow is too sophisticated for this situation. try simple autocorrelation (convolution with itself), because I see just translation.

your application sounds like you might wanna use an optical mouse sensor.

edit: I just tried something using filter2D and minMaxLoc applied to adjacent frames. that kinda works. the issue is numerics. the procedure gives you integer values. integrating those up obviously also cumulates the rounding error. you could either upscale the frames, which gives you a few bits more of precision. or you could keep a “keyframe” to calculate against (instead of the previous one), and only update the keyframe if the shift has gotten large enough for the pictures to become ambiguous. I’d recommend less than half a tooth mark because at that point the shift becomes ambiguous (half tooth forward? half backward?). one downside to “keyframes” is that during fast-moving section, you’ll get frames with motion blur, and those are poor keyframes, so how you determine what’s a keyframe gets a little tricky.

filter2D is also fairly wasteful because it calculates the correlation for ALL POSSIBLE shifts. you’re only ever observing a small subset of those (very little in Y direction). it might pay to reach for numpy/scipy or even roll your own correlation loop (I’d recommend sum of squared differences) and accelerate with numba.

from the adjacent-frames approach with 4x upsampling, I get this: [486.75 3.25] the truth is closer to [490, -3] (manually picked) so there you see the issue.

a plot of the cumulative shift (in X):

edit: I played around with keyframes and assessing motion blur by what relative shift each got. resulting shift ([490.5 -2.5]) gets close to what I would pick in an image editor but eh… depending on the tweakables, it’s still varying by a pixel or so in X, and Y… is moving all over the place.

one side effect of working around motion blur is that it also works around rolling shutter, which is very evident during the quickly moving segments.

X position (positive = going left):

Y position (positive = going up):

have fun: ocv-forum-8983.ipynb · GitHub


Thank you so much @crackwitz, I appreciate your comprehensive answer! I’ll play around with the notebook and try to wrap my head around it :slight_smile:

edit: Got your script working by editing all imshow(frame[ ]) to cv.imshow(’’, frame[ ]), and adding imports for cv2 and numpy.

I’ve looked into the various functions, and can understand what’s going on somewhat.
For our experiments we need 25 seconds of motion analyzed, and my computer quickly devours 50GiB memory running the current script. The script exits before memory_profiler can identify the culprit. I’ll look around for a way to “unload” images/arrays when they’re no longer relevant.

@crackwitz I made a little thing, it uses your code pretty much unchanged atm. Would you be okay with me licencing it as BSD-3-CLAUSE or another compatible licence, so it can be included in the raspiraw repo?

Do you want to be credited?

fine with me, as long as nobody comes after me for using my own code however I see fit. my view of IP law is that everyone who caused that should be hanged.

that notebook was really just a proof of concept. since you seem to need that on a running video, you’ll have to adapt it accordingly. there’s no need to keep all the history. I used the available history to full effect. that “keyframe” stuff you could also just do such that it keeps the most recent and good frame as a potential reference to fall back on, while using the currently designated “key” frame until it’s too far away or whatever. one additional frame in memory won’t cost much.

1 Like

For the video I posted, your method works fine, but it gave pretty unstable results on some other videos. After trying and failing to reduce noise sufficiently, I changed to recording with raspivid and using findTransformECC(). It gives very stable results (when it actually converges), but it consistently estimates total motion as 1/2 to 1/3 of actual real-world motion.
A few questions I’m wondering about:

  • Is findTransformECC() unsuited to this task?

  • Does findTransformECC ignore edge pixels in image 1 that disappear completely in image 2, or is this discrepancy baked into the ECC-confidence?

  • How can I make findTransformECC converge (on an accurate solution) more often? Currently, if the transform fails to converge or gives <1 pixel motion, I skip current frame and try the next one instead.

Could you give it a quick look? My script lives here now, since it’s no longer related to raspiraw.

Cheers, Solid

ECC refinement requires that you’re already in the valley of the global minimum. if your terrain is “finer” than the “distance” yet to be traveled, it has nothing to go by.

you’ll have to run it with larger sigmas to capture coarse features. that helps find the global minimum but it also makes fine tweaks harder. practically, you might want to apply a few passes, going from large to small sigma.

or get a first approximation by other means. ECC refinement is really just made for tweaking.

no clue how it deals with border effects. check docs, and if they don’t say, you might be left with digging into the source code. I’d hope that someone already considered that.

1 Like