Corrupted video files (AWS Lambda, Docker, react js): moov atom not found

I have a Python lib that I made to estimate pose data from a video file. This uses opencv to read the video from the specified path. When I try to open a video I get the following error. I have tested it with multiple videos.

[mov,mp4,m4a,3gp,3g2,mj2 @ 0x39f2180] moov atom not found
[ERROR] ValueError: Error opening video stream or file
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 35, in handler
    video_processing_strategy.process_task()
  File "/var/task/src/strategy/video_processing.py", line 13, in process_task
    counter, output_file_path = pose_estimator.estimate_pose(
  File "/var/task/pose_estimation_lib/estimator.py", line 30, in estimate_pose
    raise ValueError("Error opening video stream or file")

I have the following code

import pathlib
import typing

import cv2


class PoseEstimator:
    def __new__(cls, *args, **kwargs):
        if not hasattr(cls, "instance"):
            cls.instance = super(PoseEstimator, cls).__new__(cls)
        return cls.instance

    def estimate_pose(
        self,
        exercise: str,
        video_name: str,
        min_detection_confidence: typing.Optional[float] = 0.5,
        min_tracking_confidence: typing.Optional[float] = 0.5,
        fps: typing.Optional[int] = 30,
    ):
        counter, status = 0, True
        absolute_file_path = pathlib.Path(video_name)

        input_stream = cv2.VideoCapture(str(absolute_file_path))
        if not input_stream.isOpened():
            raise ValueError("Error opening video stream or file")
        frame_dimensions = (int(input_stream.get(3)), int(input_stream.get(4)))
        ...

I am running this code on AWS Lambda as Docker image built from the below Dockerfile on a x86_64 arch

FROM public.ecr.aws/lambda/python:3.11

ARG CODEARTIFACT_AUTH_TOKEN

WORKDIR ${LAMBDA_TASK_ROOT}

RUN yum update -y \
        && yum install -y libglvnd-glx mesa-libGL
# RUN yum install -y gcc openssl-devel wget tar

COPY requirements.txt /tmp/requirements.txt

RUN pip config set global.extra-index-url https://aws:$CODEARTIFACT_AUTH_TOKEN@wecoach-976750617193.d.codeartifact.ap-south-1.amazonaws.com/pypi/python-packages/simple/

RUN pip install --no-cache-dir --requirement /tmp/requirements.txt --target .

COPY . .

CMD [ "lambda_function.handler" ]

I am using opencv-contrib-python==4.8.1.78

that’s an error coming straight from ffmpeg.

file is corrupted or not there. verify with plain ffmpeg/ffprobe on CLI.

use named constants. CAP_PROP_FRAME_WIDTH and so on.

The video I am using is downloaded from s3 and is stored as an obj with Content-Type: video/mp4.
Since I can’t access Cli, I can’t use the ffmpeg route.
I observed the same issue when I used Content-Type:binary/octet-stream

use those to get some information:

os.path.isfile(...)

os.stat(...)

open(..., "rb").read(256)

I think this is an issue with AWS Lambda and/or Docker, and that file not being an actual local file but some network resource that requires access in specific ways that are different from how local files are accessed.

bypass all that AWS/Docker nonsense. download that blob to your local computer and inspect it.

I got the following info this shows that my file is present and readable.

self.file_path.is_file()=True
self.file_path.stat()=os.stat_result(st_mode=33204, st_ino=11, st_dev=65072, st_nlink=1, st_uid=993, st_gid=990, st_size=5027566, st_atime=1714471763, st_mtime=1714471763, st_ctime=1714471763)
self.file_path.open('rb').read(256)=b'\x00\x00\x00\x18ftypmp42\x00\x00\x00\x00mp42isom\x00\x00\x00\x18beam\x01\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x06\x00\x00\x00\x00\x00\x14\xef\xbf\xbdmoov\x00\x00\x00lmvhd\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xef\xbf\xbdD\x00\x08\xef\xbf\xbd\x00\x00\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00@\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x08\xef\xbf\xbdtrak\x00\x00\x00\\tkhd\x00\x00\x00\x07\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x08\xef\xbf\xbd\xef\xbf\xbd\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\xef\xbf\xbd\xef\xbf\xbd\x00\x00\x00\x00\x00\x00'

I had the same issue the other day when I tried to download a media file directly from github using wget. The file I donwnload was not a media file so it couldn’t be played.

According to that, the file is corrupted. The tree structure no longer matches up.

the beam box claims to be 24 bytes long, but skipping 24 bytes, the moov atom’s size is broken. 00 00 14 EF BF BD is the issue. There are two extra bytes.

moov atoms are fairly small, so I am inclined to think the highest two bytes being 00 is right.

I think the size field of the moov atom triggered some kind of mojibake. The fourth byte must have been something that UTF-8 couldn’t decipher (something that violates UTF-8 encoding rules), so it inserted the “replacement character”, 0xFFFD. Encoded, that turns into EF BF BD. This would explain how two extra bytes showed up.

did the binary data go through any kind of pipes (subprocess/shell pipes, especially on windows) or “string” conversion at any point? That could introduce junk.

the file is definitely corrupted, if that’s the structure it starts with.

Guys the issue was neither with S3 nor with openCV, instead the issue was in the JS code which was uploading the media file to S3.

Basically the video was getting corrupted because of my react client using

new FormData()

to upload to S3.

Referrence