gray (or green) frames happen when there hasn’t been a keyframe yet.
if you make it decode the stream right away, it’s “starting” somewhere other than the beginning of the stream, so it’s getting predictive frames first, not a keyframe.
the picture will snap to what it should be as soon as there’s a keyframe.