Fetching RGBA format from camera with VideoCapture

Can VideoCapture fetch a stream from an USB camera (UVC) with RGBA out.
When I use the GUID for RGBA (also tried RGB32) cap >> img, img remains empty. When I check the FOURCC , the size of cap, both are ok.
Changing the GUID to YUY2 it works. However using cam.set(CAP_PROP_CONVERT_RGB, false); does not seem to change the data, I do not get the raw data. I am using OpenCv 4.6.0 under windows 11. Further I use Visual Studio 2022 (C++).
I checked the Microsoft UVC camera description and according that RGB32 should be no problem and also DirectShow should be capable of handeling the format.



why do you want this ? what is it for ?
video cameras do not produce a valid alpha, and it’s actually ‘in the way’ if you want to do computer-vision …

When I use the GUID for RGBA

what ? a GUID ? please show, what you’re trying !

I checked the Microsoft UVC camera description and according that RGB32 should be no problem and also DirectShow should be capable of handeling the format.

sure, use DirectShow directly, not opencv’s VideoCapture …

I have a camera that produces 4 bands (RGBI). From those 4 bands I want to calculate vegetation indexes.
With the GUID it tell the UVC driver what kind of camera I have (this GUID is programmed in the camera (FX3-Cypress).
GUID for YUY2:
0x59, 0x55,0x59, 0x32,
0x00, 0x00, 0x10, 0x00,
0x80, 0x00, 0x00, 0xAA,
0x00, 0x38, 0x9B, 0x71
0x52, 0x47,0x42, 0x41,
0x00, 0x00, 0x10, 0x00,
0x80, 0x00, 0x00, 0xAA,
0x00, 0x38, 0x9B, 0x71
But why should I not use VideoCapture? Is it not possible to get raw data from it?



it is possible.

there’s some “convert rgb” flag on the VideoCapture. set it to 0. it’s a misnomer. it modulates whether opencv converts whatever raw format into its usual BGR, or not.

but then you’ll have to make sure you understand the format you’re getting. and this feature is somewhat new, so there may still be issues with it.

Thanks for the reply. I will change the data from the camera by a counter value to see if I can find out how the data can be reconstructed.
One thing I am still a bit confused about: When I use a different GUID (the one for RGBA or RGB32), I can get the size, can read the FOURCC, but when I try tho fetch the stream by cap >>img, it results in an empty img? Should this be possible as well, so in other words am I doing something wrong?

you should set CAP_PROP_FOURCC (pixel format) to something sensible like VideoWriter::fourcc(‘R’,‘G’,‘B’,‘A’) (whatever your camera reports in its capabilities), then hope that your device gives you that.

cap >> frame is equivalent to cap.read(frame)

if you don’t have any other means of enumerating your device capabilities and choices of fourcc/pixel format (to check that your device reports them correctly…), ffmpeg can help you


and you may need to pin apiPreference to CAP_DSHOW or CAP_MSMF, whichever works for you.

again, since this CONVERT_RGB=0 is a new feature, I have no idea if that’ll give you the entire frame, or do something stupid with the length, or even report some kind of width and height that may or may not be sensible.

if you encounter problems, make sure to run on the latest release (4.9), browse open and closed bug reports that could be related, and dig around in the source of the videoio module.

I just have been running some tests and give the RGBI data fixed values. I get a nice cyan image (not very important), but I wanted to check if I saw some difference when I used
cap.set(CAP_PROP_CONVERT_RGB, false); Can I retriev in any way the raw data.
After splitting the img in 3 channels I looked the value of each pixel. however with or without the line (cap.set(CAP_PROP_CONVERT_RGB, false))does not make any difference in the pixel values.
Also I could not releate the channel pixel values to the values I gave the RGBI data. So I asume there is still a kind of conversion ( input values RGBI → 0x2a,0x55, 0xB7, 0xCA), and in the 3 channels I see a repating pattern:
channel[0] : 0xbe, 0xff etc
channel[1]: 0x80, 0xff, etc
channel[2]: 0x00, 0x4f, etc

I will follow your suggestion to see if I get further with ffmpeg.
Thanks for your input.

this might not be implemented for this backend. I remember something about backend differences for some of the newer flags.

you can browse cap_dshow.cpp and cap_msmf.cpp for mentions of “convert” to investigate.

I have been playing around a bit. When I use CAP_DSHOW, cap.set(CAP_PROP_CONVERT_RGB, false); does not have any impact. When I split img in 3 channels it gives me always the same value. When I use CAP_ANY or CAP_MSMF I see (video from the web cam inside my laptop, which sends data 640 x 480 NV12), my img is now only 1 row and 460800 coloms. This matches with 640 x 480 x 1.5 bytes. When I split img now, 2 channels are empty and only 1 channel has data. I coverd my web camera with a black cloth. Then I searched through the data, for values >0x60 (because with a black cloth over the camera I expect U and V to be around 0x80). But I did not find anything above 0x13.
So this is no raw data either (I think) even that the number of bytes matches the raw data format.
I will follow you advise to see if I can find something.
BTW to be sure I updated mij rev to 4.9.
Thanks for your help.

I am a couple of steps further. One of the things is that for video formats several GUIDS are going around. Thanks to the remark of Crackwitz I looked through cap_dshow.cpp and cap_msmf.cpp. When I changed the GUID mentioned in there for RGB32, I was capable of reading 3 (bytes of the 4) of raw data:
VideoCapture cam(1, CAP_DSHOW);
if (!cam.isOpened()) // if cam is not open return
return 1;
cam >> img;
if (img.empty()) // if img is empty return
return 1;
ushort rows, cols;
rows = img.rows;
cols = img.cols;
ushort n, m;
Mat channel[4];
uchar pixval1[20], pixval2[20], pixval3[20];
while (1)

cam >> img;
//imshow("cameraOrg", img);
split(img, channel);
for (n = 0; n < rows; n++)
	for (m = 0; m < 20; m++)
		pixval1[m] = channel[0].at<uchar>(Point(m, n));
		pixval2[m] = channel[1].at<uchar>(Point(m, n));
		pixval3[m] = channel[2].at<uchar>(Point(m, n));

if (waitKey(1) == 27) break;		// in case ESC is pressed

The values in the 3 arrays were nicly my RGB planes. So I missed the 4th one. channel[3] is empty.
Next I changed the GUID to RGB24, and did the same. I only had to make the vertical resolution different (since I have 30% more bytes due to the 4th plane). Now I see all the bytes back in the 3 channels. I only have to get them out of the 3 channels and put them in theire own Mat matrix.
So I will see if I can find a way to see if I can get the 4 plane as well with RGB32, that would be the most ideal solution.
In case I found a solution I will update my post here in the hope it is usefull for others.

still curious, what did you change & where ?

Inside the camera (USB3 UVC) you have a GUID that tells what type of video format is generated by the camera.
I tried several GUIDs that I found on the web (from several sides incl Microsoft). In cap_dshow I found for RGB32: E436EB7E-524F-11CE-9F53-0020AF0BA770 and RGB24: E436EB7E-524F-11CE-9F53-0020AF0BA770. In VideoCapute I had to add CAP_DSHOW (VideoCapure(camID, CAP_DSHOW).

1 Like

Inside the camera (USB3 UVC) you have a GUID

can you give us a link to model / sw ?
i doubt, that there’s sw running inside your cam using guids,
(guessing:) those are likely ‘prefered media subtypes’ to be used in a dshow filtergraph later

you can also find the guids here:

unfortunately, opencv’s VideoCapture implementation builds it’s own filtergraph,
not respecting your settings. have a look:

looks like it only supports gray8 or rgb24 here, whatever you try elsewhere.

if you’re able to hack it / rebuild the libs, you could try to add MEDIASUBTYPE_RGB32 here
and update buffer sizes here

OP has accidentally confessed to a piece of information.

that is a chip that interfaces USB3 to embedded electronics. it’s commonly combined with FPGAs.

it’s likely a custom camera and they can control what they’re reading from the sensor and what they’re sending over USB3.

1 Like

That is correct. We made a camera for agriculture applications. We use indeed a IC from Cypress (FX3) to get the video from the camera to pc/processor.
I tried to use RGB32, that works (partly). with VideoCapture with the CAP_SHOW flag. I only loose 1 band (I get only 3 bands of the 4). I have a kind of workaround for vegetation indexes. For the calculation of an index I hardly ever use all the 4 bands, most are only 2 or 3 bands.
I can change the order of bands inside the FPGA, based on a selection made on the host and send to the camera.
The other option I have is use RGB24, and ancrease the number of horizontal pixels and reschedule the data after it is received.
Both are not ideal, but for now I can make an image and run my tests.

you should prototype your camera interface with something other than OpenCV.

something like OBS Studio perhaps.

when the interface works, then you can figure out what OpenCV needs to get all the data without mangling it.