i dont think, it makes much sense to convert your code to cuda.
the only operation to optimize would be the normalization step,
and up/downloading data to the GPU would eat up all speedup.
also, if this should run on colab, you’d have to build your own cv2 from src first,
the builtin version does not support cuda
however, you can use cv2.imread(..., cv2.IMREAD_GRAYSCALE)
and skip the gray conversion