I appreciate the help, thanks.
My mistake was setting winSize = (256, 256). hence for each keypoint it tried to calculate 31x31x2x2x9 histograms (and returned 0’s because the stride and padding issue).
Once I set winSize = (16, 16), for each keypoint it calculated the correct number of histograms 1x1x2x2x9 in the surrounding of the keypoint.
Thanks