that picture does surprise me. if it works for you, keep using it.
the scale invariance comes from SIFT estimating the scale of the feature (laplacian pyramid and all that), which should correlate with the scaled appearance of a physical feature (of a certain size) in a camera picture.
SIFT then considers a local neighborhood around the point that is proportional to its scale (as figured from the octave in the laplacian pyramid and some intermediate levels between octaves), and as such is invariant/robust to the scaling of the appearance of the physical feature.
the 100 brightest have no reason not to be distributed randomly, so that’ll be good enough. you can go to extra trouble to collect features in the corners of the view. I wouldn’t worry about that unless there’s evidence of a homography “wiggling” or not sufficiently aligning two views.
you can always run a pass of ECC refinement. that’ll use both pictures in their entirety to refine the homography.
keypoint “sizes” are mostly implementation-defined, i.e. specific to the feature detection/description algorithm you’re using (SIFT, AKAZE, …) you can expect to find details in papers of the respective algorithms but opencv’s implementation might differ. hard to say. opencv docs might also not go into that much detail.