Hi Berak and crackwitz,
Thanks so much for thinking along. With your help I finally found the solution for my problem. I will show it below.
The Thin Plate Spline Shape Transformer turned out to be useful. I found out how to use applyTransformation to create a remap map. Now I just have to call cv.remap in my frame-loop which is really fast compared to warpImage that does both transformation and map.
To answer your questions about how I get my points: For my project Globe4D I am projecting with a fish-eye projector inside a 1 meter globe/dome. A fish-eye camera with Infrared pass-filter is also on the inside. It does not see visible light, therefore it can not be calibrated automatically using a chessboard pattern. With infrared LEDs the globe is illuminated from the inside. This makes hands on the globe reflect the IR-light which can be captured by the camera.
The fish-eye camera is out of center which leads to quite some distortion. To calibrate the touch I project a grid on the globe. By touching the dots on the grid (by hand or with an IR-flashlight) and knowing which dot is the active one I can map screen points to camera points. That is the input for the mapping function.
I think you’re right about that I just might need bilinear interpolation. I’m used to work with the Processing environment and OpenGL, that’s why at first I was talking about vertices and texture-coordinates / uv-mapping, all very linear and 2D. Here’s an example of my quads in Processing: Quads (github.com)
I wouldn’t know how to calculate the mapping array for the remap function myself using bilinear interpolation. I think the fact that the quads don’t have right corners (so not rectangles) makes it even more complex. At least for my level of expertise. I would love to see some code from someone 
So here’s my final code that runs at full speed. Thanks again for your time!
import cv2
import numpy as np
w,h = 640,480
cam = cv2.VideoCapture(0)
cam.set(cv2.CAP_PROP_EXPOSURE,-6)
cv2.namedWindow("src")
cv2.namedWindow("dst")
cv2.moveWindow("dst",w,0)
# estimate TPS transformation
cam_points = np.loadtxt("data/cam_points.txt", dtype=int).reshape(1,-1,2)
screen_points = np.loadtxt("data/screen_points.txt", dtype=int).reshape(1,-1,2)
screen_points = (screen_points * (h/2400, w/3200) + (-64,0)).astype(int) # 3200 = 640*480/480 to maintain aspect ratio, -64 to restore center
matches = [cv2.DMatch(i, i, 0) for i in range(len(screen_points[0]))]
tps = cv2.createThinPlateSplineShapeTransformer()
tps.estimateTransformation(screen_points, cam_points, matches)
# apply transformation to remap map (this part can still be improved I think but for now it's fine since it only runs once)
map_x = np.zeros((h,w), dtype=np.float32)
map_y = np.zeros((h,w), dtype=np.float32)
for y in range(h):
for x in range(w):
p = np.array([x,y]).astype(np.float32).reshape(1,1,2)
u,v = tps.applyTransformation(p)[1][0][0]
map_x[y,x] = u
map_y[y,x] = v
# draw loop that runs very fast since it only uses remap for transformation
while cv2.waitKey(1)!=27:
ret, src = cam.read()
dst = cv2.remap(src, map_x, map_y, cv2.INTER_LINEAR)
for c,s in zip(cam_points[0], screen_points[0]):
cv2.circle(src, c, 5, (0,0,255), -1)
cv2.circle(dst, s, 5, (0,255,0), -1)
cv2.imshow("src", src)
cv2.imshow("dst", dst)
cv2.destroyAllWindows()