Processing image stack with CUDA function

crackwitz · May 13, 2021, 3:49pm

some material on cuda streams, which is not necessarily beginner-friendly, so feel free to ignore it, or browse out of curiosity:

according to this, a Quadro T2000 has 3.6 Tflop/s of FP32, so that’s still orders of magnitude better than a CPU.

( @cudawarped please don’t compare to the latest and greatest. newbies lack the context to understand such judgments. this hardware is plenty powerful. it’s absolutely not the issue here. )

your whole problem is not realizing that transfer of data and commands takes time.

getting a GPU to perform is all about dealing with latency.

those are the basics.

please don’t jump into GPU programming. your issues will be tiresome because they wouldn’t come up with a proper approach.

start with the basics of GPU programming. that means learning what’s special about GPUs and what to pay attention to.

you do not start by coming up with your own questions and poking around for answers. you look for structured teaching. get experts to tell you what you need to know. as a newbie you can’t know what you need to know, and you aren’t expected to.

hell, nvidia has tons of documentation and tutorials and introductions. they wouldn’t make any money if their educational content sucked. please go and look for good learning material.

Topic		Replies	Views
Is there a way to use Cuda version of fastNlMeansDenoising in python? Python cuda	2	614	October 10, 2022
The use of OpenCV's cuda-VideoReader's grab does not contribute to efficiency improvement? Python cuda , videoio	4	404	February 19, 2024
OpenCV CUDA extremely slow cuda	3	6710	April 30, 2021
Some opencv cudafilter functions is slower than CPU code on Jetson Xavier NX C++ filter , cuda , cudaarithm	1	312	November 8, 2023
CUDA Fast detector much slower than normal FAST performance , cuda , practical	9	2445	May 28, 2021

Processing image stack with CUDA function

Related topics