What are the advantages of CUDA vs. graphics-based GPGPU?
CUDA is designed from the ground-up for efficient general purpose computation on GPUs. Developers can compile C for CUDA to avoid the tedious work of remapping their algorithms to graphics concepts. CUDA exposes several hardware features that are not available via graphics APIs. The most significant of these is shared memory, which is a small (currently 16KB per multiprocessor) area of on-chip memory which can be accessed in parallel by blocks of threads. This allows caching of frequently used data and can provide large speedups over using textures to access data. Combined with a thread synchronization primitive, this allows cooperative parallel processing of on-chip data, greatly reducing the expensive off-chip bandwidth requirements of many parallel algorithms. This benefits a number of common applications such as linear algebra, Fast Fourier Transforms, and image processing filters. Whereas fragment programs in graphics APIs are limited to outputting 32 floats (RGBA * 8 render targe