Definition
CUDA (Compute Unified Device Architecture) is NVIDIA's parallel-computing platform and programming model. It gives developers direct, general-purpose access to NVIDIA GPUs from languages such as C, C++ and Fortran, turning a graphics card into a massively parallel compute engine for deep learning, scientific computing and high-performance computing (HPC).
How it works
In the CUDA model, the CPU (the "host") launches small functions called kernels that execute on the GPU (the "device"). Each kernel runs across a grid of thread blocks, with every block holding many threads that execute the same instruction on different data. This single-instruction, many-threads style is what lets a GPU chew through the matrix math behind neural networks orders of magnitude faster than a CPU.
Why it dominates — and why that matters
Because CUDA arrived early and is deeply integrated into frameworks like PyTorch and TensorFlow, it has become the de-facto standard for AI development. That dominance is also a centralization risk: most AI software assumes NVIDIA hardware, giving one vendor enormous leverage over the entire stack. Open alternatives such as ROCm exist precisely to loosen that grip.
For self-hosting builders, knowing whether your tooling is CUDA-locked matters when sourcing hardware for a local LLM or a wider sovereign AI deployment.
In Simple Terms
CUDA (Compute Unified Device Architecture) is NVIDIA’s parallel-computing platform and programming model. It gives developers direct, general-purpose access to NVIDIA GPUs from languages such as…
