Following content will introduce you with the GPU architecture in detail. Generally, the structure of a graphics card is (from big to small): processor clusters (PC) > streaming multiprocessors (SM) > layer-1 instruction cache & associated cores. Learn more computer components’ structures on MiniTool partition editor website.
About GPU (Graphics Processing Unit)
According to Wikipedia, a GPU card, also called a graphics card or video card, is a specialized electronic circuit. It is specially designed for quickly manipulating and altering memory to accelerate the creation of images in a frame buffer, thus intended for output to a display device like a computer monitor or TV screen.
Modern GPU architecture is very efficient at manipulating graphics as well as image processing. The highly parallel structure makes it more effective than general-purpose CPU (Central Processing Unit) architecture for algorithms, which process large blocks of data in parallel.
Within a PC, a GPU can be embedded into an expansion card (video card), preinstalled on the motherboard (dedicated GPU), or integrated into the CPU die (integrated GPU).
GPU Architecture
When talks about video card architecture, it always involves in or compared with CPU architecture.
GPU vs CPU Architecture
The function of a GPU is to optimize data throughput. It allows pushing as many as possible tasks through its internals at once, many more tasks than the CPU can handle at once. This is all because, generally, a graphics card has many more cores than a CPU.
Yet, actually, we call a core a CUDA (Compute Unified Device Architecture) core, which consists of a fully pipelined integer ALUs (Arithmetic Logic Unit) and FPU (Floating Point Unit), in a GPU. In NVIDIA GPU architecture, the ALU supports full 32-bit precision for all instructions. And, the integer ALU is optimized to efficiently support 64-bit, extended precision operations as well as various instructions like Boolean, compare, convert, move, shift, bit-reverse insert, bit-field extract and population count.
Generally, the architecture of a GPU is very similar to that of a CPU. They both make use of memory constructs of cache layers, global memory and memory controller.
A high-level GPU architecture is all about data-parallel throughput computations and putting available cores to work instead of focusing on low latency cache memory access like a CPU.
GPU Architecture Basics
Within a GPU device, there are multiple processor clusters (PC), which contain multiple streaming multiprocessors (SM). And, every SM accommodates a layer-1 instruction cache layer together with its associated cores. Usually, one SM adopts a dedicated layer-1 cache and a shared layer-2 cache before pulling data form the global GDDR-5 memory. Therefore, the GPU processor architecture is tolerable for memory latency.
GCA (Graphics Compute Array)
Usually, a GCA, also known as a 3D engine, consists of pixel shaders, vertex shaders or unified shaders, stream processors (CUDA cores), texture mapping units (TMUs), render output units (ROPs), L2 cache, geometry processors, and so on.
GMC (Graphics Memory Controller)
The GMC, also known as memory chip controller (MCC) or memory controller unit (MCU), is a digital circuit that controls the data flow going to or from the computer’s graphics memory. It can be a separate chip; it can also be integrated into another chip, such as being placed on the same die or as an integral part of a microprocessor. If the GMC exists as an integral part, it is called an IMC (Integrated Memory Controller).
The memory GMC controls including VRAM, WRAM, MDRAM, DDR, GDDR and HBM.
VGA BIOS (Video Graphics Array Basic Input/Output System)
VGA BIOS, also known as video BIOS, is the BIOS of a graphics card in a computer. It is a separate chip located on the graphics card, not part of the GPU.
BIF (Bus Interface)
The bus interface (BI) is a computer bus for interfacing small peripheral devices such as flash memory with the processor. Usually, it includes SA, VLB, PCI, AGP and PCIe.
PMU (Power Management Unit)
The PMU is a microcontroller (microchip) that controls the power functions of digital platforms. It has many similar components to the average computer, such as CPU, memory, firmware, software, etc. The PMU is one of the few components that remain active even when the computer is completely turned off, powered by a backup battery.
In a portable computer, the PMU coordinates the following functions:
- Monitor power connections and battery charges.
- Shut down unnecessary system parts when they are left idle.
- Control sleep and power functions (on or off).
- Control power to other integrated circuits.
- Manage the interface of a built-in keyboard or trackpads.
- Charge batteries when necessary.
- Regulate the real-time clock (RTC).
VPU (Video Processing Unit)
VPU is a specialized processor that takes video streams as input and can execute very complex processes on the input stream. It is usually used in machine learning applications and devices, and functions as an auxiliary component in those devices.
VPU is a video codec responsible for video encoding and decoding. So, it is also called a video encoder and decoder. VPUs perform the compression or decompression of MPEG2, Theora, VP8, H.264, H.265, VP9, VC-1, etc.
DIF (Display Interface)
Display interface, also called display controller, defines a serial bus and a communication protocol among the host, the source of the image data and the destination device. It includes RAMDACs, HDMI audio, DP audio, video underlay (VGA, DVI, HDMI, DisplayPort, S-Video, composite video, component video), PHY (LVDS, TIMDS) and EDID.
In a nutshell, GPU architecture is simple than that of CPU. Graphics processing unit architecture has much more cores than a CPU to achieve parallel data processing with higher tolerate latency. Such kind of GPU is known as general-purpose GPU (GPGPU) used to accelerate computational workloads in modern high-performance computing (HPC).