What is a Tensor Core? The Nvidia GPU technology explained

Nvidia A100 Tensor Core GPU
(Image credit: Nvidia)

If you've ever wondered what a Tensor Core is then you're not alone. Whether you're in the market for a new graphics card or want to understand your Nvidia graphics card better, the tech is essential in powering today's demanding games.

Tensor Cores can be found inside all of the best graphics cards and the best 4K graphics cards, as this is the technology that utilizes what's known as mixed computing, a combination of algorithmically driven (or AI-powered) and native hardware to produce results that far exceed standard rasterization, allowing for higher framerates in higher resolutions.

In short, Tensor Cores are the hardware that power Nvidia's AI upscaling tech known as DLSS (Deep Learning Super Sampling). But how do they work? Furthermore, how important are they for PC gaming and intensive productivity tasks? That's what TechRadar is here to answer.

What is a Tensor Core?

Nvidia GeForce RTX 5080 during our review process

(Image credit: Nvidia)

Tensor Cores are specially developed hardware units inside of an Nvidia graphics card that enable mixed-precision computing by accelerating deep-learning AI workloads, and are trained across a vast neural network.

They are far more power efficient than the CUDA cores of your graphics card and able to handle mixed-precision operations which would otherwise take considerably longer to complete.

History of Tensor Cores

Nvidia logo on a dark background

(Image credit: Konstantin Savusia / Shutterstock)

Nvidia first used Tensor Cores on its Volta lineup of data centres in 2017, which were later expanded into graphics cards. The first GPUs to use Tensor Cores were the Nvidia Titan V, Nvidia Quadro GB100, and the Nvidia Titan V CEO Edition, all of which utilized 640 first-generation Tensor Cores to accelerate their AI workloads limited to FP16 (or a 16-bit floating point).

Second-generation Tensor cores were launched inside Turing (RTX 20 series) graphics cards, which greatly expanded the precisions up to Int1, Int4, and Int8 as well as FP16. Turing GPUs were also the first to utilize Ray Tracing cores (RT cores) which made the real-time lightning technique possible in video games (the combination of the technology would later come to be known as having "RTX on").

For productivity use, these faster and more versatile Tensor Cores were found inside RTX Quadro video cards, primarily used in workstations for CAD, content creation, scientific calculations, and machine learning.

Third-generation Tensor Cores were built inside Ampere (RTX 30 series) graphics cards. As well as Int1, Int4, Int8, and F16, the computational capabilities were expanded to include TF32, FP64, and bfloat16 precisions, too. These new formats greatly improved the types of machine learning (and deep learning), with the TF32 format being up to 20x faster than previous mixed-learning models.

These Tensor Cores were also essential in driving Nvidia DLSS 2.0 as the AI-accelerated temporal anti-aliasing (TAA) became far cleaner and more widely adopted. While the first version of DLSS, launched in February 2019, could be blurry and imprecise, the wider amount of precisions possible through this generation of the Tensor Cores meant far higher framerates with hardware-intensive software, such as ray-traced games, particularly in higher resolutions such as 1440p and 4K.

As Nvidia transitioned from primarily a computing/hardware company to an AI software developer, a similar trend could be seen with the fourth-generation Tensor Cores. A primary use of accelerated AI technologies is generative AI, large language models, chatbots, and Natural Language Processing (NLP).

However, In gaming terms, the fourth-generation Tensor Cores also made DLSS 3's Frame Generation a possibility, something that became exclusive to Ada (RTX 40 series) graphics cards and was not supported on previous RTX GPU generations. Provided developers worked with Nvidia to hand-tune a specific hardware-based algorithm for their games, this tech analyzes and then generates additional frames alongside traditional rasterization happening naturally. This results in higher framerates, even though it is hotly debated whether these are "real frames" or not.

Tensor Core utilization for gaming

Cyberpunk 2077 RTX On

(Image credit: Nvidia/CD Projekt Red)

That leads us to 2025, with the recent introduction of Blackwell architecture graphics cards featuring fifth-generation Tensor Cores. Nvidia has stated its latest architectural technology "delivers 30X speedup" when compared to its previous generation due to the new precision formats supported.

For gaming, this advancement makes Multi Frame Generation a possibility, which is a feature exclusive to RTX 50 series graphics cards (such as the RTX 5090 and RTX 5080). We've been consistently impressed by what MFG can do for 8K gaming.

Tensor Cores work alongside CUDA cores in gaming (AI software in tandem with hardware) to render video games at a lower native resolution and then upscale into a higher one. This is how RTX graphics cards can produce playable framerates of 60 and beyond in resolutions like 4K and 8K while utilizing hardware-intensive rendering techniques such as ray tracing.

As DLSS (and other RTX technologies) have been continually improved over the years, things now look as close to native performance as possible, something that has been favored by many gamers.

How do Tensor Cores work?

Press shot of an Nvidia chip

(Image credit: Nvidia)

Tensor Cores are specifically-designed hardware units found inside Nvidia graphics cards primarily to accelerate AI workloads, such as matrix multiplication, that make machine/deep learning possible. As the technology has become more sophisticated from 2017 through to 2025, with the launch from the first-generation to the current fifth-generation offerings, they now facilitate significantly faster performance when compared to what's possible from regular GPU cores.

The main feature of Tensor Cores is mixed precision computing to perform calculations with lower precision data with greater accuracy than before. As the technology has matured, the types of high-precision output have deepened significantly.

Additionally, Tensor Cores are optimized for Matrix Multiply-Accumulate operations (solving complicated math equations) through their tile-based processing. These tiles are stacked, meaning parallel (instead of linear) computations can happen per core, resulting in faster overall workloads.

The operations of Tensor Cores can be broken down into three distinct steps: data loading, mixed-precision calculations, and then the output through accumulation.

All of this comes together to fulfil what is known as AI acceleration, which has a myriad of uses in gaming (DLSS, Frame Generation, MFG, and Ray Reconstruction) to the training of neural networks, machine learning, deep learning, and large language models.

You might also like...

TOPICS
Aleksha McLoughlin
Contributor

Aleksha McLoughlin is an experienced hardware writer. She was previously the Hardware Editor for TechRadar Gaming until September 2023. During this time, she looked after buying guides and wrote hardware reviews, news, and features. She has also contributed hardware content to the likes of PC Gamer, Trusted Reviews, Dexerto, Expert Reviews, and Android Central. When she isn't working, you'll often find her in mosh pits at metal gigs and festivals or listening to whatever new black and death metal has debuted that week.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.