もっと詳しく

The Geforce RTX 4000 has been said to have 18,432 FP32 shaders as AD102 for some time. It is assumed that the Geforce RTX 4090 will be equipped with it and, according to the latest rumors, it should achieve a bit more than 100 TFLOPS grid performance and thus a bit more than the fastest Radeon RX 7000, which is said to have 92 TFLOPS. That would be around a factor of 2.8 compared to the Geforce RTX 3090, which doesn’t seem entirely unrealistic at first. So far, the rumors have made it clear that Nvidia will significantly increase the number of shaders on the flagship, but that the break with the next smaller model, the Geforce RTX 4080, will be all the bigger. According to the data that is floating around, it should then reach the level of an RTX 3090.

So that would mean Nvidia shifting performance up a notch and price point. In the Ada generation, the Geforce RTX 4080 would give you about the same performance as the Geforce RTX 3090. If you want more, you have to go for the (temporary) flagship. On paper, it offers a lot more. However, it remains to be seen to what extent this can then be converted into pure performance and how efficiently this will take place. In any case, 600 watts TGP seems to be set at the moment, for OC models and a possible Ti it could go beyond that – a PCB with 900 watts has already been discussed.

chip SM GPC TPC shaders cache memory bus Storage
AD102 144 12 72 18,432 96MiB 384 bits 24 GiB
AD103 84 7 48 10,752 64MiB 256 bits 16 GiB
AD104 60 5 30 7,680 48MiB 192 bits 12 GiB
AD106 36 3 18 4,608 32MiB 128 bits 8 GiB
AD107 24 3 12 3,072 32MiB 128 bits 8 GiB

What was left out in the whole race for the fastest graphics card are the performances apart from the raster performance. The topic of ray tracing in particular should be exciting, because the goal must be to enable good results even on simpler maps. The flagship battle is ultimately nice, but unaffordable for most players and also a bit unreasonable. Nvidia also assumes that AD102 is not a full version – which is obvious. On the one hand, you leave room for a Ti or Titan. On the other hand, you are on the safer side with the monolithic design in terms of yield, while AMD probably uses the multi-chip design and is therefore positioned a little differently.

Geforce RTX 4000 vs. Radeon RX 7000: Nvidia with advantage in manufacturing?

The next generation will be particularly exciting because the designs differ a little more than usual. The question is also whether AMD really wants to offer an enthusiast card, or whether Nvidia will leave the throne for the time being and orientate itself a step below.

AD102 Navi 31
nodes TSMC N4P TSMC N5 & N6
architecture Ada (Lovelace) AMD RDNA3
GPU package monolithic Multi-Chip Modules (MCM)
GPU size ~ 600mm² ~ 800mm²
This 1 2 GCD + 4 MCD + 1 IOD
GPU mega clusters 12 graphics processing clusters (GPC) 2×3 shader engines
GPU super clusters 72 Texture Processing Clusters (TPC) 2×30 RDNA Workgroups (WGP)
GPU clusters 144 streaming multiprocessors (SM) 120 Compute Units (CU)
FP32 cores 18,432 CUDAs 15,360 stream processors
GPU clock ~ 2.7GHz ~ 3.0GHz
FP32 performance ~ 100 TFLOPs ~ 92 TFLOPs
memory size 24 GiB ?
memory speed 21Gb/s ~ 18Gbps
storage type GDDR6X GDDR6
memory bus 21 Gb/s, 384 bits ~18Gbps, 256bits
caches 96MiB (L2 cache) 256/512MiB Infinity Cache
power consumption 600 watts ?
performance ~ Q3/2022 ~ Q3/2022
sales launch ~Q4/2022 ~Q4/2022

And even if Intel is keeping a low profile at the moment, the successor to Alchemist called Battlemage is quickly expected. He should then get involved quite quickly and also a little further up than just in the middle field, as is expected for the Arc A770/750 and A580/380. In any case, it’s not unwise for Intel to drive inconspicuously at first in order to drive out possible driver disputes and teething problems. That saves you a fiasco like that of the Volari.

Sources: Twitter (@kopite7kimi, @Greymon55), Wccftech, Videocardz

The post Allegedly with 100 TFLOPS FP32 performance appeared first on Gamingsym.