The Geforce RTX 4000 has been said to have 18,432 FP32 shaders as AD102 for some time. It is assumed that the Geforce RTX 4090 will be equipped with it and, according to the latest rumors, it should achieve a bit more than 100 TFLOPS grid performance and thus a bit more than the fastest Radeon RX 7000, which is said to have 92 TFLOPS. That would be around a factor of 2.8 compared to the Geforce RTX 3090, which doesn’t seem entirely unrealistic at first. So far, the rumors have made it clear that Nvidia will significantly increase the number of shaders on the flagship, but that the break with the next smaller model, the Geforce RTX 4080, will be all the bigger. According to the data that is floating around, it should then reach the level of an RTX 3090.
So that would mean Nvidia shifting performance up a notch and price point. In the Ada generation, the Geforce RTX 4080 would give you about the same performance as the Geforce RTX 3090. If you want more, you have to go for the (temporary) flagship. On paper, it offers a lot more. However, it remains to be seen to what extent this can then be converted into pure performance and how efficiently this will take place. In any case, 600 watts TGP seems to be set at the moment, for OC models and a possible Ti it could go beyond that – a PCB with 900 watts has already been discussed.
chip | SM | GPC | TPC | shaders | cache | memory bus | Storage |
---|---|---|---|---|---|---|---|
AD102 | 144 | 12 | 72 | 18,432 | 96MiB | 384 bits | 24 GiB |
AD103 | 84 | 7 | 48 | 10,752 | 64MiB | 256 bits | 16 GiB |
AD104 | 60 | 5 | 30 | 7,680 | 48MiB | 192 bits | 12 GiB |
AD106 | 36 | 3 | 18 | 4,608 | 32MiB | 128 bits | 8 GiB |
AD107 | 24 | 3 | 12 | 3,072 | 32MiB | 128 bits | 8 GiB |
What was left out in the whole race for the fastest graphics card are the performances apart from the raster performance. The topic of ray tracing in particular should be exciting, because the goal must be to enable good results even on simpler maps. The flagship battle is ultimately nice, but unaffordable for most players and also a bit unreasonable. Nvidia also assumes that AD102 is not a full version – which is obvious. On the one hand, you leave room for a Ti or Titan. On the other hand, you are on the safer side with the monolithic design in terms of yield, while AMD probably uses the multi-chip design and is therefore positioned a little differently.
Geforce RTX 4000 vs. Radeon RX 7000: Nvidia with advantage in manufacturing?
The next generation will be particularly exciting because the designs differ a little more than usual. The question is also whether AMD really wants to offer an enthusiast card, or whether Nvidia will leave the throne for the time being and orientate itself a step below.
AD102 | Navi 31 | |
---|---|---|
nodes | TSMC N4P | TSMC N5 & N6 |
architecture | Ada (Lovelace) | AMD RDNA3 |
GPU package | monolithic | Multi-Chip Modules (MCM) |
GPU size | ~ 600mm² | ~ 800mm² |
This | 1 | 2 GCD + 4 MCD + 1 IOD |
GPU mega clusters | 12 graphics processing clusters (GPC) | 2×3 shader engines |
GPU super clusters | 72 Texture Processing Clusters (TPC) | 2×30 RDNA Workgroups (WGP) |
GPU clusters | 144 streaming multiprocessors (SM) | 120 Compute Units (CU) |
FP32 cores | 18,432 CUDAs | 15,360 stream processors |
GPU clock | ~ 2.7GHz | ~ 3.0GHz |
FP32 performance | ~ 100 TFLOPs | ~ 92 TFLOPs |
memory size | 24 GiB | ? |
memory speed | 21Gb/s | ~ 18Gbps |
storage type | GDDR6X | GDDR6 |
memory bus | 21 Gb/s, 384 bits | ~18Gbps, 256bits |
caches | 96MiB (L2 cache) | 256/512MiB Infinity Cache |
power consumption | 600 watts | ? |
performance | ~ Q3/2022 | ~ Q3/2022 |
sales launch | ~Q4/2022 | ~Q4/2022 |
And even if Intel is keeping a low profile at the moment, the successor to Alchemist called Battlemage is quickly expected. He should then get involved quite quickly and also a little further up than just in the middle field, as is expected for the Arc A770/750 and A580/380. In any case, it’s not unwise for Intel to drive inconspicuously at first in order to drive out possible driver disputes and teething problems. That saves you a fiasco like that of the Volari.
Sources: Twitter (@kopite7kimi, @Greymon55), Wccftech, Videocardz
The post Allegedly with 100 TFLOPS FP32 performance appeared first on Gamingsym.