Allegedly with 100 TFLOPS FP32 performance

The Geforce RTX 4000 has been said to have 18,432 FP32 shaders as AD102 for some time. It is assumed that the Geforce RTX 4090 will be equipped with it and, according to the latest rumors, it should achieve a bit more than 100 TFLOPS grid performance and thus a bit more than the fastest Radeon RX 7000, which is said to have 92 TFLOPS. That would be around a factor of 2.8 compared to the Geforce RTX 3090, which doesn’t seem entirely unrealistic at first. So far, the rumors have made it clear that Nvidia will significantly increase the number of shaders on the flagship, but that the break with the next smaller model, the Geforce RTX 4080, will be all the bigger. According to the data that is floating around, it should then reach the level of an RTX 3090.

So that would mean Nvidia shifting performance up a notch and price point. In the Ada generation, the Geforce RTX 4080 would give you about the same performance as the Geforce RTX 3090. If you want more, you have to go for the (temporary) flagship. On paper, it offers a lot more. However, it remains to be seen to what extent this can then be converted into pure performance and how efficiently this will take place. In any case, 600 watts TGP seems to be set at the moment, for OC models and a possible Ti it could go beyond that – a PCB with 900 watts has already been discussed.

chip	SM	GPC	TPC	shaders	cache	memory bus	Storage
AD102	144	12	72	18,432	96MiB	384 bits	24 GiB
AD103	84	7	48	10,752	64MiB	256 bits	16 GiB
AD104	60	5	30	7,680	48MiB	192 bits	12 GiB
AD106	36	3	18	4,608	32MiB	128 bits	8 GiB
AD107	24	3	12	3,072	32MiB	128 bits	8 GiB

What was left out in the whole race for the fastest graphics card are the performances apart from the raster performance. The topic of ray tracing in particular should be exciting, because the goal must be to enable good results even on simpler maps. The flagship battle is ultimately nice, but unaffordable for most players and also a bit unreasonable. Nvidia also assumes that AD102 is not a full version – which is obvious. On the one hand, you leave room for a Ti or Titan. On the other hand, you are on the safer side with the monolithic design in terms of yield, while AMD probably uses the multi-chip design and is therefore positioned a little differently.

Geforce RTX 4000 vs. Radeon RX 7000: Nvidia with advantage in manufacturing?

The next generation will be particularly exciting because the designs differ a little more than usual. The question is also whether AMD really wants to offer an enthusiast card, or whether Nvidia will leave the throne for the time being and orientate itself a step below.

	AD102	Navi 31
nodes	TSMC N4P	TSMC N5 & N6
architecture	Ada (Lovelace)	AMD RDNA3
GPU package	monolithic	Multi-Chip Modules (MCM)
GPU size	~ 600mm²	~ 800mm²
This	1	2 GCD + 4 MCD + 1 IOD
GPU mega clusters	12 graphics processing clusters (GPC)	2×3 shader engines
GPU super clusters	72 Texture Processing Clusters (TPC)	2×30 RDNA Workgroups (WGP)
GPU clusters	144 streaming multiprocessors (SM)	120 Compute Units (CU)
FP32 cores	18,432 CUDAs	15,360 stream processors
GPU clock	~ 2.7GHz	~ 3.0GHz
FP32 performance	~ 100 TFLOPs	~ 92 TFLOPs
memory size	24 GiB	?
memory speed	21Gb/s	~ 18Gbps
storage type	GDDR6X	GDDR6
memory bus	21 Gb/s, 384 bits	~18Gbps, 256bits
caches	96MiB (L2 cache)	256/512MiB Infinity Cache
power consumption	600 watts	?
performance	~ Q3/2022	~ Q3/2022
sales launch	~Q4/2022	~Q4/2022

And even if Intel is keeping a low profile at the moment, the successor to Alchemist called Battlemage is quickly expected. He should then get involved quite quickly and also a little further up than just in the middle field, as is expected for the Arc A770/750 and A580/380. In any case, it’s not unwise for Intel to drive inconspicuously at first in order to drive out possible driver disputes and teething problems. That saves you a fiasco like that of the Volari.

Sources: Twitter (@kopite7kimi, @Greymon55), Wccftech, Videocardz

The post Allegedly with 100 TFLOPS FP32 performance appeared first on Gamingsym.

Gadget Gate

Allegedly with 100 TFLOPS FP32 performance

Archives