TSMC emphasized the extensive use of EUV with this process. It’s worth pointing out that this is really TSMC’s first ‘main’ EUV-based process. TSMC N7 and N7P nodes are DUV-based. TSMC first production EUV process is the N7+ but that node is really an orphan – not compatible with the prior nodes and no clear migration path forward other than going back to this node.
On the other hand, N5 is designed as the main migration path from N7 for most customers. TSMC says that more than 10 EUV layers are used to replace at least 4 times more immersion layers at cut, contact, via and metal line steps. This is comparing their EUV-based N5 node to a hypothetical N5 node that utilizes multi-patterning.
N7P yields 7% performance increase or 10% reduced power consumption when compared to N7
AI-specific hardware has been a catalyst for this tremendous growth, but there are always bottlenecks that must be addressed. A poll of the audience participants found that memory bandwidth was their #1 area for needed focus. Steve and Bill agreed and explored how HBM2E and GDDR6 memory could help advance AI/ML to the next level.
Steve discussed how HBM2E provides unsurpassed bandwidth and capacity, in a very compact footprint, that is a great fit for AI/ML training with deployments in heat- and space-constrained data centers. At the same time, the excellent performance and, built on time-tested manufacturing processes, make it an ideal choice for AI/ML inference which is increasingly implemented in powerful “IoT” devices such as ADAS in cars and trucks.
Question: As it pertains to the PC space, what GPU do you think it most closely aligns with?
It’s definitely within the realm of [Nvidia’s 10-series cards], ranging from the 1060 to 1080. It’s somewhere in that range, looking at the comparisons that we’ve done, but it’s hard to make an apples-to-apples comparison of exactly where it’s at as it really depends on the games and the CPU and things like that…it’s like current generation of Nvidia [hardware]
From World of tank developer on Xbox One X’s CPU and GPU balance
He replied that, “We’ve actually found the CPU and GPU improvements to complement each other quite well. Increasing the resolution from 1080p to 4K uses much of the additional power of the GPU but has basically no effect on the CPU.
“We’re looking for all of our studios to add a level of support for Xbox One X. We Tweeted out last night that we’re working right now to get Skyrim SE and Fallout 4 supported on the X,”
Hines said in a recent interview with Geoff Keighely. “We’re working right now to get both of those titles supported with higher resolution, True 4K, higher frame rates, etc. The games will take advantage of the hardware and Microsoft’s been grateful and Phil Spencer came out last year to tell us what they’re doing and walk us through a tech demo to let all of our guys get up to speed on what Xbox One X is capable of doing and how we want to embrace it and incorporate it into our games.”
On the subject of the Xbox One X’s horsepower, Stieglitz said Ark can run at the equivalent of “Medium” or “High” settings on PC. It can run at 1080p/60fps (Medium) or 1440p/30fps (High), and it sounds like developer Studio Wildcard may offer an option to switch between them.
As for the comparisons between the PC and Xbox One X, he said: “If you think about it, it’s kind of equivalent to a GTX 1070 maybe and the Xbox One X actually has 12GB of GDDR5 memory. It’s kind of like having a pretty high-end PC minus a lot of overhead due to the operating system on PC. So I would say it’s equivalent to a 16GB 1070 PC, and that’s a pretty good deal for $499″.
The old GCN’s Render Back End (RBE) cache. Page 13 of 18.
Once the pixels fragments in a tile have been shaded, they flow to the Render Back-Ends (RBEs). The RBEs apply depth, stencil and alpha tests to determine whether pixel fragments are visible in the final frame. The visible pixels fragments are then sampled for coverage and color to construct the final output pixels. The RBEs in GCN can access up to 8 color samples (i.e. 8x MSAA) from the 16KB color caches and 16 coverage samples (i.e. for up to 16x EQAA) from the 4KB depth caches per pixel. The color samples are blended using weights determined by the coverage samples to generate a final anti-aliased pixel color. The results are written out to the frame buffer, through the memory controllers
GCN version 1.0’s RBE cache size is just 20 KB. 8x RBE = 160 KB render cache for Radeon HD 7970.
Xbox One X GPU has 7 billion transistors which points to GPU design not being RX-480/RX-580.
R9-290X/R9-390X’s Compute Unit (CU)’s TMU path has 1 MB L2 cache before over-spilling to external memory which is then memory bandwidth bound. RBEs has tiny 384 KB cache before before over-spilling to external memory which is a known bottleneck and hence the reasons for compute shaders optimization push from AMD. This is one of many reasons that the older AMD GPUs can’t convert their GpGPU performance into graphics performance.
RX-480’s CU’s TMUs has 2 MB L2 cache before over-spilling to external memory i.e. being memory bandwidth bound. RBEs has tiny cache storage before before over-spilling to external memory which is known bottleneck and hence the reasons for compute shaders optimization push from AMD. This is one of many reasons that the older AMD GPUs can’t convert their GpGPU performance into graphics performance.
Comparisons with Xbox One X’s GPU
Xbox One X GPU’s CU’s TMU path has 2 MB L2 cache while RBE path has 2 MB render cache before over-spilling to external memory. This advantage could contributed to Xbox One X’s good performance for ForzaTech’s wet track with heavy alpha effects usage which rivaled NVIDIA’s GeForce GTX 1070 (1).
Larger cache means that the GPU doesn’t have to access larger, slower memory pools as much, which primarily reduces the load on the VRAM subsystem (increasing available VRAM for other tasks), whilst simultaneously accelerating rendering speed.
Comparisons with other GPUs
GTX 1060’s SM/TMUs and RBE/ROPS paths has 1.5 MB L2 cache before over-spilling to external memory i.e. being memory bandwidth bound. Both TMU and RBE/ROPS read/write paths has similar performance.
GTX 1070’s SM/TMUs and RBE/ROPS paths has 2 MB L2 cache before over-spilling to external memory i.e. being memory bandwidth bound. Both TMU and RBE/ROPS read/write paths has similar performance.
Shader Module (SM) is NVIDIA’s terminology for AMD’s Compute Unit (CU).