GDC 2014: "Approaching Zero Driver Overhead in OpenGL (Presented by NVIDIA)" by Cass Everitt, Tim Foley, John McDonald, Graham Sellers of NVIDIA, Intel, NVIDIA, AMD https://gdcvault.com/play/1020791/Approaching-Zero-Driver-Overhead-in

GDC Presentation Reviews

These were the extensions presented:

- GL_ARB_buffer_storage
- GL_ARB_map_buffer_range
- GL_ARB_draw_indirect
- GL_ARB_multi_draw_indirect
- GL_ARB_indirect_parameters
- GL_ARB_shader_draw_parameters
- GL_ARB_bindless_texture
- GL_ARB_sparse_texture
- GL_ARB_shader_atomic_counters

... and any subset of these may be present on any of your customers' machines. How the hell is any developer supposed to do anything sensible in this situation???

2/5

GDC Presentation Reviews

So they talked about, like, if you have persistently mapped buffers, you can just write data into them without making any API calls (provided you synchronize correctly), and if you have bindless textures, you don't have to rebind anything between draw calls, and if you have sparse textures you don't even need multiple textures in the first place and can instead just have a single humongous texture with only part of it mapped to physical memory...

So like that's cool and stuff, but like...

3/5

GDC Presentation Reviews

... what if *some* of those are present but not others? Are you expected to totally rewrite your rendering algorithm? How are you supposed to know whether or not most of your potential customers have which extensions? Can you rely on *anything*???

They presented some microbenchmarks where they showed that things go faster if you have these extensions, which is cool, but my head was basically spinning by what the presenters seem to be implicitly asking every developer to do in their engine

4/5

GDC Presentation Reviews

Review: 5/10 I just can't rate it higher. 11-year-ago me would have rated this way higher, but 2025 me just thinks that this is kind of a clusterfuck

Tom Forsyth

@GDCPresoReviews ...and everybody agreed. Which is why I think a year later (?) we all assembled in a room at Valve and said "who's got some bright ideas?" and AMD said "well Mantle is a nice start" and then somebody search-and-replaced the name "Mantle" to "Vulkan" (see? Coz everything is a stupid pun) and shipped it!

You think I am kidding. But I am kidding a lot less than you think I am.

GDC Presentation Reviews

@TomF

I don’t think you are kidding ️

GDC Presentation Reviews

@TomF

I do think it is interesting, though, that Vulkan is in the same general situation now… the OpenGL registry lists 337 ARB or KHR or EXT extensions across the past 33 years, and the Vulkan registry lists 273 KHR or EXT extensions across the past 9 years.

I don’t think it’s a coincidence; I think it’s actually a result of the work flow that Khronos uses.

Josh Simmons

@GDCPresoReviews @TomF I think it's also just an issue of such diverging hardware capabilities, combined with the desire for low level access. It's a lot less bad if you filter for the ones appropriate for a given device class. At least in CPU land Intel and AMD mostly straight up copy each others ISA extensions.

Josh Simmons

@GDCPresoReviews @TomF like extension bloat is certainly annoying, but mostly you never even consider using most of them. Otoh things like push constants combined with buffer device address, dynamic rendering, etc are *hugely* simplifying (and core vulkan!) but if you need to support mobile it's not happening. Imo extensions have the scary numbers, but the variety of ways to implement even the most basic things like "how to send data to shader" is the true pain of learning and using the api.

GDC Presentation Reviews

@dotstdy @TomF

Right, I think that’s why I had such a strong reaction to this AZDO talk in particular. The techniques they were describing weren’t iterative improvements on top of what you already have. They were describing fundamentally changing how your algorithms are structured. And they described a few different ways to do it, and indicated that you should use whichever is supported by the user’s device at runtime. Which just seems totally untenable to me.

Tom Forsyth

@dotstdy @GDCPresoReviews Heh - eventually. And usually the second one to get there does it better because they have a bit of hindsight. AMD has the best implementation of AVX512 so far

GDC Presentation Reviews

@dotstdy @TomF

I’ve heard others make the point I’m trying to make like this: if you use this dialect of the technology, would it recognize itself in the mirror?

Meaning, like, does the AZDO dialect of OpenGL look like OpenGL? Does it smell like OpenGL?

For comparison: you can write C++ style code but spell it in Lisp, and you can write Lisp style code but spell it in C++, and both of those are poor uses of programming languages

Anders Lindqvist

@TomF @dotstdy @GDCPresoReviews Can’t intel have hindsight now and redo it? Spec is the same I guess. Sunk cost stopping them?

Tom Forsyth

@GDCPresoReviews @dotstdy Totally true, but even by the time of AZDO, it's a totally legit question to ask "what does OpenGL look like anyway?" It was an old old API, and had gone through so many revisions. It doesn't help that most extensions, even the official ones, were written as deltas on previous docs!

Tom Forsyth

@breakin @dotstdy @GDCPresoReviews They did! That's what AVX10 is. And I mostly think it's a good idea. It sucks that you can't rely on 512-bit support, but that's a physical limit, and I'll just have to accept the designers' words that they can't make it work. But given that, AVX10 is very acceptable.

Josh Simmons

@TomF @GDCPresoReviews deltas are the most annoying part of khronos extensions, but at least these days for vulkan extensions they're publishing a "wtf is this" document as well, which goes a surprisingly long way towards improving things. e.g. https://github.com/KhronosGroup/Vulkan-Docs/blob/main/proposals/VK_KHR_shader_quad_control.adoc

Anders Lindqvist

@TomF @dotstdy @GDCPresoReviews Interesting! Almost seems as if one might have a different program running on the e-cores with this!

Edit: seems like this is no different from avx512.

Tom Forsyth

@breakin @dotstdy @GDCPresoReviews AVX10 just means a core can support "AVX256" (i.e. AVX512 features but half the width) without supporting the full 512 bits, which is difficult for the small cores. So that's a good thing.

Jari Komppa 🇫🇮

@TomF @breakin @dotstdy @GDCPresoReviews IMO intel should just support avx512 even in e-cores, even if it means being really, really slow. It's more important that a feature works than that it's fast. But that's just my take.

Josh Simmons

@sol_hsa @TomF @breakin @GDCPresoReviews this is what they're doing in 10.2, it's just maximally confusing because why not.

Forum Federato

GDC 2014: "Approaching Zero Driver Overhead in OpenGL (Presented by NVIDIA)" by Cass Everitt, Tim Foley, John McDonald, Graham Sellers of NVIDIA, Intel, NVIDIA, AMD https://gdcvault.com/play/1020791/Approaching-Zero-Driver-Overhead-in