GDC 2014: "Approaching Zero Driver Overhead in OpenGL (Presented by NVIDIA)" by Cass Everitt, Tim Foley, John McDonald, Graham Sellers of NVIDIA, Intel, NVIDIA, AMD https://gdcvault.com/play/1020791/Approaching-Zero-Driver-Overhead-in
-
GDC 2014: "Approaching Zero Driver Overhead in OpenGL (Presented by NVIDIA)" by Cass Everitt, Tim Foley, John McDonald, Graham Sellers of NVIDIA, Intel, NVIDIA, AMD https://gdcvault.com/play/1020791/Approaching-Zero-Driver-Overhead-in
This is the famous AZDO talk!!! This was cool because it was given by all of the major GPU vendors.
I remember watching this back in 2014, and thinking to myself "wow, OpenGL can do so much!!!" but now that I watch it again in 2025, my main reaction now is actually horror at extension hell.
1/5
These were the extensions presented:
- GL_ARB_buffer_storage
- GL_ARB_map_buffer_range
- GL_ARB_draw_indirect
- GL_ARB_multi_draw_indirect
- GL_ARB_indirect_parameters
- GL_ARB_shader_draw_parameters
- GL_ARB_bindless_texture
- GL_ARB_sparse_texture
- GL_ARB_shader_atomic_counters... and any subset of these may be present on any of your customers' machines. How the hell is any developer supposed to do anything sensible in this situation???
2/5
-
These were the extensions presented:
- GL_ARB_buffer_storage
- GL_ARB_map_buffer_range
- GL_ARB_draw_indirect
- GL_ARB_multi_draw_indirect
- GL_ARB_indirect_parameters
- GL_ARB_shader_draw_parameters
- GL_ARB_bindless_texture
- GL_ARB_sparse_texture
- GL_ARB_shader_atomic_counters... and any subset of these may be present on any of your customers' machines. How the hell is any developer supposed to do anything sensible in this situation???
2/5
So they talked about, like, if you have persistently mapped buffers, you can just write data into them without making any API calls (provided you synchronize correctly), and if you have bindless textures, you don't have to rebind anything between draw calls, and if you have sparse textures you don't even need multiple textures in the first place and can instead just have a single humongous texture with only part of it mapped to physical memory...
So like that's cool and stuff, but like...
3/5
-
So they talked about, like, if you have persistently mapped buffers, you can just write data into them without making any API calls (provided you synchronize correctly), and if you have bindless textures, you don't have to rebind anything between draw calls, and if you have sparse textures you don't even need multiple textures in the first place and can instead just have a single humongous texture with only part of it mapped to physical memory...
So like that's cool and stuff, but like...
3/5
... what if *some* of those are present but not others? Are you expected to totally rewrite your rendering algorithm? How are you supposed to know whether or not most of your potential customers have which extensions? Can you rely on *anything*???
They presented some microbenchmarks where they showed that things go faster if you have these extensions, which is cool, but my head was basically spinning by what the presenters seem to be implicitly asking every developer to do in their engine
4/5
-
... what if *some* of those are present but not others? Are you expected to totally rewrite your rendering algorithm? How are you supposed to know whether or not most of your potential customers have which extensions? Can you rely on *anything*???
They presented some microbenchmarks where they showed that things go faster if you have these extensions, which is cool, but my head was basically spinning by what the presenters seem to be implicitly asking every developer to do in their engine
4/5
Review: 5/10 I just can't rate it higher. 11-year-ago me would have rated this way higher, but 2025 me just thinks that this is kind of a clusterfuck
-
Review: 5/10 I just can't rate it higher. 11-year-ago me would have rated this way higher, but 2025 me just thinks that this is kind of a clusterfuck
@GDCPresoReviews ...and everybody agreed. Which is why I think a year later (?) we all assembled in a room at Valve and said "who's got some bright ideas?" and AMD said "well Mantle is a nice start" and then somebody search-and-replaced the name "Mantle" to "Vulkan" (see? Coz everything is a stupid pun) and shipped it!
You think I am kidding. But I am kidding a lot less than you think I am.
-
@GDCPresoReviews ...and everybody agreed. Which is why I think a year later (?) we all assembled in a room at Valve and said "who's got some bright ideas?" and AMD said "well Mantle is a nice start" and then somebody search-and-replaced the name "Mantle" to "Vulkan" (see? Coz everything is a stupid pun) and shipped it!
You think I am kidding. But I am kidding a lot less than you think I am.
I don’t think you are kidding
️
-
I don’t think you are kidding
️
I do think it is interesting, though, that Vulkan is in the same general situation now… the OpenGL registry lists 337 ARB or KHR or EXT extensions across the past 33 years, and the Vulkan registry lists 273 KHR or EXT extensions across the past 9 years.
I don’t think it’s a coincidence; I think it’s actually a result of the work flow that Khronos uses.
-
I do think it is interesting, though, that Vulkan is in the same general situation now… the OpenGL registry lists 337 ARB or KHR or EXT extensions across the past 33 years, and the Vulkan registry lists 273 KHR or EXT extensions across the past 9 years.
I don’t think it’s a coincidence; I think it’s actually a result of the work flow that Khronos uses.
@GDCPresoReviews @TomF I think it's also just an issue of such diverging hardware capabilities, combined with the desire for low level access. It's a lot less bad if you filter for the ones appropriate for a given device class. At least in CPU land Intel and AMD mostly straight up copy each others ISA extensions.
-
@GDCPresoReviews @TomF I think it's also just an issue of such diverging hardware capabilities, combined with the desire for low level access. It's a lot less bad if you filter for the ones appropriate for a given device class. At least in CPU land Intel and AMD mostly straight up copy each others ISA extensions.
@GDCPresoReviews @TomF like extension bloat is certainly annoying, but mostly you never even consider using most of them. Otoh things like push constants combined with buffer device address, dynamic rendering, etc are *hugely* simplifying (and core vulkan!) but if you need to support mobile it's not happening. Imo extensions have the scary numbers, but the variety of ways to implement even the most basic things like "how to send data to shader" is the true pain of learning and using the api.
-
@GDCPresoReviews @TomF like extension bloat is certainly annoying, but mostly you never even consider using most of them. Otoh things like push constants combined with buffer device address, dynamic rendering, etc are *hugely* simplifying (and core vulkan!) but if you need to support mobile it's not happening. Imo extensions have the scary numbers, but the variety of ways to implement even the most basic things like "how to send data to shader" is the true pain of learning and using the api.
Right, I think that’s why I had such a strong reaction to this AZDO talk in particular. The techniques they were describing weren’t iterative improvements on top of what you already have. They were describing fundamentally changing how your algorithms are structured. And they described a few different ways to do it, and indicated that you should use whichever is supported by the user’s device at runtime. Which just seems totally untenable to me.
-
@GDCPresoReviews @TomF I think it's also just an issue of such diverging hardware capabilities, combined with the desire for low level access. It's a lot less bad if you filter for the ones appropriate for a given device class. At least in CPU land Intel and AMD mostly straight up copy each others ISA extensions.
@dotstdy @GDCPresoReviews Heh - eventually. And usually the second one to get there does it better because they have a bit of hindsight. AMD has the best implementation of AVX512 so far
-
Right, I think that’s why I had such a strong reaction to this AZDO talk in particular. The techniques they were describing weren’t iterative improvements on top of what you already have. They were describing fundamentally changing how your algorithms are structured. And they described a few different ways to do it, and indicated that you should use whichever is supported by the user’s device at runtime. Which just seems totally untenable to me.
I’ve heard others make the point I’m trying to make like this: if you use this dialect of the technology, would it recognize itself in the mirror?
Meaning, like, does the AZDO dialect of OpenGL look like OpenGL? Does it smell like OpenGL?
For comparison: you can write C++ style code but spell it in Lisp, and you can write Lisp style code but spell it in C++, and both of those are poor uses of programming languages
-
@dotstdy @GDCPresoReviews Heh - eventually. And usually the second one to get there does it better because they have a bit of hindsight. AMD has the best implementation of AVX512 so far
@TomF @dotstdy @GDCPresoReviews Can’t intel have hindsight now and redo it? Spec is the same I guess. Sunk cost stopping them?
-
I’ve heard others make the point I’m trying to make like this: if you use this dialect of the technology, would it recognize itself in the mirror?
Meaning, like, does the AZDO dialect of OpenGL look like OpenGL? Does it smell like OpenGL?
For comparison: you can write C++ style code but spell it in Lisp, and you can write Lisp style code but spell it in C++, and both of those are poor uses of programming languages
@GDCPresoReviews @dotstdy Totally true, but even by the time of AZDO, it's a totally legit question to ask "what does OpenGL look like anyway?" It was an old old API, and had gone through so many revisions. It doesn't help that most extensions, even the official ones, were written as deltas on previous docs!
-
@TomF @dotstdy @GDCPresoReviews Can’t intel have hindsight now and redo it? Spec is the same I guess. Sunk cost stopping them?
@breakin @dotstdy @GDCPresoReviews They did! That's what AVX10 is. And I mostly think it's a good idea. It sucks that you can't rely on 512-bit support, but that's a physical limit, and I'll just have to accept the designers' words that they can't make it work. But given that, AVX10 is very acceptable.
-
@GDCPresoReviews @dotstdy Totally true, but even by the time of AZDO, it's a totally legit question to ask "what does OpenGL look like anyway?" It was an old old API, and had gone through so many revisions. It doesn't help that most extensions, even the official ones, were written as deltas on previous docs!
@TomF @GDCPresoReviews deltas are the most annoying part of khronos extensions, but at least these days for vulkan extensions they're publishing a "wtf is this" document as well, which goes a surprisingly long way towards improving things. e.g. https://github.com/KhronosGroup/Vulkan-Docs/blob/main/proposals/VK_KHR_shader_quad_control.adoc
-
undefined Oblomov ha condiviso questa discussione
-
@breakin @dotstdy @GDCPresoReviews They did! That's what AVX10 is. And I mostly think it's a good idea. It sucks that you can't rely on 512-bit support, but that's a physical limit, and I'll just have to accept the designers' words that they can't make it work. But given that, AVX10 is very acceptable.
@TomF @dotstdy @GDCPresoReviews Interesting! Almost seems as if one might have a different program running on the e-cores with this!
Edit: seems like this is no different from avx512.
-
@TomF @dotstdy @GDCPresoReviews Interesting! Almost seems as if one might have a different program running on the e-cores with this!
Edit: seems like this is no different from avx512.
@breakin @dotstdy @GDCPresoReviews AVX10 just means a core can support "AVX256" (i.e. AVX512 features but half the width) without supporting the full 512 bits, which is difficult for the small cores. So that's a good thing.
-
@breakin @dotstdy @GDCPresoReviews AVX10 just means a core can support "AVX256" (i.e. AVX512 features but half the width) without supporting the full 512 bits, which is difficult for the small cores. So that's a good thing.
@TomF @breakin @dotstdy @GDCPresoReviews IMO intel should just support avx512 even in e-cores, even if it means being really, really slow. It's more important that a feature works than that it's fast. But that's just my take.
-
@TomF @breakin @dotstdy @GDCPresoReviews IMO intel should just support avx512 even in e-cores, even if it means being really, really slow. It's more important that a feature works than that it's fast. But that's just my take.
@sol_hsa @TomF @breakin @GDCPresoReviews this is what they're doing in 10.2, it's just maximally confusing because why not.