NV Command List Sample
Description
This sample demonstrates the use of the NV_command_list extension. In this sample the NV_command_list is used to render a basic scene. Texturing is performed via ARB_bindless_texture.
APIs Used
- OpenGL 4.4
- GL_NV_command_list
- GL_ARB_bindless_texture
- GL_NV_uniform_buffer_unified_memory
- GL_ARB_shading_language_include
- GL_NV_shadow_samplers_cube
Shared User Interface
The Graphics samples all share a common app framework and certain user interface elements, centered around the "Tweakbar" panel on the left side of the screen which lets you interactively control certain variables in each sample.
To show and hide the Tweakbar, simply click or touch the triangular button positioned in the top-left of the view.
Technical Details
The Nv_command_list extension
is built around bindless GPU pointers/handles which allow rendering scenes with hundreds of thousands of draw calls at extremely low CPU time:
- Tokenized Rendering :
- Commands are encoded into binary data ( tokens ) instead of issuing classic gl calls. This allows the driver of the GPU to efficiently iterate over a stream of many commands in single or multiple sequences :
glDrawCommandsStatesNV(tokenBuffer, offsets[], sizes[], states[], fbos[], count)
- The tokens are stored in regular OpenGL buffers and can be re - used across frames or manipulated by the GPU itself
- In addition to draw calls, the tokens cover the most frequent state changes ( VBO/IBO/UBO ) and a few basic scalar changes ( blend color, polygon offset, stencil ref, etc. )
- As tokens are only reference data ( for example UBO ), their content is free to change. You can change vertex positions or matrices freely
- Commands are encoded into binary data ( tokens ) instead of issuing classic gl calls. This allows the driver of the GPU to efficiently iterate over a stream of many commands in single or multiple sequences :
typedef struct
{
GLuint header; // glGetCommandHeader(GL_UNIFORM_ADDRESS_COMMAND_NV)
GLushort index; // in glsl: layout(binding=INDEX, commandBindableNV) uniform...
GLushort stage; // glGetStageIndexNV(GL_VERTEX_SHADER)
GLuint64 address; // glGetNamedBufferParameterui64vNV(buffer, GL_BUFFER_GPU_ADDRESS,
// &address);
} UniformAddressCommandNV;
-
State Objects
- Costly validation in the driver can often happen as late as at draw call time or at other unexpected times, potentially causing unstable framerates. Monolithic state-objects (common in other new graphics APIs) allow us to pre-validate the core rendering state (FBO, program, blending states, etc.) and reuse it
- Full control over when validation happens via
glCaptureState (stateObject, primitiveBaseMode)
and use of the current GL state's setup - Very efficient state switching between different State Objects
-
Pre-compiled Command List Object
- State Objects and client-side tokens can be pre-compiled into a special object
- Allows further driver optimization (faster State Object transitions) at the loss of flexibility (changing State Objects requires rebuilding command list object)
Sample Highlights
Depending on the availability of the extension, the sample allows switching between a standard OpenGL, token-buffer or commandlist-object modes to render the scene. Inside basic-nvcommandlist.cpp
you will find the functions:
-
Sample::drawStandard()
-
The standard OpenGL approach allows rendering the scene via the standard
glDrawElements
function for each object on the scene
-
The standard OpenGL approach allows rendering the scene via the standard
-
Sample::drawTokenBuffer()
-
The token buffer approach allows rendering the scene using list of tokens (binary data) via the
glDrawCommandsStatesNV
function
-
The token buffer approach allows rendering the scene using list of tokens (binary data) via the
-
Sample::drawTokenList()
-
The token list approach allows rendering the scene using pre-compiled command list via
glCallComandListNV
-
The token list approach allows rendering the scene using pre-compiled command list via
-
Sample::drawTokenEmulation()
-
The emulation layer allows us to roughly get an idea how the
glDrawCommands*
andglStateCapture
work internally. Emulation may also be useful as a permanent compatibility layer for driver/hardware combinations which do not run the extension natively
-
The emulation layer allows us to roughly get an idea how the
Performance
The sample renders 1024 objects. Each object has a sphere or box VBO/IBO pair and references a range within a big UBO that stores per -- object data like matrix, color and texture. Half of objects on the scene use the geometry shader to transform primitives. Here are some preliminary example results for Timer Draw on a win 7 -- 64, i7 -- 860, Quadro K5000 systemDraw mode | GPU time | CPU time ( microseconds ) |
---|---|---|
standard | 850 | 1750 |
nvcmdlist emulated | 830 | 1500 |
nvcmdlist buffer | 775 | 30 |
nvcmdlist list | 775 | <1 |
One can see that by classic API usage the scene is CPU bound as more time is spent here than on the graphics card.
The gained performance in emulation approach comes from using bindless VBO and UBO.
The token-buffer technique is slightly slower on CPU than the pre-compiled list because the 500 State Objects (each half of scene's objects) still need to be checked every frame. The nvcmdlist techniques essentially only require a single dispatch.
The closest other way to get to this command would be by using MultiDrawIndirect and vertex divisor indexing, but it makes shaders more complex by adding an indirection parameter.