NV Command List Sample

Category: Performance

Min PC GPU: GeForce Fermi-class

Min Tegra Device: Tegra K1

@ GitHub: NV Command List Sample Source Code

Description

This sample demonstrates the use of the NV_command_list extension. In this sample the NV_command_list is used to render a basic scene. Texturing is performed via ARB_bindless_texture.

APIs Used

OpenGL 4.4
GL_NV_command_list
GL_ARB_bindless_texture
GL_NV_uniform_buffer_unified_memory
GL_ARB_shading_language_include
GL_NV_shadow_samplers_cube

Shared User Interface

The Graphics samples all share a common app framework and certain user interface elements, centered around the "Tweakbar" panel on the left side of the screen which lets you interactively control certain variables in each sample.

To show and hide the Tweakbar, simply click or touch the triangular button positioned in the top-left of the view.

Other controls are listed below.

Device	Input	Result
touch	1-Finger Drag	Orbit-rotate the camera
	2-Finger Drag	Move up/down/left/right
	2-Finger Pinch	Scale the view
mouse	Left-Button Drag	Orbit-rotate the camera
	Right-Button Drag	Move up/down/left/right
	Middle-Click Drag	Scale the view (up:out, down:in)
keyboard	Escape	Quit the application
	Tab	Toggle TweakBar visibility
gamepad	Start	Toggle TweakBar visibility
	Right ThumbStick	Orbit-rotate the camera
	Left ThumbStick	Move forward/backward, Slide left/right
	Left/Right Triggers	Move up/down
	A	Show TweakBar, Toggle Focused Item
	B	Close Focused UI, Hide TweakBar
	DPAD Up/Down	Move Focus to Prev/Next Item
	DPAD Left/Right	Decrease/Increase Focused Item

Technical Details

The Nv_command_list extension is built around bindless GPU pointers/handles which allow rendering scenes with hundreds of thousands of draw calls at extremely low CPU time:

Tokenized Rendering :
- Commands are encoded into binary data ( tokens ) instead of issuing classic gl calls. This allows the driver of the GPU to efficiently iterate over a stream of many commands in single or multiple sequences : glDrawCommandsStatesNV(tokenBuffer, offsets[], sizes[], states[], fbos[], count)
- The tokens are stored in regular OpenGL buffers and can be re - used across frames or manipulated by the GPU itself
- In addition to draw calls, the tokens cover the most frequent state changes ( VBO/IBO/UBO ) and a few basic scalar changes ( blend color, polygon offset, stencil ref, etc. )
- As tokens are only reference data ( for example UBO ), their content is free to change. You can change vertex positions or matrices freely

The tokens are tightly-packed structs and most common tokens are 16 bytes each. Below you will find the token definition to update a UBO binding.


typedef struct
{
    GLuint header;  // glGetCommandHeader(GL_UNIFORM_ADDRESS_COMMAND_NV)
    GLushort index; // in glsl: layout(binding=INDEX, commandBindableNV) uniform...
    GLushort stage;  // glGetStageIndexNV(GL_VERTEX_SHADER)
    GLuint64 address; // glGetNamedBufferParameterui64vNV(buffer, GL_BUFFER_GPU_ADDRESS, 
                                                         // &address);
} UniformAddressCommandNV;

State Objects
- Costly validation in the driver can often happen as late as at draw call time or at other unexpected times, potentially causing unstable framerates. Monolithic state-objects (common in other new graphics APIs) allow us to pre-validate the core rendering state (FBO, program, blending states, etc.) and reuse it
- Full control over when validation happens via glCaptureState (stateObject, primitiveBaseMode) and use of the current GL state's setup
- Very efficient state switching between different State Objects
Pre-compiled Command List Object
- State Objects and client-side tokens can be pre-compiled into a special object
- Allows further driver optimization (faster State Object transitions) at the loss of flexibility (changing State Objects requires rebuilding command list object)

Sample Highlights

Depending on the availability of the extension, the sample allows switching between a standard OpenGL, token-buffer or commandlist-object modes to render the scene. Inside basic-nvcommandlist.cpp you will find the functions:

Sample::drawStandard()
- The standard OpenGL approach allows rendering the scene via the standard glDrawElements function for each object on the scene
Sample::drawTokenBuffer()
- The token buffer approach allows rendering the scene using list of tokens (binary data) via the glDrawCommandsStatesNV function
Sample::drawTokenList()
- The token list approach allows rendering the scene using pre-compiled command list via glCallComandListNV
Sample::drawTokenEmulation()
- The emulation layer allows us to roughly get an idea how the glDrawCommands* and glStateCapture work internally. Emulation may also be useful as a permanent compatibility layer for driver/hardware combinations which do not run the extension natively

Performance

The sample renders 1024 objects. Each object has a sphere or box VBO/IBO pair and references a range within a big UBO that stores per -- object data like matrix, color and texture. Half of objects on the scene use the geometry shader to transform primitives. Here are some preliminary example results for Timer Draw on a win 7 -- 64, i7 -- 860, Quadro K5000 system

Draw mode	GPU time	CPU time ( microseconds )
standard	850	1750
nvcmdlist emulated	830	1500
nvcmdlist buffer	775	30
nvcmdlist list	775	<1

One can see that by classic API usage the scene is CPU bound as more time is spent here than on the graphics card.

The gained performance in emulation approach comes from using bindless VBO and UBO.

The token-buffer technique is slightly slower on CPU than the pre-compiled list because the 500 State Objects (each half of scene's objects) still need to be checked every frame. The nvcmdlist techniques essentially only require a single dispatch.

The closest other way to get to this command would be by using MultiDrawIndirect and vertex divisor indexing, but it makes shaders more complex by adding an indirection parameter.