Feedback Particles Sample
Description
The Feedback Particles sample shows how normal vertex shaders can be used to animate particles and write the results back into vertex buffer objects via Transform Feedback, for use in subsequent frames. This is another way of implementing GPU-only particle animations. The sample also uses Geometry Shaders to generate custom particles from single points and also to kill the dead ones.
APIs Used
- GL_EXT_transform_feedback
- glBindTransformFeedback
- glDrawTransformFeedback
- GL_EXT_geometry_shader4
Shared User Interface
The Graphics samples all share a common app framework and certain user interface elements, centered around the "Tweakbar" panel on the left side of the screen which lets you interactively control certain variables in each sample.
To show and hide the Tweakbar, simply click or touch the triangular button positioned in the top-left of the view.
Technical Details
The Feedback Particles sample shows how normal vertex shaders can be used to animate particles and write the results back into vertex buffer objects via Transform Feedback, for use in subsequent frames. This is another way of implementing GPU-only particle animations. The sample also uses Geometry Shaders to generate custom particles from single points and to delete expired ones.
Naive Implementation
The first idea that comes to mind when it comes to GPU particle system using transform feedback is the idea of creating a geometry shader which will handle logic and lifecycle of the entire particle system or even a number of particle systems. With a single GPU program a single draw call can emit, move and delete particles. This approach could have simple shader logic like the following:
Single Pass (input from the previous frame transform feedback)
- Process TTL (Time to Live): if expired -- exit the shader;
- Read particle type;
- If it is an emitter:
- If it is time to emit - emit new particles to the output; Reset the time to emit counter;
- Process emitter data and push it to the output;
- If it is an ordinary particle:
- Process particle data and push it to the output.
Running a simulation is straightforward:
//Turn rendering OFF glEnable(GL_RASTERIZER_DISCARD_EXT); glUseProgram(m_simulationProgram); { //Output particles to Current feedback object glBindTransformFeedback(GL_TRANSFORM_FEEDBACK, m_Current); glBeginTransformFeedback(GL_POINTS); //If not first frame, run GPU pass with input from feedback object Previous //If is first frame, run GPU pass with input from emitter VBO if (!m_isFirstHit) glDrawTransformFeedback(GL_POINTS, m_Previous); else { glDrawArrays(GL_POINTS, 0, m_emitterCount); m_isFirstHit = false; } glEndTransformFeedback(); } //Turn rendering ON glDisable(GL_RASTERIZER_DISCARD_EXT); ... //Render particles from feedback object Current glDrawTransformFeedback(GL_POINTS, m_Current); //Swap feedback objects IDs swap(m_Current, m_Previous);
However, although it may sound great and could be an interesting programming challenge, this approach is not that great performance-wise. The output of the geometry shader can be placed into the fast on-chip memory, which is usually quite limited in size. Because the GPU runs threads in parallel, this limitation may reduce the number of simultaneously running geometry shader threads and so reduce the overall performance of the particle system.
An Optimized Approach
A basic idea is to somehow workaround a possible GPU under-utilization introduced by the limited fast memory space. To do this, the particle system GPU program from the naive approach has to be split into two passes: a particle emission pass and a particle processing and deletion pass. Both passes stream data out to the same transform feedback buffer which will be later used during rendering and also on the next frame as input for the second pass.
However, there is an obstacle in the way: the GPU program can't be changed during transform feedback. To work around this obstacle we have to introduce a third pass: a generated particles copy pass. Obviously, the particle emission pass has shared memory limitations; but it will be issued only on emitters, which are usually small in number. The second pass GPU program will only work with one particle at a time and is free from the described limitation, unless particle system uses a very fat particle structure. It is also best to have emitters at the CPU side, as they can be controlled more comfortably. The simplest shader logic then for this approach would be:
First pass (input from CPU memory):
- [GS] If it is time to emit - emit new particles to the output;
Second pass (input from the previous PASS transform feedback):
- [VS] Process particle;
- [GS] If TTL is not expired push particle to the output;
Third pass (input from the previous FRAME transform feedback):
- [VS] Process particle;
- [GS] If TTL is not expired, push particle to the output;
So, pulling it all together, the new algorithm would be:
//Turn rendering OFF; glEnable(GL_RASTERIZER_DISCARD_EXT); //Run first GPU pass; glUseProgram(m_emitProgram); { //emit particles to EmitterFeedback transform feedback object; glBindTransformFeedback(GL_TRANSFORM_FEEDBACK, m_EmitterFeedback); glBeginTransformFeedback(GL_POINTS); glDrawArrays(GL_POINTS, 0, m_EmitterCount); glEndTransformFeedback(); } //Run second and third GPU passes; glUseProgram(m_processProgram); { //Output particles to Current glBindTransformFeedback(GL_TRANSFORM_FEEDBACK, m_Current); //Run second GPU pass with input from feedback object EmitterFeedback glDrawTransformFeedback(GL_POINTS, m_EmitterFeedback); //If not the first frame, then run third GPU pass with //the input from feedback object Previous if (!m_isFirstHit) glDrawTransformFeedback(GL_POINTS, m_Previous); glEndTransformFeedback(); m_isFirstHit = false; } //Turn rendering ON; glDisable(GL_RASTERIZER_DISCARD_EXT); ... //Render particles from feedback object Current glDrawTransformFeedback(GL_POINTS, m_Current); //Swap feedback object IDs swap(m_Current, m_Previous);