Managing Meshes and Vertex Attributes
In ancient days when rendering in immediate mode, this was not an issue. You'd simply design your rendering loops to just provide one normal per face instead of per vertex. But with modern OpenGL this is a no-go if you want to see at least some performance.
The way to go with modern OpenGL is to cache as many data in objects residing
in GPU memory as possible. Typically, vertex buffer objects (VBOs) are used
to keep all kinds of vertex attributes, like coords, normals, colors, in GPU
memory. Index buffer objects (IBOs) allow to do the same for index data,
referring to the arrays of attribute vectors.
Handling multiple indices
Calculating Per-Vertex Normals - Crease Angle
Meshes exported from CAD applications often tend to span several "surfaces".
If vertex normals are calculated by just averaging face normals of all faces
the vertex is part of, the result is poor. The images on the right show the Bismarck
battleship dataset with just one averaged normal per vertex (top) and multiple
normals when the angle between two adjacent polygons is larger than 30 degrees
(bottom). Click on images to enlarge.
Thus, most packages allow to define
a crease angle that defines the maximum angle between two face normals
for them to interpolate at a given vertex. If the angle between the face
normals is larger, then a crease is assumed to be in between the to faces and
multiple normals are created for the shared vertices.
foreach face F of mesh:
foreach vertex V of F:
V.normal = (0,0,0)
foreach adjacent face Fa of V:
if angle(Fa.normal, F.normal) < creaseAngle:
V.normal += Fa.normal * Fa.area
else
; // ignore face normal
normalise V.normal
The vertex normals are kept as an array, a normal index for each vertex
references it's normal. The above loops produce as many normals for a vertex as
there are faces the vertex is part of, and usually many of them are identical.
So the next loop reworks the normal index and eliminates duplicate normals to
reduce the memory footprint of any csgIndexedShape node.
PerformanceIndexing vertex data may deliver great performance, because graphics drivers or even the hardware has a transformation cache, maintaining several already transformed vertices. However, these caches are small (something around 10-20 vertices). Thus, the application needs to be careful about when to send which index in order to take advantage of a vertex being found in the cache.Having the transformation cache in mind, the following recent benchmark result with tinySG came as a surprise: Unrolling the geometry of the Bismarck dataset increased the render performance by up to 4x for synthetic benchmark datasets, compared to indexed rendering using VBOs/IBOs. The calls changed from glMultiDrawElements() to glMultiDrawArrays(). Both AMD/Ati and nVidia hardware benefit from the change. The table above shows a performance comparison done on a 2.8GHz Core i7-860 with an AMD FirePro W8000 running Windows 7. For large meshes (terrain, dragon), performance gains by unrolling indices are huge. Real datasets (axle, mountaineer) still show a significant improvement, although other factors influence performance here as well, like 500-900 material state changes, traversal of 1500-4000 scenegraph nodes and binding of well over 1000 VBOs each frame.
Keep rendering,
Copyright by Christian Marten, 2013 Last change: 15.11.2013 |