Optimizations
Although the geometry's creation is very fast and we are rendering the mesh using only a small number of long triangle strips (usually about some hundred strips per frame), there are quite a lot of optimizations we can do to increase the performance on the side of the processor as well as the graphics card.
As described in the section "Materials" we use a multi-pass rendering approach to apply more than one material to the ground. In the common case most materials will be used only in small parts of the landscape and be invisible in most others. The alpha-channel of the material's lightmap defines where which material is visible. Of course it's a waste of GPU bandwidth to render materials on patches which don't use that material all together (where the material's alpha channel is completely zero in the corresponding patch's part).
It's easy to see, that if the part of a material's alpha-channel which covers one distinct patch is completely set to zero, then this patch does not need to be rendered with that material. Assuming that the materials' alpha channels won't change during runtime we can calculate for each patch which materials will be visible and which won't in a pre-processing step. Later at runtime, only those passes are rendered, which really contribute to the final image.
Another important optimization is to reduce the number of patches which need to be rendered at all. This is done in three steps. First a rectangle which covers the projection of the viewing frustum onto the ground plane is calculated. All patches outside that rectangle will surely not be visible. All remaining patches are culled against the viewing frustum. To do this we clip the patches' bounding boxes against all six sides of the viewing frustum. All remaining patches are guaranteed to lie at least partially inside the camera's visible area. Nevertheless, not all of these remaining patches will necessarily be visible, because some of them will probably be hidden from other patches (e.g. a mountain). To optimize this case we can finally use a PVS (Potentially Visible Sets) algorithm to further reduce the number of patches needed to be rendered.
PVS [Air91, Tel91] is used to determine, at runtime, which patches can be seen from a given position and which are hidden by other objects (in our case also patches). Depending on the type of landscape and the viewers position a lot of patches can be removed this way. In Figure 7 the camera is placed in a valley and looks at a hill.
![](http://www.gamedev.net/columns/hardcore/geomorph/vis_01.jpg) |
|
![](http://www.gamedev.net/columns/hardcore/geomorph/vis_01b.gif) |
|
![](http://www.gamedev.net/columns/hardcore/geomorph/vis_01a.gif) |
Figure 7a: Final Image |
|
Figure 7b: Without PVS |
|
Figure 7c: With PVS |
Figure 7b shows that a lot of triangles are rendered which do not contribute to the final image, because they are hidden by the front triangles forming the hill. Figure 7c shows how PVS can successfully remove most of those triangles. The nice thing about PVS is that the cost of processing power is almost zero at runtime because most calculations are done offline when the terrain is designed.
In order to (pre-) calculate a PVS the area of interest is divided into smaller parts. In our case it is obvious to use the patches for those parts. For example a landscape consisting of 16x16 patches requires 16x16 cells on the ground plane (z=0). To allow the camera to move up and down, it is necessary to have several layers of such cells. Tests have shown that 32 layers in a range of three times the height of the landscape are enough for fine graded PVS usage.
One problem with PVS is the large amount of memory needed to store all the visibility data. In a landscape with 16x16 patches and 32 layers of PVS data we get 8192 PVS cells. For each cell we have to store which of all the 16x16 patches are visible from that cell. This means that we have to store more than two million values. Fortunately we only need to store one bit values (visible/not visible) and can save the PVS as a bit field which results in a 256Kbyte data file in this example case.
![](http://www.gamedev.net/columns/hardcore/geomorph/visfromtop.gif) |
Figure 8: PVS from Top View. Camera sits in the valley in the middle of the green dots |
Figure 8 shows an example image from the PVS calculation application where the camera is located in the center of the valley (black part in the middle of the green dots). All red dots resemble those patches, which are not visible from that location. Determining whether a patch is visible from a location is done by using an LOS (line of sight) algorithm which tracks a line from the viewer's position to the patch's position. If the line does not hit the landscape on its way to the patch then this patch is visible from that location.
To optimize memory requirements the renderer distinguishes between patches which are active (currently visible) and those which aren't. Only those patches which are currently active are fully resident in memory. The memory footprint of inactive patches is rather low (about 200 bytes per patch).
Geomorphing in Hardware
Doing geomorphing for a single patch basically means doing vertex tweening between the current tessellation level and the next finer one. The tessellation level calculation returns a tessellation factor in form of a floating point value where the integer part means the current level and the fractional part denotes the tweening factor. E.g. a factor of 2.46 means that tweening is done between levels 2 and 3 and the tweening factor is 0.46. Tweening between two mesh representations is a well known technique in computer graphics and easily allows an implementation of morphing for one single patch (vertices that should not move simply have the same position in both representations).
The problem becomes more difficult if a patch's neighbors are considered. Problems start with the shared border vertices which can only follow one of the two patches but not both (unless we accept gaps). As a consequence one patch has to adapt its border vertices to those of its neighbor. In order to do correct geomorphing it is necessary that the finer patch allows the coarser one to dictate the border vertices' position. This means that we do not only have to care about one tweening factor as in the single patch case but have to add four more factors for the four shared neighbor vertices. Since the vertex shader can not distinguish between interior and border vertices these five factors have to be applied to all vertices of a patch. So we are doing a tweening between five meshes.
As if this wasn't already enough, we also have to take special care of the inner neighbor vertices of the border vertices. Unfortunately these vertices also need their own tweening factor in order to allow correct vertex insertion (when switching to a finer tessellation level). To point out this quite complicated situation more clearly we go back to the example of figure 6b. For example we state that the patch's left border follows its coarser left neighbor. Then the tweening factor of vertex '1' is depends on the left neighbor, whereas the tweening factor of all interior vertices (such as vertex '2') depend on the patch itself. When the patch reaches its next finer tessellation level (figure 6a), the new vertex 'A' is inserted. Figure 9 shows the range in which the vertices '1' and '2' can move and the - in this area lying - range in which vertex 'A' has to be inserted. (Recall that a newly inserted vertex must always lie in the middle of its preexisting neighbors). To make it clear why vertex 'A' needs its own tweening factor suppose that the vertices '1' and '2' are both at their bottom position when 'A' is inserted (tweeningL and tweeningI are both 0.0). Later on when 'A' is removed the vertices '1' and '2' might lie somewhere else and 'A' would now probably not lie in the middle between those two if it had the same tweening factor as vertex '1' or vertex '2'. The consequence is that vertex 'A' must have a tweening factor (tweeningA) which depends on both the factor of vertex '1' (tweeningL - the factor from the left neighboring patch) and on that of vertex '2' (tweeningI - the factor by that all interior vertices are tweened).
![](http://www.gamedev.net/columns/hardcore/geomorph/morphing.gif) |
Figure 9: Vertex insertion/removal range |
What we want is the following:
Vertex 'A' should
- be inserted/removed in the middle between the positions of Vertex '1' and Vertex '2'
- not pop when the patch switches to another tessellation level
- not pop when the left neighbor switches to another tessellation level.
The simple formula tweeningA = (1.0-tweeningL) * tweeningI does the job. Each side of a patch has such a 'tweeningA' which results in four additional tessellation levels.
Summing this up we result in having have 9 tessellation levels which must all be combined every frame for each vertex. What we actually do in order to calculate the final position of a vertex is the following:
PosFinal = PosBase + tweeningI*dI + tweeningL*dL + tweeningR*dR + tweeningT*dT + ?/P>
Since we only morph in one direction (as there is no reason to morph other than up/down in a heightmap generated terrain) this results in nine multiplications and nine additions just for the geomorphing task (not taking into account any matrix multiplications for transformation). This would be quite slow in terms of performance doing it on the CPU. Fortunately the GPU provides us with an ideal operation for our problem. The vertex shader command dp4 can multiply four values with four other values and sum the products up in just one instruction. This allows us to do all these calculations in just five instructions which is only slightly more than a single 4x4 matrix multiplication takes.
The following code snippet shows the vertex data and constants layout that is pushed onto the graphics card.
; Constants specified by the app
;
; c0 = (factorSelf, 0.0f, 0.5f, 1.0f)
; c2 = (factorLeft, factorLeft2, factorRight, factorRight2),
; c3 = (factorBottom, factorBottom2, factorTop, factorTop2)
;
; c4-c7 = WorldViewProjection Matrix
; c8-c11 = Pass 0 Texture Matrix
;
;
; Vertex components (as specified in the vertex DECLARATION)
;
; v0 = (posX, posZ, texX, texY)
; v1 = (posY, yMoveSelf, 0.0, 1.0)
; v2 = (yMoveLeft, yMoveLeft2, yMoveRight, yMoveRight2)
; v3 = (yMoveBottom, yMoveBottom2, yMoveTop, yMoveTop2)
We see that only four vectors are needed to describe each vertex including all tweening. Note that those vectors v0-v3 do not change as long as the patch is not retessellated and are therefore good candidates for static vertex buffers.
The following code shows how vertices are tweened and transformed by the view/projection matrix.
;-------------------------------------------------------------------------
; Vertex transformation
;-------------------------------------------------------------------------
mov r0, v0.xzyy ; build the base vertex
mov r0.w, c0.w ; set w-component to 1.0
dp4 r1.x, v2, c2 ; calc all left and right neighbor tweening
dp4 r1.y, v3, c3 ; calc all bottom and top neighbor tweening
mad r0.y, v1.y, c0.x, v1.x ; add factorSelf*yMoveSelf
add r0.y, r0.y, r1.x ; add left & right factors
add r0.y, r0.y, r1.y ; add bottom & top factors
m4x4 r3, r0, c4 ; matrix transformation
mov oPos, r3
While this code could surely be further optimized there is no real reason to do so, since it is already very short for a typical vertex shader.
Finally there is only texture coordinate transformation.
;-------------------------------------------------------------------------
; Texture coordinates
;-------------------------------------------------------------------------
; Create tex coords for pass 0 - material (use texture matrix)
dp4 oT0.x, v0.z, c8
dp4 oT0.y, v0.w, c9
; Create tex coords for pass 1 - lightmap (simple copy, no transformation)
mov oT1.xy, v0.zw
oT0 is multiplied by the texture matrix to allow scaling, rotation and movement of materials and cloud shadows. oT1 is not transformed since the texture coordinates for the lightmap do not change and always span (0,0)-(1,1). |