"Dino 2" Technical Demonstration
Programming: James Russell
Modelling/Animation/Textures: Mark Prettyman
Additional Animation: Barry Scott
Sound: Jason Page
Tools: Mark Breugelmans
Background
The original "Dino" demo, which showed an animated T-Rex walking on a blank
background, was one of the first PlayStation programs to show off the power
of the PlayStation platform. Years down the track, it was decided to update
the demo to showcase the new technologies like the Dual Shock vibrating
controller and the HMD file format.
Specification
This demonstration was designed primarily to give new users a Dual Shock
experience. In addition, the source would be provided as sample code to
help developers use the Dual Shock and give concrete examples of the variety
of ways the HMD format can be used.
A rough storyboard of the demo would start off the with the camera facing
down into the pond, which is reflecting the sky. A 'thud' would be heard,
the vibration of the thud would cause the water in the pond to ripple and
the the Dual Shock to vibrate accordingly. More 'thuds' would be heard
as the dinosaur approached, until the picture shows its horns reflected
in the water. The camera dollys back and pans up to show the dinosaur,
which roars, and moves backwards. From this point on, the user has control
of both the camera and the dinosaur. The dinosaur's environment is a forest
clearing, and the dinosaur can walk around the pond. The user can control
whether the dinosaur moves backwards or forwards, can move the dinosaur's
head, wag its tail and stomp its foot. When the dinosaur stomps its foot,
it causes the ground to shake. The camera can be moved around the clearing,
and can be dollied in or out. Periodically, the nearby volcano erupts,
which causes the ground to shake.
Originally, the pond was to show a true reflection of the environment
and the dinosaur in such a way that the reflection itself could be rippled
realistically. Unfortunately CPU constraints meant that this was not feasible
in conjunction with a full frame rate. Instead the initial sequence where
the user does not have control contains the ripple, and then as the camera
pans out the scene is changed so that the reflection looks correct, but
without the ripple. For more detail, see below.
After running trial versions of the demo through the DTL-2700 Performance
Analyzer, many optimisations were made to increase the speed and resolution
of the demo. The demo runs at 50 frames per second at a resolution of 512x256.
It processes about 1800 polygons per frame, and draws up to 1300 polygons
per frame.
HMD
The HMD (Hierarchical Modelling Format) file format is a general purpose,
flexible and versatile format that includes multiple co-ordinate systems,
images, skinning (also known as shared polygons) and animation, among other
categories. HMD allows you to quickly and easily build programs, and extend
the functionality of the format where needed. HMD is used for most of the
models in this demonstration in a variety of ways.
Demo coding overview
The general structure of the demo is very simple. There are two 'modes'
of operation. The first mode is the initial sequence where the user does
not have control, and the application is performing operations like the
ripple and moving the dinosaur. This is called the Intro mode. The second
mode is where the camera pans back, and the user has control. This is called
the Main mode. After the Intro mode has finished, the application switches
to the Main mode and stays in that mode until the demo exits. The way drawing
is performed depends on which mode is currently active.
Notes:
-
All textures and sounds are loaded once into VRAM and SRAM respectively
at the beginning of the demo. Once there, the main RAM that they occupy
can be reclaimed.
-
If a Dual Shock controller is inserted, the program will change it into
analog mode and lock it there.
Intro mode
In Intro mode, the requirement is to draw the reflection of the scene in
the pond, rippling it if necessary (the rippling is caused by the dinosaur's
footfall), and integrate the rippled pond with the rest of the environment.
Because the original specification called for the reflected environment
in the pond to be rippled at any time, it was necessary to first render
this environment to an off screen buffer, then use this off screen buffer
as a texture for the pond, distorting the U/V co-ordinates in such a way
as to create the illusion of the water rippling. Initial experiments were
promising, but further down the track it was found to be too computationally
expensive to get an accurate result from an arbitrary camera position.
With this in mind, the specification was altered so that the ripple only
occurs in the period of time where the camera is not under the user's control.
The basic loop for the Intro mode is to:
-
Send the last frame's Ordering Tables to the GPU.
-
Process the sound.
-
Process the controller.
-
Prepare for drawing by clearing the Ordering Tables.
-
Move the objects in the scene.
-
Draw the sky to the off screen buffer.
-
Render the reflection to the off screen buffer
-
Iterate the ripple effect.
-
Render the environment, then the pond and finally the dinosaur.
This is very similar to the Main mode's loop, so we just use a flag to
change behaviour where the loops differ.
Notes:
-
The sky texture should cover the entire area of the pond, and when the
camera moves it gives the illusion of looking a long way off into the distance.
To perform this effect, the off screen buffer is filled with a sky texture
that is tiled across the whole area using tiled textures and a SPRT. By
adjusting the U/Vs with respect to the camera position, this gives the
illusion of distance. This method is better than using a simple TMD for
the sky, because it guarantees that the the entire off screen area will
be covered. With a TMD, it is difficult to move the model appropriately
to fill the area accurately. In addition, using a SPRT instead of a POLY_FT4
is faster. We also take advantage of the texture cache by making the sky
tile less than 64x64 pixels in size.
-
The water rippling algorithm is the 'standard' ripple effect. That is,
a rectangular array of water heights is kept, and two separate rectangular
arrays hold the velocity of the flow of water between each each water height.
The water heights affect the velocities, and in turn the velocities affects
the water height. A very realistic simulation of water flow can be achieved
this way. Originally the rippled texture was was rendered on top of the
ground using a rectangular array of quadrangles, and the U/V co-ordinates
if the quadrangles related to the velocity of the water at that point.
This was simple to code, but looked bad where the quadrangles overlapped
the boundary of the pond. The final solution was to create a "spider's
web" of polygons which completely filled the area of the pond, but did
not extend over the edge. However, since the U/Vs of the polygons of this
arrangement do not easily map to a velocity which will perturb it, the
program uses the member of the velocity rectangular array which is closest
to the polygon vertex.
-
The Gs libraries are used to initialise the double buffering structures.
However, because the way we are drawing is rather non-standard (using an
off screen buffer), it is more efficient to take over the responsibility
of setting the drawing and display areas ourselves.
-
Three OT's are used in Intro mode:
-
The ReflectionOT contains any polygons that should be reflected in the
pool. This will be the trees, and the dinosaur. The forest clearing is
not reflected. The Reflection OT is used for a different purpose in Main
mode (see below).
-
The GroundOT always contains the environment (without the innermost tree).
-
The MainOT contains the innermost tree and the dinosaur. Because of the
intrinsic nature of the models and camera constraints, we know that the
reflection is drawn first (to the off screen buffer). The ground is drawn
next to the normal screen buffer, since it is always behind everything
else. The water (using the reflection as a texture) is drawn next, on top
of the ground, and finally the MainOT is drawn on top of that. The innermost
tree and dinosaur are sorted into the same OT because it is not certain
without some calculation whether the dinosaur is in front of the tree or
vice versa.
Optimisations:
-
This demo had a lot of main memory to spare, so space efficiency is not
an issue. There are some optimisations that have not been performed which
would have saved a lot of memory (like single buffered primitive heaps,
etc.), but since there was so much memory available, we decided to make
the code look simpler instead.
-
The VSync() function waits for the vertical blank to occur (when the TV
scanline is at the bottom of the screen), but then waits until the TV scanline
has started at the top of the screen instead of returning immediately.
The reason for this is due to a hardware GPU bug that manifests itself
in interlaced mode only (the GPU will always draw to even scanlines in
the period between the VBlank and the first scanline). Because we are not
using interlaced mode, this wait time is unnecessary. To reclaim this time,
we use an alternative method. The application's VSyncCallback function
is set to drop a flag when the VBlank occurs and the application is ready
(it will not drop the flag if the application is not ready). To wait for
the VBlank, the application sets the flag, and when it changes state (due
to the VSyncCallback), the application continues. This method can regain
between 5-7% of CPU frame time.
-
The Ordering Tables initially contains a linked list of null GPU packets.
These packets are necessary in the construction of the final OT, but not
necessary for the GPU. However, since these packets still take time for
the DMA to process they can be removed from the OTs after they have been
constructed. The function CrunchOT() will process an Ordering Table and
re-adjust the pointers of the primitives so that they skip over the null
entries. For long Ordering Tables this can significantly reduce the time
taken by the GPU/DMA to process the OT. Performing this optimisation gained
about 7% GPU time.
-
Where possible, it is faster to construct and link GPU primitives once
then send the list to the GPU, rather than constructing them from scratch
each frame. The demo does this wherever possible, particularly for changing
drawing areas and GPU drawing attributes.
-
The demo contains two very similar environment models. One is used for
the Intro mode, and contains polygons underneath the pond's surface. The
pond's surface (using the off screen buffer as a texture) is blended in
using semi-transparent polygons. The other environment model is used in
the Main mode, and is very similar. However, this model has the bed of
the pond designated as semi-transparent. This is because in Main Mode the
pond's surface is drawn before the ground, whereas in Intro Mode it is
drawn after the ground.
-
When using the Performance Analyzer, it seemed that there was large amount
of cache thrash between the functions VSync and DrawSync right up until
the end of the frame. After some investigation, it was found that the DrawSync
function was thrashing with another function that it uses in another library.
By changing the order of the libraries in the link file, this thrash was
removed. This doesn't affect CPU speed much, since all DrawSync does is
loop until the GPU has finished, but removing the cause of the cache thrash
made it easier to see performance problems with the Performance Analyzer
software.
Main mode
The Main mode of the program is slightly different to the Intro mode. The
way the drawing is performed is quite different, and even a different Environment
HMD is used.
The basic loop for the Main mode is to:
-
Send the last frame's Ordering Tables to the GPU.
-
Process the sound.
-
Process the controller.
-
Prepare for drawing by clearing the Ordering Tables.
-
Move the objects in the scene.
-
Draw the sky to the off screen buffer.
-
Render the off screen buffer to the main screen buffer (this contains only
the sky).
-
Render the reflected environment (trees and dinosaur) to the main screen
buffer.
-
Render the environment and finally the dinosaur to the main screen buffer.
The Main mode differs from the Intro mode in the way the reflection is
performed. The Intro mode renders the reflected polygons to the off screen
buffer, but the main mode renders them straight to the main screen buffer
using a different camera angle and position to give the illusion of reflection.
In addition, the semi-trans polygons of the pond in the Intro mode cause
the reflection to be darker than the 'real' image. To simulate this in
Main mode, the drivers will change the RGB of any reflected polygon to
be (32,32,32) instead of (128,128,128).
-
Animation. The animation of the dinosaur is fairly simple. There are a
number of different 'states' the dinosaur can be in, and although the user
may want it to be in one state ("Walking Forwards"), the dino will have
to travel through more than one state to smoothly reach the desired state.
With a little thought and very little programming, this could have been
achieved with the current implementation of HMD animation, but at the time
the tools were not available for an HMD animation export. Effectively the
HMD format can perform a function similar to the AnimNextState() function
(defined within this demo), where the current state changes to a user specified
state using a jumptable embedded in the animation data. See below for a
diagram of these states.
-
The texture animation of the volcano erupting in the background is performed
by overwriting data in the HMD file. Specifically, when we want to change
animation frames, the polygon information concerning that particular quadrangle
is overwritten to change between textures. With a TMD file, finding the
address of this polygon information is difficult and error prone if the
artist was to change the TMD often. However, with HMD it is very simple,
and we take advantage of the HMD assembler to do so. After the HMD Header
Section in the environment LAB file, we put a reference to the label VOLCANO.
Using HMDTool, we can work out the polygon number of the quadrangle we
want to animate. We find this particular quadrangle in the human readable
LAB file and put the label VOLCANO: next to it. Thus when we assemble the
file, we can find the address of VOLCANO by using the reference created
at the end of the HMD Header Section. Thus we let the assembler do the
calculation for us. This technique is also used to adjust the level of
semi-transparency of the dinosaur's shadow.
-
To create the illusion of reflection, the models that have to be reflected
are simply drawn upside down. This is achieved by rotating the camera 180
degrees and mirroring its position with respect to the ground. However,
if the view is simply rotated, then the reflected objects will need to
be flipped about the X axis. This is simple to do by adjusting the World->Screen
matrix, but changes the winding order of the polygons so that visible polygons
return negative outer products instead of positive ones. To get around
this, the HMD drivers use a flag to determine whether they should negate
the outer product result so that the visible reflected polygons have a
positive outer product. This flag is set only when we are processing 'reflected'
polygons. So the process of creating a reflected image is to create a translated,
rotated and flipped Local->Screen Matrix, and set a flag so that the driver
both negates the outer product and resets the RGB value to (32,32,32).
Animation format
The animation format used was not HMD, because the tools were not available
at the time to create the necessary data. Instead, a very simple format
that was created by Sony's 1st party publishers was used. This animation
format contains a number of keyframes, and for each keyframe it holds the
translation of the root node of the model and the rotations of each co-ordinate
system in the model's hierarchy. This format suffices for most purposes,
but as a downside it is not possible to 'stretch' a bone (alter the
translation at any node other than root). However it is a very simple format,
and easy to integrate into the demo.
To set the model to a particular animation frame, all that is required
is to change the root translation and create the 3x3 matrix (using RotMatrix)
for each co-ordinate system based on the 3 Euler angles for that frame/coordinate
system (stored in the animation file).
There are 3 animation files in use here. The first is the 'back off'
animation, where the dinosaur moves away from the pond and into the starting
position. The second and third files are a collection of the animation
'states' that the dinosaur can be in when walking forwards and backwards
respectively.
For both forwards and backwards movement, the dinosaur's animation has
3 'state change' positions, and there are animations for moving between
these frames. The 3 keyframes could be described as 'stopped' (where the
dinosaur is not moving its feet), 'walk1' (the end of the first part of
the stride), and 'walk2' (the end of the second part of the stride). There
are 6 animations to move between these state change positions, and 2 extra
animations for when the dinosaur stomps its foot, and shifts its weight
(it performs this after a certain length of inactivity).
In this diagram, the circles represent the foot positions of the dino.
They are called Stop,
Walk 1,
Walk 2, Backwalk
1 and Backwalk 2. The arrows indicate animations between these
states. The function AnimNextState() will return the next animation to
play given a current state and a desired state.
HMD coding
There are 6 HMD files used by the demo:
-
Dinosaur HMD. Contains 22 co-ordinate systems, and includes skinning.
-
Shadow HMD. Contains 1 semi-transparent polygon which is the dinosaur's
shadow. The reason why this is not integrated into the dinosaur's model
is because the co-ordinate system it attaches to is different depending
on which animation file is being used. The 'backoff' animation file requires
the shadow to be attached to the dinosaur's root node, while the other
animation files require the shadow to be attached to the handler node (which
is one above the root node).
-
Environment HMD. Contains the outer cylinder (which are textured with trees),
the ground, and the 2 outmost 'real' trees. The bed of the pond consists
of normal polygons, as the water will be added on top using semi-trans
polygons. This model is used only in Intro Mode.
-
Environment HMD NP. Identical to the Environment HMD model, except the
bed of the pond consists of semi-transparent polygons. This is because
in Main mode, the reflection is draw to the main screen before the environment.
-
Innermost tree HMD. This is the tree that stands next to the pond. It is
separated from the environment HMD because it needs to be sorted into a
different OT. We can be sure that the dinosaur will always be drawn after
the two outermost trees, but not the innermost tree, so we sort this innermost
tree into the same OT as the dinosaur.
-
Trees HMD. This contains the 3 trees in the environment. They are used
in the reflection only. The difference between this model's trees and the
trees in the Environment HMD files above are that these trees have tops
(with leaves and branches). This is because they will be seen in the reflection.
The camera constraints in the Environment HMD model mean that the tops
of the trees will never be visible, so they have been removed from those
models.
These files were created in Alias¦Wavefront, and exported using
the CHR exporter and CHR2LAB (available from the SCEE website). Some of
the LAB files were manually edited by hand.
With the exception of the dinosaur and it's shadow, standard Gs co-ordinate
systems were used for all HMDs.
-
Standard GS co-ordinate systems were used for the dinosaur in the initial
stages of development. However, it was inefficient for the CPU to calculate
the Local->World rotation matrices for both the reflection and the normal
dinosaur, when both were identical (even though their Local->Screen matrices
were not). To this end, the Local->World matrices are calculated just once,
and used twice. This minimises the number of matrix multiplications required.
-
The World->Screen matrix calculated by the program is stored in GsWSMATRIX
because all of the HMDs (except the dinosaur) use the standard GS functions
to calculate their Local->Screen matrix. The Gs functions will use the
coordinate structures internal to the HMD to calculate the Local to World
coordinate system, then use the matrix in GsWSMATRIX to calculate the Local->World->Screen
matrix.
-
The standard HMD primitive drivers were used for most of the development.
Performance analysis on the program revealed that there was a lot of unnecessary
GPU activity due to primitives being drawn completely off screen. To get
around this, the original drivers were rewritten to include trivial rejection
clipping routines. These clipping routines were optimised for the constraints
of the demo by taking advantage of the fact that the origin was in the
center of the screen.
-
When the dinosaur is drawn for the reflection, the X is flipped (by negating
the m[0][0] term in the GsWSMATRIX) after a 180 degree rotation to achieve
the mirror effect. However, this also inverts the winding order. To flip
the winding order back to what it should be, the drivers have also been
updated to examine a flag which the program sets when it is processing
the reflected triceratops.
-
The SN linker psylink will include any functions from a library
in the final executable if those functions are referenced. However, the
definition of "referenced" encompasses 'extern' references, even if there
are no other references. Thus, if you 'extern' the function ABC()
which resides inside a library, the linker will include it even if there
are no other references to ABC() in the source file. To regain
the memory taken up by unused functions, HMD initialisation code should
not include any references, including 'extern's, to unused functions. The
standard functions which map primitive codes to drivers need to be adjusted
so that they do not include any references to unused drivers. You can used
the HMDCodes utility (available from the SCEE website) to view which primitive
codes you're using.
-
The makefile will automatically convert a new CHR file into a LAB file,
then assemble it into an HMD file, then convert that into a form suitable
for inclusion into the program. In the case of the Environment however,
the VOLCANO labels have to be manually added to the LAB file, so if the
Environment CHR was changed it was necessary to re-adjust the LAB file
and then re-make.
-
The drivers used for initial development were the standard HMD drivers.
When the special requirements of the demo meant that these were unsuitable,
they were adjusted to include trivial rejection clipping, colour replacement
(to make the reflection polygons darker), and winding order reversal (so
the reflected polygons were judged to be visible in a correct fashion).
This required some register saving (in a slightly non-standard manner,
since these functions are still leaf functions). Additional optimisations
that could have been made here could have been
-
The usage of the D-Cache. The address of a scratch area, usually the D-Cache,
is passed to GsSortUnit, which uses it to create the structure passed to
the driver. This structure (which includes the primitive header) by no
means takes up the full 1024 bytes of the D-Cache, so it would be possible
to store registers here instead of on the stack, saving a few cycles.
-
The usage of the HI/LO registers. Temporary saves can be made to these
registers, but this was not done because the code would have been even
more unreadable.
-
Synthetic instructions, which expand to more than one MIPS instruction
and usually involve the AT (assembler temp) register, could have been completely
removed. This would mean that one less register would have to have been
saved.
-
There are 5 separate drivers used in this demo. However, if you are using
HMD you may only want to maintain one or two, even though your model may
use more. Or alternatively, you may want to pass in certain parameters
for a particular special case. In either of these cases, a simple technique
is to create a generic driver that handles many cases, and has multiple
function entry points. Each entry point would set the appropriate parameters
and then continue on to the main body of the function. This means you'll
have to maintain a minimum of drivers.
Controller Pad coding
The demo was designed as a an application to show off the Dual Shock controller
pad's features. The demo assumes that a Dual Shock is plugged in, but will
function (without the ability to move the head) if a standard controller
is not plugged in. Because of this assumption, the program will lock the
Dual Shock into analog mode if it is available. If analog is not available,
the digital pad will mimic the action of the left analog stick.
The function processController() is run every frame, and handles
the pad state changes that can occur when a controller is removed or inserted.
To make the interface simpler for the program, the function maintains
two global flags. One states whether there is a controller inserted and
the communication with it is stable, and the other indicates whether this
controller is in analog mode.
Firstly, the function checks the current state. If the state is stable,
but the validController flag indicates that there was no valid controller
previously, then a new controller must have been inserted and communication
must have been established. Now we know that there is a valid controller,
but if it is a Dual Shock then we have to set the alignment so we can send
vibration data back to the controller, and lock the mode into analog mode.
If the controller is not a Dual Shock, then we cannot lock either the mode
nor any digital/analog mode switches.
When a Dual Shock is first inserted and communication becomes stable,
it is necessary to set the alignment with PadSetActAlign so that actuator
information can be sent to the controller. When this function is called,
however, the state changes to "Communicating with Controller", and it is
necessary to wait until communcation is stable again before attempting
to lock the controller into a different mode.
The analog sticks control the movement of the camera and the movement
of the dinosaur's head. If the sticks are left at rest, it is possible
for the analog values to jitter between one or two values. To stop this
from happening, the analog values returned by getAnalogValues()
are only copied verbatim if they lie beyond a certain threshold from the
standard rest value of 0x80.