A key to allowing effective platform independence is to use logical
descriptions so that viewers can fill in the details according to their own
rendering capabilities. As an example, you could describe a room in terms of
the polygon defining the floor plan, the height of the walls, and categories
for the textures of floor, walls and ceiling. Hierarchical descriptions of
wall textures could include: raw color and a link to the tiling pattern for
an explicit design of wall paper. Low power systems would use plain walls,
saving the cost of retrieving and patterning the walls. Fractal techniques
offer interesting possibilities too.
Shared models would avoid the need to download detailed models, e.g. for
wall paper, window and door fittings, chairs, tables, carpets etc. These
models, by using well known names can be retrieved over the net and cached
for subsequent use. The models would include hierarchical levels of detail.
This is important for "distancing" and reducing the load on lower power
clients. In addition to appearence, models could include behaviours
defined by scripts, e.g. sound of a clock ticking, the way a door opens
and functional calculators, radios and televisions.
Full VR needs expensive I/O devices, but we could get by with side-ways
movement of the mouse (cursor keys) to turn left or right and up-down
movement of the mouse to move forwards and backwards in the scene. I believe
that allowing a progression from simple to sophistocated I/O devices with the
same VR interchange formats will be critical to broad take up of VR.
So far I have outline a way in which you could click on an HTML link and
appear in a VR museum and wander around at will. Pushing on doors would
correspond clicking on hypertext links in HTML. The next step is to get to
meet other people in these VR environments. The trick here, is to wrap
real-time video images of people's faces onto 3D models of their heads.
This has already been done by a research group at ATR in Japan. Our library
couldn't find any relevant patents, so it looks like there are no problems
in defining non-proprietary protocols/interchange formats for this approach.
The bandwidth needed is minimised by taking advantage of the 3D models
to compress movements. By wrapping the video image of a face onto a 3D model,
you get excellent treatment of facial details, as needed for good non-verbal
communication, while minimizing the number of polygons needed.
The effectiveness of this approach has been demonstrated by Disney who
project video images on onto a rubber sheet deformed by a mask pushing out
of the plane. Needless to say, there remain some research issues here ...
The first steps in achieving this vision is to start work on a lightweight
interchange format for VR enviroments and experimenting with viewers
and http. A starting point is to pool info on available software tools
we could use to get off the ground.
Regards,
Dave Raggett (looking forward to the Web's VR version of the Vatican Exhibit).
-----------------------------------------------------------------------------
Hewlett Packard Laboratories, +44 272 228046
Bristol, England dsr@hplb.hpl.hp.com