Using the Oculus Rift’s head tracking to aid immersive audio.

This blog post has been drafted for a while now, but with the recent news that Facebook has bought Oculus for $2bn, it seemed like the perfect time to finish it off and publish it.

For those who are unfamiliar with the Oculus Rift, it is a headset which uses a single screen to display two images; each image is directed to one eye using two lenses, and so a 3D effect is created.  It uses head tracking, so that if you look left while wearing the headset, the effect can be replicated in the interactive media; the parameters within a game or piece of media are often mapped to the tilt and pan of the camera, using the orientation of your head to dictate the view.  At this point, the Oculus Rift hardware is solely visual, however if this sort of technology can be used to create an immersive visual experience for the user, then surely similar principles can be applied to provide fully immersive audio?
The Oculus Rift

Head tracking for audio works in a very similar way to those outlined above.  When you listen to something using conventional headphones, no matter how you move your head, the sound remains the same; this is because the soundfield is fixed.  In other words, if you look to the left, the auditory field follows.  However, when head tracking is introduced, the sound origins are fixed within a 3D space.

an example of head-tracking

In order to understand how this effect might work, it’s important to understand how human hearing localises sound sources.  There are two main systems we use to localise a sound’s origin: Inter-aural Time Difference (ITD) and Inter-aural Level Difference (ILD).  ITD is a measure of how much earlier a sound arrives at one ear before the other; for example, if a sound arrives at the left ear first, we assume the sound originates from the left.  ILD is similar, but identifies the difference in amplitude from one ear to the other; if a sound is louder in the right ear, it is assumed that the sound originates from the right.  The brain uses these differences in time and amplitude to pin-point the origin of a sound, the processes occurring so quickly that they don’t even need to be thought about.  In order to decide where a sound is coming from, we naturally move our heads in order to narrow down the sound’s source.  If the sound hits the right ear first, and is louder, then it’s source should originate to the right; moving the head will allow the listener to determine the precise location, because it will change the sound/time information at each ear. Currently, when experiencing audio in interactive media, the soundfield is static and not linked to our head movements as it would be in a natural environment; it is instead mapped to the input of a controller. This pulls us out of the experience rather than immersing us further because the soundfield is mapped to an unnatural stimulus: the controller inputs, rather than the natural stimulus this change is attributed to: the movements of our heads. If the movements of our heads are represented by changes in the media’s soundfield it will be perceived as being more realistic, therefore, aiding immersion.

By using this theory, soundscapes could have their orientation parameters mapped to the tilt/pan of the character’s view, which in turn is being controlled by the head of the user’s tilt and pan, thus allowing for a truly immersive experience for the user.  There are, however, limitations to the development of audio tracking, the main one being that the consumer could be listening to the audio through any number of means: headphones, stereo, 5.1 to name just a few.  When using headphones, modelling of the Head Related Transfer Function (HRTF) and emulating binaural systems, needs to be considered, and would probably aid the immersion. The downfall of this would be the variety of parameters when using HRTF models – everyone’s HRTF is different and choosing models for each individual would not be efficient or possible for the consumer. This would suggest a universal model being decided upon for the consumer, which wouldn’t work perfectly for everyone.

Using the Oculus Rift’s head tracking is essential to the future success of immersive media systems. Spatial audio needs to be taken into account for the next iteration of the Rift, not least by the developers producing content for it and is truly the next step in building upon the immersion that the Oculus Rift currently offers. Hopefully the new Facebook acquisition of Oculus will see a new iteration of the hardware which will take spatial audio into account.

For further information, this youtube video showing a spatial audio engine in Unity 3D should be the first port of call and perfectly demonstrates the benefits spatial audio could provide to the Oculus Rift:

Figure 1 –
Figure 2 –