GDMonline Programming the PC Speaker, part 2
Phil Inch, Game Developers Magazine

DOWNLOAD ... The sound player mentioned in this article is contained in the file VOC-IT.ZIP (17,174 bytes) which can be downloaded by clicking this disk icon.

DOWNLOAD ... The waveform viewer mentioned in this article is contained in the file WAVEFORM.ZIP (8,526 bytes) which can be downloaded by clicking this disk icon.

Introduction

I'm sure you're curious to know what we're going to learn this issue, and I'm not going to keep you in suspense any longer ...

As promised, we're going to make some digital sound effects! We're going to do this using the program attached to this article, "VOC-IT" which is capable of playing Soundblaster .VOC files. You can download this program and the source code by clicking on the above icon!

But before you run it, some warnings - read these or (maybe) suffer the consequences ...

* If you're running from within Windows, or any other "shell", it's probably best that you exit now and run VOC-IT from Dos. The way Windows shares processor time between applications is not compatible with VOC-IT.

* On some machines, particularly portables which always have crap little speakers, this sound may sound really awful, or may be completely inaudible. There's not a lot I or anyone else can do about this, I'm afraid.

* The real-time clock is STOPPED during playback. This is not normally a problem but if you play a lot of sound files your time-of-day will steadily become more and more wrong. I've yet to work out a truly satisfactory way around this.

* Finally, if you're at work, this sound effect might be loud so be sure no-one's around ... you can't stop the playback once it's started.

Notes regarding the source to VOC-IT

As usual, the source code is included, and you'll find it in the VOC-IT zip file. It's a mixture of C and assembler but everything that it's doing is explained in this article, so this shouldn't be a problem. I've abandoned the pseudo-code I talked about last issue as it really didn't come off.

(Note that it can only play VOC files up to the limit of available memory, some some of the really large VOC files are beyond its capacity ... if I work out a way around this, I'll publish an update).

I've also just discovered it can play .WAV files, and in fact it may be able to play lots of other file types also - you'll just have to experiment. You can even play .EXE files and listen for satanic messages in your copy of Word for Windows (grin).

Controlling the speaker directly

You'll recall that last issue we used timer 2 to make the speaker oscillate at a given frequency, thus producing a sound. The SOUNDS program showed how, by changing the frequencies, we could create "sound effects".

This issue, we're going to use timer 2 again, but in a way which allows us to play back "digital" sound effects.

To quickly refresh your memory, we set timer 2 up using a countdown value which dictated the frequency at which timer 2 oscillated. We then "connected" timer 2 to the speaker, meaning that whenever timer 2 oscillated, the speaker would "click", thus producing the tone.

Understanding Sound waveforms

Let's quickly digress and examine the waveforms for two sounds - a "tone", like we played last issue, and a "digital" sound like we're going to play this issue.

A waveform diagram is like a graph. It consists of two axes. The Y axis (up and down) represents the amplitude of the wave, and for our purposes it's the current position of the speaker cone. That is, a point low down on the graph represents the speaker cone at or near the rest position, and a point high on the graph represents the speaker cone at or near its maximum possible extension.

The X axis (left to right) is "time", so if you trace the waveform with your finger from left to right, you're roughly tracing the path of the speaker cone as the voltage across the speaker is modified.

The waveform for tones like we generated last issue looks something like this:

FULL EXTENSION    *        *        *        *
            |    * *      * *      * *      * *
            |    * *      * *      * *      * *
            |    * *      * *      * *      * *
            |    * *      * *      * *      * *
            +----*-*------*-*------*-*------*-*------
            |    * *      * *      * *      * *
            |    * *      * *      * *      * *
            |    * *      * *      * *      * *
            |    * *      * *      * *      * *
REST POSITION ***   ******   ******   ******   ******
You should be able to clearly see the "pulses" produced by timer 2 oscillating. But what's physically happening to the speaker?

When a voltage is applied to the speaker, the cone starts to move outwards which is represented on this diagram by the wave (the line of *'s) moving upwards.

When the cone reaches full extension (or the amount of extension caused by the applied voltage, as you'll see), the wave stops at the top.

When the voltage is removed, the cone starts to move back to the rest position, which is shown as the wave moving downwards until it reaches the bottom.

The important point from this diagram is, the speaker cone can only exist for any length of time in two positions - rest, when there is no applied voltage, and full extension, when a certain voltage is being applied.

All positions between these are just "passed through" as the speaker moves from rest to full extension and back again. This is the kind of wave we created last issue using timer 2. The PC speaker can be connected to 0V (rest) or 5V (full extension) by timer 2. We can't apply 2.5V or any other voltage in between.

Now, let's look at a possible wave form for digital sound:

FULL EXTENSION       **                     **           
            |   *   *  *                   *  *          
            |  * * *   *                  *   *     *   *
            | *   *    *                 *    *    * * * 
            |*          *     *          *     *   *  *  
            +------------*---*-*--------*-------*-*------
            |             *  *  *  *    *        *       
            |             *  *  * * *  *                 
            |             * *    *   * *                 
            |              *          *                  
REST POSITION
As you can see, the waveform is very rough and does not follow any regular pattern. In addition, the speaker cone can be made to rest in any position by applying a fraction of the voltage required for full extension.

This relationship is not linear, that is, applying half the voltage does not necessarily mean that we end up with half extension, but for our purposes now we'll assume it does.

(If you have trouble understanding this, watch the bass speaker on your stereo and you'll see the cone moving in and out (from rest to fully extended). You'll notice that the distance the cone travels out from rest varies with the music, and this movement corresponds to the voltage being applied.)

But how can we make the PC speaker move that way?

The key to playing digital sound is to be able to control to a very fine resolution, the actual position of the speaker cone. Unfortunately, the PC only allows us to apply two voltages - 0V (speaker is at rest), and 5V (speaker is fully extended).

When playing digital sound, we want to be able to move the speaker to positions between rest and full extension, which means we need to apply a fraction of 5V, which the PC does not allow us to do directly.

"The key to playing digital
sound is to control
the actual position of
the speaker cone"

However, we can achieve the same result by applying the normal 5V to the speaker, which causes the cone to start moving towards full extension, and then remove the voltage when the cone has reached the point we want it to, at which point it will return to a resting position.

But how do we know when the cone has reached the position we want? We rely on the fact that the time taken for the cone to travel from rest to full extension is approximately 60 millionths of a second (!!!), so we just have to apply the voltage, wait for a fraction of 60 millionths of a second, then turn the voltage off.

For our purposes, we'll assume that the fraction of 60us (us=microseconds, or millionths of a second) required is the fraction of full extension we want. That is, we'll assume that the speaker moves at a constant rate from rest to full extension.

For example, after half the time required for full extension (30us), we'll assume that the speaker has travelled half the distance to full extension.

Easy, huh!

Generating a "square wave" with timer 2

Normally, when timer 2 oscillates, it sends a brief 5V pulse to the speaker, causing it to "click", as you saw from the waveform diagram above.

In order to make digital sound, we will need to reprogram timer 2 to keep the voltage applied to the speaker for a certain amount of time, which we will dictate by giving a countdown value just like we did last issue when we were playing notes, and to then turn it off.

The timer will do this once only as opposed to doing it repeatedly as it does when playing a continuous note (like last issue). In this way we will control exactly the amount of extension (movement) the speaker undergoes.

Repeatedly doing this process produces what's known as a "square wave". Can you see where it got its name?

EXTENSION   |    **********  ********        *********
            |    *        *  *      *        *        
            |    *        *  *      *        *        
            |    *        *  *      *        *        
            |    *        *  *      *        *        
            +----*--------*--*------*--------*--------
            |    *        *  *      *        *        
            |    *        *  *      *        *        
            |    *        *  *      *        *        
            |    *        *  *      *        *        
REST POSITION*****        ****      **********        
As I said, we're going to reprogram timer 2 to keep the speaker on until the countdown value we give it reaches 0.

We know that it takes about 60 millionths of a second for the speaker cone to move from rest to full extension, and we're going to assume that we can vary the amount of extension by varying the time the voltage is applied.

We also know that timer 2 is driven by an external oscillator which runs at 1,193,180Hz (remember the theory from last issue?). This means that it reduces the countdown value 1,193,180 times every second, or once every 0.83us.

Therefore, for the speaker cone to reach full extension (60us), we would need to count down ( 60 / 0.83 ) times = 72 times. In other words, if we program the countdown with a value of 72 or higher the speaker will have time to reach full extension.

If we program the timer with countdown values of less than 72, the speaker cone will NOT have time to reach full extension before the voltage is turned off (countdown reaches 0), and we will therefore effectively control how much extension we actually desire!

"See, I told you
it was easy!"

See, I told you it was easy! To play a whole sound, we just keep re-programming timer 2 with the countdown value required for the amount of extension required for the current bit of the waveform, as you'll see in the next section. The speaker then appears to move exactly along the line specified by the waveform!

Just as an aside, you should now see why the piezo-electric speakers you get in laptop computers won't replay digitised sound very well. A PE speaker is just two flat metal plates (if you've ever opened up a digital watch, a PE speaker is that flat round gold plate thingy).

When a current is passed between them they "vibrate" at a specific rate which creates the sound. It's not possible in any way to control the extension because nothing actually extends, and therefore we can't properly reproduce a digital waveform.

Digital sound files

Your average digital sound file, VOC, WAV or whatever, usually consists of numeric values which represent the amplitude (extension) of the speaker at any given moment.

If you plotted these values on a graph, you would be able to produce the waveform diagram for that sound file. In the WAVEFORM zip file you'll find a very basic version of this, WAVEFORM.EXE (C source included, as always).

Just run WAVEFORM {filename} (eg WAVEFORM TADA.WAV) and the program will display one screenful of the waveform at a time, and you can press space to move to the next screen or ESCAPE to stop. This program only works with VGA mode 13h, and it's a useful application of the graphics code we learnt last issue!

WAVEFORM.EXE is just a "quick and dirty" program which I encourage you to experiment with. For now, just refer back to the example digital waveform I drew a little earlier.

The other important piece of information about a digital sound file is, what frequency was it recorded at? In other words, how often was the position (extension) of the speaker (microphone) recorded? Most digital sound files are recorded at around 16,000Hz which means that the position (extension) of the speaker (microphone) is noted 16,000 times a second, and each time a corresponding number is written to the file.

(Therefore, a file recorded at 16,000Hz theoretically takes 16,000 bytes for every one second of sound, but this can be reduced as you'll see in later issues).

"If we have file recorded
at 16,000Hz, we need to
adjust timer 2 16,000
times a second!"

If we have a file recorded at 16,000Hz, we need to adjust the position of the PC speaker 16,000 times a second. This means we need to re-calculate the countdown value and update the frequency of timer 2 16,000 times a second!

How can we accurately know when it's time to update timer 2? Well, this is where timer 0 comes in (the real time clock). As you know from previous issues, timer 0 usually runs at 18.2Hz.

If we beef up timer 0 to run at 16,000Hz, and update the countdown value for timer 2 each time timer 0 oscillates, then we can accurately replay the digitised sound!

Recall that timer 0 also controls the generation of interrupt 8, which among other things, updates the real time clock. If we run timer 0 at 16,000Hz (879 times faster than normal), then the clock will update 879 times faster than normal, which is clearly unacceptable.

There are two solutions to this problem. The first solution is, stop timer 0 from generating interrupt 8 - that is, stop the clock while the sound is playing. While this can't be done directly, we can replace the computer's interrupt 8 routine with our own (which does nothing), and this is the approach I have taken with VOC-IT.

The other possibility is to write our own interrupt 8 routine which calls the original interrupt 8 routine at the correct rate of 18.2 times a second, but my experiments with doing this were not very successful.

The disadvantage of totally disabling the computer's default interrupt 8 handler is that the real-time clock appears to "stop", that is, the time is not updated! This can be solved by calculating how long we were running our own handler and then manually updating the time of day when we've finished playing the sound, but for simplicity I have left this out of VOC-IT. In a later issue we'll add it in.

Rolling our own interrupt 8 "handler"

One of the more versatile abilities of the IBM is that it allows us to replace almost all the interrupt services with our own routines, or "handlers". I won't go into the details here, but I have explained how this is done in this issue's "interrupts" section.

"The IBM ... allows us to
replace interrupt services
with our own"

The interrupt 8 handler in VOC-IT does nothing except tell the interrupt generator that it is finished, which of course means it takes very little time. With the clock running at such a high rate (and hence the handler being called at the corresponding rate), there's no time to be messing around!

What is unique about interrupt 8 is that it is possible to tell the computer to "wait" (do nothing) until the next time interrupt 8 is generated. This is done using the assembler command HLT (halt).

This is useful for us, because we use it to wait for the next interrupt 8, which is our signal that it's time to modify the countdown value for timer 2. In this way we can precisely follow the specified playback frequency, regardless of the speed of the PC we are using!

(This is a variation of the trick used to make games appear to play at the same speed regardless of the machine they're running on, and next issue I'll show you how this is done!)

Playing the digitised sound

In step form, then, here's how we play a digitised sound. Further explanation is given where required, and I recommend you also print out and look at the source code (VOC-IT.C and INT08.ASM), even if you're not familiar with C and assembler.

1) Load the sound data into memory Loading the sound data simply means opening the file and reading each byte into memory. At this stage we apply the muting factor which simply means we divide each byte by a certain number which reduces the range of values.

Recall we found that only values in the range of 0 to 72 produce useful countdown values. (Anything above 72 always produces full extension of the speaker). The maximum value of a byte is 255, and 255/72 is about 4. The muting factor is a power of 2 - ie mute 1 = /2 (2 to the power 1), mute 2 = /4 (2 squared), mute 3 = /8 (2 cubed). Therefore the default muting factor is 2.

You can experiment with changing the muting factor. In VOC-IT, this is done by using the /M command line switch. Larger mute values will make the sound quieter, and I'm sure you will now understand why (let me know if you don't).

Smaller values may appear to amplify the sound if the values in the file were small to begin with, but it may also stop the sound completely because you're exceeding the maximum useful value of 72 and therefore leaving the speaker at full extension permanently.

Some sound files you will encounter will not need a muting factor at all - they have been stored on disk already scaled to the correct range. You will only discover which files these are by experimentation. For these files, use mute factor 0 which will not alter the values at all.

Finally, not all digital sound files are stored in "raw" format. VOC files in particular have a quite complicated structure which among other things allows us to compress and repeat blocks of data. Next issue we'll examine the structure of VOC files and write a "proper" loading routine. For now, just playing it as a raw sound file is acceptable.

2) Replace the computers interrupt 8 handler with our own (do-nothing) handler.

If you look at the file 'INT08.ASM' in the SOURCE\VOC-IT directory, you'll see the new interrupt 8 handler. All it does is tell the interrupt controller that the interrupt is complete. Its beyond the scope of this article to explain all the details of this process, but look at this issues interrupts article for an explanation.

In 'C', we use the setvect() and getvect() functions to set and retrieve the addresses of the interrupt functions. I have no idea how to do it in other languages, although I do know there's a DOS interrupt for doing this. Please let me know if you know how to do this in other languages.

3) Re-program timer 2 to generate a single pulse and to "attach" to the speaker.

I will cover timer programming in much more detail in a future issue, but for now, a brief explanation of what to do. Port 43h is the "mode control register" for the timers, and when we want to do something with a timer, we output a control byte to this port. The control byte is composed of 8 bits, as you know, and each bit has some significance to the timer controller.

Firstly, we will output 90h which is 10010000 in binary. We are sending four pieces of information here. This is what we are sending:

10 = select counter 2
01 = countdown values are 1 byte only (not 2)
000 = generate a single pulse for the whole countdown
0 = countdown in binary

So we are telling the controller:

* we're programming timer 2.

* we will send single byte countdown values, which limits our range to 0-255. Strictly speaking this is not necessary, but it avoids us having to send two bytes every time we want to update the countdown value. (If we're only sending values between 0 and 255, the high byte would always be zero!) In this mode we're telling the timer to assume the high byte is zero.

* we want the pulse applied (voltage applied to the speaker) for the whole countdown, and then removed when the counter reaches 0. Mode 0 also only happens once, and it's triggered by us loading the countdown value.

4) Set the frequency of timer 0 to the desired playback frequency.

To do this, we calculate the required high byte and low byte values for the playback frequency we want, and then program timer 0 with this countdown value in exactly the same way we programmed timer 2 last issue.

5) Get the first byte of sound data, which dictates the amplitude of the wave (ie: amount of extension of the speaker cone).

In VOC-IT, we using the DS:SI registers to point to the location in memory which holds the next byte of the sound sample.

6) Re-program timer 2 with this byte as the countdown value. This automatically starts timer 2.

This is simply done by OUTing the byte value to port 42h, which is the port for timer 2.

7) Wait for the next interrupt 8 to occur, by issuing a HLT instruction.

8) If we just used the last byte in the sound file, go to step 10

In VOC-IT, when the program loads the sound file, it adjusts the range of values depending on the muting factor specified, and at the end of the sample data it puts the value FFh. The playback routine checks to see if the byte it just read was FFh, and if so, it stops playback.

An alternative method is to check the number of bytes played against the number of bytes read when the sound was loaded, but this is not as fast, and we need to write the tightest possible code for this routine!

9) Return to step 6 with the next byte in the sound data.

If it's not FFh, assume it's another byte of sound data, and continue. The "oversampling" factor allows us to play each byte of sound data more than once if we want to, and the reasons for doing so are explained below.

10) Turn the speaker off. (sound is finished)

Best done by disconnecting timer 2 from the speaker, as explained in the last issue.

11) Set the timer 0 frequency back to 18.2Hz (countdown=0)

Exactly as for step 4.

12) Restore the computer's interrupt 8 handler

See step 2.

Oversampling

You'll see that in step 9 I mentioned oversampling, which is a term you may have come across on CD players. I think in CD players it means re-reading an area of the CD a number of times to try to get an error-free reading. This is not what it means when replaying digitised sound.

Here's an experiment. Try playing a digitised sound at 8,000Hz. I warn you, this will sound bad so spare a thought for where and when you do this.

You can do it with VOC-IT with the following command. Remember NOT to do this under Windows. This example uses the PING.WAV sound that comes with Windows:

        VOC-IT /F:8000 PING.WAV
Ugh! Totally awful, as you'll have noticed. Why? One of the side effects of the way we are playing sounds is that we produce a constant tone at the playback frequency. This is called the "carrier" tone.

In other words, if we replay a file at 8,000Hz we also create a constant 8,000Hz tone which is what you just heard, and as you probably noticed it drowns out the sound.

The obvious solution to this is to replay the file at a frequency where the carrier tone is either out of human hearing range or so close to it that the resulting tone only has a negligible effect on the sound quality.

So, for example, to replay the same sound at 16,000Hz:

        VOC-IT /F:16000 PING.WAV
Well, the awful "whine" has gone, but now our sound is playing twice as fast, which should come as no surprise.

The solution to this is to play every byte twice. This will make the sound appear half the speed, and half of 16,000Hz is 8,000Hz which is the frequency we're trying to replay at, but since we are actually playing at 16,000Hz we don't get the 8,000Hz carrier whine.

The act of playing each byte of sound data more that once is called oversampling.

There are two ways to play every byte twice. The first way is, when we load the sound file into memory, to store every byte in memory twice. This certainly works and it makes the sound playing code a lot simpler, but it's wasteful in memory terms and reduces the maximum size of the sound we can load into memory.

In addition, what if we want to replay a sample recorded at 5,000Hz? We would probably oversample (play each byte) three times to achieve 15,000Hz, which would triple our memory requirement.

The best solution, (and for this I send my thanks to my colleague Dave Harvey for helping me work out some very tight code), is simply to keep a count of how many times we play each byte.

Therefore if we want an oversample factor of 2, we count each byte twice, or in other words we only fetch a new byte of sound data every other time we play a byte.

For an oversample of 3, we only get a new byte of sound data every 3 interrupts, and we use each byte 3 times in a row to reprogram timer 2.

Just to show you it does work, here's how to play the same sound again, played at 16,000Hz but this time with an oversample of 2, meaning each byte is played twice.

        VOC-IT /F:16000 /O:2 PING.WAV
Congratulations! You've made it all the way through the theory, and once you understand all this you'll be well on the way to creating your own digitised sound routines!

Advantages and disadvantages of playing digitised sound this way

Clearly the biggest disadvantage of this method of playing digital sound is that the processor is so busy there is no time to do anything else, and you will notice that packages that use digitised sound on the PC speaker (such as Links 386 Pro) "stop" everything else while the sound is playing.

"It is possible to
play digitised sound
'in the background'"

It is possible to play digitised sound "in the background", that is, while other things are going on, but the method we've just presented is not ideal for this. I'm currently researching a new possible method I've encountered and if I can make it work I'll bring it to you in a later issue.

This limitation means that you can't really use a lot of digitised sound effects in your games, and bearing in mind that on some machines the sound is terrible this may be a good thing.

However, a judiciously used sound effect on your title screen, or between stages, can add a lot of atmosphere. Remember, digitised sounds can range from a simple "congratulations" to "game over" to even short bursts of digitised music. The choice is yours.

You will find that VOC-IT will achieve results almost as good as the digital voice on a sound card, depending on the quality of your PC speaker. It is at least as good as the sound engine in two excellent products for the PC speaker, LINKS 386 PRO and PC-STUDIO.

Those of you who have seen the MOD players on the market, such as MODPLAY, and more recently the excellent IPLAY, will know that it's possible to play digitised sound and do a lot of other things besides, such as displaying the waveform. Right now I don't know exactly how this is achieved but I have an inkling.

I will endeavour to find out more if I can and I'll bring the results in future issues.

In conclusion

This has been an extremely technical article and you shouldn't feel bad if you don't understand it all. Several re-readings may be necessary to really take it all in.

It actually took me six weeks to write this article, of which five were pure research and development of VOC-IT and another week writing and revising this text.

"I'll be interested to
hear your "feedback"
over the next few months"

I hope that you find the result worthwhile. I've certainly enjoyed doing the research and I'm quite pleased with VOC-IT. I'll be interested to hear your feedback over the next few months (little sound joke there, did anyone notice? )

Next time...

Now that you know how to play music and digitised sound effects, you really know all there is to know about using the PC speaker, so for the time being we'll end our discussion of it.

Next issue I'm hoping to develop an article explaining the basics of programming Adlib and Sound Blaster cards, but until I sit down to work it out I can't be sure what form this will take.

Anyway, for now, have fun making noise with the speaker and remember that your local BBS offers hundreds or even thousands of digital sound files for you to play using VOC-IT. Many commercial games also leave their sound files in raw format, for example Ultima Underworld I & II come with a wealth of VOC files which VOC-IT happily plays.