Wave-machine

Creating music the hard way

Writing a WAVE file

In the previous article, we were able to output audio data, but this data inconveniently cannot be played in an audio player. This is because the raw samples carry with them no data about what they represent. We know that these were 16-bit signed samples taken at 44.1 KHz, but an audio player would have no idea. A true audio file carries headers that notify an audio player exactly what the format of our audio data is. This is why we needed to use sox to generate a WAV file.

As I add to this library, I don't want to have to keep running sox to test my output; let's update our code so it outputs a WAV file instead of raw data. I'll start in my Main.hs file with how I would ideally want my code to look:

import Data.ByteString.Lazy.Builder
import System.IO
import WaveMachine.Audio.Tones
import WaveMachine.Audio.Pitch
import WaveMachine.Builders
import WaveMachine.Sampling

audioFn :: Double -> Double
audioFn = applyPitch middleC sineWave

channels = 1
sampleRate = 44100
bitDepth = 16
samples = sampleInt16 audioFn sampleRate 5
waveFile = WaveFile channels sampleRate bitDepth samples

main :: IO ()
main = hPutBuilder stdout $ waveFilePcm16Builder waveFile

I didn't cover this in my article, but as you can see, I have divided out the code into separate modules to make getting my head around things more managable. The main function now invokes a waveFilePcm16Builder, which we will write, and which takes a waveFile variable. This waveFile value is defined with some weird WaveFile function call. What is that? WaveFile is a data constructor which we will also be creating.

You may also wonder what that $ character is doing in there. Since Haskell is a functional language, you'll find yourself using a lot of parentheses to keep operations in the intended order. Parentheses are fine, but sometimes the sheer number of them can get out of hand. $ is a convenient operator that basically means "treat the rest as though it's parenthesized." In other words, the above is exactly equivalent to hPutBuilder stdout (waveFile16Builder waveFile). As you get used to it, you start to think of it as a convenient way to chain together functions.

OK, let's go to the meat of things. I've placed this new WaveFile and waveFile16Builder in WaveMachine/Builders.hs (hence the import WaveMachine.Builders above). Let's look at the contents of this file a piece at a time, starting with WaveFile:

data WaveFile a = WaveFile Int Int Int [a] -- channels, sampleRate, bitDepth, samples

When I first learned Haskell, I found code like this very confusing; let me explain. data defines a new data type. The name of our new data type is WaveFile a; what is this a nonsense? This is a type parameter. If you've used C++, C#, or Java, this is similar to a template parameter, or "generics." The reason why we're adding a parameter here is because a wave file can contain samples made up of a variety of data types. When we use a in our declaration, we're saying you can substitute it for any type.

The part after the = is also confusing. We're defining WaveFile right? So why are we using that symbol again? Haskell has this concept of an algebraic type, which is sort of a combination of an enum and struct from other languages. The reason it's called algebraic is more technical than I'll get into. The way you create a value for an algebraic type is through a constructor, which is what this second WaveFile symbol is. Our WaveFile constructor takes three Int values, representing the number of channels, sample rate, and bit depth, and then an array of type a. When we build a WaveFile, we decide what that type will be based on the value we pass.

One thing that really irks me about Haskell is that they have a really nice alternative to creating types called records that help you avoid long lists of parameters like ours we're using; however, this alternative has some inconvenient limitations. I won't go into details at this time, but if you're interested, there's an extension to Haskell which is close to becoming a standard which will fix this. Here's an good article explaining the problem and solution.

So now we have a way to describe a wave file through the WaveFile type, but how do we write one? To understand this, we first have to understand RIFF files.

RIFF (Resource Interchange File Format) is a general purpose container file format composed of "chunks":

data RiffFile = RiffFile [RiffChunk]

A chunk is made up of a header containing a 4 character tag, the length of the chunk's content, then the content itself. There are several defined chunk types, but we're only going to be concerned with the top-level "form" chunk, the wave format chunk, and the chunk with the sound samples:

data RiffHeader = RiffHeader String Int

data RiffChunk =
    RiffChunk RiffHeader [Word8]
    | RiffFormChunk String [RiffChunk]
    | WaveFormatChunk Int Int Int Int [Word8] -- format, channels, sampleRate, bitDepth, extra
    | WaveInt16SamplesChunk [Int16]

What's going on with this RiffChunk definition? RiffChunk is an algebraic type that can take multiple forms, and thus has multiple constructors. This will allow us to write some polymorphic functions later on.

The basic content of a 16-bit PCM WAV file, which uses this RIFF container, is as follows:

pcmFormat = 1

riffFileForWavePcm16 :: WaveFile Int16 -> RiffFile
riffFileForWavePcm16 (WaveFile channels sampleRate bitDepth samples) =
    RiffFile [ 
        RiffFormChunk "WAVE" [ 
            WaveFormatChunk pcmFormat channels sampleRate bitDepth [],
            WaveInt16SamplesChunk samples ] ]

What do I mean by "PCM?" The samples we have been creating are in a PCM format, which stands for pulse-code modulation. Generally speaking, this term is used when referring to uncompressed data, especially audio data like we've been producing.

As you can see, the structure of a WAV file is a top-level "form" chunk which contains a format descriptor of WAVE; this is how the consumer of a RIFF file knows this is a WAVE file. The form chunk then contains sub-chunks. We're only going to write two of these sub-chunks: one that describes the format of the data, and another that contains the data itself.

Now let's write the Builder that will output the bytes of a WAVE file:

waveFilePcm16Builder :: WaveFile Int16 -> Builder
waveFilePcm16Builder waveFile = riffFileBuilder $ riffFileForWavePcm16 waveFile

This just delegates to a builder for RIFF files, which looks like this:

riffFileBuilder :: RiffFile -> Builder
riffFileBuilder (RiffFile chunks) = mconcat $ map riffChunkBuilder chunks

Since a RIFF file is simply a collection of chunks, we'll just build every chunk. Here's the builder for a generic chunk:

riffChunkBuilder :: RiffChunk -> Builder
riffChunkBuilder (RiffChunk header bytes) = mconcat ( 
    (riffHeaderBuilder header)
    : (map word8 bytes))

As you can see, we build the header for the chunk, then use this word8 builder to output all of the bytes in the chunk (Word8 in Haskell terms is what we generally call bytes in other contexts; it is a much more accurate term, since byte doesn't always meant "8 bits" in some obscure contexts).

Writing the header for a chunk is simple enough:

riffHeaderBuilder :: RiffHeader -> Builder
riffHeaderBuilder (RiffHeader tag size) = mconcat [ 
    string8 tag, 
    word32LE $ fromIntegral size ]

We write the tag with string8, which is basically ASCII characters (an 8-bit encoding), then the size of the chunk with word32LE (the LE suffix throughout our builders stands for little-endian, which we discussed in the last article). So what is this fromIntegral? Because Haskell is very type-safe, and because we're going to be outputting numbers in a variety of formats, there's going to be a lot of type conversions going on. fromIntegral is a really convenient function that will convert any integral (like integers) into the required type for the function being called. In this example, size is an Int, and we're converting it to a Word32, which is required by the word32LE builder.

In order to write these headers, we need to know the size of the chunks:

sizeOfRiffChunks :: [RiffChunk] -> Int
sizeOfRiffChunks chunks = sum $ map sizeOfRiffChunk chunks

sizeOfRiffChunk :: RiffChunk -> Int
sizeOfRiffChunk chunk = sizeOfRiffHeader + (sizeOfRiffContent chunk)

sizeOfRiffHeader :: Int
sizeOfRiffHeader = 8

sizeOfRiffContent :: RiffChunk -> Int
sizeOfRiffContent (RiffChunk _ bytes)             = length bytes
sizeOfRiffContent (RiffFormChunk _ chunks)        = 4 + (sizeOfRiffChunks chunks) 
sizeOfRiffContent (WaveFormatChunk _ _ _ _ extra) = 18 + (length extra)
sizeOfRiffContent (WaveInt16SamplesChunk samples) = 2 * (length samples)

Notice how we have several definitions of sizeOfRiffContent? This is how we can polymorphically have different results depending on which variation of RiffChunk we're using. So what are all those _s? If your function doesn't care about a parameter, a common convention is to use _ as a placeholder so that it doesn't take up unnecessary space in your function.

Now that we know the chunk sizes, let's write the builders for our chunks. We already wrote a version of riffChunkBuilder as a reference for general-purpose chunks, but what we really need is specific builder logic for each kind:

riffChunkBuilder (RiffFormChunk form chunks) = mconcat ( 
    [
        riffHeaderBuilder (RiffHeader "RIFF" (4 + (sizeOfRiffChunks chunks))),
        string8 "WAVE"
    ] ++ (map riffChunkBuilder chunks))


riffChunkBuilder (WaveFormatChunk format channels sampleRate bitDepth extraBytes) = 
    mconcat ([ 
        riffHeaderBuilder $ RiffHeader "fmt " (fromIntegral $ 18 + (length extraBytes)), 
        word16LE $ fromIntegral format,
        word16LE $ fromIntegral channels,
        word32LE $ fromIntegral sampleRate,
        word32LE $ fromIntegral $ sampleRate * channelsSampleSize,
        word16LE $ fromIntegral channelsSampleSize,
        word16LE $ fromIntegral bitDepth,
        word16LE $ fromIntegral $ length extraBytes
    ] ++ (map word8 extraBytes))
    where channelsSampleSize = channels * (bitDepth `div` 8)


riffChunkBuilder (WaveInt16SamplesChunk samples) = mconcat (
    riffHeaderBuilder (RiffHeader "data" (2 * (fromIntegral $ length samples)))
    : (map int16LE samples))

The RiffFormChunk chunk is the top-level chunk for a RIFF file; as you can see, we write the header with a RIFF tag, and a size of 4 + the size of all of the sub-chunks. The 4 bytes are for the form tag, which in this case we write as WAVE. Note that we are recursively calling riffChunkBuilder since sub-chunks for a RiffFormChunk have the same data type.

The builder for WaveFormatChunk uses a tag of fmt. The size of this chunk is 18, with optional extra bytes added to the end. The content contains the details about the format (PCM), number of channels, sample rate, and bit depth, presented in an oddly redundant format.

The builder for WaveInt16SamplesChunk write a data tag then the size of the chunk, which in this case is two bytes per sample.

After all that code, we're finally done! To see it all together, you can browse the article-2 branch or checkout the article-2 branch. If you check out this branch and build, you can now write a WAV file directly, without sox:

sh build.sh
./wave-machine > sine.wav

Or, if you have a command-line application that can play waves, you can create the wave and play it all in one shot:

./wave-machine | mplayer -

You can play the file here:

To see the code for this article, check out the article-2 branch.

Alright, now that we've made wave-machine more convenient to use, we can more-easily experiment with more audio goodness. In the next article, I'm going to explore how to play two tones simultaneously.