Writing a WAVE file
In the previous article, we were able to output audio data, but this data inconveniently cannot be played in an audio player.
This is because the raw samples carry with them no data about what they represent. We know that these were 16-bit signed samples taken at 44.1 KHz, but an audio player would have no idea. A true audio file carries headers that notify an audio player exactly
what the format of our audio data is. This is why we needed to use sox
to generate a WAV file.
As I add to this library, I don't want to have to keep running sox
to test my output; let's update our code so it outputs a WAV file instead of raw data. I'll start in my Main.hs
file with how I would ideally want my code to look:
import Data.ByteString.Lazy.Builder
import System.IO
import WaveMachine.Audio.Tones
import WaveMachine.Audio.Pitch
import WaveMachine.Builders
import WaveMachine.Sampling
audioFn :: Double -> Double
audioFn = applyPitch middleC sineWave
channels = 1
sampleRate = 44100
bitDepth = 16
samples = sampleInt16 audioFn sampleRate 5
waveFile = WaveFile channels sampleRate bitDepth samples
main :: IO ()
main = hPutBuilder stdout $ waveFilePcm16Builder waveFile
I didn't cover this in my article, but as you can see, I have divided out the code into separate modules to make getting my head
around things more managable. The main
function now invokes a waveFilePcm16Builder
, which we will write, and which takes a
waveFile
variable. This waveFile
value is defined with some weird WaveFile
function call. What is that? WaveFile
is a
data constructor which we will also be creating.
You may also wonder what that $
character is doing in there. Since Haskell is a functional language, you'll find yourself using a lot of parentheses to keep operations in the intended order. Parentheses are fine, but sometimes the sheer number of them can get
out of hand. $
is a convenient operator that basically means "treat the rest as though it's parenthesized." In other words, the
above is exactly equivalent to hPutBuilder stdout (waveFile16Builder waveFile)
. As you get used to it, you start to think of it
as a convenient way to chain together functions.
OK, let's go to the meat of things. I've placed this new WaveFile
and waveFile16Builder
in WaveMachine/Builders.hs
(hence the import WaveMachine.Builders
above). Let's look at the contents of this file a piece at a time, starting with WaveFile
:
data WaveFile a = WaveFile Int Int Int [a] -- channels, sampleRate, bitDepth, samples
When I first learned Haskell, I found code like this very confusing; let me explain. data
defines a new data type. The name of
our new data type is WaveFile a
; what is this a
nonsense? This is a type parameter. If you've used C++, C#, or Java, this is
similar to a template parameter, or "generics." The reason why we're adding a parameter here is because a wave file can contain
samples made up of a variety of data types. When we use a
in our declaration, we're saying you can substitute it for any type.
The part after the =
is also confusing. We're defining WaveFile
right? So why are we using that symbol again? Haskell has this
concept of an algebraic type, which is sort of a combination of an enum
and struct
from other languages. The reason it's
called algebraic is more technical than I'll get into. The way you create a value for an algebraic type is through a constructor, which is what this second WaveFile
symbol is. Our WaveFile
constructor takes three Int
values, representing the number of channels, sample rate, and bit depth, and then an array of type a
. When we build a WaveFile
, we decide what that type will be based on the value we pass.
One thing that really irks me about Haskell is that they have a really nice alternative to creating types called records that help you avoid long lists of parameters like ours we're using; however, this alternative has some inconvenient limitations. I won't go into details at this time, but if you're interested, there's an extension to Haskell which is close to becoming a standard which will fix this. Here's an good article explaining the problem and solution.
So now we have a way to describe a wave file through the WaveFile
type, but how do we write one? To understand this, we first
have to understand RIFF files.
RIFF (Resource Interchange File Format) is a general purpose container file format composed of "chunks":
data RiffFile = RiffFile [RiffChunk]
A chunk is made up of a header containing a 4 character tag, the length of the chunk's content, then the content itself. There are several defined chunk types, but we're only going to be concerned with the top-level "form" chunk, the wave format chunk, and the chunk with the sound samples:
data RiffHeader = RiffHeader String Int
data RiffChunk =
RiffChunk RiffHeader [Word8]
| RiffFormChunk String [RiffChunk]
| WaveFormatChunk Int Int Int Int [Word8] -- format, channels, sampleRate, bitDepth, extra
| WaveInt16SamplesChunk [Int16]
What's going on with this RiffChunk
definition? RiffChunk
is an algebraic type that can take multiple forms, and thus has
multiple constructors. This will allow us to write some polymorphic functions later on.
The basic content of a 16-bit PCM WAV file, which uses this RIFF container, is as follows:
pcmFormat = 1
riffFileForWavePcm16 :: WaveFile Int16 -> RiffFile
riffFileForWavePcm16 (WaveFile channels sampleRate bitDepth samples) =
RiffFile [
RiffFormChunk "WAVE" [
WaveFormatChunk pcmFormat channels sampleRate bitDepth [],
WaveInt16SamplesChunk samples ] ]
What do I mean by "PCM?" The samples we have been creating are in a PCM format, which stands for pulse-code modulation. Generally speaking, this term is used when referring to uncompressed data, especially audio data like we've been producing.
As you can see, the structure of a WAV file is a top-level "form" chunk which contains a format descriptor of WAVE
; this is how
the consumer of a RIFF file knows this is a WAVE file. The form chunk then contains sub-chunks. We're only going to write two of
these sub-chunks: one that describes the format of the data, and another that contains the data itself.
Now let's write the Builder
that will output the bytes of a WAVE file:
waveFilePcm16Builder :: WaveFile Int16 -> Builder
waveFilePcm16Builder waveFile = riffFileBuilder $ riffFileForWavePcm16 waveFile
This just delegates to a builder for RIFF files, which looks like this:
riffFileBuilder :: RiffFile -> Builder
riffFileBuilder (RiffFile chunks) = mconcat $ map riffChunkBuilder chunks
Since a RIFF file is simply a collection of chunks, we'll just build every chunk. Here's the builder for a generic chunk:
riffChunkBuilder :: RiffChunk -> Builder
riffChunkBuilder (RiffChunk header bytes) = mconcat (
(riffHeaderBuilder header)
: (map word8 bytes))
As you can see, we build the header for the chunk, then use this word8
builder to output all of the bytes in the chunk
(Word8
in Haskell terms is what we generally call bytes in other contexts; it is a much more accurate term, since byte doesn't always meant "8 bits" in some obscure contexts).
Writing the header for a chunk is simple enough:
riffHeaderBuilder :: RiffHeader -> Builder
riffHeaderBuilder (RiffHeader tag size) = mconcat [
string8 tag,
word32LE $ fromIntegral size ]
We write the tag with string8
, which is basically ASCII characters (an 8-bit encoding), then the size of the chunk with word32LE
(the LE
suffix throughout our builders stands for little-endian, which we discussed in the last article). So what is this fromIntegral
? Because Haskell is very type-safe, and because we're going to be outputting numbers in a variety of formats, there's going to be a lot of type conversions going on. fromIntegral
is a really convenient function that will convert any integral (like
integers) into the required type for the function being called. In this example, size
is an Int
, and we're converting it to a Word32
, which is required by the word32LE
builder.
In order to write these headers, we need to know the size of the chunks:
sizeOfRiffChunks :: [RiffChunk] -> Int
sizeOfRiffChunks chunks = sum $ map sizeOfRiffChunk chunks
sizeOfRiffChunk :: RiffChunk -> Int
sizeOfRiffChunk chunk = sizeOfRiffHeader + (sizeOfRiffContent chunk)
sizeOfRiffHeader :: Int
sizeOfRiffHeader = 8
sizeOfRiffContent :: RiffChunk -> Int
sizeOfRiffContent (RiffChunk _ bytes) = length bytes
sizeOfRiffContent (RiffFormChunk _ chunks) = 4 + (sizeOfRiffChunks chunks)
sizeOfRiffContent (WaveFormatChunk _ _ _ _ extra) = 18 + (length extra)
sizeOfRiffContent (WaveInt16SamplesChunk samples) = 2 * (length samples)
Notice how we have several definitions of sizeOfRiffContent
? This is how we can polymorphically have different results
depending on which variation of RiffChunk
we're using. So what are all those _
s? If your function doesn't care about a
parameter, a common convention is to use _
as a placeholder so that it doesn't take up unnecessary space in your function.
Now that we know the chunk sizes, let's write the builders for our chunks. We already wrote a version of riffChunkBuilder
as a
reference for general-purpose chunks, but what we really need is specific builder logic for each kind:
riffChunkBuilder (RiffFormChunk form chunks) = mconcat (
[
riffHeaderBuilder (RiffHeader "RIFF" (4 + (sizeOfRiffChunks chunks))),
string8 "WAVE"
] ++ (map riffChunkBuilder chunks))
riffChunkBuilder (WaveFormatChunk format channels sampleRate bitDepth extraBytes) =
mconcat ([
riffHeaderBuilder $ RiffHeader "fmt " (fromIntegral $ 18 + (length extraBytes)),
word16LE $ fromIntegral format,
word16LE $ fromIntegral channels,
word32LE $ fromIntegral sampleRate,
word32LE $ fromIntegral $ sampleRate * channelsSampleSize,
word16LE $ fromIntegral channelsSampleSize,
word16LE $ fromIntegral bitDepth,
word16LE $ fromIntegral $ length extraBytes
] ++ (map word8 extraBytes))
where channelsSampleSize = channels * (bitDepth `div` 8)
riffChunkBuilder (WaveInt16SamplesChunk samples) = mconcat (
riffHeaderBuilder (RiffHeader "data" (2 * (fromIntegral $ length samples)))
: (map int16LE samples))
The RiffFormChunk
chunk is the top-level chunk for a RIFF file; as you can see, we write the header with a RIFF
tag, and a size of 4 + the size of all of the sub-chunks. The 4 bytes are for the form tag, which in this case we write as WAVE
. Note that we are recursively calling riffChunkBuilder
since sub-chunks for a RiffFormChunk
have the same data type.
The builder for WaveFormatChunk
uses a tag of fmt
. The size of this chunk is 18, with optional extra bytes added to the end.
The content contains the details about the format (PCM), number of channels, sample rate, and bit depth, presented in an oddly
redundant format.
The builder for WaveInt16SamplesChunk
write a data
tag then the size of the chunk, which in this case is two bytes per sample.
After all that code, we're finally done! To see it all together, you can browse the article-2 branch or checkout the article-2
branch. If you check out this branch and build, you can now write a WAV file directly, without sox
:
sh build.sh
./wave-machine > sine.wav
Or, if you have a command-line application that can play waves, you can create the wave and play it all in one shot:
./wave-machine | mplayer -
You can play the file here:
To see the code for this article, check out the article-2 branch.
Alright, now that we've made wave-machine
more convenient to use, we can more-easily experiment with more audio goodness. In the next article, I'm going to explore how to play two tones simultaneously.