One of the great things about writing software that runs under the .NET platform is the ability to play videos, music files, and to display images. With such a rich platform for multimedia, it’s quite easy to just get consumed in media players and MP3s.
It’s possible, however, to actually make some very, very custom sound generators.
Why Would I Want Custom Sound, Though?
There are many reasons why you might want to create your own sounds in program code. Just to give you a few ideas, here are some situations:
- Sound effects for a gme
- Specialist tone generation (a hearing test)
- Ham radio
The limit really is just your imagination.
So how might we go about playing our sounds? If you’ve spent any time at all around Windows, you’ll be familiar with the “WAV” format file. WAV, or wave files as they are known, are more than just a file type to store audio in; they’re also the data format used to store unpacked sound data in memory so that Windows can play sounds easily.
The MSDN page:
https://msdn.microsoft.com/en-us/library/Microsoft.VisualBasic.Devices.Audio(v=vs.110).aspx
Details the use of the “Audio”class, which is a managed interface around the win32 media APIs. Reading the page, you might be forgiven for thinking this is for Visual Basic only, but it’s possible to also use this in C# by including the Visual Basic bindings in your application.
There is also a C#-specific version of the API in the form of the “SoundPlayer” class. The C# version, however, does not allow you to easily build a byte-based buffer of your wave directly in memory, then simply play that, which is what we’ll be doing in this post.
Let’s Write Some Code
As with most of my posts, start yourself a simple command line project in Visual Studio. Then, once the project is created, add a reference to “MIcrosoft.VisualBasic” by using the Add references menu on your project.
Once this code is added, find a wave file kicking around on your hard drive (I used one called “crooner2.wav” that I found) and add the following code to your console application:
using Microsoft.VisualBasic; using Microsoft.VisualBasic.Devices; namespace MyApp { class Program { static Audio myAudio = new Audio(); static void Main() { myAudio.Play("m:\\crooner2.wav", AudioPlayMode.WaitToComplete); } } }
When you press F5 to run your program, you should hear your sound start to play. Playing a wave in this manner is all well and good, but what happens if you want to repeat it, or play it multiple times?
Just using the standard file play approach like this means that it’s open from your hard disk each time, and thus has the overhead of this operation before it’s played.
If, however, we load the file into a byte array, we can use a variation on the Play method that allows us to play straight from memory, as follows:
using System; using System.IO; using Microsoft.VisualBasic; using Microsoft.VisualBasic.Devices; namespace Scrap { class Program { static Audio myAudio = new Audio(); private static byte[] myWaveData; static void Main() { myWaveData = File.ReadAllBytes("m:\\crooner2.wav"); myAudio.Play(myWaveData, AudioPlayMode.WaitToComplete); Console.WriteLine("Repeating"); myAudio.Play(myWaveData, AudioPlayMode.WaitToComplete); } } }
As you can see from the previous method, we play the same wave twice, straight from memory. There are other flags, too, for AudioPlayMode. As well as waiting for the current play to complete, you can also use ‘Background’ and ‘BackgroundLoop’, allowing your application to get on with other tasks while the audio continues to play.
But What About Custom Sounds?
The key to making your own custom sounds using nothing more than program code is learning what an uncompressed WAV file looks like. Once you understand the layout of these files, you actually can build one directly in memory without loading anything from disk, and, if your code is fast enough, play them in real time, too.
So, what exactly does a WAV file look like?
Like any file, waves have a strictly defined layout of bytes. In the case of a wave, this starts with the file header that looks like the following:
Field | Offset | Number of Bytes | Sample Data |
FileTypeId | 0 | 4 | “RIFF” (String) |
FileLength | 4 | 4 | 12345 (Unsigned Int) |
MediaTypeId | 8 | 4 | “WAVE” (String) |
Following on from that is a section known as the “Format Chunk”. This defines the sample frequency, byte size, and other information Windows needs to know to play the sound. This looks as follows: (NOTE: offsets are from the beginning of this chunk. NOT the file start.)
Field | Offset | Number of Bytes | Sample Data |
ChunkId | 0 | 4 | “fmt ” (String) |
ChunkSize | 4 | 4 | 12345 (Unsigned Int32) |
FormatTag | 8 | 2 | 1 (Unsigned Int16) |
Channels | 10 | 2 | 2 (Unsigned Int16) |
Frequency | 12 | 4 | 44100 (Unsigned Int32) |
AverageBytesPerSec | 16 | 4 | 12345 (Unsigned Int32) |
BlockAlign | 20 | 2 | 1234 (Unsigned Int16) |
BitsPerSample | 22 | 2 | 16 (Unsigned Int16) |
The final part of the puzzle is the audio data itself, which looks as follows:
Field | Offset | Number of Bytes | Sample Data |
ChunkId | 0 | 4 | “data” (String) |
ChunkSize | 4 | 4 | 12345 (Unsigned Int32) |
AudioData | 8 | Numsamples | Array of byte/short |
Initially, this might all look quite complicated, but the reality of it is a lot of the data is either static and doesn’t change, or is calculated automatically based on other values.
The good thing about this is that we can easily turn this into a useful set of C# classes. First, the wave header; add a class to your application called WaveHeader.cs and add the following code:
using System; using System.Collections.Generic; using System.Text; namespace myApp { public class WaveHeader { private const string FILE_TYPE_ID = "RIFF"; private const string MEDIA_TYPE_ID = "WAVE"; public string FileTypeId { get; private set; } public UInt32 FileLength { get; set; } public string MediaTypeId { get; private set; } public WaveHeader() { FileTypeId = FILE_TYPE_ID; MediaTypeId = MEDIA_TYPE_ID; // Minimum size is always 4 bytes FileLength = 4; } public byte[] GetBytes() { List<Byte> chunkData = new List<byte>(); chunkData.AddRange(Encoding.ASCII.GetBytes(FileTypeId)); chunkData.AddRange(BitConverter.GetBytes(FileLength)); chunkData.AddRange(Encoding.ASCII.GetBytes(MediaTypeId)); return chunkData.ToArray(); } } }
You can see that the FileType and MediaType IDs can’t be changed by any application using the class. Only the ‘FileLength’ can, and we’ll come back to that in a moment.
You’ll also see that I’ve added a method called ‘GetBytes’, allowing us to easily get a byte array of the contents of this chunk to add to our play buffer. Continuing on, here’s the code to define a similar class for the “Format Chunk”:
using System; using System.Collections.Generic; using System.Text; namespace myApp { public class FormatChunk { private ushort _bitsPerSample; private ushort _channels; private uint _frequency; private const string CHUNK_ID = "fmt "; public string ChunkId { get; private set; } public UInt32 ChunkSize { get; private set; } public UInt16 FormatTag { get; private set; } public UInt16 Channels { get { return _channels; } set { _channels = value; RecalcBlockSizes(); } } public UInt32 Frequency { get { return _frequency; } set { _frequency = value; RecalcBlockSizes(); } } public UInt32 AverageBytesPerSec { get; private set; } public UInt16 BlockAlign { get; private set; } public UInt16 BitsPerSample { get { return _bitsPerSample; } set { _bitsPerSample = value; RecalcBlockSizes(); } } public FormatChunk() { ChunkId = CHUNK_ID; ChunkSize = 16; FormatTag = 1; // MS PCM (Uncompressed wave file) Channels = 2; // Default to stereo Frequency = 44100; // Default to 44100hz BitsPerSample = 16; // Default to 16bits RecalcBlockSizes(); } private void RecalcBlockSizes() { BlockAlign = (UInt16)(_channels * (_bitsPerSample / 8)); AverageBytesPerSec = _frequency * BlockAlign; } public byte[] GetBytes() { List<Byte> chunkBytes = new List<byte>(); chunkBytes.AddRange(Encoding.ASCII.GetBytes(ChunkId)); chunkBytes.AddRange(BitConverter.GetBytes(ChunkSize)); chunkBytes.AddRange(BitConverter.GetBytes(FormatTag)); chunkBytes.AddRange(BitConverter.GetBytes(Channels)); chunkBytes.AddRange(BitConverter.GetBytes(Frequency)); chunkBytes.AddRange(BitConverter.GetBytes(AverageBytesPerSec)); chunkBytes.AddRange(BitConverter.GetBytes(BlockAlign)); chunkBytes.AddRange(BitConverter.GetBytes(BitsPerSample)); return chunkBytes.ToArray(); } public UInt32 Length() { return (UInt32)GetBytes().Length; } } }
You can see straight away that there’s a lot more to this class than the header. For instance, when you change things like the sample frequency or number of channels, other items like the block size are recalculated on the fly, keeping them in line with the values chosen.
Again, as with the wave header, there’s also a method called GetBytes that returns a byte array representation of the chunk, and a method called Length() tgat returns the length in bytes of the data.
The method Length() is used to get to total size of the chunk, so that the file size in the header can be calculated correctly. Calculating the file sizes for a wave can be a little confusing when you first set out to do the task because it’s not quite as straightforward as you may think.
If you look at the format chunk, you’ll see a field in there called ChunkSize, and you would be correct in making the assumption that it is indeed the size of the format chunk. However, it’s not the full size of the chunk.
Chunk sizes are calculated minus the number of bytes for the ChunkId and the number of bytes for the ChunkSize, which is usually 4 bytes each, making a total of 8 bytes. This means that the value that is added to this field is (in the case of a format chunk) 16, which is the full size (24 bytes) that the length method returns minus the 8 bytes for the header fields.
The complication for most people, however, sets in when calculating the file size in the wave header.
The file size, like the chunk size, ignores the first two fields in the header the FileTypeId and the FileLength. However, it does NOT ignore those same 2 bytes in each chunk.
This means that, when calculating the value to go in this header, you need to use the full value that the “Length()” method we’ve defined returns, and then you need to subtract 8 bytes from that overall total. When calculating the ChunkSizes in the individual chunks, you need to set it to the value returned by the “Length()” method minus 8.
Moving on with the rest of the fields in the chunk, the next one on our list is FormatTag. For everything we’re doing in this post, this needs to stay at 1 which means “MS PCM” or an uncompressed, un-encoded raw wave data file containing values in either 8, 16, 24, or 32 bit resolution.
Next we have Channels and Frequency. Channels should be 2 for stereo (Left & Right) and 1 for mono. The frequency is the sample rate that defaults to 44.1k or 44100 samples per second.
This is the number of data points (or individual sample values) that were captured or will be replayed in one second of time. In the case of this post, I’ve defaulted to standard CD quality.
The higher this value is the more fidelity and accuracy the sound will be, but it will also contain more data. The lesser the value, the lower the quality. You can also change this on the fly to alter the pitch and speed of the data once you’ve defined it, but that’s a bit more complicated than I have space for in this post.
AverageBytesPerSec and BlockAlign are calculated automatically for you, so the ability to set them from outside the class is not allowed. If you want to know the maths behind it, however, you can find them in the “RecalcBlockSizes” method.
Finally we have BitsPerSample. This value represents how many bits are in one sample data point, typically either 8 or 16.
If you use 8 bits, your sample data can only take the range -127 to 127. (Remember that sound is a wave, so 0 is the middle point where silence is.) If you use 16, you get a larger range of numbers, allowing for more quality. For this post, I’m defaulting to CD Quality, which is 44.1khz Stereo at 16 bits. If you want to change that, be aware you’ll have to make changes to other parts of the code, too.
The final chunk we need to turn into code is the “DataChunk” and after the format chunk, this is extremely simple. Create a class in your app called DataChunk.cs and add the following code:
using System; using System.Collections.Generic; using System.Linq; using System.Text; namespace myApp { public class DataChunk { private const string CHUNK_ID = "data"; public string ChunkId { get; private set; } public UInt32 ChunkSize { get; set; } public short[] WaveData { get; private set; } public DataChunk() { ChunkId = CHUNK_ID; ChunkSize = 0; // Until we add some data } public UInt32 Length() { return (UInt32) GetBytes().Length; } public byte[] GetBytes() { List<Byte> chunkBytes = new List<Byte>(); chunkBytes.AddRange(Encoding.ASCII.GetBytes(ChunkId)); chunkBytes.AddRange(BitConverter.GetBytes(ChunkSize)); byte[] bufferBytes = new byte[WaveData.Length * 2]; Buffer.BlockCopy(WaveData, 0, bufferBytes, 0, bufferBytes.Length); chunkBytes.AddRange(bufferBytes.ToList()); return chunkBytes.ToArray(); } public void AddSampleData(short[] leftBuffer, short[] rightBuffer) { WaveData = new short[leftBuffer.Length + rightBuffer.Length]; int bufferOffset = 0; for (int index = 0; index < WaveData.Length; index += 2) { WaveData[index] = leftBuffer[bufferOffset]; WaveData[index + 1] = rightBuffer[bufferOffset]; bufferOffset++; } ChunkSize = (UInt32)WaveData.Length * 2; } } }
As with the other two classes, we have a ‘GetBytes’ and ‘Length’ method that are used to get the byte[] array and chunk length respectively. You’ll also note that this chunk is all data apart from the obligatory 8 byte header on it.
Once thing to pay particular attention to, however, is the WaveData property. You’ll note that this is defined as an array of “Short”.
This is because in the format chunk, we’re declaring a 16 bit sample size. If your sample size was 8 or even 24/32, you’d need to make sure the elements in this array were of the appropriate size. If you don’t, the size calculations will be wrong, and nothing will work correctly.
This does, unfortunately, make things a little messy because, to put it in a memory buffer, we need to pull that data out as standard 8 bit bytes, and, as you can see in the code, we also need to do a bit of marshalling to make sure we add the sample data in correctly, in an interleaved fashion, as the following illustration shows:
Figure 1: Showing bits per channel
1 frame is equal to 1 sample on both channels, so in this case, 1 frame = 16 bits * 2 channels, or 32 bits.
When you’re adding the sample data to the single short buffer, you therefore have to take 16 bits from channel 1 and add that, then 16 bits from channel 2 and add that immediately after, and then increase the pointer on your source data by one, but increase your destination pointer by 2.
Everything then played back on the left channel will be in the green slot and everything on the right in the orange slot.
Once you study it a few times, and then get your head around it, it does get easier. I promise 🙂
If you only had one channel (for example, a mono file), you could just transfer the values from one array to the other without doing anything special. Likewise, if you were handling something like 5.1 surround sound, you’d need to adjust the code to handle an interleave of 6 (5.1 surround uses 6 channels) which again is way beyond the content of this post.
We’re Nearly There…..
At this point, we have everything we need to create the data for a wave file. We could even use these classes to produce a sequence of bytes that you then can write to a file and save your wave.
The last thing we need now is a method of producing some audio data that we can hear. An easy way to do this is to produce sine waves.
I’m not going to go into the theory of sound waves and how they work; there’s plenty of info on Wikipedia if you want to research that. Instead, I’m just going to give you one final class that will make it easy for you to produce sine waves of any given frequency and duration, for use in your projects:
using System; namespace myApp { public class SineGenerator { private readonly double _frequency; private readonly UInt32 _sampleRate; private readonly UInt16 _secondsInLength; private short[] _dataBuffer; public short[] Data { get{ return _dataBuffer; }} public SineGenerator(double frequency, UInt32 sampleRate, UInt16 secondsInLength) { _frequency = frequency; _sampleRate = sampleRate; _secondsInLength = secondsInLength; GenerateData(); } private void GenerateData() { uint bufferSize = _sampleRate * _secondsInLength; _dataBuffer = new short[bufferSize]; int amplitude = 32760; double timePeriod = (Math.PI * 2 * _frequency) / (_sampleRate); for (uint index = 0; index < bufferSize - 1; index++) { _dataBuffer[index] = Convert.ToInt16(amplitude * Math.Sin(timePeriod * index)); } } } }
By using this, you can specify the frequency and duration of your wave, and the frequency of your wave (from the format chunk), allowing you to create pure sound waves of any frequency that you then can add to your data chunks.
If we wrap this all together in our console program, we should find that we can now do the following:
using System; using System.Collections.Generic; using Microsoft.VisualBasic; using Microsoft.VisualBasic.Devices; namespace myApp { class Program { static Audio myAudio = new Audio(); private static byte[] myWaveData; // Sample rate (Or number of samples in one second) private const int SAMPLE_FREQUENCY = 44100; // 60 seconds or 1 minute of audio private const int AUDIO_LENGTH_IN_SECONDS = 1; static void Main() { List<Byte> tempBytes = new List<byte>(); WaveHeader header = new WaveHeader(); FormatChunk format = new FormatChunk(); DataChunk data = new DataChunk(); // Create 1 second of tone at 697Hz SineGenerator leftData = new SineGenerator(697.0f, SAMPLE_FREQUENCY, AUDIO_LENGTH_IN_SECONDS); // Create 1 second of tone at 1209Hz SineGenerator rightData = new SineGenerator(1209.0f, SAMPLE_FREQUENCY, AUDIO_LENGTH_IN_SECONDS); data.AddSampleData(leftData.Data, rightData.Data); header.FileLength += format.Length() + data.Length(); tempBytes.AddRange(header.GetBytes()); tempBytes.AddRange(format.GetBytes()); tempBytes.AddRange(data.GetBytes()); myWaveData = tempBytes.ToArray(); myAudio.Play(myWaveData, AudioPlayMode.WaitToComplete); } } }
When you hit F5, you should hear your two tones play, one on each side of your headphones.
You could easily combine both of these buffers together, so the same was played on both channels, but I’ll leave that as something for you to experiment with.
For those who are curious, the tone you can hear (or the mixture of the two, at least) is the DTMF tone for the number 1 on a telephone keypad. You can easily find the frequencies needed on the Internet, and you could then create a series of samples in memory that when played back down a telephone line would dial the requested number.
If there’s anything you’d like to learn about .NET, or are wondering if there’s a method/API/assembly for performing a common task, please come find me on the Internet by searching for “shawty_ds”, or simply just add a comment below and I’ll do what I can to put together an article on the subject.