Logo

Docy

3D Spatial Audio Tutorial

Estimated reading: 7 minutes

What you’ll learn

In this tutorial you’ll learn how to setup the Immersitech SDK. You’ll create a room, add participants to that room, and apply 3D mixing. You’ll also learn how to simulate the audio for participants to speed up development.

If you’d rather just see the finished application you can find it in our Github examples repository.

Boilerplate

Let’s set up some headers and constants for our project. We’ll add the standard C libraries, include the Immersitech header file, and define some constants to define our desired output from the library.

The number of channels MUST be set to stereo (2) to hear 3D Audio. Mono output will not allow for 3D listening experiences.

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

#include "immersitech.h"

#define OUTPUT_SAMPLING_RATE 48000
#define OUTPUT_NUM_FRAMES 960
#define OUTPUT_NUM_CHANNELS 2
#define INTERLEAVED true
#define SPATIAL_QUALITY 5

Initialize the library

Next, let’s create our main function, enable logging, and initialize the Immersitech library.

int main(int argc, char **argv) {

  imm_enable_logging(true);

  imm_error_code error_code;
  imm_library_configuration output_config = {
    OUTPUT_SAMPLING_RATE,
    OUTPUT_NUM_FRAMES,
    OUTPUT_NUM_CHANNELS,
    INTERLEAVED,
    SPATIAL_QUALITY
  };

  imm_handle imm = imm_initialize_library("Immersitech_Engineering_sound_manager_license_key.dat", NULL, NULL, output_config, &error_code);

  printf("\nUsing Immersitech version: %s", imm_get_version());
  printf("\nLicense Key Info: %s", imm_get_license_info(imm));

}

Go ahead and build the project to make sure that the library accepts your license key. You should see something like the following in your console output:

  Using Immersitech version: v1.0.9
  License Key Info: { "valid": true, "name": "Immersitech_Engineering_sound_manager_license_key.dat", "department": "Engineering", "minimum_version": "v0.8.0", "maximum_version": "v1.9.999", "creation_date": "3/9/2022", "expiration_date": "8/21/2022" }

Assuming that your license key is valid we can move forward with using the library.

Simulating participant audio

The main use case of the Immersitech SDK is to apply audio effects and mixing to real-time conferencing. During development it may not be practical for you to gather your coworkers together each time you need to test changes to your code. That’s why we’ve provided a tutorial that uses audio files to simulate participants. The input file will simulate the user who is speaking and the output file will simulate what is being heard in the room.

Setting up files and buffers for processing the audio

Let’s start by creating a struct to store the header information for the .wav file and a pointer that we can use for reading the file.

typedef struct header {
  unsigned char chunk_id[4];
  unsigned int chunk_size;
  unsigned char format[4];
  unsigned char subchunk1_id[4];
  unsigned int subchunk1_size;
  unsigned short audio_format;
  unsigned short num_channels;
  unsigned int sample_rate;
  unsigned int byte_rate;
  unsigned short block_align;
  unsigned short bits_per_sample;
  unsigned char subchunk2_id[4];
  unsigned int subchunk2_size;
} header;

typedef struct header* header_p;

We will need to read in the header for the input file so that we can calculate the size of the buffers and determine the format for the output file. Then we’ll write the WAV header to the output file.

For the purposes of this tutorial we’ll just process the input for one participant. If you’d like to simulate audio input for all of the participants in the room then you’ll need to repeat this step for each participant.

// Input files will simulate input audio from each individual participant
FILE* input_file = fopen("input.wav", "rb");

// Output files let you review what each participant hears
FILE* output_file = fopen("output.wav", "wb");

header_p meta = (header_p)malloc(sizeof(header));

fread(meta, 1, sizeof(header), input_file);
int participant_1_sampling_rate = meta->sample_rate;
int participant_1_num_channels = meta->num_channels;
int participant_1_num_input_frames = (OUTPUT_NUM_FRAMES * participant_1_sampling_rate) / OUTPUT_SAMPLING_RATE;

meta->subchunk2_size *= 2;
meta->chunk_size = 36 + meta->subchunk2_size;
meta->num_channels = OUTPUT_NUM_CHANNELS;
meta->sample_rate = OUTPUT_SAMPLING_RATE;
fwrite(meta, 1, sizeof(header), output_file);

Next, we’ll setup our input and output buffers for processing the input audio from the first participant.

// Initialize buffers to read input files and write to output files
short* input_buffer = (short*)malloc(OUTPUT_NUM_FRAMES * OUTPUT_NUM_CHANNELS * sizeof(short));
short* output_buffer = (short*)malloc(OUTPUT_NUM_FRAMES * OUTPUT_NUM_CHANNELS * sizeof(short));

Great! Now we’re set up to read from an audio file and write to an audio output file. One final step we need to take is to setup the library for processing our audio.

Creating a room

Let’s create a room and add two participants into the room. The first participant will be the speaker in this scenario and the second participant will be hearing the first participant.

// Create and initialize a room to put participants into
int room_id = 0;
imm_create_room(imm, room_id);

// Add Participants into our room
int ID_1 = 1;
int ID_2 = 2;
imm_participant_configuration input_config = { participant_1_sampling_rate, participant_1_num_channels, IMM_PARTICIPANT_REGULAR };
imm_add_participant(imm, room_id, ID_1, "participant_1", input_config);
imm_add_participant(imm, room_id, ID_2, "participant_2", input_config);

Next, we’ll assign a position and heading to the first participant in the room. And enable 3D mixing for all participants.


imm_position position = { -60,0,10 };
imm_heading heading = { 0,0 };

imm_set_participant_position(imm, room_id, 1, position, heading );

imm_set_all_participants_state(imm, room_id, IMM_CONTROL_MIXING_3D_ENABLE, 1);
imm_set_all_participants_state(imm, room_id, IMM_CONTROL_DEVICE, IMM_DEVICE_HEADPHONE);

Processing the Audio

At this point we have everything that we need set up for audio processing. We just need to actually process the audio files to simulate a real-time audio application. The code below is an example of reading and writing audio to a single output file for a single participant. To process audio for all participants you’ll need to adjust this code to write to separate output files for each participant.

// Loop through a file and buffer it as you would see in a real-time application
while (!feof(input_file)) {

  // Read in one buffer of audio from each file
  fread(input_buffer, 1, participant_1_num_input_frames * participant_1_num_channels * sizeof(short), input_file);
  imm_input_audio_short(imm, room_id, ID_1, input_buffer, participant_1_num_input_frames);

  // Now that all the audio is entered into the room for this cycle, we are ready to generate the outputs
  imm_output_audio_short(imm, room_id, ID_2, output_buffer);
  fwrite(output_buffer, 1, OUTPUT_NUM_FRAMES * OUTPUT_NUM_CHANNELS * sizeof(short), output_file);
}

Cleanup

Finally, let’s free up memory by calling the destructors for the library and closing our input and output buffers for each participant.

// Remove all participants from the room
imm_remove_participant(imm, room_id, ID_1);
imm_remove_participant(imm, room_id, ID_2);

// Destroy the room
imm_destroy_room(imm, room_id);

// Close and free the library
imm_destroy_library(imm);

// Close input and output files and free input / output buffers
fclose(input_file);
fclose(output_file);
free(input_buffer);
free(output_buffer);

We should now be able to run the application that we’ve written. Just make sure that your input.wav file is included with the project. After running the application you should have a 3D spatialized output.wav file!

CONTENTS