Logo

Listeners and Sound Sources

Overview

Device Settings

Devices

To achieve an optimal audio quality you will need to specify the physical audio equipment setup for each listener.

The Immersitech Engage SDK currently supports 2 categories of listening devices: headphones and speakers. The device type is set to headphones by default.

Please see the following documentation to change the device type:

Half-Span Angle

When the device type for a listener is set to speakers you will also need to set the half-span angle property.

Half-span angle refers to the relative angle between the listener and the physical speakers that they are listening to. The angle can be from 1 to 90 degrees (integer values). For quick reference, the following devices roughly correspond to the half span angle listed below:

Speaker TypeHalf-Span Angle (degrees)
Portable speakers7
Laptop speakers15
Large television sound bars25
Studio speakers30 – 45

To find the half span angle for your setup, you can measure the distance from your head to your speakers and also the distance between your speakers to calculate the angle as pictured below:

Please see the following documentation to change the half-span angle:

3D Spatial Audio

Position

The Immersitech Engage SDK allows you to move participants around in a 3D virtual space and process the sound as 3D spatial audio. It is important to know what the x, y, and z coordinates in this space refer to.

The center point of a room is located at (0, 0, 0) which is x, y, and z respectively.

Moving along the x axis will move a participant left or right.

Moving along the y axis will move a participant up or down.

Moving along the z axis will move a participant forward or backward.

Also note that the unit describing these x, y, z coordinates is centimeters. Therefore, a participant at (15,-10,50) is 15 centimeters to the right of the center, ten centimeters down from the center, and 50 centimeters in front of the center.

Heading

To perform an accurate 3D rendering you’ll need to provide which direction a participant is facing. We refer to this measurement as the heading. The heading is composed of an azimuth angle and an elevation angle. The azimuth angle is with respect to the x and z plane. The elevation angle is with respect to the y and z plane.

Please see the following documentation to set position and heading:

3D Mixing

The Immersitech library also has more advanced controls if you would like to fine tune the 3D rendering experience. Note that the following parameters only apply if 3D mixing is enabled.

Attenuation

As a source moves further from a listener, its volume will decrease. This parameter allows you to control precisely the drop in decibels that a source will undergo for each additional meter they move away from the source.

Max Distance

This parameter allows you to specify at which distance you would no longer like attenuation to occur. For example, you can prevent participants from no longer being audible if they move very far away from the listener.

Reverb

By default, reverb is applied to more accurately localize the source in the room. However, you can use this setting to disable reverb if needed. Note that the source will still attenuate even without the reverb enabled.

Please see the following documentation to set these controls:

Understanding Audio Buffer Sizes

When input and output audio may have different sampling rates, it can be confusing to understand the size of the buffers required. The goal of this section is to explain and provide examples to clear this confusion.

When you initialize the Immersitech Library, part of the library configuration is the sampling rate and number of frames as parameters. These values dictate the format of the audio on the OUTPUT side.

When you initialize any participant using some variant of the add participant function, part of the participant configuration is the sampling rate and number of channels as parameters. These values dictate the format of the audio on the INPUT side.

You will submit audio into the input function with the input format you specified and the Immersitech Library will convert the output audio to the output format specified.

The table below will exercise some examples for 10 millisecond buffers:

WhoSampling RateNumber of FramesNumber of ChannelsNumber of Samples
Library Output48 kHz4802960
Library Output32 kHz3201320
Library Output24 kHz2402480
Library Output16 kHz1602320
Library Output8 kHz80180
Participant 1 Input48 kHz4802960
Participant 2 Input48 kHz4801480
Participant 3 Input16 kHz1602320
Participant 4 Input16 kHz1601160
Participant 5 Input8 kHz802160
Participant 6 Input8 kHz80180

It is important to also establish here that the Immersitech library currently supports the output buffer size to either be 10 milliseconds or 20 milliseconds worth of data. For an output sampling rate of 48 kHz, this is either 480 frames or 960 frames while at an output sampling rate of 8 kHz this is 80 frames or 160 frames. We also support buffer sizes of 512 or 1024 if your output sampling rate is 48 kHz for some audio systems that only use power of two buffers.

Glossary

Audio Parameter Notation

In the Immersitech Libraries, we use the notation where one sample is a single value, one frame contains one sample per channel, and one buffer contains one sample period worth of frames. To learn more about this notation, visit this web resource.

Room

A room is a space where participants can hear and speak to all other participants in the room.
Each room can have unique attributes that can change the audio experience for the participants in that room.
A room will have an associated list of seats that participants can occupy, changing their 3D perspective with respective to the other participants.

Center Point

A room will also have a center point, towards which all participants will turn to face automatically if they are placed into a seat.

Seats

A seat is a pre-defined (x,y,z) position that a single participant can occupy.
The seat will also have an automatically generated heading that turns the participant in that seat towards the center point of the room.

Stacking

If you want to move a participant to a seat that is already occupied, the behavior while be defined by the allowSeatStacking property of the room.
If seat stacking is allowed, then both participants will occupy the same position in (x,y,z) space, but have different seat IDs.
If seat stacking is not allowed, then the second participant will occupy the same position shifted in the z-axis by the stackDelta property.

Open Room

If you would like to have complete manual control for the participant’s position and heading, use a room that is an Open Room.
In an Open Room, participants may be moved any where in the (x,y,z) coordinate system and face any direction.
If a participant is moved manually, they are no longer considered to be in a seat.

Implementation Models

Client Side Implementation

  • The SDK is deployed as part of your compiled application.
  • Noise Cancellation and Voice Clarity can be deployed quickly without external dependency.
  • 3D Spatial Audio can be enabled using real-time coordinate information to render sounds in specific locations.

Hybrid Implementation

  • The SDK installed on the client-side handles Noise Cancellation and Voice Clarity.
  • A central server-based solution handles 3D Spatial Audio.
  • The processing power is split between the server and application, ensuring less strain on your participant’s machines.

Server/Cloud Implementation

  • The SDK is connected to server application via a platform-specific adapter.
  • Engage Core will receive audio from all connected clients and send processed streams back through server application.
  • With coordinate information, 3D Spatial Audio can render sounds in real-time in specific locations.

Devices and Half-Span Angle

Quick Start

Overview

The Immersitech SDK is a C/C++ library that functions as an audio mixer and audio processor featuring 3D spatial audio processing, noise cancellation, and speech enhancement.
The Immersitech SDK is currently made for people who have direct access to raw audio data. If you can access this, the SDK can collect them and return to you a raw audio output buffer that has been processed. Additionally, the Immersitech SDK allows you to change the audio settings in real time for any participant.

Installation

Prerequisites

Obtain a License Key

If you haven’t done so already please reach out to us about starting a trial of the Immersitech Engage™ Core SDK now. You will need to obtain a license key before you can use the audio processing features that the sdk offers.

Download the binaries

You’ll need to download the pre-compiled binaries from the Releases section of the immersitech-engage Github repo. Please select the tar file that is specific to the platform where you will be developing with the Immersitech SDK.

Windows

The tar file includes libimmersitech.dll and libimmersitech.lib. You will need to link these files to your project. If you need additional help with Windows development please see the Visual Studio projects in our code examples.

Mac

The tar file includes libimmersitech.dylib. You will need to link this file to your project. If you need additional help with Mac development please see the Makefiles in our code examples.

Linux

The tar file includes libimmersitech.so. You will need to link this file to your project. If you need additional help with Linux development please see the Makefiles in our code examples.

Make sure that your C / C++ libraries are up to at least:

Ubuntu:

GLIBC_2.27

Debian:

GLIBC_2.29

If you plan on using the websocket server feature of the library, you will need to install and link the following libraries to your program:

-lcrypto -lssl

Generic Instructions

To use the Immersitech Library, include immersitech.h in your projects and add the functions to your code. You will also need to make sure to link the dynamic library to your project and ensure it is in the location you linked it to. When you initialize the Immersitech library you will need to provide the correct path to your license file.

The following files are also included in the tar file and are optional for more advanced feature usage:

immersitech_logger.h (optional if you want to implement a custom logger)
immersitech_event_manager.h (optional if you want to implement a custom event manager)
– room_layout.json (optional if you want to use custom room layouts)
– websocket_config.json (optional if you want to use the websocket server)

The Basics

Following the quick start is the best way to dive right into the code, but you may have some questions about some of the main concepts of the Immersitech Engage. That’s why we created The Basics to help you get up to speed with the terminology that is used.

Device Settings

Devices

To achieve an optimal audio quality you will need to specify the physical audio equipment setup for each listener.

The Immersitech Engage SDK currently supports 2 categories of listening devices: headphones and speakers. The device type is set to headphones by default.

Please see the following documentation to change the device type:

Half-Span Angle

When the device type for a listener is set to speakers you will also need to set the half-span angle property.

Half-span angle refers to the relative angle between the listener and the physical speakers that they are listening to. The angle can be from 1 to 90 degrees (integer values). For quick reference, the following devices roughly correspond to the half span angle listed below:

Speaker TypeHalf-Span Angle (degrees)
Portable speakers7
Laptop speakers15
Large television sound bars25
Studio speakers30 – 45

To find the half span angle for your setup, you can measure the distance from your head to your speakers and also the distance between your speakers to calculate the angle as pictured below:

Please see the following documentation to change the half-span angle:

Reference Materials

Noise Cancellation Tutorial

What you’ll learn

In this tutorial you’ll learn how to setup the Immersitech SDK. You’ll create a room, add participants to that room, and apply the noise cancellation effect. You’ll also learn how to simulate the audio for participants to speed up development.

If you’d rather just see the finished application you can find it in our Github examples repository.

Boilerplate

Let’s start by setting up some headers and constants for our project. We’ll add the standard C libraries, include the Immersitech header file, and define some constants that we can use for initializing the Immersitech library.

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <windows.h>

#include "immersitech.h"

#define OUTPUT_SAMPLING_RATE 48000
#define OUTPUT_NUM_FRAMES 480
#define OUTPUT_NUM_CHANNELS 1
#define INTERLEAVED true
#define SPATIAL_QUALITY 1

Initialize the library

Next, let’s create our main function, enable logging, and initialize the Immersitech library.

int main(int argc, char **argv) {

  imm_enable_logging(true);

  imm_error_code error_cod
  imm_library_configuration output_config = {
    OUTPUT_SAMPLING_RATE,
    OUTPUT_NUM_FRAM
    OUTPUT_NUM_CHANNELS,
    INTERLEAVED,
    SPATIAL_QUALITY
  };

  imm_handle imm = imm_initialize_library("Immersitech_Engineering_sound_manager_license_key.dat", NULL, NULL, output_config, &error_code);

  printf("\nUsing Immersitech version: %s", imm_get_version());
  printf("\nLicense Key Info: %s", imm_get_license_info(imm));

}

Go ahead and build the project to make sure that the library accepts your license key. You should see something like the following in your console output:

Using Immersitech version: v1.0.9

License Key Info: { "valid": true, "name": "Immersitech_Engineering_sound_manager_license_key.dat", "department": "Engineering", "minimum_version": "v0.8.0", "maximum_version": "v1.9.999", "creation_date": "3/9/2022", "expiration_date": "8/21/2022" }
Assuming that your license key is valid we can move forward with using the library.

Simulating participant audio

The main use case of the Immersitech SDK is to apply audio effects and mixing to real-time conferencing. During development it may not be practical for you to gather your coworkers together each time you need to test changes to your code. That’s why we’ve provided a tutorial that uses audio files to simulate participants. The input file will simulate the user who is speaking and the output file will simulate what is being heard in the room.

Setting up the input and output buffers

Let’s start by creating a struct to store the header information for the .wav file. We will need to read in the header for the input file so that we can calculate the size of the buffers and determine the format for the output file. Then we’ll write the WAV header to the output file and create our input and output buffers.

If you need some audio files to test with you can download them from our examples repository.

typedef struct wav_header {
  unsigned char chunk_id[4];
  unsigned int chunk_size;
  unsigned char format[4];
  unsigned char subchunk1_id[4];
  unsigned int subchunk1_size;
  unsigned short audio_format;
  unsigned short num_channels;
  unsigned int sample_rate;
  unsigned int byte_rate;
  unsigned short block_align;
  unsigned short bits_per_sample;
  unsigned char subchunk2_id[4];
  unsigned int subchunk2_size;
} wav_header;

// Input files will simulate input audio from each individual participant
FILE* input_file = fopen("input.wav", "rb");

// Output files let you review what each participant hears
FILE* output_file = fopen("output.wav", "wb");

// Read in data about input file
struct wav_header wav_meta_data;

fread(&wav_meta_data, 1, sizeof(wav_header), input_file);

int input_rate = wav_meta_data.sample_rate;
int input_channels = wav_meta_data.num_channels;

printf("Input Configuration: %s || %i Hz || %i Channel(s)\n\n", argv[1], input_rate, input_channels);

// Write WAV Header data to the output file
if (wav_meta_data.sample_rate != OUTPUT_SAMPLING_RATE) {
  wav_meta_data.subchunk2_size *= (OUTPUT_SAMPLING_RATE / wav_meta_data.sample_rate);
  wav_meta_data.sample_rate = OUTPUT_SAMPLING_RATE;
  wav_meta_data.chunk_size = 36 + wav_meta_data.subchunk2_size;
}

fwrite(&wav_meta_data, 1, sizeof(wav_header), output_file);

// Calculate the number of frames we must receive from each input
int input_num_frames = (OUTPUT_NUM_FRAMES * input_rate) / OUTPUT_SAMPLING_RATE;

// Initialize buffers to read input files and write to output files
short* input_buffer = (short*)malloc(input_num_frames * input_channels * sizeof(short));
short* output_buffer = (short*)malloc(OUTPUT_NUM_FRAMES * OUTPUT_NUM_CHANNELS * sizeof(short));

Great! Now we’re set up to read from an audio file and write to an audio output file. One final step we need to take is to setup the library for processing our audio.

Creating a room

Let’s create a room and add two participants into the room. Then, we’ll enable noise cancellation for each of the participants.

// Create and initialize a room to put participants into
int room_id = 0;
imm_create_room(imm, room_id);

// Add Participants into our room
int ID_1 = 1;
int ID_2 = 2;

imm_participant_configuration input_config = { input_rate, input_channels, IMM_PARTICIPANT_REGULAR };

imm_add_participant(imm, room_id, ID_1, "participant_1", input_config);
imm_add_participant(imm, room_id, ID_2, "participant_2", input_config);

// Turn on the noise cancellation for all the participants
imm_set_all_participants_state(imm, room_id, IMM_CONTROL_ANC_ENABLE, 1);

Processing the Audio

At this point we have everything that we need set up for audio processing. We just need to actually process the audio files to simulate a real-time audio stream.

// Loop through a file and buffer it as you would see in a real-time application
while ( !feof(input_file) ) {

  // Read in one buffer of audio from each file
  // Input each buffer into its respective Immersitech Participant within the room
  fread(input_buffer, 1, input_num_frames * input_channels * sizeof(short), input_file);
  imm_input_audio_short(imm, room_id, ID_1, input_buffer, input_num_frames);

  // Now that all the audio is entered into the room for this cycle, we are ready to generate the outputs

  // Generate the output for each participant and save it to their output file
  imm_output_audio_short(imm, room_id, ID_2, output_buffer);
  fwrite(output_buffer, 1, OUTPUT_NUM_FRAMES * OUTPUT_NUM_CHANNELS * sizeof(short), output_file);

}

Cleanup

Finally, let’s free up memory by calling the destructors for the library and closing our input and output buffers.

// Remove all participants from the room
imm_remove_participant( imm, room_id, ID_1 );
imm_remove_participant( imm, room_id, ID_2 );

// Destroy the room
imm_destroy_room(imm, room_id);

// Close and free the library
imm_destroy_library(imm);

// Close input and output files and free input / output buffers
fclose(input_file);
fclose(output_file);
free(input_buffer);
free(output_buffer);

We should now be able to run the application that we’ve written. Just make sure that your input.wav file is included with the project. After running the application you should have a clean, noise-reduced output.wav file!

3D Spatial Audio Tutorial

What you’ll learn

In this tutorial you’ll learn how to setup the Immersitech SDK. You’ll create a room, add participants to that room, and apply 3D mixing. You’ll also learn how to simulate the audio for participants to speed up development.

If you’d rather just see the finished application you can find it in our Github examples repository.

Boilerplate

Let’s set up some headers and constants for our project. We’ll add the standard C libraries, include the Immersitech header file, and define some constants to define our desired output from the library.

The number of channels MUST be set to stereo (2) to hear 3D Audio. Mono output will not allow for 3D listening experiences.

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

#include "immersitech.h"

#define OUTPUT_SAMPLING_RATE 48000
#define OUTPUT_NUM_FRAMES 960
#define OUTPUT_NUM_CHANNELS 2
#define INTERLEAVED true
#define SPATIAL_QUALITY 5

Initialize the library

Next, let’s create our main function, enable logging, and initialize the Immersitech library.

int main(int argc, char **argv) {

  imm_enable_logging(true);

  imm_error_code error_code;
  imm_library_configuration output_config = {
    OUTPUT_SAMPLING_RATE,
    OUTPUT_NUM_FRAMES,
    OUTPUT_NUM_CHANNELS,
    INTERLEAVED,
    SPATIAL_QUALITY
  };

  imm_handle imm = imm_initialize_library("Immersitech_Engineering_sound_manager_license_key.dat", NULL, NULL, output_config, &error_code);

  printf("\nUsing Immersitech version: %s", imm_get_version());
  printf("\nLicense Key Info: %s", imm_get_license_info(imm));

}

Go ahead and build the project to make sure that the library accepts your license key. You should see something like the following in your console output:

  Using Immersitech version: v1.0.9
  License Key Info: { "valid": true, "name": "Immersitech_Engineering_sound_manager_license_key.dat", "department": "Engineering", "minimum_version": "v0.8.0", "maximum_version": "v1.9.999", "creation_date": "3/9/2022", "expiration_date": "8/21/2022" }

Assuming that your license key is valid we can move forward with using the library.

Simulating participant audio

The main use case of the Immersitech SDK is to apply audio effects and mixing to real-time conferencing. During development it may not be practical for you to gather your coworkers together each time you need to test changes to your code. That’s why we’ve provided a tutorial that uses audio files to simulate participants. The input file will simulate the user who is speaking and the output file will simulate what is being heard in the room.

Setting up files and buffers for processing the audio

Let’s start by creating a struct to store the header information for the .wav file and a pointer that we can use for reading the file.

typedef struct header {
  unsigned char chunk_id[4];
  unsigned int chunk_size;
  unsigned char format[4];
  unsigned char subchunk1_id[4];
  unsigned int subchunk1_size;
  unsigned short audio_format;
  unsigned short num_channels;
  unsigned int sample_rate;
  unsigned int byte_rate;
  unsigned short block_align;
  unsigned short bits_per_sample;
  unsigned char subchunk2_id[4];
  unsigned int subchunk2_size;
} header;

typedef struct header* header_p;

We will need to read in the header for the input file so that we can calculate the size of the buffers and determine the format for the output file. Then we’ll write the WAV header to the output file.

For the purposes of this tutorial we’ll just process the input for one participant. If you’d like to simulate audio input for all of the participants in the room then you’ll need to repeat this step for each participant.

// Input files will simulate input audio from each individual participant
FILE* input_file = fopen("input.wav", "rb");

// Output files let you review what each participant hears
FILE* output_file = fopen("output.wav", "wb");

header_p meta = (header_p)malloc(sizeof(header));

fread(meta, 1, sizeof(header), input_file);
int participant_1_sampling_rate = meta->sample_rate;
int participant_1_num_channels = meta->num_channels;
int participant_1_num_input_frames = (OUTPUT_NUM_FRAMES * participant_1_sampling_rate) / OUTPUT_SAMPLING_RATE;

meta->subchunk2_size *= 2;
meta->chunk_size = 36 + meta->subchunk2_size;
meta->num_channels = OUTPUT_NUM_CHANNELS;
meta->sample_rate = OUTPUT_SAMPLING_RATE;
fwrite(meta, 1, sizeof(header), output_file);

Next, we’ll setup our input and output buffers for processing the input audio from the first participant.

// Initialize buffers to read input files and write to output files
short* input_buffer = (short*)malloc(OUTPUT_NUM_FRAMES * OUTPUT_NUM_CHANNELS * sizeof(short));
short* output_buffer = (short*)malloc(OUTPUT_NUM_FRAMES * OUTPUT_NUM_CHANNELS * sizeof(short));

Great! Now we’re set up to read from an audio file and write to an audio output file. One final step we need to take is to setup the library for processing our audio.

Creating a room

Let’s create a room and add two participants into the room. The first participant will be the speaker in this scenario and the second participant will be hearing the first participant.

// Create and initialize a room to put participants into
int room_id = 0;
imm_create_room(imm, room_id);

// Add Participants into our room
int ID_1 = 1;
int ID_2 = 2;
imm_participant_configuration input_config = { participant_1_sampling_rate, participant_1_num_channels, IMM_PARTICIPANT_REGULAR };
imm_add_participant(imm, room_id, ID_1, "participant_1", input_config);
imm_add_participant(imm, room_id, ID_2, "participant_2", input_config);

Next, we’ll assign a position and heading to the first participant in the room. And enable 3D mixing for all participants.


imm_position position = { -60,0,10 };
imm_heading heading = { 0,0 };

imm_set_participant_position(imm, room_id, 1, position, heading );

imm_set_all_participants_state(imm, room_id, IMM_CONTROL_MIXING_3D_ENABLE, 1);
imm_set_all_participants_state(imm, room_id, IMM_CONTROL_DEVICE, IMM_DEVICE_HEADPHONE);

Processing the Audio

At this point we have everything that we need set up for audio processing. We just need to actually process the audio files to simulate a real-time audio application. The code below is an example of reading and writing audio to a single output file for a single participant. To process audio for all participants you’ll need to adjust this code to write to separate output files for each participant.

// Loop through a file and buffer it as you would see in a real-time application
while (!feof(input_file)) {

  // Read in one buffer of audio from each file
  fread(input_buffer, 1, participant_1_num_input_frames * participant_1_num_channels * sizeof(short), input_file);
  imm_input_audio_short(imm, room_id, ID_1, input_buffer, participant_1_num_input_frames);

  // Now that all the audio is entered into the room for this cycle, we are ready to generate the outputs
  imm_output_audio_short(imm, room_id, ID_2, output_buffer);
  fwrite(output_buffer, 1, OUTPUT_NUM_FRAMES * OUTPUT_NUM_CHANNELS * sizeof(short), output_file);
}

Cleanup

Finally, let’s free up memory by calling the destructors for the library and closing our input and output buffers for each participant.

// Remove all participants from the room
imm_remove_participant(imm, room_id, ID_1);
imm_remove_participant(imm, room_id, ID_2);

// Destroy the room
imm_destroy_room(imm, room_id);

// Close and free the library
imm_destroy_library(imm);

// Close input and output files and free input / output buffers
fclose(input_file);
fclose(output_file);
free(input_buffer);
free(output_buffer);

We should now be able to run the application that we’ve written. Just make sure that your input.wav file is included with the project. After running the application you should have a 3D spatialized output.wav file!

3D Spatial Audio

Position

The Immersitech Engage SDK allows you to move participants around in a 3D virtual space and process the sound as 3D spatial audio. It is important to know what the x, y, and z coordinates in this space refer to.

The center point of a room is located at (0, 0, 0) which is x, y, and z respectively.

Moving along the x axis will move a participant left or right.

Moving along the y axis will move a participant up or down.

Moving along the z axis will move a participant forward or backward.

Also note that the unit describing these x, y, z coordinates is centimeters. Therefore, a participant at (15,-10,50) is 15 centimeters to the right of the center, ten centimeters down from the center, and 50 centimeters in front of the center.

Heading

To perform an accurate 3D rendering you’ll need to provide which direction a participant is facing. We refer to this measurement as the heading. The heading is composed of an azimuth angle and an elevation angle. The azimuth angle is with respect to the x and z plane. The elevation angle is with respect to the y and z plane.

Please see the following documentation to set position and heading:

3D Mixing

The Immersitech library also has more advanced controls if you would like to fine tune the 3D rendering experience. Note that the following parameters only apply if 3D mixing is enabled.

Attenuation

As a source moves further from a listener, its volume will decrease. This parameter allows you to control precisely the drop in decibels that a source will undergo for each additional meter they move away from the source.

Max Distance

This parameter allows you to specify at which distance you would no longer like attenuation to occur. For example, you can prevent participants from no longer being audible if they move very far away from the listener.

Reverb

By default, reverb is applied to more accurately localize the source in the room. However, you can use this setting to disable reverb if needed. Note that the source will still attenuate even without the reverb enabled.

Please see the following documentation to set these controls:

Noise Cancellation Tutorial

What you’ll learn

In this tutorial you’ll learn how to setup the Immersitech SDK. You’ll create a room, add participants to that room, and apply the noise cancellation effect. You’ll also learn how to simulate the audio for participants to speed up development.

If you’d rather just see the finished application you can find it in our Github examples repository.

Boilerplate

Let’s start by setting up some headers and constants for our project. We’ll add the standard C libraries, include the Immersitech header file, and define some constants that we can use for initializing the Immersitech library.

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <windows.h>

#include "immersitech.h"

#define OUTPUT_SAMPLING_RATE 48000
#define OUTPUT_NUM_FRAMES 480
#define OUTPUT_NUM_CHANNELS 1
#define INTERLEAVED true
#define SPATIAL_QUALITY 1

Initialize the library

Next, let’s create our main function, enable logging, and initialize the Immersitech library.

int main(int argc, char **argv) {

  imm_enable_logging(true);

  imm_error_code error_cod
  imm_library_configuration output_config = {
    OUTPUT_SAMPLING_RATE,
    OUTPUT_NUM_FRAM
    OUTPUT_NUM_CHANNELS,
    INTERLEAVED,
    SPATIAL_QUALITY
  };

  imm_handle imm = imm_initialize_library("Immersitech_Engineering_sound_manager_license_key.dat", NULL, NULL, output_config, &error_code);

  printf("\nUsing Immersitech version: %s", imm_get_version());
  printf("\nLicense Key Info: %s", imm_get_license_info(imm));

}

Go ahead and build the project to make sure that the library accepts your license key. You should see something like the following in your console output:

Using Immersitech version: v1.0.9

License Key Info: { "valid": true, "name": "Immersitech_Engineering_sound_manager_license_key.dat", "department": "Engineering", "minimum_version": "v0.8.0", "maximum_version": "v1.9.999", "creation_date": "3/9/2022", "expiration_date": "8/21/2022" }
Assuming that your license key is valid we can move forward with using the library.

Simulating participant audio

The main use case of the Immersitech SDK is to apply audio effects and mixing to real-time conferencing. During development it may not be practical for you to gather your coworkers together each time you need to test changes to your code. That’s why we’ve provided a tutorial that uses audio files to simulate participants. The input file will simulate the user who is speaking and the output file will simulate what is being heard in the room.

Setting up the input and output buffers

Let’s start by creating a struct to store the header information for the .wav file. We will need to read in the header for the input file so that we can calculate the size of the buffers and determine the format for the output file. Then we’ll write the WAV header to the output file and create our input and output buffers.

If you need some audio files to test with you can download them from our examples repository.

typedef struct wav_header {
  unsigned char chunk_id[4];
  unsigned int chunk_size;
  unsigned char format[4];
  unsigned char subchunk1_id[4];
  unsigned int subchunk1_size;
  unsigned short audio_format;
  unsigned short num_channels;
  unsigned int sample_rate;
  unsigned int byte_rate;
  unsigned short block_align;
  unsigned short bits_per_sample;
  unsigned char subchunk2_id[4];
  unsigned int subchunk2_size;
} wav_header;

// Input files will simulate input audio from each individual participant
FILE* input_file = fopen("input.wav", "rb");

// Output files let you review what each participant hears
FILE* output_file = fopen("output.wav", "wb");

// Read in data about input file
struct wav_header wav_meta_data;

fread(&wav_meta_data, 1, sizeof(wav_header), input_file);

int input_rate = wav_meta_data.sample_rate;
int input_channels = wav_meta_data.num_channels;

printf("Input Configuration: %s || %i Hz || %i Channel(s)\n\n", argv[1], input_rate, input_channels);

// Write WAV Header data to the output file
if (wav_meta_data.sample_rate != OUTPUT_SAMPLING_RATE) {
  wav_meta_data.subchunk2_size *= (OUTPUT_SAMPLING_RATE / wav_meta_data.sample_rate);
  wav_meta_data.sample_rate = OUTPUT_SAMPLING_RATE;
  wav_meta_data.chunk_size = 36 + wav_meta_data.subchunk2_size;
}

fwrite(&wav_meta_data, 1, sizeof(wav_header), output_file);

// Calculate the number of frames we must receive from each input
int input_num_frames = (OUTPUT_NUM_FRAMES * input_rate) / OUTPUT_SAMPLING_RATE;

// Initialize buffers to read input files and write to output files
short* input_buffer = (short*)malloc(input_num_frames * input_channels * sizeof(short));
short* output_buffer = (short*)malloc(OUTPUT_NUM_FRAMES * OUTPUT_NUM_CHANNELS * sizeof(short));

Great! Now we’re set up to read from an audio file and write to an audio output file. One final step we need to take is to setup the library for processing our audio.

Creating a room

Let’s create a room and add two participants into the room. Then, we’ll enable noise cancellation for each of the participants.

// Create and initialize a room to put participants into
int room_id = 0;
imm_create_room(imm, room_id);

// Add Participants into our room
int ID_1 = 1;
int ID_2 = 2;

imm_participant_configuration input_config = { input_rate, input_channels, IMM_PARTICIPANT_REGULAR };

imm_add_participant(imm, room_id, ID_1, "participant_1", input_config);
imm_add_participant(imm, room_id, ID_2, "participant_2", input_config);

// Turn on the noise cancellation for all the participants
imm_set_all_participants_state(imm, room_id, IMM_CONTROL_ANC_ENABLE, 1);

Processing the Audio

At this point we have everything that we need set up for audio processing. We just need to actually process the audio files to simulate a real-time audio stream.

// Loop through a file and buffer it as you would see in a real-time application
while ( !feof(input_file) ) {

  // Read in one buffer of audio from each file
  // Input each buffer into its respective Immersitech Participant within the room
  fread(input_buffer, 1, input_num_frames * input_channels * sizeof(short), input_file);
  imm_input_audio_short(imm, room_id, ID_1, input_buffer, input_num_frames);

  // Now that all the audio is entered into the room for this cycle, we are ready to generate the outputs

  // Generate the output for each participant and save it to their output file
  imm_output_audio_short(imm, room_id, ID_2, output_buffer);
  fwrite(output_buffer, 1, OUTPUT_NUM_FRAMES * OUTPUT_NUM_CHANNELS * sizeof(short), output_file);

}

Cleanup

Finally, let’s free up memory by calling the destructors for the library and closing our input and output buffers.

// Remove all participants from the room
imm_remove_participant( imm, room_id, ID_1 );
imm_remove_participant( imm, room_id, ID_2 );

// Destroy the room
imm_destroy_room(imm, room_id);

// Close and free the library
imm_destroy_library(imm);

// Close input and output files and free input / output buffers
fclose(input_file);
fclose(output_file);
free(input_buffer);
free(output_buffer);

We should now be able to run the application that we’ve written. Just make sure that your input.wav file is included with the project. After running the application you should have a clean, noise-reduced output.wav file!

Developer Home

Overview

Quick Start

Overview

The Immersitech SDK is a C/C++ library that functions as an audio mixer and audio processor featuring 3D spatial audio processing, noise cancellation, and speech enhancement.
The Immersitech SDK is currently made for people who have direct access to raw audio data. If you can access this, the SDK can collect them and return to you a raw audio output buffer that has been processed. Additionally, the Immersitech SDK allows you to change the audio settings in real time for any participant.

Installation

Prerequisites

Obtain a License Key

If you haven’t done so already please reach out to us about starting a trial of the Immersitech Engage™ Core SDK now. You will need to obtain a license key before you can use the audio processing features that the sdk offers.

Download the binaries

You’ll need to download the pre-compiled binaries from the Releases section of the immersitech-engage Github repo. Please select the tar file that is specific to the platform where you will be developing with the Immersitech SDK.

Windows

The tar file includes libimmersitech.dll and libimmersitech.lib. You will need to link these files to your project. If you need additional help with Windows development please see the Visual Studio projects in our code examples.

Mac

The tar file includes libimmersitech.dylib. You will need to link this file to your project. If you need additional help with Mac development please see the Makefiles in our code examples.

Linux

The tar file includes libimmersitech.so. You will need to link this file to your project. If you need additional help with Linux development please see the Makefiles in our code examples.

Make sure that your C / C++ libraries are up to at least:

Ubuntu:

GLIBC_2.27

Debian:

GLIBC_2.29

If you plan on using the websocket server feature of the library, you will need to install and link the following libraries to your program:

-lcrypto -lssl

Generic Instructions

To use the Immersitech Library, include immersitech.h in your projects and add the functions to your code. You will also need to make sure to link the dynamic library to your project and ensure it is in the location you linked it to. When you initialize the Immersitech library you will need to provide the correct path to your license file.

The following files are also included in the tar file and are optional for more advanced feature usage:

immersitech_logger.h (optional if you want to implement a custom logger)
immersitech_event_manager.h (optional if you want to implement a custom event manager)
– room_layout.json (optional if you want to use custom room layouts)
– websocket_config.json (optional if you want to use the websocket server)

The Basics

Following the quick start is the best way to dive right into the code, but you may have some questions about some of the main concepts of the Immersitech Engage. That’s why we created The Basics to help you get up to speed with the terminology that is used.

Reference Materials

SDK Documentation

Click the button below to explore the SDK Documentation referenced and linked throughout the Developer Pages.

Understanding Audio Buffer Sizes

When input and output audio may have different sampling rates, it can be confusing to understand the size of the buffers required. The goal of this section is to explain and provide examples to clear this confusion.

When you initialize the Immersitech Library, part of the library configuration is the sampling rate and number of frames as parameters. These values dictate the format of the audio on the OUTPUT side.

When you initialize any participant using some variant of the add participant function, part of the participant configuration is the sampling rate and number of channels as parameters. These values dictate the format of the audio on the INPUT side.

You will submit audio into the input function with the input format you specified and the Immersitech Library will convert the output audio to the output format specified.

The table below will exercise some examples for 10 millisecond buffers:

WhoSampling RateNumber of FramesNumber of ChannelsNumber of Samples
Library Output48 kHz4802960
Library Output32 kHz3201320
Library Output24 kHz2402480
Library Output16 kHz1602320
Library Output8 kHz80180
Participant 1 Input48 kHz4802960
Participant 2 Input48 kHz4801480
Participant 3 Input16 kHz1602320
Participant 4 Input16 kHz1601160
Participant 5 Input8 kHz802160
Participant 6 Input8 kHz80180

It is important to also establish here that the Immersitech library currently supports the output buffer size to either be 10 milliseconds or 20 milliseconds worth of data. For an output sampling rate of 48 kHz, this is either 480 frames or 960 frames while at an output sampling rate of 8 kHz this is 80 frames or 160 frames. We also support buffer sizes of 512 or 1024 if your output sampling rate is 48 kHz for some audio systems that only use power of two buffers.

SDK Documentation

Click the button below to explore the SDK Documentation referenced and linked throughout the Developer Pages.

3D Spatial Audio Tutorial

What you’ll learn

In this tutorial you’ll learn how to setup the Immersitech SDK. You’ll create a room, add participants to that room, and apply 3D mixing. You’ll also learn how to simulate the audio for participants to speed up development.

If you’d rather just see the finished application you can find it in our Github examples repository.

Boilerplate

Let’s set up some headers and constants for our project. We’ll add the standard C libraries, include the Immersitech header file, and define some constants to define our desired output from the library.

The number of channels MUST be set to stereo (2) to hear 3D Audio. Mono output will not allow for 3D listening experiences.

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

#include "immersitech.h"

#define OUTPUT_SAMPLING_RATE 48000
#define OUTPUT_NUM_FRAMES 960
#define OUTPUT_NUM_CHANNELS 2
#define INTERLEAVED true
#define SPATIAL_QUALITY 5

Initialize the library

Next, let’s create our main function, enable logging, and initialize the Immersitech library.

int main(int argc, char **argv) {

  imm_enable_logging(true);

  imm_error_code error_code;
  imm_library_configuration output_config = {
    OUTPUT_SAMPLING_RATE,
    OUTPUT_NUM_FRAMES,
    OUTPUT_NUM_CHANNELS,
    INTERLEAVED,
    SPATIAL_QUALITY
  };

  imm_handle imm = imm_initialize_library("Immersitech_Engineering_sound_manager_license_key.dat", NULL, NULL, output_config, &error_code);

  printf("\nUsing Immersitech version: %s", imm_get_version());
  printf("\nLicense Key Info: %s", imm_get_license_info(imm));

}

Go ahead and build the project to make sure that the library accepts your license key. You should see something like the following in your console output:

  Using Immersitech version: v1.0.9
  License Key Info: { "valid": true, "name": "Immersitech_Engineering_sound_manager_license_key.dat", "department": "Engineering", "minimum_version": "v0.8.0", "maximum_version": "v1.9.999", "creation_date": "3/9/2022", "expiration_date": "8/21/2022" }

Assuming that your license key is valid we can move forward with using the library.

Simulating participant audio

The main use case of the Immersitech SDK is to apply audio effects and mixing to real-time conferencing. During development it may not be practical for you to gather your coworkers together each time you need to test changes to your code. That’s why we’ve provided a tutorial that uses audio files to simulate participants. The input file will simulate the user who is speaking and the output file will simulate what is being heard in the room.

Setting up files and buffers for processing the audio

Let’s start by creating a struct to store the header information for the .wav file and a pointer that we can use for reading the file.

typedef struct header {
  unsigned char chunk_id[4];
  unsigned int chunk_size;
  unsigned char format[4];
  unsigned char subchunk1_id[4];
  unsigned int subchunk1_size;
  unsigned short audio_format;
  unsigned short num_channels;
  unsigned int sample_rate;
  unsigned int byte_rate;
  unsigned short block_align;
  unsigned short bits_per_sample;
  unsigned char subchunk2_id[4];
  unsigned int subchunk2_size;
} header;

typedef struct header* header_p;

We will need to read in the header for the input file so that we can calculate the size of the buffers and determine the format for the output file. Then we’ll write the WAV header to the output file.

For the purposes of this tutorial we’ll just process the input for one participant. If you’d like to simulate audio input for all of the participants in the room then you’ll need to repeat this step for each participant.

// Input files will simulate input audio from each individual participant
FILE* input_file = fopen("input.wav", "rb");

// Output files let you review what each participant hears
FILE* output_file = fopen("output.wav", "wb");

header_p meta = (header_p)malloc(sizeof(header));

fread(meta, 1, sizeof(header), input_file);
int participant_1_sampling_rate = meta->sample_rate;
int participant_1_num_channels = meta->num_channels;
int participant_1_num_input_frames = (OUTPUT_NUM_FRAMES * participant_1_sampling_rate) / OUTPUT_SAMPLING_RATE;

meta->subchunk2_size *= 2;
meta->chunk_size = 36 + meta->subchunk2_size;
meta->num_channels = OUTPUT_NUM_CHANNELS;
meta->sample_rate = OUTPUT_SAMPLING_RATE;
fwrite(meta, 1, sizeof(header), output_file);

Next, we’ll setup our input and output buffers for processing the input audio from the first participant.

// Initialize buffers to read input files and write to output files
short* input_buffer = (short*)malloc(OUTPUT_NUM_FRAMES * OUTPUT_NUM_CHANNELS * sizeof(short));
short* output_buffer = (short*)malloc(OUTPUT_NUM_FRAMES * OUTPUT_NUM_CHANNELS * sizeof(short));

Great! Now we’re set up to read from an audio file and write to an audio output file. One final step we need to take is to setup the library for processing our audio.

Creating a room

Let’s create a room and add two participants into the room. The first participant will be the speaker in this scenario and the second participant will be hearing the first participant.

// Create and initialize a room to put participants into
int room_id = 0;
imm_create_room(imm, room_id);

// Add Participants into our room
int ID_1 = 1;
int ID_2 = 2;
imm_participant_configuration input_config = { participant_1_sampling_rate, participant_1_num_channels, IMM_PARTICIPANT_REGULAR };
imm_add_participant(imm, room_id, ID_1, "participant_1", input_config);
imm_add_participant(imm, room_id, ID_2, "participant_2", input_config);

Next, we’ll assign a position and heading to the first participant in the room. And enable 3D mixing for all participants.


imm_position position = { -60,0,10 };
imm_heading heading = { 0,0 };

imm_set_participant_position(imm, room_id, 1, position, heading );

imm_set_all_participants_state(imm, room_id, IMM_CONTROL_MIXING_3D_ENABLE, 1);
imm_set_all_participants_state(imm, room_id, IMM_CONTROL_DEVICE, IMM_DEVICE_HEADPHONE);

Processing the Audio

At this point we have everything that we need set up for audio processing. We just need to actually process the audio files to simulate a real-time audio application. The code below is an example of reading and writing audio to a single output file for a single participant. To process audio for all participants you’ll need to adjust this code to write to separate output files for each participant.

// Loop through a file and buffer it as you would see in a real-time application
while (!feof(input_file)) {

  // Read in one buffer of audio from each file
  fread(input_buffer, 1, participant_1_num_input_frames * participant_1_num_channels * sizeof(short), input_file);
  imm_input_audio_short(imm, room_id, ID_1, input_buffer, participant_1_num_input_frames);

  // Now that all the audio is entered into the room for this cycle, we are ready to generate the outputs
  imm_output_audio_short(imm, room_id, ID_2, output_buffer);
  fwrite(output_buffer, 1, OUTPUT_NUM_FRAMES * OUTPUT_NUM_CHANNELS * sizeof(short), output_file);
}

Cleanup

Finally, let’s free up memory by calling the destructors for the library and closing our input and output buffers for each participant.

// Remove all participants from the room
imm_remove_participant(imm, room_id, ID_1);
imm_remove_participant(imm, room_id, ID_2);

// Destroy the room
imm_destroy_room(imm, room_id);

// Close and free the library
imm_destroy_library(imm);

// Close input and output files and free input / output buffers
fclose(input_file);
fclose(output_file);
free(input_buffer);
free(output_buffer);

We should now be able to run the application that we’ve written. Just make sure that your input.wav file is included with the project. After running the application you should have a 3D spatialized output.wav file!

Glossary

Audio Parameter Notation

In the Immersitech Libraries, we use the notation where one sample is a single value, one frame contains one sample per channel, and one buffer contains one sample period worth of frames. To learn more about this notation, visit this web resource.

Room

A room is a space where participants can hear and speak to all other participants in the room.
Each room can have unique attributes that can change the audio experience for the participants in that room.
A room will have an associated list of seats that participants can occupy, changing their 3D perspective with respective to the other participants.

Center Point

A room will also have a center point, towards which all participants will turn to face automatically if they are placed into a seat.

Seats

A seat is a pre-defined (x,y,z) position that a single participant can occupy.
The seat will also have an automatically generated heading that turns the participant in that seat towards the center point of the room.

Stacking

If you want to move a participant to a seat that is already occupied, the behavior while be defined by the allowSeatStacking property of the room.
If seat stacking is allowed, then both participants will occupy the same position in (x,y,z) space, but have different seat IDs.
If seat stacking is not allowed, then the second participant will occupy the same position shifted in the z-axis by the stackDelta property.

Open Room

If you would like to have complete manual control for the participant’s position and heading, use a room that is an Open Room.
In an Open Room, participants may be moved any where in the (x,y,z) coordinate system and face any direction.
If a participant is moved manually, they are no longer considered to be in a seat.

Implementation Models

Client Side Implementation

  • The SDK is deployed as part of your compiled application.
  • Noise Cancellation and Voice Clarity can be deployed quickly without external dependency.
  • 3D Spatial Audio can be enabled using real-time coordinate information to render sounds in specific locations.

Hybrid Implementation

  • The SDK installed on the client-side handles Noise Cancellation and Voice Clarity.
  • A central server-based solution handles 3D Spatial Audio.
  • The processing power is split between the server and application, ensuring less strain on your participant’s machines.

Server/Cloud Implementation

  • The SDK is connected to server application via a platform-specific adapter.
  • Engage Core will receive audio from all connected clients and send processed streams back through server application.
  • With coordinate information, 3D Spatial Audio can render sounds in real-time in specific locations.