-
Notifications
You must be signed in to change notification settings - Fork 274
Configuration
This module defines the format and interface that will provide the RAW data from the microphones. Here are the elements to define:
| Parameter | Type | Description |
|---|---|---|
| fS | uint | Sample rate in samples/sec |
| hopSize | uint | Number of samples acquired on each channel at each frame |
| nBits | uint | Number of bits per sample (must be either 8, 16, 24 or 32) |
| nChannels | uint | Number of audio channels |
Here is an example with a sample rate of 44100 samples/sec, a hop size of 512 samples, 16 bits signed samples and 8 channels:
raw:
{
fS = 44100;
hopSize = 512;
nBits = 16;
nChannels = 8;
interface: {
type = "file";
path = "mics.raw";
};
};
Mapping provides a way to select the microphones which are to be used by ODAS. This allows you to ignore some microphones if the RAW signal provides channels you would like to ignore.
| Parameter | Type | Description |
|---|---|---|
| map | uint | List of the channels used, with index starting at 1 |
Here is an example where the RAW signals contain 8 channels, and you wish to use only channels 1,2,5,6 and 7. Once mapping is done, these channels are now referred to as microphones 1, 2, 3, 4 and 5.
mapping:
{
map: (1,2,5,6,7);
};
This module defines some general parameters that are used by most modules. Here are the elements to define:
| Parameter | Type | Description |
|---|---|---|
| epsilon | float | You should leave this parameter at 1E-20 |
| size.hopSize | uint | You should leave this parameter at 128 |
| size.frameSize | uint | You should leave this parameter at 256 |
| samplerate.mu | uint | You should leave this parameter at 16000 |
| samplerate.sigma2 | float | You should leave this parameter at 0.01 |
| mics.[m].mu.[0] | float | Position mean in x of microphone m |
| mics.[m].mu.[1] | float | Position mean in y of microphone m |
| mics.[m].mu.[2] | float | Position mean in z of microphone m |
| mics.[m].sigma2.[0] | float | Position variance in xx of microphone m |
| mics.[m].sigma2.[1] | float | Position variance in xy of microphone m |
| mics.[m].sigma2.[2] | float | Position variance in xz of microphone m |
| mics.[m].sigma2.[3] | float | Position variance in yx of microphone m |
| mics.[m].sigma2.[4] | float | Position variance in yy of microphone m |
| mics.[m].sigma2.[5] | float | Position variance in yz of microphone m |
| mics.[m].sigma2.[6] | float | Position variance in zx of microphone m |
| mics.[m].sigma2.[7] | float | Position variance in zy of microphone m |
| mics.[m].sigma2.[8] | float | Position variance in zz of microphone m |
| mics.[m].direction.[0] | float | Direction in x of microphone m |
| mics.[m].direction.[1] | float | Direction in y of microphone m |
| mics.[m].direction.[2] | float | Direction in z of microphone m |
| mics.[m].angle.[0] | float | Maximum angle at which gain is 1 for microphone m |
| mics.[m].angle.[1] | float | Minimum angle at which gain is 0 for microphone m |
| spatialfilter[s].direction.[0] | float | Direction in x for space search filter s |
| spatialfilter[s].direction.[1] | float | Direction in y for space search filter s |
| spatialfilter[s].direction.[2] | float | Direction in z for space search filter s |
| spatialfilter[s].angle.[0] | float | Maximum angle at which gain is 1 for space search filter s |
| spatialfilter[s].angle.[1] | float | Minimum angle at which gain is 0 for space search filter s |
| nThetas | uint | You should leave this parameter at 181 |
| gainMin | float | You should leave this parameter at 0.25 |
Here is an example with a 16-microphone array with a cubic shape. Microphone directivity is used as the array is closed, and the microphone variance is diagonal and non-zero for axes that span the surface plane for each microphone:
general:
{
epsilon = 1E-20;
size:
{
hopSize = 128;
frameSize = 256;
};
samplerate:
{
mu = 16000;
sigma2 = 0.01;
};
speedofsound:
{
mu = 343.0;
sigma2 = 25.0;
};
mics = (
# Microphone 1
{
mu = ( +0.1250, -0.0725, +0.0725 );
sigma2 = ( 0.0, 0.0, 0.0, 0.0, +1E-6, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +1.000, +0.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 2
{
mu = ( +0.1250, +0.0725, +0.0725 );
sigma2 = ( 0.0, 0.0, 0.0, 0.0, +1E-6, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +1.000, +0.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 3
{
mu = ( +0.1250, -0.0725, -0.0725 );
sigma2 = ( 0.0, 0.0, 0.0, 0.0, +1E-6, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +1.000, +0.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 4
{
mu = ( +0.1250, +0.0725, -0.0725 );
sigma2 = ( 0.0, 0.0, 0.0, 0.0, +1E-6, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +1.000, +0.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 5
{
mu = ( +0.0725, +0.1250, +0.0725 );
sigma2 = ( +1E-6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +0.000, +1.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 6
{
mu = ( -0.0725, +0.1250, +0.0725 );
sigma2 = ( +1E-6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +0.000, +1.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 7
{
mu = ( +0.0725, +0.1250, -0.0725 );
sigma2 = ( +1E-6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +0.000, +1.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 8
{
mu = ( -0.0725, +0.1250, -0.0725 );
sigma2 = ( +1E-6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +0.000, +1.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 9
{
mu = ( -0.1250, +0.0725, +0.0725 );
sigma2 = ( 0.0, 0.0, 0.0, 0.0, +1E-6, 0.0, 0.0, 0.0, +1E-6 );
direction = ( -1.000, +0.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 10
{
mu = ( -0.1250, -0.0725, +0.0725 );
sigma2 = ( 0.0, 0.0, 0.0, 0.0, +1E-6, 0.0, 0.0, 0.0, +1E-6 );
direction = ( -1.000, +0.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 11
{
mu = ( -0.1250, +0.0725, -0.0725 );
sigma2 = ( 0.0, 0.0, 0.0, 0.0, +1E-6, 0.0, 0.0, 0.0, +1E-6 );
direction = ( -1.000, +0.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 12
{
mu = ( -0.1250, -0.0725, -0.0725 );
sigma2 = ( 0.0, 0.0, 0.0, 0.0, +1E-6, 0.0, 0.0, 0.0, +1E-6 );
direction = ( -1.000, +0.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 13
{
mu = ( -0.0725, -0.1250, +0.0725 );
sigma2 = ( +1E-6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +0.000, -1.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 14
{
mu = ( +0.0725, -0.1250, +0.0725 );
sigma2 = ( +1E-6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +0.000, -1.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 15
{
mu = ( -0.0725, -0.1250, -0.0725 );
sigma2 = ( +1E-6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +0.000, -1.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 16
{
mu = ( +0.0725, -0.1250, -0.0725 );
sigma2 = ( +1E-6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +0.000, -1.000, +0.000 );
angle = ( 80.0, 100.0 );
}
);
# Spatial filter to include only a range of direction if required
# (may be useful to remove false detections from the floor)
spatialfilter: (
{
direction = ( +0.000, +0.000, +1.000 );
angle = (80.0, 90.0);
}
);
nThetas = 181;
gainMin = 0.25;
};
This module stands for the stationnary noise estimation by Minima Control Recursive Averaging (MCRA):
| Parameter | Type | Description |
|---|---|---|
| b | uint | You should leave this parameter at 3 |
| alphaS | float | You should leave this parameter at 0.1 |
| L | uint | You should leave this parameter at 150 |
| delta | float | You should leave this parameter at 3.0 |
| alphaD | float | You should leave this parameter at 0.1 |
Here is an example of what the Stationnary noise estimation module should look like in configuration:
# Stationnary noise estimation
sne:
{
b = 3;
alphaS = 0.1;
L = 150;
delta = 3.0;
alphaD = 0.1;
};
This module generates sources with potential directions of arrival for sound:
| Parameter | Type | Description |
|---|---|---|
| nPots | uint | You should leave this parameter at 4 |
| nMatches | uint | You should leave this parameter at 10 |
| probMin | float | You should leave this parameter at 0.3 |
| nRefinedLevels | uint | You should leave this parameter at 1 |
| interpRate | uint | You should leave this parameter at 1 |
| scans.[0].level | uint | You should leave this parameter at 2 |
| scans.[0].delta | int | You should leave this parameter at -1 |
| scans.[1].level | uint | You should leave this parameter at 4 |
| scans.[2].level | int | You should leave this parameter at -1 |
Here is an example of the Sound Source Localization configuration. In this case, the potential sources are displayed in the terminal:
# Sound Source Localization
ssl:
{
nPots = 4;
nMatches = 10;
probMin = 0.3;
nRefinedLevels = 1;
interpRate = 1;
# Number of scans: level is the resolution of the sphere
# and delta is the size of the maximum sliding window
# (delta = -1 means the size is automatically computed)
scans = (
{ level = 2; delta = -1; },
{ level = 4; delta = -1; }
);
# Output to export potential sources
potential: {
format = "json";
interface: {
type = "terminal";
};
};
};
Sound source tracking can be performed with particle filters or Kalman filters. Kalman filters are recommanded as they provide improved accuracy and reduce the computational load significantly.
| Parameter | Type | Description |
|---|---|---|
| mode | string | Is either "particle" or "kalman" according to method chosen |
| add | string | You should leave this parameter at "dynamic" |
| active.[0].weight | float | You should leave this parameter at 1.0 |
| active.[0].mu | float | You should leave this parameter at 0.3 |
| active.[0].sigma2 | float | You should leave this parameter at 0.0025 |
| inactive.[0].weight | float | You should leave this parameter at 1.0 |
| inactive.[0].mu | float | You should leave this parameter at 0.15 |
| inactive.[0].sigma2 | float | You should leave this parameter at 0.0025 |
| sigmaR2_prob | float | You should leave this parameter at 0.0025 |
| sigmaR2_active | float | You should leave this parameter at 0.0225 |
| sigmaR2_target | float | You should leave this parameter at 0.0025 |
| Pfalse | float | You should leave this parameter at 0.1 |
| Pnew | float | You should leave this parameter at 0.1 |
| Ptrack | float | You should leave this parameter at 0.8 |
| theta_new | float | You should leave this parameter at 0.9 |
| N_prob | uint | You should leave this parameter at 5 |
| theta_prob | float | You should leave this parameter at 0.8 |
| N_inactive.[0] | uint | You should leave this parameter at 150 |
| N_inactive.[1] | uint | You should leave this parameter at 200 |
| N_inactive.[2] | uint | You should leave this parameter at 250 |
| N_inactive.[3] | uint | You should leave this parameter at 250 |
| theta_inactive | float | You should leave this parameter at 0.9 |
| kalman.sigmaQ | float | You should leave this parameter at 0.001 |
| particle.nParticles | uint | You should leave this parameter at 1000 |
| particle.st_alpha | float | You should leave this parameter at 2.0 |
| particle.st_beta | float | You should leave this parameter at 0.04 |
| particle.st_ratio | float | You should leave this parameter at 0.5 |
| particle.ve_alpha | float | You should leave this parameter at 0.05 |
| particle.ve_beta | float | You should leave this parameter at 0.2 |
| particle.ve_ratio | float | You should leave this parameter at 0.3 |
| particle.ac_alpha | float | You should leave this parameter at 0.5 |
| particle.ac_beta | float | You should leave this parameter at 0.2 |
| particle.ac_ratio | float | You should leave this parameter at 0.2 |
| particle.Nmin | float | You should leave this parameter at 0.7 |
Here is an example of the Sound Source Tracking configuration. In this case, the system uses Kalman filters and returns up to four tracked sources in the terminal:
sst:
{
# Mode is either "kalman" or "particle"
mode = "kalman";
# Add is either "static" or "dynamic"
add = "dynamic";
# Parameters used by both the Kalman and particle filter
active = (
{ weight = 1.0; mu = 0.3; sigma2 = 0.0025 }
);
inactive = (
{ weight = 1.0; mu = 0.15; sigma2 = 0.0025 }
);
sigmaR2_prob = 0.0025;
sigmaR2_active = 0.0225;
sigmaR2_target = 0.0025;
Pfalse = 0.1;
Pnew = 0.1;
Ptrack = 0.8;
theta_new = 0.9;
N_prob = 5;
theta_prob = 0.8;
N_inactive = ( 150, 200, 250, 250 );
theta_inactive = 0.9;
# Parameters used by the Kalman filter only
kalman: {
sigmaQ = 0.001;
};
# Parameters used by the particle filter only
particle: {
nParticles = 1000;
st_alpha = 2.0;
st_beta = 0.04;
st_ratio = 0.5;
ve_alpha = 0.05;
ve_beta = 0.2;
ve_ratio = 0.3;
ac_alpha = 0.5;
ac_beta = 0.2;
ac_ratio = 0.2;
Nmin = 0.7;
};
target: ();
# Output to export tracked sources
tracked: {
format = "json";
interface: {
type = "terminal";
};
};
};
Sound source separation allows to enhance the sound source of interest:
| Parameter | Type | Description |
|---|---|---|
| mode_sep | string | You should leave this parameter at "dds" |
| mode_pf | string | You should leave this parameter at "ss" |
| gain_sep | float | Gain to change the volume of the separated stream |
| gain_pf | float | Gain to change the volume of the post-filtered stream |
| dgss.mu | float | You should leave this parameter at 0.01 |
| dgss.lambda | float | You should leave this parameter at 0.5 |
| ms.alphaPmin | float | You should leave this parameter at 0.07 |
| ms.eta | float | You should leave this parameter at 0.5 |
| ms.alphaZ | float | You should leave this parameter at 0.8 |
| ms.thetaWin | float | You should leave this parameter at 0.3 |
| ms.alphaWin | float | You should leave this parameter at 0.3 |
| ms.maxAbsenceProb | float | You should leave this parameter at 0.9 |
| ms.Gmin | float | You should leave this parameter at 0.01 |
| ms.winSizeLocal | uint | You should leave this parameter at 3 |
| ms.winSizeGlobal | uint | You should leave this parameter at 23 |
| ms.winSizeFrame | uint | You should leave this parameter at 256 |
| ss.Gmin | float | You should leave this parameter at 0.01 |
| ss.Gmid | float | You should leave this parameter at 0.9 |
| ss.Gslope | float | You should leave this parameter at 10.0 |
Here is an example where the system outputs the separated and post-filtered signas in files separated.raw and postfiltered.raw. The number of channels correspond to the maximum number of simultaneously tracked sources.
sss:
{
# Separation mode is either "dds" or "dgss"
# Post-filtering mode is either "ms" or "ss"
mode_sep = "dds";
mode_pf = "ss";
gain_sep = 1.0;
gain_pf = 10.0;
dds: {
};
dgss: {
mu = 0.01;
lambda = 0.5;
};
dmvdr: {
};
ms: {
alphaPmin = 0.07;
eta = 0.5;
alphaZ = 0.8;
thetaWin = 0.3;
alphaWin = 0.3;
maxAbsenceProb = 0.9;
Gmin = 0.01;
winSizeLocal = 3;
winSizeGlobal = 23;
winSizeFrame = 256;
};
ss: {
Gmin = 0.01;
Gmid = 0.9;
Gslope = 10.0;
}
separated: {
fS = 16000;
hopSize = 128;
nBits = 16;
interface: {
type = "file";
path = "separated.raw";
};
};
postfiltered: {
fS = 16000;
hopSize = 128;
nBits = 16;
gain = 10.0;
interface: {
type = "file";
path = "postfiltered.raw";
};
};
};
Sound classification needs to be improved. For now just leave the following configuration:
classify:
{
frameSize = 4096;
winSize = 3;
tauMin = 88;
tauMax = 551;
deltaTauMax = 20;
alpha = 0.3;
gamma = 0.05;
phiMin = 0.5;
r0 = 0.2;
category: {
format = "undefined";
interface: {
type = "blackhole";
}
}
};
Provided by IntRoLab, Université de Sherbrooke, Québec, Canada.