[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4. The SPro library

This chapter describes the main functions of the SPro library and should be sufficient for most implementations using the library. For more details, the reader is invited to read the source code which is, and will probably ever be, the most detailed and up-to-date description of what a function does. In particular, the library header `spro.h' gives a lot of details about functions arguments. The SPro tools(11) are good example on the use of the library functions.

Basic type definitions are voluntarily not given in the manual. Wherever necessary, accessors are given to access the most crucial members of structured types and, unless not possible otherwise, direct access should be avoided as much as possible in order to ensure a better compatibility with future versions of the library. For sake of rapidity, these accessors are mostly macros rather than functions. These accessors are described in the relevant sections.

4.1 Waveform streams  Functions related to waveforms
4.2 Feature description flags  Describing feature vector contents
4.3 Feature streams  Reading and writing features
4.4 Storing features without streams  I/O with feature buffers
4.5 Feature conversion  Adding delta features, CMS, etc...
4.6 FFT-based functions  FFT analysis functions
4.7 LPC-based functions  LPC analysis functions
4.8 Miscellaneous functions  Whatever could not go anywhere else


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.1 Waveform streams

This section describes functions related to waveforms, or equivalently signals. From now on, the term signal will be used as a synonym to waveform unless otherwise specified. Functions related to signals are usually prefixed with sig_ and located in `sig.c' and `misc.c'.

4.1.1 Memory allocation  Memory allocation for waveforms
4.1.2 Opening streams  Opening waveform streams for reading
4.1.3 Reading frames  Reading frames from a waveform stream
4.1.4 Computing frame energy  


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.1.1 Memory allocation

Waveforms, or signals, are stored in a variable whose type is spsig_t. This type is not intended for storing waveform streams, i.e. the entire waveform for a document, but rather the frame samples. Therefore, no I/O functions are provided for this data type. Every signal processing function which operates on a frame takes as input a variable of the type spsig_t. Memory allocation for a signal is performed using sig_alloc and released using sig_free.

Function: spsig_t * sig_alloc (unsigned long *n)
Allocate memory for a signal containing n samples. Return a pointer to the allocated structure or NULL in case of error.

Function: void sig_free (spsig_t *p)
Free memory allocated for a signal using sig_alloc.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.1.2 Opening streams

Signals are usually read from a stream, i.e. a collection of samples, from which the frames are made. As the SPro library has been designed to process signals into feature vectors, signal streams are solely input streams and no output function is provided. Therefore, a signal stream is always opened in read mode. The following two functions are used to open a stream for reading and to close the stream when all is done. Reading frames from a stream is explained in the next section.

Function: sigstream_t * sig_stream_open (const char *fn, int fmt, float Fs, size_t nbytes, int swap)
Open stream in file fn in read mode, where the file format is fmt. If fn is NULL, input will be made from stdin. Valid file formats are SPRO_SIG_PCM16_FORMAT, SPRO_SIG_WAVE_FORMAT and SPRO_SIG_SPHERE_FORMAT if the library has been compiled to support the SPHERE file format. If fmt is SPRO_SIG_PCM16_FORMAT, the sample rate Fs (in Hz) must be specified. Otherwise, the sample rate is read from the header and Fs is ignored. The input buffer size is specified by nbytes, which means nbytes bytes will be allocated for input. If swap is non null, byte swapping is performed on the samples after reading them. Return a pointer to the opened signal stream or NULL in case of error.

Function: void sig_stream_close (sigstream_t *f)
Close a signal stream opened with sig_stream_open, releasing allocated memory.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.1.3 Reading frames

Though possible, accessing directly samples in the stream is not the purpose of signal streams in SPro. Indeed, speech processing is based on the processing of successive overlapping frames. The library provides function to access directly to frames, such as get_next_sig_frame which returns frame samples which can be weighted using sig_weight. Weighting vectors for standard signal processing windows are created using set_sig_win.

Function: int get_next_sig_frame (sigstream_t *f, int ch, int l, int d, float k, sample_t *buf)
Read next frame from channel ch in stream f. Frames are l samples long with a shift of d samples between successive frames. Frame samples are returned in the buffer buf which must have been previously allocated to contain at least d samples. The content of buf must be kept untouched between two successive calls since some of the samples reused due to the overlap. Argument k sets the pre-emphasis factor. Return 1 in case of success and 0 otherwise.

Function: float * set_sig_win (unsigned long N, int type)
Allocate and initialize a weighting vector of length N for the specified window type, where type is one of SPRO_HAMMING_WINDOW, SPRO_HANNING_WINDOW and SPRO_BLACKMAN_WINDOW. The window type SPRO_NULL_WINDOW is defined for the purpose of argument processing but is not a valid argument for this function. Return a pointer to the allocated vector or NULL in case of error.

Function: spsig_t * sig_weight (spsig_t *s, sample_t *buf, float *w)
Weight the samples in buf according to the weights in w. The result is returned in the previously allocated signal s whose size must correspond to the buffer's length. Return a pointer s.

The following is a typical piece of code used to open a signal stream and loop on all the input frames of N samples every D samples(12).
 
spfstream_t *f = sig_stream_open("foo.wav", 
                                 SPRO_SIG_WAVE_FORMAT, 0, 10000, 0);
spsig_t *frame = sig_alloc(N);
float *w = set_sig_win(N, SPRO_HAMMING_WINDOW);
sample_t *buf = (sample_t *)malloc(N * sizeof(sample_t));

while (get_next_sig_frame(f, 1, N, D, 0.95, buf)) {
  sig_weight(frame, buf, w); /* weight signal */

  /* ... */

}

sig_stream_close(f);
sig_free(frame);
free(w);
free(buf);


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.1.4 Computing frame energy

Assuming the frame signal is centered, sig_normalize compute the frame energy and may perform energy normalization to unity.

Function: double sig_normalize (spsig_t *s, int norm)
Return the square root of the sum of the squared samples in s. If norm is not null, normalize the signal variance to unity.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.2 Feature description flags

Feature description flags are used to describe the content of a feature vectors indicating information about mean and variance normalization, delta features, etc. See section 4.3 Feature streams, for details. In the library, such flags are represented as field of bits, coded as long integers. To avoid incomprehensible code, symbolic constants are defined for each piece of information possibly encoded in the feature description flag. Bit mask constants are of the form WITHX, where X is one of the letter E, Z, R, D, A or N. The constant SPRO_EMPTY_FLAG, equals to 0, can also be used to denote an empty flag.

The two functions set_flag_bits and get_flag_bits can be used to raise or check the presence of elements (bits) in the flags. Alternatively, logical operators can be used directly on the flag value. For example, the instruction
 
flag = flag | WITHZ;
will raise the bit corresponding to mean subtraction while flag & WITHZ will be true if the bit corresponding to Z is raised and false otherwise. However, we recommend using the two macros for compatibility purposes. Another way o set flags is via the function sp_str_to_flag which converts a string of characters to a flag. The dual operation is implemented in sp_flag_to_str.

Macro: long set_flag_bits (long flag, long mask)
Set to one the bits specified by mask in the the feature description flag flag. Return the resulting stream description flag. For example, the following line
 
  flag = set_flag_bits(flag, WITHZ | WITHR)
will raise the bits WITHZ and WITHR in flag, corresponding to mean and variance normalization respectively. Bits already raised in flag will be left untouched.

Macro: long get_flag_bits (long flag, long mask)
Return a flag containing the bits which are raised both in flag and in mask. The macro can be used as a boolean expression. However, this can be tricky, particularly if mask is a logical expression by itself. In this case, get_flag_bits will be true if at least two corresponding bits are raised in flag and mask. For example, if mask has the value (WITHZ | WITHR), get_flag_bits will return true if flag has either the WITHZ or WITHR bit raised, or, obviously, both. To check that both bits are raised, use the following test
 
  if (get_flag_bits(flag, WITHZ | WITHR) == (WITHZ | WITHR)) {
    /* ... */
  }

Function: long sp_str_to_flag (const char *str)
Convert str into a feature description flag, where str is a string of description letters among E, Z, R, D, A or N. Return a flag where the bits corresponding to the letters in str are raised.

Function: char * sp_flag_to_str (long flag, char str[7])
Convert flag into a string containing the corresponding feature description letters. This function is mainly for tracing. Return a pointer to str.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.3 Feature streams

This section describes the functions related to input and output of feature vectors. The functions are divided into three categories, namely opening a feature stream, reading and writing features from or to a stream and seeking to a particular position in the stream. Feature stream functions are usually prefixed by spf_stream_ and are located in `spf.c', `misc.c' and `header.c'.

4.3.1 Opening feature streams  Opening feature streams for I/O
4.3.2 Reading and writing feature vectors  Reading features from and writing features to streams
4.3.3 Seeking into a stream  Access a particular frame in a stream


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.3.1 Opening feature streams

This section describes in detail feature streams open and close mechanism. The section also explains how to access stream attributes, such as fields in the variable length header or the frame rate for streams in read mode.

Conversion flags  Dynamically converting features at I/O time
Opening for I/O  Open a feature stream
Accessing stream attributes  What's the stream dimension, frame rate, etc...


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

Conversion flags

In SPro, conversions such as adding dynamic features, normalization or energy scaling are associated with streams since these are typically global operations which cannot be carried out at the frame level. Such conversions are indicated by a conversion flag which specifies how the input data should be converted before output. In read mode, input refers to the file content and output is what is returned from the read function while, in write mode, input refers to the input of the write function and output to the file content. The conversion flag is a flag which indicates the processing that must be done between the input and the output. The conversion flag is actually a feature description flag containing the bits that should be raised in the output feature description flag in addition to those already present in the input description flag. For example, if the conversion flag takes the value (WITHZ|WITHA) and the input feature description flag, e.g. as specified in the header of an input file, is (WITHZ|WITHD), the resulting feature description for the input stream will be (WITHZ|WITHD|WITHA).

Though not coded as a flag, conversion in feature streams may include energy scaling. As this is not coded in the stream header, one must be careful not to specify scaling twice. Energy scaling conversion is turned on using set_stream_energy_scale. In a very similar way, the function set_stream_seg_length can be used to specify segmental normalization or scaling. Both functions should be called between the call to open and the first call to read or write, depending on the stream mode, in order to be effective.

Macro: float set_stream_energy_scale (spfstream_t *f, float s)
Turn on energy scaling for stream f with a scale factor s. A null value of s disable energy scaling. This is the default value when the stream is opened. The function must be called after opening the stream and before any I/O operation on the stream. Return s.

Macro: long set_stream_seg_length (spfstream_t *f, long length)
Turn on segmental normalization and scaling for stream s with a segment length of length frames. A null value of length disable energy scaling. This is the default value when the stream is opened. The function must be called after opening the stream and before any I/O operation on the stream. Return length.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

Opening for I/O

As opposed to signal streams, feature streams can be either in read or write mode. Since the arguments are quite different in both cases, two different functions are provided, namely spf_input_stream_open and spf_input_stream_open. The function spf_stream_close is common to input and output streams.

Feature streams have very important attributes, such as the dimension, the feature description flag, the frame rate or the variable header, for which accessors are provided. Macros to access the most important attributes are documented here under.

Function: spfstream_t * spf_input_stream_open (const char *name, long flag, size_t nbytes)
Open a feature stream associated to file name for reading with an associated buffer of nbytes bytes. Features read from name are converted according flag. See above for details on convertion flags. Return a pointer to the feature stream.

Function: spfstream_t * spf_output_stream_open (const char *name, unsigned short dim, long iflag, long cflag, float Fs, const spfield_t *{hd}, size_t nbytes)
Open a feature stream associated to file name for writing with a buffer of nbytes bytes. The input features, i.e. features added to the stream via spf_stream_write, dimension is dim with a corresponding feature description flag iflag and a frame rate of Fs Hz.. Conversion between the input features and the actual features written to file is specified by cflag. See above for details on conversion flags. Fields in the variable length header can be added via a possibly NULL array of fields hd, where hd is a NULL terminated array of {char *name; char *value;} elements. See example below. Return a pointer to the feature stream.

Function: void spf_output_stream_open (spfstream_t *f)
Close feature stream f opened with one of the spf_*_stream_open function, releasing allocated memory.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

Accessing stream attributes

Stream attributes, such as dimension, fields in the variable length header, frame rate can be accessed using the following accessors.

Macro: char * spf_stream_name (spfstream_t *f)
Return a pointer to the filename associated with stream f. If the stream has no associated filename, i.e. I/O via stdin and stdout, return NULL.

Macro: float spf_stream_rate (spfstream_t *f)
Return the frame rate in Hz for stream f.

Macro: unsigned short spf_stream_dim (spfstream_t *f)
Return the feature vector dimension for stream f. The dimension corresponds to the dimension of the feature vectors possibly after conversion if the stream has a conversion flag set. For input streams, the dimension is therefore the dimension of the feature vectors returned by get_next_spf_stream while, for output stream, the dimension is the dimension as in the output header.

Macro: long spf_stream_flag (spfstream_t *f)
Return the feature description flag for stream f. The returned flag is taken after conversion, if any. For input streams, the flag describes the feature vectors returned by get_next_spf_stream while, for output stream, the flag is the output header's flag.

Macro: spfheader_t * spf_stream_header (spfstream_t *f)
Return a pointer to the (possibly empty) variable length header for stream f.

Function: char * spf_header_get (spfheader_t *header, const char *name)
Return a pointer to the value of the attribute name in header. Return NULL if there are no attribute name.

Function: char * spf_header_get (spfheader_t *header, const char *name)
Return a pointer to the value of the attribute name in header. Return NULL if there are no attribute name.

Function: int spf_header_add (spfheader_t *header, const spfield_t *tab)
Add fields in tab to header, where tab is a NULL terminated array of {char *name; char *value;} elements. For example, the following code
 
spfheader_t *header = spf_header_init(NULL);
spfield_t tab[] = {
  {"snr", "20 dB"},
  {"date", "July 29, 2003"},
  {NULL , NULL}
};
spf_header_add (header, tab);
would create an empty header (undocumented function spf_header_init) and add the two fields `snr' and `date' to the header along with the corresponding values. No control is performed over duplicate field names. If several fields with the same name are added, the first one will always be returned by spf_header_get and the remaining one ignored. Return the number of fields added to the header.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.3.2 Reading and writing feature vectors

The functions documented in this section are provided to read from or write to feature streams. Reading can be done in one of two ways. You can either read vector by vector using get_next_spf_vec or read in at once all the data in the feature buffer using spf_stream_read. Writing can only be done vector by vector using spf_stream_write, unless accessing directly the stream buffer. See section 4.4 Storing features without streams, for details on this highly not recommended operation. In write mode, the feature are actually written to the output file when the buffer is full or when the stream is closed. However, function spf_stream_flush can be used to force the output to file by flushing the buffer.

Note that the two functions spf_stream_read and spf_stream_write are actually not dual functions. The first one fills in the buffer with as much data as possible while the second one writes some feature vectors in the stream buffer.

Function: unsigned long spf_stream_read (spfstream_t *f)
Fill in stream f buffer, reading until the buffer is full or the end of stream. Return the number of frames read.

Function: spf_t * get_next_spf_vec (spfstream_t *f)
Return a pointer to the next feature vector in stream f or NULL at the end of stream. See section 4.3.3 Seeking into a stream, for details on how to get a particular vector in the stream.

Function: unsigned long spf_stream_write (spfstream_t *f, spf_t *buf, unsigned long n)
Write n feature vectors concatenated in buf to stream f. The feature vector dimension in buf is the dimension specified when the stream was opened. Return the number of frames written.

Function: unsigned long spf_stream_flush (spfstream_t *f)
Flush the buffer of stream f, forcing the feature vectors to be actually written to the output file. Flushing has no effect on input streams. Return the number of frames written.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.3.3 Seeking into a stream

The I/O functions described above are mainly intended for linear input and output, i.e. for reading or writing feature vectors in a sequential way. Though this is the most common case in speech processing, accessing a particular feature vector directly is also very useful. Functions to seek to a specified feature vector in a stream are provided. Feature vectors are indexed starting from 0. In read mode, seeking to a particular frame n using spf_stream_seek means that a pointer to frame n is returned by the next call to get_next_spf_vec. In write mode, the next call to spf_stream_write will start writing at frame n, thus overwriting frame n and possibly the following if those frames add already been set.

Function: int spf_stream_seek (spfstream_t *f, long offset, int whence)
Seek offset frames according to whence in stream f. The whence argument is similar to the last argument of the C function fseek and specifies the reference point for offset. If whence is equal to SEEK_SET (0), offset is relative to the first frame. If whence is equal to SEEK_CUR (1), offset is relative to the current frame in the stream. Positioning relative to the end of the stream is not possible since the stream length is not known. The offset can be positive to seek forward in time or negative to seek backward. Seeking is only possible if the file associated with f is a seekable device, which is not the case of stdout or stdin. Return 0 if seek was correct or an error code (SPRO_STREAM_SEEK_ERR) otherwise.

Macro: unsigned long spf_stream_tell (spfstream_t *f)
Return the current position, i.e. frame index, in f.

Macro: int spf_stream_rewind (spfstream_t *f)
Seek to the beginning of the stream. This is equivalent to spf_stream_seek(f, 0, SEEK_SET). Return 0 upon success.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.4 Storing features without streams

In some programs, one may find useful to compute and keep in memory feature vectors inside a program without accessing the disk. This is for example the case if you want to embed feature extraction into your own program. Feature streams are of course not adapted to such operations which should rely on the use of feature buffers to store the feature vectors. Feature buffers are buffers containing a collection of feature vectors of the same dimension. Nearly no accessors are available for the buffer structure spfbuf_t whose attributes can be referenced directly. The structure definition is as follows:
 
typedef struct {
  unsigned short adim;          /* allocated vector dimension     */
  unsigned short dim;           /* actual vector dimension        */
  unsigned long n;              /* number of vectors              */
  unsigned long m;              /* maximum number of vectors      */
  spf_t *s;                     /* pointer to features            */
} spfbuf_t;
Note that the allocated dimension may not be the actual dimension of the features stored in the buffer. In particular, this is useful for feature conversions. See section 4.5 Feature conversion. The attribute m is the maximum number of vectors of dimension adim that can be stored in the buffer. Feature vectors are stored concatenated in the feature array s. Scanning the buffer vectors, using the adim, is illustrated in an example below.

4.4.1 Buffer allocation  Allocating memory for a buffer
4.4.2 Accessing buffer elements  Accessing vectors in a buffer
4.4.3 Buffer I/O  Reading and writing buffers to disk
4.4.4 Buffers and streams  Direct access to stream buffers (not recommended)


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.4.1 Buffer allocation

Functions are provided to allocate a buffer of a given size in bytes, resize for a given number of feature vectors and free a buffer.

Function: spfbuf_t * spf_buf_alloc (unsigned short dim, size_t size)
Allocate memory for a buffer of size bytes. The maximum dimension of the elements in the buffer is dim, the maximum number of vectors in the buffer being determined according to dim and size. If size is null, an empty buffer is allocated with the buffer array (buf->s) set to NULL. Return a pointer to the allocated buffer.

Function: spf_t * spf_buf_resize (spfbuf_t *buf, unsigned long n)
Resize buffer buf to contain exactly n vectors. The buffer array is extended (resp. reduced) if n is more (resp. less) than the current buffer size. In both cases, the current content of the buffer is left unchanged. If the current buffer is empty (size is 0 and array is NULL), the buffer array is allocated. This function can therefore be used to allocate a buffer for a given number of vectors rather than for a given size in bytes as in spf_buf_alloc. The following code is an example for allocating a buffer of 1000 feature vectors of dimension 33 using spf_buf_resize.
 
spfbuf_t *buf = spf_buf_alloc(33, 0); /* alloc. empty buffer  */
spf_buf_resize(&buf, 1000);           /* resize for 1000 vectors */
Return the address of the first element of the buffer array. Note that the attribute buf->s may be changed in spf_buf_resize.

Function: void spf_buf_free (spfbuf_t *buf)
Free memory allocated to buf.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.4.2 Accessing buffer elements

The best way to reach a particular vector in a buffer is to grab a pointer to the vector using get_spf_buf_vec. In addition, the function spf_buf_append can be used to append feature vectors to a buffer, possibly extending the buffer size if necessary.

Function: spf_t * get_spf_buf_vec (spfbuf_t *buf, unsigned long index)
Return a pointer to vector index in buf. As opposed to positions in feature streams, the frame index index here is relative to the buffer, starting at 0. Return NULL if index is out of bound.

Function: spf_t * spf_buf_append (spfbuf_t *buf, spf_t *v, unsigned short dim, unsigned long nmore)
Append feature vector v of dimension dim to buffer. If the buffer is full and nmore is not null, the buffer maximum size is extended by nmore vectors. Otherwise, if nmore is null, the buffer is left unchanged and NULL is returned. If the buffer is empty, the input vector dimension dim will be checked upon the buffer dimension. Else, dim will be used to initialize the buffer dimension. In any case, dim must be less than or equal to the maximum dimension (buf->adim) for which the buffer has been allocated. Return a pointer to the appended vector in the buffer or NULL in case of error.

Access to the buffer elements via get_spf_buf_vec implies a multiplication. Scanning all the vectors in the buffer may result faster using a pointer to the buffer array which is recursively incremented. The following example illustrates this method and print to stdout the feature vectors in text format.
 
unsigned long i;
unsigned short j;
spf_t *p;

p = buf->s;

for (i = 0; i < spf_buf_length(buf); i++) {

  /* print vector at index i */
  fprintf(stdout, "index %lu", i); 
  for (j = 0; j < spf_buf_dim(buf); j++)
    fprintf(stdout, " %8.4f", *(p+j));
  fprintf(stdout, "\n");

  /* move to next vector */
  p += buf->adim;
}
Note that the pointer increment is the allocated dimension adim, not the actual dimension dim. This example also illustrates the use of the two accessors macros spf_buf_length and spf_buf_dim which return the actual number of elements in the buffer and the actual feature vector dimension respectively.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.4.3 Buffer I/O

If you need the following functions to read or write the content of a buffer to disk, you should be wondering why you are not using feature streams for I/Os! Feature buffers are provided to store features in the memory not for I/Os, for which using the feature streams, dedicated to this purpose, should always be preferred. Still want to use buffer for I/Os?

Ok, but don't say you have not been warned! In case you insist on buffer I/Os, the two functions spf_buf_read and spf_buf_write are provided respectively to read the buffer content from disk or to write the buffer content to disk.

Function: unsigned long spf_buf_read (spfbuf_t *buf, FILE *f)
Read data from file f into the buffer, until the buffer maximum sized is reached or until the end of file, whichever occurs first. The vector dimension is taken from the buffer actual dimension given by buf->dim. Return the number of vectors read into the buffer.

Function: unsigned long spf_buf_write (spfbuf_t *buf, FILE *f)
Write the content of buf to file f. Return the number of vectors actually written to file.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.4.4 Buffers and streams

In feature streams, I/O functions clearly make use of a feature buffer. Accessing directly the element of the stream buffer using the buffer functions described above is therefore possible. A pointer to the stream buffer can be obtained using spf_stream_buf.

Macro: spfbuf_t * spf_stream_buf (spfstream_t *f)
Return a pointer to the buffer of stream f.

Unless you are quite familiar with SPro programming, direct access to stream buffers is strongly discouraged since direct buffer I/Os may result in corrupted stream position information. The main consequence of corrupted stream position information is that spf_stream_seek and spf_stream_tell will not work properly. Rather than direct access to the stream buffer, the use of spf_stream_seek and get_next_spf_frame to access a particular vector should always be preferred.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.5 Feature conversion

Feature conversion is the process of modifying the feature description flag, for example, by normalizing the feature mean and variance or by adding dynamic features. In other word, converting features consist on modifying the input features to match a specified target feature description. See section 4.2 Feature description flags.

Changing the feature type, e.g. converting feature bank features to cepstral coefficients, is not considered as a feature conversion and is outside the scope of the function described in this section. See section 4.7 LPC-based functions, for details about changing the the feature type between various LPC representation. See section 4.6 FFT-based functions, for details about changing the filter-bank representation..

Feature conversions are global operations in the sense that the conversion applies to a collection of feature vectors rather than to isolated feature vectors. Therefore, the conversion function, spf_buf_convert, operates on a feature buffer, modifying at once all the buffer vectors and returning a buffer (possibly the same -- see below) containing the new features. The conversion itself is as follows

  1. copy static features into the output buffer, possibly excluding energy if required.

  2. normalize mean and variance of the static features in the output buffer (energy, if present, is not normalized) if required

  3. compute delta features for the output buffer if required

  4. compute acceleration features for the output buffer if required
Since conversion principally aims at normalizing the features and adding dynamic features, the latter are always recomputed from the static features, even if the input feature vectors already contain dynamic features. This means that, for example, when converting features with a description flag value of WITHE|WITHD to WITHE|WITHD|WITHN, delta features will be recomputed, even though this is not strictly necessary(13)!

Conversion can operate under three different modes, namely duplicate, replace and update. In duplicate mode, spf_buf_convert allocates the output buffer and leaves the input buffer unchanged. This mode can be used to duplicate a buffer, hence the name. In replace mode, spf_buf_convert allocates the output buffer and releases memory allocated for the input buffer, thus replacing somehow the input buffer by the output one. Note that due to reallocation, the buffer address may have changed after the call to spf_buf_convert. In replace mode, calls to the conversion functions should therefore always look like
 
buf = spf_buf_convert(buf, SPRO_EMPTY_FLAG, WITHD, 0, 
                      SPRO_CONV_REPLACE);
for the caller function to take into account the new address for buf. Finally, in update mode, the output buffer is the same as the input one and conversion is done in place. For this, buffer maximum dimension must be at least equal to the maximum of the input and output dimensions. Otherwise update conversion is impossible and an error is returned. In any of the three mode, spf_buf_convert returns a pointer to the output buffer.

Function: spf_t * spf_buf_convert (spfbuf_t *buf, long iflag, long oflag, unsigned long wl, int mode)
Convert feature vectors in buf from iflag description to oflag. The normalization window length wl specifies the length for segmental normalization. If null, global normalization is performed. Otherwise, use a sliding window of wl frames centered around the current frame. The mode is either SPRO_CONV_DUPLICATE, SPRO_CONV_REPLACE, SPRO_CONV_UPDATE. Return a pointer to the buffer containing the converted data.

In addition to spf_buf_convert, the function spf_buf_normalize can be used to normalize the mean and variance of the features in a buffer. Similarly, the function spf_add_delta can be used to compute the derivatives of some features in a buffer. Both functions are generic functions which should be used solely for the purpose of non-standard operations. For example, normalizing the dynamic features or the energy variance is not possible with spf_buf_convert but is possible with spf_buf_normalize. Though not exactly a conversion function, scale_energy is a generic function used to scale the energy coefficients in a buffer.

Function: int spf_buf_normalize (spfbuf_t *buf, unsigned short s, unsigned short e, unsigned long wl, int vnorm)
Normalize features s to e included in buf, where s and e are bins in the feature vectors and starts at 0. If vnorm is non null, variance normalization is performed in addition to mean subtraction. The normalization window length wl specifies the length for segmental normalization. If null, global normalization is performed. Otherwise, use a sliding window of wl frames centered around the current frame. Return 0 upon success or an error code otherwise.

Function: int spf_add_delta (spfbuf_t *buf, unsigned short s, unsigned short e, unsigned short d)
Compute derivatives of features s to e included in buf, writing the derivatives starting at index d in the feature vector. Indices s, e and d are bins in the feature vectors and starts at 0. The buffer must have been allocated with sufficient memory to store the derivatives. Otherwise, an error is returned. Return 0 upon success.

Function: int scale_energy (spfbuf_t *buf, unsigned short j, float s, unsigned long wl)
Scale feature at bin j in buf by the factor s. This function is intended for log-energy scaling and scales with respect to the maximum value. If wl is non null, segmental scaling using a sliding window of wl frames is done. Return 0 upon success.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.6 FFT-based functions

This section documents all the functions related to Fourier analysis of speech signals.

4.6.1 Fourier transform  Fast Fourier transform of a signal
4.6.2 Filter-bank  Filter-bank integration
4.6.3 Cosine transform  Discrete Cosine Transform


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.6.1 Fourier transform

SPro implements a fast Fourier transform (FFT) algorithm as described in P. Duhamel and M. Vetterli, Improved Fourier and Hartley Transform Algorithms: Application to CycliC Convolution of Real Data, IEEE Trans. on ASSP, 35(6), June 1987. For sake of rapidity, the implementation is based on a pre-computed FFT kernel which is initialized by fft_init. Initializing the FFT kernel for a given FFT size is necessary before the first invocation of fft. In particular, this implicates that the kernel should be reinitialized whenever the FFT size changes. Memory allocated to the kernel is released using fft-reset.

Function: int fft_init (unsigned long n)
Initialize the FFT kernel for length n. If n is null, reset the kernel. Otherwise (re)allocate a kernel for the specified length: if the kernel had previously been allocated with a different size and not reset, it is reallocated. Return 0 upon success.

Function: int fft (spsig_t *s, float *mod, float *phi)
Fourier transform of signal s using the current kernel. If the length of s is less than the kernel size, s is padded with zeros. On the contrary, if the length of s is more than the kernel size, s is truncated. Note that no warning occurs in this case. Return the modulus in mod and the phase in phi. Both mod and phi must have been allocated to contain at least N/2 elements, where N is the kernel size. Either one can be NULL, in which case no value is returned. Return 0 upon success.

Macro: int fft_reset ()
Reset memory allocated to the FFT kernel. This is a macro to fft_init(0) which always returns 0.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.6.2 Filter-bank

Filter-bank analysis is a two step process. The first step consists in defining the filter-bank geometry, either with set_mel_idx or set_alpha_idx. Both functions set the indices in the FFT magnitude vector of the filters' cutoff frequencies according to the specified frequency warping. The second step is the Fourier transform and the filter-bank integration embedded in function log_filter_bank. Using log_filter_bank requires that the FFT kernel has been initialized previously.

Function: unsigned short set_alpha_idx (unsigned short *n, float fmin, float fmax, float Fs)
Set cutoff frequencies indices for n filters in the bandwidth fmin -- fmax, according to MEL frequency warping. Lower and upper frequency bounds, fmin and fmax are normalized frequencies between 0 and 0.5. If fmax is lower than or equal to fmin, the upper bound will be considered to be the Nyquist frequency (1/2). The signal sample rate Fs is given in Hz. Return a vector of n+2 indices or NULL in case of error.

Function: unsigned short set_alpha_idx (unsigned short *n, float a, float fmin, float fmax)
Set cutoff frequencies indices for n filters in the bandwidth fmin -- fmax, according to the bilinear frequency warping specified by a. If a is null, no frequency warping is used. Lower and upper frequency bounds, fmin and fmax, are normalized frequencies between 0 and 0.5. If fmax is lower than or equal to fmin, the upper bound will be considered to be the Nyquist frequency (1/2). Return a vector of n+2 indices or NULL in case of error.

Function: int log_filter_bank (spsig_t *s, unsigned short n, unsigned short *idx, spf_t *e)
Apply n channel triangular filter-bank to signal s. The indices in the FFT magnitude vector of the channels cutoff frequencies are given in idx, which should have been initialized with one of the set_xxx_idx functions above. Channel log-magnitudes are returned in vector e, previously allocated to contain at least n elements. Return 0 upon success.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.6.3 Cosine transform

As for the Fourier transform, discrete cosine transform (DCT) is a kernel based transformation. A DCT kernel for a given size is initialized using dct_init while the transformation itself is carried out by dct. The macro dct_reset resets the kernel.

Function: int dct_init (unsigned short n, unsigned short m)
Initialize the DCT kernel for a transformation from dimension n to m. If either n or m is null, reset the kernel. Otherwise (re)initialize a kernel for the specified transformation.length. Return 0 upon success.

Function: int dct (spf_t *x, spf_t *y)
Apply transformation to x, storing the result in y. Assuming the kernel was initialized with lengths n and m, x should contain at least n elements and y must have been previously allocated to contain at least m elements. Return 0 upon success.

Macro: int dct_reset ()
Reset memory allocated to the FFT kernel. This is a macro to dct_init(0, 0) which always returns 0.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.7 LPC-based functions

This section documents functions related to LPC analysis of speech signals. The first part documents how to solve the LPC equations while the second one deals with transforming the LPC or PARCOR representation into a different one.

4.7.1 Linear prediction  Computing linear prediction coefficients
4.7.2 LPC conversion  Conversions between LPC, LAR, PARCOR and LSF


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.7.1 Linear prediction

Linear prediction is a two step process in which the first step is to compute the generalized correlation sequence (sig_correl) before solving the normal equations with lpc to obtain the prediction and reflection coefficients.

Function: int sig_correl (spsig_t *s, float a, float *r, unsigned short p)
Compute generalized correlation for s according to the warping specified by a. If a is null, the autocorrelation is used. Return a correlation sequence of length p+1 via the previously allocated vector r. Return 0 upon success.

Function: void lpc (float *r, unsigned short p, spf_t *a, spf_t *k, float *e)
Compute p prediction and reflection coefficients given the correlation sequence r(0) to r(p). Return the prediction coefficients in a, the reflection coefficients in k and the LPC filter gain in e. Both a and k must have been previously allocated to contain at least p elements while e is a pointer to a float scalar.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.7.2 LPC conversion

Linear prediction can be converted into line spectrum frequencies (lpc_to_lsf) and LP-derived cepstral coefficients (lpc_to_cep). Reflection coefficients are converted into log-area ratio using refc_to_lar.

Function: int lpc_to_lsf (spf_t *a, unsigned short p, spf_t *lsf)
Convert p linear prediction coefficients a into line spectrum frequencies. lsf must have been previously allocated to contain at least p elements. Return 0 upon success.

Function: void lpc_to_cep (spf_t *a, unsigned short p, unsigned short n, spf_t *c)
Convert p linear prediction coefficients a into n cepstral coefficients c. c must have been previously allocated to contain at least n elements.

Function: void refc_to_lar (spf_t *k, unsigned short p, spf_t *g)
Convert p reflection coefficients k into p log area ratios g. g must have been previously allocated to contain at least p elements.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.8 Miscellaneous functions

This section documents a bunch of very useful functions. The two functions spf_indexes and spf_tot_dim are dedicated to manipulating the content of a feature vector. A feature vector contains various elements characterized by the description flag. spf_indexes lets you find out where the indices of the various elements in a feature vector given the description flag while spf_tot_dim computes the feature vector total dimension from the dimension of the static coefficients and the description flag.

The function set_lifter is a utility functions that allocates memory for a lifter vector and initializes the vector according to the lifter parameter.

Function: void spf_indexes (unsigned short idx[9], unsigned short dim, long flag)
Set in idx the indices of each element characterizing a feature vector of dimension dim with a description flag. idx is a nine element vector containing indices in the feature vector and organized as follow
 
<    static     ><E><      delta    ><dE>< delta delta  ><ddE>
|   |  ...  |   |   |   |  ...  |   |   |   |  ...  |   |   |
  ^           ^   ^   ^           ^   ^   ^           ^   ^
  |           |   |   |           |   |   |           |   |
idx[0]      idx[1]| idx[3]      idx[4]| idx[6]      idx[7]|
                  |                   |                   |
                idx[2]              idx[5]              idx[8]
For example, the index of the energy feature in the feature vector is idx[2] while the index of the first delta feature in the feature vector is given by idx[3]. With the exception of idx[0] which should always be equal to 0, an index value of 0 means that an element is not present in the feature vector. For example, a call to
 
spf_indexes(idx, 25, WITHE | WITHD | WITHN)
would return the following index vector
 
idx = { 0, 11, 0, 12, 23, 24, 0, 0, 0 }
Assuming p is a pointer to a feature vector, the 12 static features range from p[0] to p[11], no static log-energy is present (WITHN), delta features are from p[12] to p[23] and delta log-energy can be accessed at p[24].

Function: unsigned short spf_tot_dim (unsigned short sdim, long flag)
Return the feature vector total dimension given the dimension of the static coefficients sdim (excluding energy) and the feature description flag.

Function: float * set_lifter (int l, unsigned short n)
Return a pointer to a vector containing n coefficients for a lifter of parameter l.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Guillaume Gravier on March, 5 2004 using texi2html