NAME

vkey - voice key for detecting speech

SYNOPSIS

#include <vkey_u.h>

vkset(par)
struct vkpar *par;

vkbeg

vkstat (stat)
struct vkinfo *stat;

DESCRIPTION

The voice key monitors an audio signal for the onset and offset of speech. For each 10 millisecond window of the audio signal, the voice key hardware computes measures of low frequency energy and high frequencyy energy which are input to an analog-to-digital converter (A/D). The hardware also generates an interrupt that signals the software to read the energy measures from the A/D. These measures are input to an algorithm for detecting the onset and offset of speech.

The voice key is controlled using the routines vkset to initialize parameters, vkbeg to start the operation, and vkstat to obtain statistics after the operation has completed.

vkset
vkset must be called once before calling either of the other voice key routines. Once it has been called it is not necessary to call it again unless some of the parameters it sets must be changed. vkset is called with the argument par, a pointer to a structure defined in vkey_u.h:

struct vkpar {
  unsigned   stoptime;
  unsigned   checktime;
  int        channel;
  int        l_thresh;
  int        h_thresh;
  int        l_rundef;
  int        h_rundef;
  unsigned   *begspeech;
  unsigned   *endspeech;
  int        *l_savbuf;
  int        *h_savbuf;
  int        sbufsize;
};
par->stoptime specifies the clock time at which to stop monitoring the audio signal if the beginning of speech has not yet been detected. If the beginning of speech has been detected, the voice key will continue operating for a specified time after the last speech sample is found. That time is given in par->checktime. The voice key uses the programmable clock to time these intervals.

par->channel is the A/D channel to which the low energy measure from the voice key is connected. The high energy measure is input to par->channel+1.

The speech detection algorithm operates on the low and high energy measures independently. For the low energy measure, if the sample is greater than or equal to par->l_thresh it is classified as a hit. The onset of speech occures at the first sample in a consecutive run of par->l_rundef of more hits. The same algorithm is used for high energy using the parameters par->h_thresh and par->h_rundef.

If speech is detected in either or both energy measures, the earlier onset time and the later offset time are returned in par->begspeech and par->endspeech. If speech is not detected in either measure par->begspeech returns the value VK_NOBEG.

Low and high energy samples (hits and non-hits) are saved in buffers pointed to by par->l_savbuf and h_savbuf. The buffers must each be of size par->sbufsize to save that many samples each. If par->l_savbuf and par->h_savbuf are NULL samples are not saved.

vkbeg
After vkset has been called and the clock started (if it is not already running) the voice key is started by invoking vkbeg. It will continue to run either until the end of speech, or until par->stoptime is reached if no speech is found. The calling program can determine when the voice key has finished by checking *par->endspeech. If the value is equal to VK_NOEND the voice key is still operating.

vkstat
When the voice key has stopped, measures other than the beginning and ending speech times or the raw high and low energy samples are obtained by calling vkstat. The argument stat points to another structure defined in vkey_u.h;

struct vkinfo {
  int l_bwind;
  int h_bwind;
  int l_ewind;
  int h_ewind;
  int l_peak;
  int h_peak;
  int l_avg;
  int h_avg;
}
The window numbers of speech onsets for low and high energies are given by stat->l_bwind and stat->h_wind (windows are numbered beginning with 0). The window numbers of speech offsets are given by stat->l_ewind and stat->h_ewind. stat->l_peak and stat->h_peak are the maximum low and high energies during speech; stat->l_avg and stat->h_avg are the average low and high energies during speech. Before relying on the temporal precision of the windows, it is suggested that the 100Hz voice key clock be calibrated, as it has been known to drift.

EXAMPLE

/* test voice key */

#include <vkey_u.h>
#include <clock_u.h>
#include <stdio_p.h>

#define LRUNDEF	4
#define HRUNDEF	4
#DEFINE LTHRESH	17
#define HTHRESH	8
#define CHAN	1
#define CHECK	100
#define STOP	1000
#define BUFSZ	1000

static unsigned vbeg, vend;
int lbuf[BUFSZ];
int hbuf[BUFSZ];

main()
{
  struct vkpar v;
  struct vkinfo s;

  v.stoptime = STOP;
  v.checktime = CHECK;
  v.channel = CHAN;
  v.l_thresh = LTHRESH;
  v.h_thresh = HTHRESH;
  v.l_rundef = LRUNDEF;
  v.h_rundef = HRUNDEF;
  v.begspeech = &vbeg;
  v.endspeech = &vend;
  v.l_savbuf = lbuf;
  v.h_savbuf = hbuf;
  v.sbufsize = BUFSZ;

  if (vkset(&v) == -1)
    errexit(1,"vkset error0");

  ckreset(CKR_1MS);
  ckstart;
  if(vkbeg == -1)
    errexit(2,"vkbeg error0");

  while(vend == VK_NOEND)
    ;

  if(vkstat(&s) == -1)
    errexit(3,"vkstat error0");

  printf("begtime = %uendtime = %u0",vbeg,vend);
  printf("lbeg = %dlend = %dlpeak = %dlavg = %d0",
         s.l_bwind,
         s.l_ewind,
         s.l_peak,
         s.l_avg);
  printf("hbeg = %dhend = %dhpeak = %dhavg = %d0",
	 s.h_bwind,
         s.h_ewind,
         s.h_peak,
         s.h_avg);
}

FILES

/dev/vk0
/dev/ck0
/dev/ad0

SEE ALSO

clock(3U) , vkey(4P)