![]() Note that the audio data is currently always represented as a vector of 16-bit signed integers in native byte order. If any data is left over (under one frame), it will be carried over to the next call to process_utt, or analyzed and returned by end_utt. Performs cepstral extraction on $nsamps samples of audio data from $rawdata. process_utt my = $fe->process_utt($rawdata, $nsamps) If it fails (though I don't know why it would), it will return undef. Prepares the $fe object for cepstral extraction. OBJECT METHODS start_utt $fe->start_utt or die "start_utt failed" It specifies the type of filter band to use in extraction - the options are (exportable constants) MEL_SCALE and LOG_LINEAR, but only MEL_SCALE is supported. This is documented for completeness, but you should never use it. Scaling factor for pre-emphasis of input audio data. fft_sizeįrame size for FFT analysis (must be a power of 2). Number of filters to use for creating the mel-scale. Number of cepstral coefficients to compute. ![]() Size of the FFT window, in number of samples. Number of frames of data to be processed per second of sampled audio. Sampling rate at which the audio data to be processed was captured, specified in samples per second. Available parameters include: sampling_rate The parameters are passed as a reference to a hash of parameter names keyed to parameter values. Initializes parameters for feature extraction, and return an object which encapsulates the state of the extraction process. INITIALIZATIONO my $fe = Audio::MFCC->init(\%params) In the future it may be possible to move the extraction of these features into the feature extraction library, or to use entirely different features as input (for example, LPC coefficients, though currently, mel-scale cepstra give the best recognition performance). You might find this useful if, for example, you wish to do the actual recognition on a different machine from the audio capture, and don't have the bandwidth to send a full stream of audio data over the network.Ĭurrently, Sphinx also uses delta and double-delta cepstral vectors as input to its vector quantization module, but the calculation of these values is done inside the recognizer's utterance processing module. These coefficients can then be passed to the Speech::Recognizer::SPX::uttproc_cepdata function. This module provides an interface to the Sphinx feature extraction library which can be used to extract mel-frequency cepstral coefficients from data. My = $fe->process_utt($rawdata, $nsamps) ![]() Audio::MFCC - Perl module for computing mel-frequency cepstral coefficients SYNOPSIS use Audio::MFCC ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
January 2023
Categories |