This paper reports the results obtained by an Automatic Speech Recognition system when MFCCs, J-RASTA Perceptual Linear Prediction Coefficients (J-Rasta PLP) and energies from a Multi Resolution Analysis (MRA) tree of filters are used as input features to a hybrid system consisting of a Neural Network (NN) which provides observation probabilities for a network of Hidden Markov Models (HMM). Furthermore, the paper compares the performance of the system when various combinations of these features are used showing a WER reduction of 20% w.r.t. the use of J-Rasta PLP coefficients, when J-Rasta PLP coefficients are combined with the energies computed at the output of the leaves of an MRA filter tree. Such a combination is practically feasible thanks to the use of a NN architecture designed to integrate multiple features, exploiting the NN capability of mixing several input parameters without any assumption about their stochastical independence. Recognition is performed on a very large test set including many speakers uttering proper names from different locations of the Italian public telephone network.
Scheda prodotto non validato
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo
|Titolo:||Multi source neural networks based on fixed and multiple resolution analysis for speech recognition|
|Data di pubblicazione:||2001|
|Tipologia:||4.1 Contributo in Atti di convegno|