This paper reports the results obtained by an Automatic Speech Recognition system when MFCCs, J-RASTA Perceptual Linear Prediction Coefficients (J-Rasta PLP) and energies from a Multi Resolution Analysis (MRA) tree of filters are used as input features to a hybrid system consisting of a Neural Network (NN) which provides observation probabilities for a network of Hidden Markov Models (HMM). Furthermore, the paper compares the performance of the system when various combinations of these features are used showing a WER reduction of 20% w.r.t. the use of J-Rasta PLP coefficients, when J-Rasta PLP coefficients are combined with the energies computed at the output of the leaves of an MRA filter tree. Such a combination is practically feasible thanks to the use of a NN architecture designed to integrate multiple features, exploiting the NN capability of mixing several input parameters without any assumption about their stochastical independence. Recognition is performed on a very large test set including many speakers uttering proper names from different locations of the Italian public telephone network.
File in questo prodotto:
Non ci sono file associati a questo prodotto.