MelCepstrumGUI is an experimental tool for visualizing melscaled cepstral coefficients, the standard feature extraction preprocessing technology used in e.g. automatic speech recognition. The tool runs on any Windows 32 bit system (XP, Vista) and takes "life input" from the microphone. Using standard endpoint-detection, detected speech is transformed into vectors, each of which corresponds to 10 ms. of speech. The length of each vector is 16 per default, but can be changed through the settings menu. Also delta and delta-delta coefficients can be added. The vectors are visualized in a matrix using gray-scaled colors.
MelCepstrumGUI visualizing speech vectors (lower windows) encoded from a waveform (upper window) for frames of length 10 ms. (x-axis). Users can use the left/right mouse buttons to zoom in and out.
The purpose of the tool is to gain insight into standard feature extraction. According to theory, the vectors describe what is assumed to be the acoustic "relevant" features of the speech signal, leaving out the irrelevant ones. The "irrelevant" features are properties like the tonal height (male-female differences), other individual properties of the speech ("you-me" differences), the transmission channel (characteristics of the microphone), background noise, and speaking rate. The "relevant" features are the phonetic-phonological properties of speech sounds, e.g. the properties defining an [a:] as opposed to an [u:].