|Semester||Semester 7 – Fall|
The course covers basic concepts in speech and audio processing, with its main focus being human speech, in particular its production, perception, representation, coding, synthesis, and recognition. In addition, processing of audio signals, in particular of music signals, is also covered. In summary, the course covers the following topics:
- Introduction to digital speech processing.
- A brief review of fundamentals of digital signal processing.
- Fundamentals of human speech production and sound propagation in the human vocal tract.
- Hearing, auditory models, and speech perception.
- Time-domain methods for speech processing.
- Frequency domain representation.
- Homomorphic speech processing and cepstrum.
- Linear predictive analysis of speech signals.
- Algorithms for estimating speech parameters.
- Digital coding of speech signals.
- Frequency domain coding of speech and audio.
- Text-to-speech synthesis.
- Automatic speech recognition using hidden Markov models and natural language understanding.
- Feature extraction and recognition of music signals.
- Basic computational tools in Matlab corresponding to the above (including the MIR toolbox).
- Brief introduction to the hidden Markov model toolkit (HTK).
This course introduces students to the basic concepts and algorithms in speech and audio processing, with its main focus being human speech, but also covering more general audio signals, in particular music ones. The course also provides numerous examples to allow student familiarization with the above, as well as practical computational tools within the Matlab and HTK software frameworks, further demonstrating these.
The course provides further specialization to the students, as a continuation of the digital signal processing and pattern recognition courses, allowing them to further delve into the study of the specific signals (speech, audio).
Students successfully completing this class will have mastered the main concepts, algorithms, and tools in the processing and recognition of speech and more general audio signals. For example, they will be able to:
- Understand the process of human speech production and perception.
- Extract appropriate features from speech signals in various domains and select the most suitable among them for the particular problem at hand.
- Be able to perform speech recognition and speech synthesis with basic algorithms.
- Extract a variety of features from music signals.
- Implement programs in Matlab and HTK to perform the aforementioned tasks.