Terminology

https://sites.google.com/site/kishoreprahallad/presentations

waveform: air pressure as a function of time

https://github.com/MTG/sms-tools

http://www.dsprelated.com/showarticle/909.php https://github.com/AllenDowney/ThinkDSP/blob/master/code/saxophone.ipynb http://greenteapress.com/thinkdsp/html/thinkdsp003.html https://www.informatik.uni-augsburg.de/de/lehrstuehle/hcm/projects/tools/emovoice/#overview http://www.fon.hum.uva.nl/praat/manual/Source-filter_synthesis.html http://www.fon.hum.uva.nl/paul/papers/AcousticAnalysis8.pdf http://www.speech.cs.cmu.edu/15-492/slides/03_mfcc.pdf

periodicity

intensity

spectral qualities http://www.fon.hum.uva.nl/paul/papers/AcousticAnalysis8.pdf T0 = vocal fold vibration period for example, T0 0.3002 − 0.2931 = 0.0071, 1/0.0071 = 141 number of times per second

slow resonance frequency = 1/(s2 - s1) time between two consquitive peaks, first format

rapid resonance frequency = second format, at 3200 Hz

interference of sine waves fast and slow resonance

airpressure, measures in Pa, between -0.4 and 0.4

duration is super important

at some second, the glotal period is x, thefore the pitch is 1/x (uses cross-correlation or self-similarity)

If you want to see what happens between 100Hz and 500Hz you have to look into past/future at least/most 1/500 = 2.0 ms

analysis window of 10ms

pitch curve

derived from pitch, energy, MFCCs, duration, voice quality and spectral information

Resources:

Mel Frequency Cepstral Coefficient (MFCC) tutorial

MRCC Speech Features

Phonetics on a computer

very good (the best I’ve seen on the topic) http://www.speech.cs.cmu.edu/15-492/slides/03_mfcc.pdf

(see lecture/course slides) https://sites.google.com/site/kishoreprahallad/presentations

//emotions in speech http://www.cs.cmu.edu/~awb/papers/jhuw11_npess_final_report.pdf

http://www.ee.columbia.edu/ln/LabROSA/doc/HTKBook21/HTKBook.html

prosody, F0, F0 contour, pitch emo-db

time domain (power, zero crossings)

http://ravi.iiit.ac.in/~speech/publications/presentations/Features_in_Time_Domain.pdf

librosa

lots of music projects: http://www.ee.columbia.edu/ln/LabROSA/

good courses on musical signal processing http://www.ee.columbia.edu/~dpwe/e4896/outline.html http://www.ee.columbia.edu/ln/LabROSA/doc/HTKBook21/node54.html

prosody: the pitch of the voice (varying between low and high) length of sounds (varying between short and long) loudness, or prominence (varying between soft and loud) timbre (quality of sound) in acoustic terms, these correspond reasonably closely to

fundamental frequency (measured in hertz, or cycles per second) duration (measured in time units such as milliseconds or seconds) intensity, or sound pressure level (measured in decibels) spectral characteristics (distribution of energy at different parts of the audible frequency range)