Audio Processing and the Discrete Fast Fourier Transform (DFFT)
Terminology
https://sites.google.com/site/kishoreprahallad/presentations
waveform: air pressure as a function of time
https://github.com/MTG/sms-tools
http://www.dsprelated.com/showarticle/909.php https://github.com/AllenDowney/ThinkDSP/blob/master/code/saxophone.ipynb http://greenteapress.com/thinkdsp/html/thinkdsp003.html https://www.informatik.uni-augsburg.de/de/lehrstuehle/hcm/projects/tools/emovoice/#overview http://www.fon.hum.uva.nl/praat/manual/Source-filter_synthesis.html http://www.fon.hum.uva.nl/paul/papers/AcousticAnalysis8.pdf http://www.speech.cs.cmu.edu/15-492/slides/03_mfcc.pdf
periodicity
intensity
spectral qualities http://www.fon.hum.uva.nl/paul/papers/AcousticAnalysis8.pdf T0 = vocal fold vibration period for example, T0 0.3002 − 0.2931 = 0.0071, 1/0.0071 = 141 number of times per second
slow resonance frequency = 1/(s2 - s1) time between two consquitive peaks, first format
rapid resonance frequency = second format, at 3200 Hz
interference of sine waves fast and slow resonance
airpressure, measures in Pa, between -0.4 and 0.4
duration is super important
at some second, the glotal period is x, thefore the pitch is 1/x (uses cross-correlation or self-similarity)
If you want to see what happens between 100Hz and 500Hz you have to look into past/future at least/most 1/500 = 2.0 ms
analysis window of 10ms
pitch curve
derived from pitch, energy, MFCCs, duration, voice quality and spectral information
Resources:
Mel Frequency Cepstral Coefficient (MFCC) tutorial
very good (the best I’ve seen on the topic) http://www.speech.cs.cmu.edu/15-492/slides/03_mfcc.pdf
(see lecture/course slides) https://sites.google.com/site/kishoreprahallad/presentations
//emotions in speech http://www.cs.cmu.edu/~awb/papers/jhuw11_npess_final_report.pdf
http://www.ee.columbia.edu/ln/LabROSA/doc/HTKBook21/HTKBook.html
prosody, F0, F0 contour, pitch emo-db
time domain (power, zero crossings)
http://ravi.iiit.ac.in/~speech/publications/presentations/Features_in_Time_Domain.pdf
librosa
lots of music projects: http://www.ee.columbia.edu/ln/LabROSA/
good courses on musical signal processing http://www.ee.columbia.edu/~dpwe/e4896/outline.html http://www.ee.columbia.edu/ln/LabROSA/doc/HTKBook21/node54.html
prosody: the pitch of the voice (varying between low and high) length of sounds (varying between short and long) loudness, or prominence (varying between soft and loud) timbre (quality of sound) in acoustic terms, these correspond reasonably closely to
fundamental frequency (measured in hertz, or cycles per second) duration (measured in time units such as milliseconds or seconds) intensity, or sound pressure level (measured in decibels) spectral characteristics (distribution of energy at different parts of the audible frequency range)