Today,digitalaudioapplicationsarepartofoureverydaylives.Popularexamples include audio CDs, MP3 audio players, radio broadcasts, TV or video DVDs, video games, digital cameras with sound track, digital camcorders, telephones, telephone answering machines and telephone enquiries using speech or word recognition.Various new and advanced audiovisual applications and services become possible based on audio content analysis and description. Search engines or specific filters can use the extracted description to help users navigate or browse through large collections of data. Digital analysis may discriminate whether an audio file contains speech, music or other audio entities, how many speakers are contained in a speech segment, what gender they are and even which persons are speaking. Spoken content may be identified and converted to text.
Music may be classified into categories, such as jazz, rock, classics, etc. Often it is possible to identify a piece of music even when performed by different artists – or an identical audio track also when distorted by coding artefacts. Finally, it may be possible to identify particular sounds, such as explosions, gunshots, etc.
We use the term audio to indicate all kinds of audio signals, such as speech, musicaswellasmoregeneralsoundsignalsandtheircombinations.Ourprimary goal is to understand how meaningful information can be extracted from digital audio waveforms in order to compare and classify the data efficiently. When such information is extracted it can also often be stored as content description in a compact way.