Abstract
By classifying according to their mode of excitation, speechsounds can be broken into three distinct classes of phonemes, where a phoneme is defined as the smallest unit of speech that distinguishes one utterance from another. The three classes of phonemes are voiced, unvoiced, and plosives. Voiced phonemes are considered deterministic in nature. They are produced by forcing air through the glottis with the tension of the vocal cords adjusted so that they vibrate in a relaxed oscillation. This produces quasi-periodic pulses of air which excite the vocal tract.
Content
Human Acoustics and the Telephone Network
By classifying according to their mode of excitation, speechsounds can be broken into three distinct classes of phonemes, where a phoneme is defined as the smallest unit of speech that distinguishes one utterance from another. The three classes of phonemes are voiced, unvoiced, and plosives. Voiced phonemes are considered deterministic in nature. They are produced by forcing air through the glottis with the tension of the vocal cords adjusted so that they vibrate in a relaxed oscillation. This produces quasi-periodic pulses of air which excite the vocal tract.
Examples of voiced phonemes are the vowels, fricatives /v/, and/z/, and stop consonants /b/, /d/, and /g/.
Unvoiced phonemes are generated by forming a constriction at some point in the vocal tract and forcing air through the constriction at a high enough velocity to produce turbulence. As a result, unvoiced phonemes are considered random in nature. Examples of unvoiced phonemes are the nasal consonants /m/, and /n/, fricatives /f/, and/s/, and stop consonants /p/, /t/, and /k/.
Similar in nature to unvoiced sounds, plosive sounds result from making a complete closure of the vocal tract, building up pressure behind the closure,and abruptly releasing it, such as the /ch/ phoneme.Naturally occurring speech signals are composed of combinations of voiced, unvoiced and plosive phonemes. For example,contained in Figure 1 is the speech signal ‘goat’, which contains two voiced phonemes /g/ and /oa/, followed by a partial closure of the vocal tract, and then an unvoiced phoneme, /t/. The /g/, /oa/,and /t/ occur approximately at samples 3400-3900, 3900-5400,and 6300-6900, respectively.
Each phoneme class brings its own stress to the telephone system. In general, the peak to peak amplitude of voiced phonemes is approximately ten times that of unvoiced and plosive phonemes, as clearly illustrated in Figure 1. As a result, the telephone system must provide for a large range of signal amplitudes.
Although lower in amplitude, unvoiced and plosive phonemes contain more information and thus, higher entropy then voiced phonemes. Thus, the telephone system must provide higher resolution for lower amplitude signals.In addition to the tasks presented by the speech signal, the telephone network is also subject to bandwidth restrictions with respect to the human speech and auditory ranges.
The speech bandwidth for most adults is approximately 10 kHz. In contrast,the maximum auditory range of humans is 20 kHz. This maximum auditory range is usually limited to young children; instead, the typical hearing bandwidth for most adults is 15 kHz.Of the speech and auditory bandwidths, the telephone network restricts transmission to a 3 kHz portion, from .3 to 3.3 kHz.
This frequency range is believed to coincide with the region of greatest intelligible speech, retaining only the first three formant frequencies of the sampled speech signal. This reduced bandwidth is then surrounded by unused space from 0 to .3 kHz and from 3.3 to 4 kHz. This unused space, known as the guard band, provides a buffer against conversation interference.Summing the transmission and guard bands, the telephone network has a total bandwidth of 4 kHz.
In summary, the telephone system must provide adequate quality for small amplitude signals consisting of unvoiced phonemes.Concurrently, the telephone system must provide for transmission of a wide range of signal amplitudes, due to the occasional occurrence of high energy voiced phonemes. The accomplishment of these concurrent tasks, within a limited bandwidth, may be achieved via Pulse Code Modulation and companding, as discussed in the following section…
http://www.voip-sip.org/wp-content/uploads/2011/08/A-law_vs_u-law-voip-codec-comparison.pdf