Human Acoustics and the Telephone Network

Human Acoustics and the Telephone Network

Abstract

By classifying according to their mode of excitation, speechsounds can be broken into three distinct classes of phonemes, where a phoneme is defined as the smallest unit of speech that distinguishes one utterance from another. The three classes of phonemes are voiced, unvoiced, and plosives. Voiced phonemes are considered deterministic in nature. They are produced by forcing air through the glottis with the tension of the vocal cords adjusted so that they vibrate in a relaxed oscillation. This produces quasi-periodic pulses of air which excite the vocal tract.

Human Acoustics and the Telephone Network

By classifying according to their mode of excitation, speechsounds can be broken into three distinct classes of phonemes, where a phoneme is defined as the smallest unit of speech that distinguishes one utterance from another. The three classes of phonemes are voiced, unvoiced, and plosives. Voiced phonemes are considered deterministic in nature. They are produced by forcing air through the glottis with the tension of the vocal cords adjusted so that they vibrate in a relaxed oscillation. This produces quasi-periodic pulses of air which excite the vocal tract.

Examples of voiced phonemes are the vowels, fricatives /v/, and/z/, and stop consonants /b/, /d/, and /g/.

Unvoiced phonemes are generated by forming a constriction at some point in the vocal tract and forcing air through the constriction at a high enough velocity to produce turbulence. As a result, unvoiced phonemes are considered random in nature. Examples of unvoiced phonemes are the nasal consonants /m/, and /n/, fricatives /f/, and/s/, and stop consonants /p/, /t/, and /k/.

Similar in nature to unvoiced sounds, plosive sounds result from making a complete closure of the vocal tract, building up pressure behind the closure,and abruptly releasing it, such as the /ch/ phoneme.Naturally occurring speech signals are composed of combinations of voiced, unvoiced and plosive phonemes. For example,contained in Figure 1 is the speech signal ‘goat’, which contains two voiced phonemes /g/ and /oa/, followed by a partial closure of the vocal tract, and then an unvoiced phoneme, /t/. The /g/, /oa/,and /t/ occur approximately at samples 3400-3900, 3900-5400,and 6300-6900, respectively.

Each phoneme class brings its own stress to the telephone system. In general, the peak to peak amplitude of voiced phonemes is approximately ten times that of unvoiced and plosive phonemes, as clearly illustrated in Figure 1. As a result, the telephone system must provide for a large range of signal amplitudes.

Although lower in amplitude, unvoiced and plosive phonemes contain more information and thus, higher entropy then voiced phonemes. Thus, the telephone system must provide higher resolution for lower amplitude signals.In addition to the tasks presented by the speech signal, the telephone network is also subject to bandwidth restrictions with respect to the human speech and auditory ranges.

The speech bandwidth for most adults is approximately 10 kHz. In contrast,the maximum auditory range of humans is 20 kHz. This maximum auditory range is usually limited to young children; instead, the typical hearing bandwidth for most adults is 15 kHz.Of the speech and auditory bandwidths, the telephone network restricts transmission to a 3 kHz portion, from .3 to 3.3 kHz.

This frequency range is believed to coincide with the region of greatest intelligible speech, retaining only the first three formant frequencies of the sampled speech signal. This reduced bandwidth is then surrounded by unused space from 0 to .3 kHz and from 3.3 to 4 kHz. This unused space, known as the guard band, provides a buffer against conversation interference.Summing the transmission and guard bands, the telephone network has a total bandwidth of 4 kHz.

In summary, the telephone system must provide adequate quality for small amplitude signals consisting of unvoiced phonemes.Concurrently, the telephone system must provide for transmission of a wide range of signal amplitudes, due to the occasional occurrence of high energy voiced phonemes. The accomplishment of these concurrent tasks, within a limited bandwidth, may be achieved via Pulse Code Modulation and companding, as discussed in the following section…

http://www.voip-sip.org/wp-content/uploads/2011/08/A-law_vs_u-law-voip-codec-comparison.pdf

Human acoustics and the telephone network are deeply intertwined, as the development and optimization of telephone technology have been significantly influenced by the understanding of how humans perceive sound. The evolution of the telephone network from its inception in the late 19th century to today’s digital systems reflects continuous advancements in accommodating and enhancing acoustic signals to suit human hearing capabilities. This synergy between human acoustics and telephony involves aspects of sound transmission, voice quality, and the challenges of creating a communication system that feels natural and efficient for human users.

Basic Principles of Human Acoustics

Human acoustics revolves around how sound waves are produced, transmitted, and received by the human ear, including how these sounds are interpreted by the brain. Key points include:

Frequency Range: The average human ear can detect sounds ranging from 20 Hz to 20,000 Hz, with the most sensitive frequencies for understanding speech lying between 300 Hz and 3,400 Hz.
Sound Perception: Humans perceive sound logarithmically rather than linearly, leading to the creation of the decibel scale for measuring sound intensity.
Speech Intelligibility: Certain frequencies are more critical for understanding speech. Vowels, which carry more energy, are lower in pitch, whereas consonants, essential for speech clarity, are higher in pitch but lower in energy.

Adapting the Telephone Network for Human Acoustics

The development of the telephone network has taken these acoustic principles into account to optimize speech transmission:

Bandwidth Limitation: Early telephone systems were designed to transmit frequencies mainly between 300 Hz and 3,400 Hz, aligning with the most important frequencies for speech intelligibility. This bandwidth limitation helps reduce the amount of data transmitted, saving bandwidth while ensuring speech is still understandable.
Electrical to Acoustic Conversion: Telephones convert electrical signals back into acoustic signals through a speaker or earpiece. The design of these components has evolved to better match the acoustic properties that are most favorable to human hearing.
Signal Processing: Various signal processing techniques, such as compression and noise reduction, have been developed to improve the clarity and quality of voice transmission over the telephone network. These techniques are designed to enhance the intelligibility of speech, even in noisy environments or when the signal is weak.

Challenges and Innovations

Noise and Interference: Background noise and interference can significantly affect the quality of voice transmission. Solutions have included noise-canceling microphones and digital signal processing algorithms to filter out unwanted sounds.
Voice over Internet Protocol (VoIP): The transition from analog to digital telephony, and particularly the advent of VoIP technology, has allowed for greater flexibility in how voice signals are processed and transmitted, leading to improvements in sound quality and the potential for higher bandwidth transmission.
Hearing Impairment Considerations: Telephone technology has also adapted to accommodate users with hearing impairments, incorporating features like compatibility with hearing aids and adjustable volume controls.

Conclusion

The intersection of human acoustics and the telephone network highlights the importance of understanding human hearing in the development of communication technologies. By tailoring the network and devices to the specifics of human sound perception, engineers and designers have been able to create telephony systems that provide clear, intelligible speech across vast distances. As technology advances, the challenge remains to further refine these systems to enhance acoustic performance, accommodate a wider range of hearing abilities, and leverage new digital capabilities to improve human communication.

Leave a Comment