Addressing the Voice-over-IP Challenge
By Richard Parker
Voice-over-IP (VoIP) technology combines voice and data communications in a single datastream. It promises
to drastically reduce telephone bills and simplify IT management while ushering in new and more flexible work patterns. This article looks
into some of the practicalities and pitfalls of implementing the technology successfully.
|
|
Read this to learn about
- The hardware and software requirements of VoIP
- How to avoid potential pitfalls in VoIP implementation
- Tools to ease the development of VoIP systems
|
|
In the corporate world, the installation of VoIP systems has been in progress for some time. Although most will have switched primarily
to slash call costs, the potential of VoIP stretches far beyond economics.
VoIP telephones, for instance, can be transported with the user to almost any location, plug in to the local network and used as normal.
Because VoIP calls are routed in software, it also allows the integration of information services and other media, such as video.
However, the promise of these capabilities rests with VoIP gaining widespread acceptance. This means that a lot of functional issues
that have plagued the technology to date will have to be addressed. The primary challenge is to give users the illusion of a permanent,
analog, conventional voice connection.
The key challenge is that the Internet is built to ensure delivery in 'reasonable' rather than 'real' time. For the early VoIP
implementations, this meant packets were prone to arriving so delayed and out-of-order that they could not be reassembled reliably
enough to prevent unacceptable distortion and break-up.
The responsibility for delivering a reliable service now rests with design engineers who may not have had to work with packet-data
protocol design. This article aims to give engineers a fast track tutorial into designing for VoIP. For illustration purposes, it will refer to
VoIP relevant parts supplied and supported by Future Electronics.
| ITU Standard |
Description |
Bandwidth (kbit/s) |
Conversion Delay |
| G711 |
PCM |
64 |
< 1ms |
| G721 |
ADPCM |
32, 16, 24, 40 |
< 1ms |
| G728 |
LD-CELP |
16 |
~ 2.5ms |
| G729 |
CS-CELP |
8 |
~ 15ms |
| G723.1 |
Multi-rate CELP |
6.3, 5.3 |
~ 30ms |
Table 1: Typical VoIP standards
VoIP Basics
VoIP comprises four basic building blocks: signalling, database services, call control and CODer/DECoder (CODEC) operations. Signalling
refers to the low level protocols used to communicate between IP devices. Database services are similar to analog-telephone-number
routing and call logging but using IP addressing.
Call control manages the set up and completion of calls between endpoints. CODEC operations are responsible for converting analog
voice signals to and from the digital domain in real-time, a process referred to in VoIP parlance as 'vocoding', short for voice coding/decoding.
Vocoding is required because VoIP digitizes and compresses analog speech into 'datagrams' which are passed from source to endpoint via
Internet protocols or a corporate local/wide area network. An IP device at the endpoint then reassembles and converts these datagrams back into
their original digital audio format using another vocoder. Table. 1 lists some typical standards used for this signal processing.
Unfortunately, some of these VoIP standards are still subject to individual patents so an additional cost may be advised. This is effectively
an insurance policy against legal action and is formally known as an 'indemnification fee'.
The most popular method of VoIP digital translation to date is Pulse Code Modulation (PCM) as defined by the ITU standard G711. The
analog voice signal is converted into a linear PCM digital stream comprised of 16 bits every 125ms. The line echo is removed from the PCM
stream and it is further processed for silence suppression and tone detection.
Echo cancellation is an important requirement of VoIP systems. Although echo is not generated inside the IP network, when signals pass
from four wires to two wires inside the circuit switched network, some of the energy in the four-wire circuit is reflected back toward the
speaker. Due to network latency, and as the delay between the speaker's voice and the reflected signal increases, this echo can become
intrusive.
The resulting PCM samples are then packaged into voice frames and compressed by the vocoder to minimize their use of bandwidth. A
typical VoIP application will produce 10ms-long frames with 10 bytes of speech in a 128kbit/s linear PCM stream that is then compressed to
just 8kbit/s.
The voice frames are then integrated into voice packets. First, a Real-time Transport Protocol (RTP) packet with a 12-byte header is
created. Next, an 8-byte User Datagram Protocol packet with the source and destination address is added. Thirdly, a 20-byte IP header
containing the source and destination IP addresses is incorporated into the packet. Finally, the data is usually encrypted for security.
When the destination receives the packet, the process goes into reverse. The IP packets are numbered as they are created and sent
to the destination address. The receiving end must reassemble the packets in the correct order to re-create the original message.
VoIP implementations comprise hardware and software elements. Some of the key hardware elements and their inter-relationships
are shown in Fig. 1. Aside from the IP network itself, there are three major hardware blocks: processing servers, media gateways and
end-user devices.

Fig. 1: Hardware elements present in a VoIP network
Call-processing Servers
Call-processing servers provide call routing and communication to VoIP gateways. These are normally software based and deployed as
a single server, a cluster of servers or as a server farm with the functionality distributed across servers. They can also be based on
router platforms or developed as a dedicated appliance.
The two major handling requirements of a VoIP network are the VoIP voice-traffic payload and signalling-control traffic. The way
control traffic is routed conforms to a client-server model, the clients in this case being the VoIP terminals, including the message
servers that hold voice-mail messages.
In general the call-processing servers do not handle the VoIP payload. This is the RTP stream carrying the voice data and it is handled
in a peer-to-peer fashion. The VoIP terminals will determine the traffic flows and the call-processing servers negotiate those flows within
the control messages. Exceptions to this are where voice traffic is routed to another call-processing server, for call conferencing or
background on-hold music, for example.
Media Gateways
Media gateways form an interface between the IP network and more traditional non-VoIP telephone systems such as those based on PBX.
The complexity of the gateway depends on the different signalling formats or protocols that are being handled. These could include: Public
Switched Telephone Network (PSTN) signals; HDLC channels; E1/T1 and/or E3/T3; DSL; Ethernet and SONET/SDH.
Converting signals from one protocol to another in real-time can be computationally demanding. A high-performance, general-purpose
processor such as the PowerQUICC™ family from Freescale is a good option. An alternative possibility is to use a DSP. The StarCore™
range from Freescale, for example, can accommodate such protocol-handling requirements. The series scales from the cost-effective,
single-core MSC711X devices to the very high performance multicore MSC812X parts, running at an effective 2GHz.
The main function of the media gateway is to provide CODEC functions including compression, echo cancellation, silence suppression
and statistic gathering.
Gateways exist in several forms. A trunk gateway, for example, interfaces between a telephone network and an IP network, handling
a large number of circuits. By contrast, a residential gateway interfaces a traditional analog interface and an IP network. A corporate VoIP
gateway provides an interface between a traditional PBX or PSTN into an IP network.
There are currently two competing VoIP signalling standards: H323, defined by the ITU, and the Session-Initiation Protocol (SIP),
defined by the Internet Engineering Task Force. H323 is currently the most widely adopted as it has been around longer but with SIP's
greater simplicity this is expected to change.

Fig. 2: The V2IP development kit provides a solution based on Freescale's DragonBall
iMX21 processor and the Vericall Edge software platform from Trinity Convergence
End-user Products
End-user devices include VoIP phones, which appear to function in a similar way to traditional telephones but interface over an IP connection.
Alternatively, a desktop or software-based phone uses a computer to generate the IP connection.
Encryption of voice packets is likely to be increasingly employed as the popularity and thus perceived vulnerability of VoIP grows. One
approach often recommended here is the use of a coprocessor to handle such security-related processing. Freescale Semiconductor, for
example, supplies specialized processors to support the host processor with encryption, such as the MPC180 device. There are other devices
within the Coldfire™ and PowerQUICC® ranges that incorporate hardware encryption for standards such as AES, DES/3DES and RSA.
Kits are available to ease the task of developing VoIP phones. For example, Freescale is working with VoIP-software supplier Trinity Convergence
to supply the V2IP development kit that can be used for this purpose. The solution presented in this kit is based on the Freescale DragonBall™
iMX21 processor. This can provide both voice and video interfaces. It also employs Trinity Convergence's Vericall Edge™ software platform, a
cost sensitive product, which is designed to be run on a single processor platform. Fig. 2 illustrates some of the principal elements of the platform.
The above kit encompasses media processing, packet processing, system management and control, telephony signalling, gateway call control,
data transport and the integration of these interdependent functions. By fully integrating these disparate components, Trinity says that OEMs can
gain significant development time and cost savings.
When a large number of channels are involved, a DSP farm might be the best way to handle the CODEC, compression and echo cancellation.
In this case it might be advantageous to connect the DSPs together in a multi-processing environment.
VoIP is both an established and exciting technology that will allow legacy phone networks to be replaced with a single, more effective
infrastructure. This article has discussed some of the practical issues relevant to design engineers striving to put the underlying technology
in place successfully.