Imagine that you’re walking into a darkened conference room. You switch on the lights and make a few phone calls. All of a sudden, three of your colleagues from across the globe appear at the conference room table as if they were sitting there in the dark all along. This represents the essence of telepresence—an ultra-high-end video-conferencing system.
These systems employ high-definition video on 50-in. or larger flat-panel displays with audio designed to make all of the participants’ voices seem like they’re coming straight from their lips. And that’s not all. Typically, factors such as lighting and even furniture are taken into account, with possibly half a conference table in one room and the other half in the remote room.
A telepresence system like this could cost several hundred thousand dollars, as is the case with the TelePresence 3000 from Cisco Systems. But viable alternatives exist at a variety of price points from companies such as Hewlett-Packard, Life- Size Communications, Polycom, Sony, Telanetix, and Vidyo.
Design engineers wanting to build telepresence and highdefinition video-conferencing systems, from high-end setups down to those that might run on PCs and video phones, should begin by surveying the hardware needed to implement these systems. The latest H.264 codecs are a good starting point.
H.264 CODECS
The driving technology behind telepresence and high-definition video conferencing is the H.264 video standard, which provides over twice the compression ratio of MPEG-2. Several companies make H.264 codecs, including Fujitsu Microelectronics America, W&W Communications, and Mobilygen.
Fujitsu’s MB86H51 compresses and decompresses full highdefinition video (1920 dots by 1080 lines) in real time using the H.264 format (Fig. 1). This is a single-chip implementation for full HD H.264 high-profile version 4.0 video processing that incorporates embedded memory. It also compresses and decompresses audio in real time by utilizing formats such as the MPEG-1 Audio Layer.
The MB86H51 uses a proprietary algorithm that automatically applies less compression to areas in the image where compression artifacts are most noticeable to human vision, such as human faces or slow-moving objects, and increased compression to other areas. The effect is to maximize image quality for those critical zones. This feature also makes it possible to reduce image size to between one-half and one-third the size of the MPEG-2 format with an equivalent level of image quality.
“The advantage of our chip lies in our compression algorithm,” says Davy Yoshida, director of Business Development of Fujitsu Microelectronics America. “Comparing the compression of MPEG-2 and H.264 is 2.5 times the compression. So a 25-meg image will be 10 megs, at equal quality. But our chip can compress, with very little depreciation, to a smaller size, like 25 megs to 5 megs, and still show a very good quality picture.”
The chip also contains two blocks of 256-Mbit fast-cycle random access memory (FCRAM) embedded on-chip. The chip measures only 15 mm squared and consumes just 750 mW. The MB86H51 comes in a 650-pin FBGA package and began mass production in July of last year, priced at $295 in sample quantities. Fujitsu plans to develop a much more cost-effective version of this codec, and it may launch in the latter half of this year.
W&W Communications’ WW10K H.264 HD codec chip set consists of the WW10000BA single-chip encoder and the WW10001BA single-chip decoder (Fig. 2). The low encode-decode tandem delay as well as the ability to encode and decode 1080p and 720p video at low bit rates suit the WW10K chip set for high-definition video-conferencing and telepresence applications.
The WW10K runs at 110 MHz in single-chip implementations of the encoder and decoder. The WW10000BA encoder compresses 1080p or 720p HD video at bit rates that are two times lower than MPEG-2 HD encoders, with 15% better peak signal-to-noise ratio (PSNR). The WW10001BA decompresses the encoder’s bit stream into quality 1080i/p or 720p HD video.
The chip set has an encode-decode tandem delay of less than 35 ms or about 1 frame at 30 frames/s, delivering performance very close to the H.264 Joint Model. It can handle up to four video inputs simultaneously at different bit rates and resolutions, up to 1920 by 1088. This makes it possible to design systems that dedicate one camera per participant or group of participants and one display per participant or group of participants, delivering more immersive and lifelike video communications experiences.
Continue on Page 2