Videoconferencing Endpoints - Standards

Videoconferencing systems rely on a large collection of standards for everything from displaying video to establishing connectivity with an endpoint across the globe.  The vast majority of features implemented by videoconferencing systems have a corresponding standard.  Not all manufacturers support all of the standards in the same way, resulting in some proprietary implementations of features that can make interoperability a problem.

One of the main standards-issuing bodies in this field is the International Telecommunication Union, which releases a set of standards in the ITU Telecommunications Standards Sector (ITU-T).  Many of the video standards, including the G. and H. standards, are created by this group.

Video Standards

Video standards address the compression and playback of video data.  The standards are used in many areas outside of videoconferencing, with applications in many multimedia systems, both online and on computers.  The two prominent video standards, H.263 and H.264, are widely adopted and used throughout the videoconferencing systems.  Older standards such as H.261 are not used as frequently.

The H.264 includes a series of annexes which provide additional features or functions to the video standard.  These annexes are not required, and the implementation of specific features is not done uniformly. One relevant example is a feature in Annex G called Scalable Video Coding, or SVG.  SVG allows for separate detail layers, or sub-bitstreams, to be created, which can be translated into a smoother video image on networks with certain connectivity issues.

Annexes that are not universally supported provide a particular challenge in the videoconferencing world, as the benefits of the annexes will not be seen in systems that have not implemented them.  Similarly, as the annexes may be implemented differently between manufacturers, there is no guarantee that the features will work between systems without use of other intermediary devices.


Along with video standards, there are a set of standards for the resolution of video images.  A variety of terms are used to describe how many pixels there are in a video image, many of which are seen in areas beyond videoconferencing.  High definition and standard definition are the terms commonly used, but additional phrases such as high resolution, VGA, 720i, and 1080p are often used as well.  Some of the terms are very clearly defined, while others may be defined more broadly.

High definition refers to any video signal that is of a higher resolution than standard definition video, though this is often defined as vertical pixel counts of 720 and 1080, with respective horizontal pixel counts of 1280 and 1920.  Some additional terms are often attached to these resolutions – progressive or interlaced.  Both of these refer to the method by which the image is sampled from the imaging sensor. 

A progressively scanned image updates the entire image at once, meaning that a paused video image will show a complete, clear picture.  Interlaced scanning samples the image in horizontal bands, alternating between even and odd pictures.  This means that on frame one of a video image, all odd lines will be captured, while the next frame will capture all even lines.  This means that a paused video image will display alternating bands of video from the most current frame and the preceding frame.

Standard definition covers video signals that are either in the PAL or NTSC standards.  NTSC is the standard used in the United States, and covers a resolution of 640 x 480 pixels.  Other common resolutions and associated acronyms include 400p, 448p, SIF, CIF, 4SIF, 4CIF, QSIF, QCIF, VGA, XGA, or QVGA.  
Despite the increased interest in high-definition video, standard definition video still certainly has a place in telemedicine, as it requires less bandwidth and is still the format of choice for consumers using medical imaging devices that connect to videoconferencing endpoints.


The G.nnn standards (such as G.711, G.722, and G.729) are the audio standards used in video conferencing.  Support for G.711 is mandated by the H.323 standard (which means that all H.323-compliant systems must support G.711). 

Other G.nnn standards are included in annexes to the H.323 standard, providing various sampling frequencies and bandwidth optimizations.  G.722 provides an increase in the sampling rate of the incoming audio, which increases fidelity at the cost of additional bandwidth.  G.729 requires less bandwidth, as it is optimized for frequencies associated with speech.  There are some questions as to how the G.729 standard may impact transmission of certain sounds associated with telestethoscopy over the standard audio-input and microphone lines (note that this should not impact the use of serial stethoscopes that do not send audio data through an audio input on the endpoint).

Multimedia Call Control Standards

There are two primary standards used for initiating a call between videoconferencing systems – H.323 and Session Initiation Protocol (SIP).  These standards are designed to manage call signaling, call control, and media streaming.  At this time, SIP and H.323 are not interoperable standards, though calls can occur between systems using the different standards if a gateway infrastructure is in place.

Addressing, or defining a unique identifier for the endpoints, is done differently in SIP and H.323.  SIP addresses are formatted in a Uniform Resource Identifier format (e.g.,,, or jsmith@  H.323 addresses can be either IP addresses (e.g., or, H.323 aliases (e.g., user@ or or E.164 addresses (e.g., 18005551234).

Content Sharing

Sharing content from a computer is managed through the H.239 standard.  This is an ITU-T recommendation for the H.323 standard, supporting the streaming of a still image (typically taken from a VGA-type input) to another endpoint.  Systems that only use SIP will display the content as a part of the main video, while systems that use H.323 will have the content sent to another screen, if available.  Some calls that are signaled and managed via SIP may use H.323 as a part of their infrastructure, thereby allowing content to be shared.

Serial Data Transfer

Some medical devices, such as electronic stethoscopes, may be capable of providing a serial output.  Some videoconferencing manufacturers support the transmission of this serial data in a separate channel in the course of videoconferencing, allowing for the serial data to be sent alongside standard video and audio content.

It is important to note that at this time vendors do not send serial data in a standardized, agreed-upon way, which results in a lack of interoperability between manufacturers.  In the course of TTAC evaluations, the following was noted:

  1. It was not possible to send serial data between two different manufacturers’ endpoints
  2. It was not possible to send content via a single manufacturer’s Multipoint Control Unit (MCU) and their respective endpoints in a bridged call, BUT
  3. It was possible to send content via that particular manufacturer’s MCU and a different manufacturer’s endpoints


Both H.323- and SIP-controlled calls can support encryption.  Encryption is typically done with an implementation of the Advanced Encryption Standard (AES), with some options for signaling encryption done with an implementation of the Secure Sockets Layer (SSL) protocol.  H.323 calls support the H.235 standard, which details requirements for encryption and integrity. These are used in conjunction with the H.245 standard that handles many issues, including authentication. It is important to ensure that endpoints are properly configured; some systems may drop encryption if the connecting site does not support it, while others may disallow connections from or to unencrypted systems.

Security may be implemented with a variety of protocols, including Secure Sockets Layer (SSL), Transport Layer Security (TLS), Secure Real-Time Transport Protocol (SRTP), Secure Shell (SSH), and Internet Protocol Security (IPSec).  

NAT Traversal and Firewalls

Firewalls, which are an important part of organizational network infrastructures, can provide several challenges for videoconferencing.  They are capable of restricting what traffic is allowed into and out of a network, and may perform Network Address Translations (NAT), which effectively obfuscates the exact IP address of computers within a network.  As endpoints often need to communicate through firewalls, NAT-transversal techniques and standards need to be implemented.

H.460 is the most common ITU-T standard for NAT transversal. Methods for traversal include Session Traversal Utilities for NAT (STUN), Traversal Using Relay NAT (TURN), and Interactive Connectivity Establishment.