Desktop Video Applications - Overview

Video teleconferencing, or VTC, has long been recognized as a staple technology in the telehealth community.  VTC hardware has evolved through the years to include a range of solutions, including room-based telepresence systems that offer high-definition cameras with large screens, portable carts with cameras and monitors built into them, and dedicated desktop devices that can double as computer monitors when not in use in a conferencing session.

As healthcare providers have become increasingly comfortable with video teleconferencing solutions, they have become interested in seeing solutions that are available anytime, anywhere, without being limited to a physical location or particular network.  Manufacturers have responded to this need with a variety of software-based desktop videoconferencing applications.

This document contains the following sections:

Defining the Terms

The use of the term desktop videoconferencing, as found in this toolkit, may leave some people questioning exactly what is being described.  Various manufacturers use the term differently, and users tend to have their own understanding of what desktop videoconferencing is.  This toolkit defines desktop videoconferencing as a software application that supports real-time audio-video communication and operates on a personal computer.

What Desktop Videoconferencing Is

Desktop videoconferencing includes standards-based software applications, referring to those that can communicate with existing videoconferencing endpoints and other videoconferencing systems that use standards defined by the ITU (International Telecommunications Union) and their standardization sector (ITU-T) and other standards-issuing bodies.  These standards are open (meaning non-proprietary), and can be used to facilitate communication between products from different manufacturers.

Desktop videoconferencing also includes consumer-grade software applications, referring to those that only communicate through consumer-grade networks (the internet), with connections only available between personal computers that are running the manufacturer’s software.  These applications may follow some standards, but do not support the range of ITU-T standards that would allow a full audio-video conference with another manufacturer’s videoconferencing product.  The protocols used to communicate between these software clients are mostly closed (meaning proprietary).  Servers and computers that facilitate communication between consumer-grade clients are controlled by the manufacturer.

What Desktop Videoconferencing Is Not

To further clarify what this toolkit is addressing, it is important to define what is not covered.  Web-based applications are beyond the scope of this toolkit.  This includes webinar software, which is often used to host online meetings and presentations, as well as the web-based interfaces that some videoconferencing manufacturers make available for those who do not have their client software installed. Also beyond the scope of this discussion are the various hardware-based endpoints.  Some manufacturers provide all-in-one units that have built-in cameras and conferencing capabilities.  These are condensed versions of traditional room-based hardware codecs, and are not meant to be a part of this discussion on desktop videoconferencing.

An Introduction to Standards-Based Videoconferencing Technology

The term standards-based can be a loaded one when used to define a category of desktop videoconferencing systems.  If you look at a product sheet for any of the videoconferencing systems and you will see a veritable alphabet soup – E.164, G.711, G.722, H.263, H.264, H.323, SIP, CIF, VGA, QVGA; the list goes on.  Further complicating the issue is the fact that some consumer-grade products support a couple of the standards.

To loosely define an application that garners the distinction of being standards-based, it is important to look at whether or not the system can communicate with other existing VTC equipment.  Systems that cannot communicate with other VTC products with both audio and video are not considered to be standards-based and, aside from one special case, fall under the category of a consumer-grade system.

The Standards

As this document serves as an overview of videoconferencing technology, this will be a high-level summary of the standards.  A more in-depth look at the standards used can be found here.

Communication standards help systems figure out exactly how to talk to one another.  A handful of standards provide mechanisms for one device to find another, negotiate details of how the call will be handled, manage the call while it is in session, and then successfully terminate the connection when the call is completed.  Examples of these include H.323 (and its predecessor, H.320) and Session Initiation Protocol (SIP).  

Audio standards include a variety of G.7xx codecs that define the sampling rate, compression, and algorithms, bit rate for the transmission of audio in a videoconferencing session.  The H.323 standard, mentioned above, defines a set of audio standards that must be supported in order to be H.323-compliant, as well as a set of optional audio standards that can be used.

Video standards may feel a little more familiar to some people, as the H.264 codec is used for video encoding in many consumer products.  Some additional video standards have been used in videoconferencing technology, such as the H.261, H.263, and H.263+ standards.  One important note to make about the current, most widely-used standard (H.264) is that it contains various optional components.  Some of these, such as Scalable Video Coding, are implemented by only a handful of vendors.  This means that a “gateway” device may be needed to convert the video to a format that can be used with other products.

Resolution standards are used in some of the literature on desktop videoconferencing products, specifically when describing how much processing power is needed to handle a video of a certain size.  Examples of these resolution standards are VGA (640x480 pixels), QVGA (quarter VGA, or 320x240), CIF (352x288), and 16CIF (1408x1152).  Datasheets may say “QVGA with a 1.5 GHz P4 processor”, meaning that a computer with a 1.5 GHz processor is required to handle sending and receiving a 320x240 pixel video stream with audio.

As previously stated, this link provides more information on these standards, additional standards, and what they mean.

Overview of Standards-Based Videoconferencing Technology Infrastructure

While each standards-based vendor has its own specific hardware requirements to enable communication with other systems, and while they may implement different combinations or variations of the standards, there are several key functional components shared between the vendors.  The following section describes what these parts are, and how they work together.  Please see the review of individual manufacturers for exact product descriptions and requirements.

Desktop Video - Standards-Based Graphic

Starting with the standards-based desktop videoconferencing system end-user, the VTC system contains a client application.  This is a piece of software that must be installed on the user’s computer.  The software will be configured to be able to communicate with a centralized computer, or gatekeeper.  Users will log-in to their client software, and the gatekeeper will ensure that their logon credentials are valid.

The gatekeeper, as mentioned above, authenticates users.  It is also responsible for managing connections to other users’ client applications, hardware-based endpoints, and other connections within the system.  These devices limit how many active video connections can be maintained at any given time.

For users connecting from a location outside of their organization’s network, such as a home or remote office, an intermediary device is often required.  This device is called a proxy.  It sits on the “edge” of a network, outside of the firewall and exposed to the internet.  This device routes calls through the firewall and to the gatekeeper, allowing a user to connect to the video system as easily as if they were within the network.  

Note that it is also possible to create a virtual private network (VPN) connection, bypassing the need to use a proxy device.  The VPN option may result in performance concerns, and can add a layer of complexity when handling outside connections.  This may be problematic, especially if people who do not already have network credentials need to access the videoconferencing system, such as patients or consulting physicians from another organization.

With these basic components, many systems can support in-network videoconferencing.  A room-based device can communicate with a user at home, desktop clients can communicate with one another, and other basic needs are met.

To be able to communicate with other networks and video systems, or to be able to communicate with different manufacturers’ infrastructure within a single network (such as a new desktop system from Manufacturer A to a legacy room-based system from manufacturer B), additional devices are needed.  These devices, called gateways, help to translate between different communication standards and video formats.  These gateways can help connect calls from SIP to H.323 systems, or transcode (convert) from an H.264 SVC video stream to straight H.264.

The infrastructure described so far will only allow point-to-point communication in most manufacturers’ systems.  To support multiple simultaneous users in the same video conference, a bridge is required.  Note that bridges are sometimes called multipoint control units (MCU).  This device manages connection speeds to each different end-point, sending the amount of video data that can be supported.  Some configurations may show all “bridged” conference attendees on one screen, while others may show only the active speaker.

Most of these systems will support content sharing with others engaged in the video conference, which means that applications and documents can be viewed within a conference session.  Additionally, features such as instant messaging, contact list management, and other communication tools are included with the infrastructure described above.

Other devices can be a part of the video infrastructure, such as recording and streaming devices that capture and play back video conferences, but they are not required to support basic videoconferencing needs and desktop videoconferencing systems.

An Introduction to Consumer-Grade Videoconferencing Technology

Using the term “consumer-grade” to describe software for videoconferencing can be, as with “standards-based”, a contentious issue.  In this particular context, “consumer-grade” reflects the fact that video traffic is transmitted over the internet, that the products do not openly communicate with other systems, and that they neither require nor allow the installation of organizationally-managed infrastructure devices within an organization’s network. 

“Consumer-grade videoconferencing” is not a reflection on the appropriateness of the technology for use in healthcare, but an acknowledgement of the different way in which these products operate.  [sidebar message: Please see this discussion on the use and configuration of consumer-grade software in medical environments for more information on whether or not it is appropriate for your organization.]

Consumer-grade videoconferencing products may use various standards in their operation.  Depending on the manufacturer, these may include encryption standards for protecting calls, the H.264 video standard for encoding and decoding, SIP for managing voice-only calls, or various text- and file-transfer standards for transmitting instant messages and documents.

Overview of Consumer-Grade Desktop Videoconferencing Infrastructure

Not all consumer-grade desktop videoconferencing systems operate in the same way.  As these are closed, proprietary systems, the exact inner-workings of their infrastructure are not always clear.  That said, there are several features that each product has in place to initiate and control video calls between client applications.

Desktop Video - Consumer Grade Graphic

As can be seen in the diagram above, the components required to support consumer desktop videoconferencing software are significantly reduced in comparison to the standards-based systems.

The client application is installed on the end-user’s computer.  It is preconfigured to communicate with a central server that is managed by the manufacturer of the software.  The end-user has control of various features, such as the logging of instant messages, call records, and which users of the manufacturer’s software can be contacted, including those who are outside of an organizations network of preferred providers and users.

The client application communicates with a centralized authentication server.  This requires communication with an internet-based service that is hosted by the manufacturer of the product.  As these are not a part of an organization’s own infrastructure and network, they cannot connect to domain controllers for user accounts or other authentication features.

Call routing, similar to the gatekeeper function of the standards-based systems, is managed by the manufacturer of the product.  This may include servers hosted by the manufacturer, or may utilize other peer-to-peer “node” computers that are running the manufacturer’s software.  Using nodes can help bypass certain problems with communicating through firewalls, but raises some concerns about where the traffic is being routed.  Manufacturers may use encryption to ensure that the call initiation and the conferencing session cannot be captured by any intermediary systems.

Once a connection is established, many of these systems try to create direct connections between the client applications, reducing the bandwidth demands on any of the controlling devices.  In some situations, such as where a firewall may be preventing a call from being performed in such a direct fashion, the call may continue to be routed through intermediary routing devices.

Organizations must make a decision if they are comfortable with the lack of control that they may have over these products.  Again, please refer to the discussion on consumer-grade software in the medical environment for more information.

Note that not all consumer-grade videoconferencing software supports having multiple participants in a single videoconferencing session, which may make the software less ideal for some settings.  Additional features, such as instant messaging, file transfers, and contact list management are included with this software.  Note that file transfers include the actual sending of files, as opposed to content sharing (as found with standards-based software), which shares a read-only view of the file, without allowing for the viewer to interact with the file directly.

A Note on Webcams

Desktop videoconferencing software requires a webcam of some sort to allow the sending of images and audio.  These may be built into some computers, such as laptops, or may be USB devices.  The quality of the camera will play an important role in the quality of the images viewed in a videoconferencing session; a great network connection will lose much of its benefit if the video quality is too low.

Some webcams are advertised as having the ability to provide “high-definition” video streams.  There is an issue with some of these claims as there are limitations in how much video data can be sent through a USB connection.  While their sensors may be able to capture an image that qualifies as high-definition, extensive compression must be applied, or else the frame rate must be reduced to one that falls within the limitations of the USB standard.  Additionally, the stream of high-definition video places an enormous strain on a computer’s central processing unit, and may be a limiting factor in how high the image quality can be.

One solution that has been applied to work around this issue is to provide hardware support and image processing within the camera, and then use proprietary data processing that reduces the strain on the CPU and the limits of the USB transfer speed.  The Tandberg PrecisionHD USB Camera uses this process to transfer video via USB, with specific features within their software providing an increase in performance when decoding the video.

The Telehealth Technology Assessment Center has not performed a comprehensive survey of USB webcams that are currently on the market; however, the TTAC is happy with the Logitech 9000 Professional webcam.  The TTAC does not endorse the Logitech webcam, and encourages organizations to perform their own review of webcams when making their purchasing decision.  A list of webcam manufacturers can be found here.


Desktop videoconferencing is still relatively new in the world of telehealth.  Organizations are finding an increasing demand to provide the ability to connect their providers, their patients, and their partners with always-on video capabilities.  These new technologies are requiring organizations to assess needs, product capabilities, and the new risks posed by changing videoconferencing modalities.  Standards-based desktop videoconferencing solutions have the promise of interoperability, while the consumer-grade products provide free and ubiquitous access to video conferencing for a wide range of users.  Educating healthcare providers, organizational leaders, and curious patients is an important aspect of implementing this technology.

Additional information is available throughout this toolkit to inform decisions on how to make a decision between these two classes of products, educate readers on differences between the products specifications, and more.