After a failed initial attempt in the 1950s to bring three dimensionality into the entertainment sector, 3D technologies today have reached a point where media production, transportation, as well as consumption are of sufficient speed and quality to finally advance the complete media chain from two dimensional (2D) to three dimensional information. Production is based either on stereo‐ or multi‐scopic video capturing techniques or on synthetic environments that can mimic or adopt real‐life behavior. The resulting 3D media can be transported by providers over the existing telecommunication infrastructure to be consumed by users using single‐ or multi‐view 3D devices such as TVs, goggles, or projectors. The Visual Communication Department in Bell Labs' Application Research Domain has taken on the research challenge to replace a user's avatar in a 3D virtual environment with a real time representation of the user derived from his streaming 3D video image. With this, the tedious process of avatar creation becomes obsolete as people are able to gesture naturally and convey facial expressions within these synthetic worlds. This paper highlights the technical challenges around building and implementing an architecture for innovative and immersive communication systems. We also discuss the technical limitations of the current approach, and pinpoint major open issues. © 2011 Alcatel‐Lucent.