Media Composer Family

Learn how to edit film, video, and file-based footage faster and in more ways than any other editing system.

How Light-Fields Inspired Stereoscopic 3D Editorial

Only published comments... Mar 17 2013, 12:00 AM by Shailendra Mathur
Filed under:


With only a few days to go until I give my presentation about Avid's Stereoscopic Editorial Architecture - Light Fields, Intelligent Computing and Beyond at the NVIDIA GPU Technology Conference (GTC) 2013 on March 19, I wanted to spend more time building upon my previous blog post How Intelligent Computing Powers Our Editorial Architecture and discuss stereoscopic editorial architecture.


As a refresher, I want to briefly note what was covered in my first blog post. Avid’s stereoscopic editorial architecture is based off a unique heterogeneous compute architecture that we built to scale up to the performance requirement that high data rate formats such as Stereoscopic 3D pose to maintain a seamless editing experience. We call this the Avid Intelligent Compute Architecture in the mainstream environment and also fondly refer to it as ACPL (Avid Component Processing Library), since that was the original project name.


In this blog, I’ll cover the other aspect of the talk, which is the data model that we came up with for working with the Stereo 3D format. I will provide a brief overview of how we used familiar concepts from the existing Avid multi-cam and multi-resolution editing architecture, but then inspired ourselves from an evolving area of research and development called light-fields to design our current stereoscopic architecture.


The trend toward capture of increased granularity of visual information is undeniable. Acquisition of high frame rates, high spatial resolutions, and high bit-depth media is on the rise.


The increase in granularity in theory is, however, independent from increased “range”. As an illustration of range versus resolution, a 4K captured image could represent the exact same framing of a scene as an HD camera, but with more pixels.



Alternatively, it could represent a field of view that is twice as large as an HD view. In the first case (illustration above), the spatial range between the 4K and HD images are exactly the same, but the pixel size is smaller in the 4K image compared to the HD image. In the second case, the pixel size is exactly the same, but the image ranges are different.



Apart from a single image capturing a larger field of view, the other method is to put multiple views together to create a spatial field. Similarly, either with a single very high sensitive sensor or with multiple cameras with different exposures, multiple dynamic ranges are now being captured. Finally, in the temporal domain, capturing a larger range is easy … leaving the video camera running for a longer length of time!


These trends contribute to a vision we are following —“Super-sample the world, edit it later!”


As an editor and a storyteller, wouldn’t it be great if you had access to more views of the scene that were captured? You’d have more elements to work with, even if you were not there when it was initially shot. You could zoom-in, pan, re-project, slow-down, speed-up, increase contrast in low lights, expose highlights without blooming, etc., all without losing precision. To not lose precision means the data you’ve captured has a very high granularity, so you don’t face aliasing when making your new framing decision within the captured scene.


If we are to manage all of this data for you, we need a data model that can organize multiple captured views of the scene. We need tools to ensure that the various views can be expressed as “sync” relationships with each—temporally, spatially and in color. We need a player that knows how to output one or more views from the same scene. For the various sampling of the views, we also need to take resolution into account so we can reconstruct the right quality of the view for you. In doing all of this, we need to make sure that common workflows and functions such as proxy editing, transcode, consolidate, delete unused, etc., map to these data sets.


Light field theory provides a framework for storing multiple viewpoints of a scene. If we are to consider multiple images capturing the scene as samples of a light field, one requirement that arises is that they need to be placed in common coordinate systems in the temporal, spatial and color domains. It is how we will be able to address individual samples or know how to construct a novel view …


Aha, this is where our multi-cam and sync-clip paradigms come in. The tools that go with these grouped clips provide various methods of sync’ing multiple cameras into a common time reference. What if those tools were extended for aligning spatial views with respect to each other? How about tools to align the colors of the view samples? The same multi-cam architecture also allows us to choose different output views over time from the various camera angles. What if the same architecture allowed us to output various spatial and color views, as well?


It is using this inspiration that we developed the data model and the run-time to create the new Stereo Clips assets, as well as the editing model. For those of you who are already using this feature, you’re familiar with the methods of temporally sync’ing the left, right or other views that get added to the stereo clip. If you expand the source side timeline for the stereo clip edited into a sequence, you will also find spatial and color alignment tools there. To output new views, the effects sub-system, project setup and the viewing sub-system can request independent views from the same S3D grouped clip—currently limited to the different variation of Stereo 3D output views only. The S3D clips also keep account of different resolutions available for the various output views. The same S3D clip tracks full-frame version of the left and right views, go together with any frame compatible proxy versions.


While the usability and feature-set has been tuned to the particular needs of the Stereoscopic workflows, hopefully you see where we are coming from, and where we can go as the industry evolves.


Ready to learn more? Join me at the GPU Technology Conference (GTC) 2013 where you can hear more about stereoscopic architecture during my presentation about Avid's Stereoscopic Editorial Architecture - Light Fields, Intelligent Computing and Beyond on March 19. I’ll also be available after the presentation to talk with you about this topic and to answer any questions you have.


Thank you,


Shailendra Mathur

Leave a Comment

login or create an account to post a comment.

About Shailendra Mathur

Shailendra Mathur is the Chief Architect for video products at Avid with technology oversight over the editing, video server and broadcast graphics products. The responsibilities involve working with customers and technology partners externally, and product management and engineering internally to translate customer and business requirements into architectural and design strategies. With over 18 years of experience in the media industry and a research background in the area of computer vision and medical imaging, Shailendra has contributed to a wide gamut of technical products and solutions in the media space. Beyond his responsibilities in product development, his research and engineering interests have led to multiple publications and patents in the areas of computer vision, medical imaging, visual effects, graphics, animation, media players and high performance compute architectures. Over the past few years understanding the art and science of stereoscopy, color, high frame rates, high resolutions, and applying them to storytelling tools has been a passion. Other areas of interest are file based workflows, asset management and the trends around the merging IT and Broadcast technologies.

© Copyright 2011 Avid Technology, Inc.  Terms of Use |  Privacy Policy |  Site Map |  Find a Reseller