Live Probabilistic Editing for Virtual Cinematography

Velho, Luiz; Carvalho, Leonardo; Lucio, Djama

doi:10.1007/978-3-319-99426-0_4

Luiz Velho¹⁷,
Leonardo Carvalho¹⁷ &
Djama Lucio¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11112))

Included in the following conference series:

International Conference on Entertainment Computing

2157 Accesses

Abstract

This paper introduces Probabilistic Editing for Virtual Cinematography. It is part of the VR Kino+Theater platform and provides high level authoring tools for cinematic presentations. The director acts as a DJ controlling in real-time an A/V switcher interface that selects the camera views of a theatrical performance in a virtual reality experience.

You have full access to this open access chapter, Download conference paper PDF

A Pattern-Based Tool for Creating Virtual Cinematography in Interactive Storytelling

Ressaca and Dispersão: Experiments in Non-linear Cinema

Video-based interactive storytelling using real-time video compositing techniques

Article 02 February 2017

Keywords

1 Introduction

Entertainment is presently going through radical transformations as a consequence of technological developments, advances in the state of the art, and social changes. This will shape the way cultural audio-visual products will be delivered and consumed.

A direct consequence of the above scenario is that new creative possibilities become available for content producers. Here we explore one of these avenues, related to live editing and virtual cinematography.

2 Related Work

A seminal work in the area of interest is “Live Cinema”, an initiative of the film director Coppola [2]. His proposal is an attempt to combine theater, film and television as an experimental form of storytelling.

In this setting, performances are acted live and viewed by an audience in real time on a movie screen. The goal is to achieve a more cinematic look and feel than what is typically employed for dramatic broadcasts. It employs professional television technology borrowed from TV sports.

In order to demonstrate the concept, Coppola promoted a workshop with American Zoetrope at the UCLA School of Theater, Film and Television in 2016. During an one-month period, 75 UCLA students and faculty produced a 27 min piece called “Distant Vision” for a live broadcast to a limited audience. The production involved operating over 40 cameras, acting and working on sound, set design and construction, costume, props, editing, and stage management.

The main issue of the project is the coordination of all these practical aspects restricted by the traditional video technology. In that respect, we can arguably say that Live Cinema was a vision ‘ahead of its time’, impaired in many ways by the physicality of the medium.

Another related work is [4], where probabilistic editing is used for video in a post-production phase, consequently it cannot be applied to a live show.

3 VR Kino+Theater

VR Kino+Theater [6] is a new platform for storytelling that shares various aspects of Coppola’s vision of Live Cinema. The main difference is that it is based on 3D Computer Graphics and Digital Network Communications.

The platform we propose integrates traditional forms of entertainment, such as Theater and Cinema, with advanced technology, more specifically Virtual Reality and Gaming.

The main components of VR Kino+Theater exploits the concepts of Situated Participatory Virtual Reality and also Live 3D Digital Cinema. We believe this initiative points to the directions for the future of media.

3.1 Situated Participatory VR

Situated Participatory Virtual Reality [7] is a modality of VR that allows the creation of Shared Multi-User Virtual Environments. For this purpose, it combines real and virtual objects in tangible spaces, where the participants, represented by digital avatars, are completely immersed in a simulated world. They use VR headsets and markers for full body motion capture.

The above setting implements the Theater component of the platform. As such, the actors perform in a VR stage that is mapped into a virtual set. Figure 1 shows the real actors performing in the VR stage and the corresponding action of their avatars in the CG virtual set.

3.2 Live 3D Cinema

Live 3D Digital Cinema is the technology behind the non-immersive Audio-Visual presentation format of VR Kino+Theater. It consists of the Computer Graphics infra-structure for Animation, Real-Time Simulation and Rendering of the experience.

The virtual cinematography framework includes Pre-Programmed Cameras and Interactive Editing for generating the cinematic content.

The above setting implements the Kino component of the platform. In this context, the director selects in real-time the views that are shown on the live movie projection screen.

Figure 2 shows the director operating a multi-camera switcher during a live presentation.

Figure 3 shows the image selected by the director at a moment of the presentation (the lower right camera in the interface).

4 VR Kino+Theater Cinematography

In this section we give an overview of the VR Kino+Theater Cinematography. It is composed of a camera specification infrastructure and an interface for the camera selection by the director in real-time.

The camera specification infrastructure is implemented through a layered architecture with three levels: Unity CG Cameras; Cinemachine Camera Operators; and K+T Virtual Cameras. The director interface consists of a live camera switcher.

4.1 Unity Camera

The Unity Camera Layer corresponds to the low level of the camera specification infrastructure. It consists of a standard Computer Graphics Camera of the Unity Game Engine [5]. The camera is defined by the usual parameters, such as position, orientation, field of view, etc.

4.2 Cinemachine Operators

The Cinemachine camera operators constitute the intermediate level of the camera specification infrastructure. These camera operators embody a framework for smart, programmable cameras. In that respect the operator knows about the entities in a Unity scene and controls the camera specification based on visual composition rules.

The two main control mechanisms are the Composer and the Transposer, they allow to specify the camera in screen and scene space respectively.

4.3 Kino+Theater Cameras

The VR Kino+Theater Camera layer forms the higher level of the camera specification infrastructure. It consists of an object based abstraction for creating the entity of a Virtual Cameraman and allowing to instantiate these objects for specific purposes.

In that respect, the Kino+Theater Cameras are designed for Cinematography Storytelling and allow the director to compose shots for each scene of the narrative.

There are two classes of camera objects: General; and Timeline. General cameras refer to shot types that can be used freely during a scene, while Timeline cameras are meant to be used at certain times along the scene and are choreographed for specific events of the action.

4.4 Kino+Theater Camera Switcher

The Director’s Interface allows the control a live image on the movie screen by selecting the active view using an special-purpose multi-camera switcher.

This interface contains the views of 12 pre-programmed cameras showing in real-time the CG simulation. A view is activated by a simple click. The director interface also contains additional controls for triggering simulation events. The cameras are divided in two blocks: one block with 8 general multi-purpose cameras, such as close ups, medium shots and characters points of view; and another block with a sequential list of timeline cameras which are custom designed for specific parts of the action. Figure 4 shows the Kino+Theater Camera Switcher interface.

5 Reaching to a Higher Level

The VR Kino+Theater Cinematography infrastructure implements a powerful mechanism for live editing of cinematic experiences. Nonetheless, the image director operates the camera switcher interface using a rather explicit control. He/she has to exercise every single cut of the visual piece (at precise moments in real-time).

The scenario described above motivates us to reach to a higher level of control that could provide more expressive power. In that sense, the goal is to allow the director to act as a DJ, using an interface designed for stylistic control and live improvisation.

Such an interface has to expose the “right parameters” in a concise and intuitive way. It should be pre-configured based on the scene content and the desired cinematic style variations. The key for creating this device is to exploit the concept of generative interfaces.

Another important point is that the proposed functionality should be built on top of the VR Kino+Theater Application Framework and rely on the layered architecture of its Cinematography infrastructure.

5.1 Stylistic Control

The Stylistic Control is based on Shot Classes and Timeline Events.

The Shot Classes are related to visual characteristics of the movie image. For example: Close-Ups; Middle; and Wide shots. The Timeline Events are related to specific moments when an action occurs.

These two style elements are combined using Cinematic Rules that are part of the Cinematographic language. Together they deal with aspects such as pacing of cuts, etc.

5.2 Architecture

The Cinematography Infrastructure of VR Kino+Theater is implemented using a layered architecture. It consists of several layers for the camera entities.

As presented in Sect. 4 the first three layers correspond respectively to the levels of abstraction for: the Unity Camera; Cinemachine Operator; and K+T Cameraman.

In order to incorporate style control, we extend this hierarchy to include a higher order level: the K+T AutoShot. Figure 5 shows the complete camera abstraction hierarchy.

The K+T AutoShot embodies a high level cinematic style control that is the basis of probabilistic editing, as will be discussed in the next section.

6 Probabilistic Editing

Probabilistic Editing is a framework for the design of cinematic style in live audio/visual performance presentations. The framework is based on Film Grammar, Stylistic Edit Patterns and Flow of Action in order to create a Live High-Level Control mechanism.

In this framework the Film Grammar maps to the concept of Camera Groups, the Stylistic Patterns are represented by a Cut Graph and the Flow of Action follows mark-up sequences in the Timeline.

The result is a generative edit interface that extends the VR Kino+Theater camera switcher.

6.1 Camera Groups

Camera Groups embody the main conceptual entity that is manipulated in the probabilistic setting of our framework. They form the building blocks of Cut Graphs (see next subsection) that are used to represent a cinematic style design.

Typically, camera groups are created by the image director following classification principles that are based on shot classes. Furthermore, they are defined per scene, i.e., they depend on the narrative content and the staging of specific scenes.

For example, Fig. 6 illustrates a camera group for the CloseUps in the Cell Scene of the experiment “The Tempest”. This group includes the close-up shots of the characters Miranda and Prospera.

6.2 Cut Graph

Cut Graphs are the probabilistic representation of a cinematic style in our framework. They are mathematically a Probabilistic Graphical Model [3], in which the nodes are random variables and the links are statistical dependencies among these variables.

The nodes describe in probability terms the particular cinematic style rules for a given sequence that is part of the narrative. For example, Table 1 shows a rule for the Camera Group “CloseUp of Cell Scene” mentioned above. Essentially, it specifies that close-ups of Miranda have a 20% chance of being selected, while close-ups of Prospera have 80% chance.

Table 1. Node R1

Full size table

As a whole, the cut graph models a probability distribution that characterizes a particular cinematic style. Intuitively, this density distribution is a way to decide which shots to select for a cut in a probabilistic sense.

Figure 7 shows an example of a Cut Graph. The top nodes (R1 to R3) are associated with parametric input decisions, and the bottom node (Rn) is associated with the final cut selection.

6.3 Modeling the Distributions

In order to model the probability distributions in the Cut Graph we can use either parametric or non-parametric models.

In the parametric setting we have the following characterization of distribution functions \({\mathcal P} = \{ P_{\varTheta } \in {\mathcal P} : \theta \in \varTheta \}\), where \(\varTheta \) is the set of parameters. While in the non-parametric setting the distribution is given by a table, for instance, in the form of a histogram (as in the example of Table 1).

6.4 Timeline

One important aspect that remains to be considered is related to the Timeline. That is, to answer the question: “When to perform a Cut”? In order to model this temporal aspect we resort to Track Markers. They are associated with Camera Groups and defined by a list of time-stamped annotations indicating the moments to evaluate the cut graph for a decision of which cut to make.

In other words, the several layers of track markers collectively specify how often to perform a Cut. In that sense, the image director can determine the Granularity of Markers, which could also have a Nesting structure. Figure 8 illustrate the Timeline and Track Marker layers.

6.5 Style Design

The Style Design is accomplished by combining the nodes in a cut graph to set the conditional probabilities. Each node is associated with a track marker layer that specifies both the camera groups involved and type of probabilistic model, as well as, the timeline events for potential cut evaluation using the graph.

The general format of the track mark entries is as follows (Fig. 9):

For non-parametric models the specification is the list of cameras of a camera group and their associated weights. This allows a computation of the histogram description (See below Fig. 10).

Parametric models are associated with camera groups and a procedural layer that has controls for their parameters, which can be exposed in the live editing interface as we will see in the next subsection.

Figure 11 shows an example of a style design using both non-parametric and parametric models.

6.6 Editing Interface

The editing Interface extends the live image switcher to provide high levels controls. It is programmable with the style parameters for each scene and is meant to be used in Live Cinema.

The interface operates either in an auto or manual mode. The auto mode selects automatically the cuts based on the probabilistic style graph, without the interference of the director. However, the director can also change the style parameters using the interface controls and these modifications will be reflected in real-time on the cut decisions by the machine. Furthermore, the director can override the probabilistic style machine to select individual cameras at any moment—virtually operating the switcher in manual mode.

Figure 12 shows the Editing interface. Note the highlighted area indicating the style controls.

7 Case Study: The Tempest

In this section we present a case study of using the high level live edting functionality on an actual A/V experiment.

We used the tool for the Cinema presentation of the Shakespeare’s play, “The Tempest” [8].

7.1 Cameras

The production of the experiment consisted of three scenes: the Cell; the Clearing and the Epilogue.

The cinematic style design is the one depicted in Fig. 11. At the top node of the cut graph, the random variables control the decision between a theatrical and film styles – in this case, that means respectively wide shots of longer duration versus near shots with fast paced cuts.

In the Cell scene the camera groups are close-ups and points-of-view for the track of film style and mid shots from fixed angles (front, back, left, right) for the track of theatrical style.

In the Clearing scene the camera groups are also close-ups and points-of-view for the track of film style but for the track of theatrical style we included both wide and mid shots.

The Epilogue scene does not have a probabilistic editing setting, only a deterministic zoom shot.

7.2 Results

In order to verify the effectiveness of the high level cinematic style control of our framework, we performed several laboratory tests and also produced a live presentation with participation of the audience.

The first test was a comparison between Film and Theatrical styles. In that case, we generated two different montages of the experiment “The Tempest”, featuring these two extremes, i.e., respectively 100% of Film style and 100% of Theatrical style. In the Film style montage, only near shots have been used and the cuts performed when each characters started a new line of the dialogue. In the Theatrical style montage, only mid and wide hots have been used and the cuts selected when the action caused a change in relative position of the characters.

Also, the Film style was controlled by a parametric probability distribution that would decide between action or reaction shots (i.e., showing the character speaking or listening) and between close-up or point-of-view shots of the selected character. In contrast, the Theatrical style was controlled by a non-parametric probability distribution, created by the director to maximize the visibility of the characters in the frame during their action.

The second test was a montage that combined Film and Theatrical styles with equal probability (i.e., a 50%/50% chance).

7.3 Evaluation

The evaluation of our tests revealed that the proposed probabilistic cinematic style designed for the experiment “The Tempest” provided a simple and intuitive control of the style variables involved.

The extreme cases: Film and Theatrical montages showed what was expected, in terms of framing coverage and pace of cuts.

The intermediate case: combination of 50% Film and 50% Theatrical montage produced a result very close to a realistic editing setting, with a well balanced combination of wide and near shots and a good cut dynamics.

8 Conclusions and Future Work

In this paper we described a new expressive tool for Audio/Visual Presentations in the context of Live Cinema. It extends the VR Kino+Theater image switcher to provide high level style controls that allows the director to act as a DJ.

Future work goes into two directions: investigate the inverse approach to generate a Cut Graph, and experiment with an editing interface controlled by the viewer.

Our current work proposes a generative interface for high-level probabilistic live editing with virtual cinematography. In that sense, the probability distributions in the cut graph, as well as the parametric controls in the interface, are created by the director. This is the direct approach to solve the problem.

The other side of the coin is the inverse approach to automatically generate the solution. The problem, then, is to generate the editing machinery from examples, using machine learning techniques, such as Deep Learning with Neural Networks. In this setting, we can have two possible scenarios: the first scenario would be to estimate only the editing style from montages of an author, using supervised learning; the second scenario would entail a full understanding of the editing style structure manifold, including the style, the parametrization and the AIA interface controls.

It is worth noting that this inverse approach to the problem has many potential applications, for example in Live presentations of Sports and Music Shows.

Finally, another way to explore the ideas discussed in this paper is in a live non-linear editing setting where the montage of the story would be controlled by viewer through an interactive interface for selecting the relevant shots and show them on the screen. This kind of interface has been proposed by the Eko group in the context of interactive storytelling [1]. They recently released the series “War Games” using this approach.

References

Company, E.: Eko studio (2017). https://studio.helloeko.com/
Coppola, F.: Live Cinema and Its Techniques. Liveright, New York (2017)
Google Scholar
Jordan, M.I.: Graphical models. Statist. Sci. 19(1), 140–155 (2004)
Article MathSciNet Google Scholar
Leake, M., Davis, A., Truong, A., Agrawala, M.: Computational video editing for dialogue-driven scenes. ACM Trans. Graph. 36(4), 130:1–130:14 (2017)
Article Google Scholar
Technologies, U.: Web portal (2017). https://unity3d.com/
Velho, L., Carvalho, L., Lucio, D.: VR Kino+Theater: a platform for the future digital media. Technical report TR-01-2018, VISGRAF Lab - IMPA (2018)
Google Scholar
Velho, L., Lucio, D., Carvalho, L.: Situated participatory virtual reality. In: Proceedings of XVI Simposio Brasileiro de Jogos e Entretenimento Digital (2017)
Google Scholar
Velho, L., et al.: Making the tempest. Technical report TR-02-2018, VISGRAF Lab - IMPA (2018). https://www.visgraf.impa.br/tempest/

Download references

Author information

Authors and Affiliations

IMPA, Rio de Janeiro, Brazil
Luiz Velho, Leonardo Carvalho & Djama Lucio

Authors

Luiz Velho
View author publications
You can also search for this author in PubMed Google Scholar
Leonardo Carvalho
View author publications
You can also search for this author in PubMed Google Scholar
Djama Lucio
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luiz Velho .

Editor information

Editors and Affiliations

Fluminense Federal University, Niteroi, Rio de Janeiro, Brazil
Esteban Clua
University of Coimbra, Coimbra, Portugal
Licinio Roque
Curtin University, Perth, Western Autralia, Australia
Artur Lugmayr
Tampere University of Technology, Pori, Finland
Pauliina Tuomi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Velho, L., Carvalho, L., Lucio, D. (2018). Live Probabilistic Editing for Virtual Cinematography. In: Clua, E., Roque, L., Lugmayr, A., Tuomi, P. (eds) Entertainment Computing – ICEC 2018. ICEC 2018. Lecture Notes in Computer Science(), vol 11112. Springer, Cham. https://doi.org/10.1007/978-3-319-99426-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-99426-0_4
Published: 25 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99425-3
Online ISBN: 978-3-319-99426-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)