SceneNet: video crowdsourcing for immersive event reconstructions

Smartphone crowd recordings at public events can be combined into 3D videos


Recording events with smartphones to share on social media is one of modern life’s common practices. However, how many times do people actually watch the footage they have taken? Limited perspective and poor image quality can make for unsatisfying viewing, failing to accurately convey the feeling and atmosphere of the occasion. SceneNet aims to revolutionise this by making your own recorded videos into a virtual collaboration amongst all fellow event attendees, creating a 3D immersive experience.


Aggregating pictures taken at concerts from different angles could lead to a 3D immersive experience. Photo by Elliot Teo on Unsplash.

The project, funded under the EU’s Future and Emerging Technologies (FET) programme, came about after project leader Dr Chen Sagiv, who holds a PhD in applied mathematics from Tel Aviv University, Israel,  and her husband attended a Depeche Mode concert. They had noticed the hundreds of smartphone screen lights held by fans all attempting to capture the moment for posterity, hence the idea to crowdsource everyone’s videos from the event was born.


The first obvious challenge involved in SceneNet’s initial development stage was how to sift through and put together footage from thousands of different sources, recorded in countless formats and quality. To solve this challenge, the project developed a way to identify matches between images and videos and integrate them in the right place, just as in a puzzle.


According to Sagiv, footage can be enhanced to reduce background noise, motion blur and bad lighting, for example. Sounds simple, right? Sagiv explained in the Times of Israel how the system “stitches together the videos at their edges, matching the scenes uploaded by the crowdsourced devices.” It’s a very complex process because “you have to match the colours and compensate for the different lighting, the capabilities of devices and factors that cause one video of even the same scene to look very different.”


Computer science is rapidly developing and may provide the computational resources for crowdsourced video making


Today, the computing power required to support this technology is enormous and may currently limit real-time execution of 3D reconstruction, but in this era of constant innovation, processor performance will become less of an issue overtime. “SceneNet needs to leverage these technologies to parse through thousands of videos that will be uploaded to the cloud, searching each one for its common denominators and determining what must be done to a clip in order to make it look like a natural part of the final presentation,” said Sagiv.


SceneNet could potentially redefine virtual reality and media, which are both likely to take on a whole new dimension through these types of immersive and interactive features, such as individuals being able to edit or view 3D videos and environments from any vantage point.


The implications of this technology extend far beyond your favourite singer’s current world tour or sporting event. Real-life applications of 3D virtual environments based on mobile crowdsourced videos could be used in law enforcement, allowing witnesses to aggregate their smartphone images to reconstruct a crime scene. In defence strategies for comprehensive terrain mapping of remote areas. Surgeons could view complex medical operations from multiple angles in high definition 3D. In real-estate and construction, architects and builders could remotely monitor progress on the ground in 3D and display it to prospective property buyers. And in education and just for fun, students could collaborate in making 3D videos of their school projects.


If it realises its full potential, the numerous applications of SceneNet are really limited only by imagination.


Cover image: Joseph Pearson on Unsplash