Software Engineers Valentin Bazarevsky and Andrei Tkachenka from the Google Research team have just posted on the blog that they launched a feature to the YouTube app that allows for real-time, on-device mobile video segmentation into YouTube stories (which YouTube calls “reels”).
Commonly known as the “green screen” effect, video segmentation allows you to separate the foreground of a scene from the background and treat them as two different layers. Essentially, you can change the background and place yourself in a completely different location. This normally requires the footage to have been shot in front of a monochrome screen (commonly green), but Google has been able to achieve the same effect by using convolutional neural networks and machine learning.
Apparently, the software engineers annotated tens of thousands of portraits that with a wide spectrum of people doing different poses in the foreground with varying backgrounds. These were then labelled with pixel-accurate locations of various elements such as hair, neck, glasses, lips and so forth. Based on this database, the Valentin and Andrei say they achieved a cross-validation result of 98% Intersection-Over-Union of human annotator quality.
To apply this to video, what the engineers did was to separate each incoming frame of video into its Red, Green and Blue channels, and adding that to the mask from the previous frame for frame-to-frame consistency. They also had to train the machine learning algorithm to properly handle cases where people suddenly appeared in the field of view of the camera.
As you can see from the images above, the results are quite impressive. Unfortunately though, the feature is only slated for a limited rollout in YouTube stories and the first set of effects. Hopefully it becomes part of the standard set of tools in YouTube for all users soon!