Google now has the solution to listen to a particular person talking in a crowd and the implications of the same might be immense, from enhancing chat audio quality to mapping voices in a video.
Computers, notwithstanding the recent advances in the field of AI, never had the ability to pick out a particular voice in a crowd though Google is claiming to have devised a solution to this. In a recent blog post, the company claimed to have developed a deep learning system that is smart enough to pick out individual voices just by looking at their faces.
Google also released a video to show just how effective the new technology is. As can be seen, shifting the slider on either side will let the user to listen to a particular voice while dimming out all surrounding noises.
Google said this has been achieved by letting the computer first recognize the individual as well as their mode of talking. Lip movements to everything else have been mapped which then were fed to a deep learning neural network to let the computer isolate the person talking and focus on his or her voice. In fact, the company created virtual parties where several individual voices were mixed to let the AI separate the voices into separate tracks.
While the same behavior comes naturally to us humans, the same for machines can be considered a revolutionary development in designing advanced AI-based systems. It has only been years that computer vision has become smart enough to recognize people or objects in photos. The latest advancements in computer vision tech is also evident in almost all major Photos app where the system is able to recognize friends and others.
Something on the same lines can be expected with the latest Google research where maybe the same can be employed for better chat audio quality. Google already has the Hangout and Duo chat apps, and both look ideal candidate for the above tech.
Video recording might also benefit from the new tech where it might help in speech enhancements even in noisy environments. It could be like someone would like to listen to what everyone in a video had to say and the user can listen to only that person while muting out the rest and so on.
Google also cited the instance of the technology benefiting hearing aids where it can let the user listen to the one he or she wishes to in say a crowd. However, as some sources pointed out, there might be privacy issues as well as the same can let anyone to ‘listen’ to others talking in a public space. It seems intimate conversations might be open to eavesdropping of the AI kind now.