PDA

View Full Version : Suggesting sounds for images from video collections, Disney Research, Zurich, Switzerland



Airicist
22nd November 2016, 04:37
Developer - Disney Research (https://pr.ai/showthread.php?5983)

Airicist
22nd November 2016, 04:38
https://youtu.be/NSvhnoddGU0

Suggesting sounds for images from video collections

Published on Nov 15, 2016


Given a still image, humans can easily think of a sound associated with this image. For instance, people might associate the picture of a car with the sound of a car engine. In this paper we aim to retrieve sounds corresponding to a query image. To solve this challenging task, our approach exploits the correlation between the audio and visual modalities in video collections. A major difficulty is the high amount of uncorrelated audio in the videos, i.e., audio that does not correspond to the main image content, such as voice-over, background music, added sound effects, or sounds originating on screen. We present an unsupervised, clustering-based solution that is able to automatically separate correlated sounds from uncorrelated ones. The core algorithm is based on a joint audio-visual feature space, in which we perform iterated mutual kNN clustering in order to effectively filter out uncorrelated sounds. To this end we also introduce a new dataset of correlated audio-visual data, on which we evaluate our approach and compare it to alternative solutions. Experiments show that our approach can successfully deal with a high amount of uncorrelated audio.

Airicist
22nd November 2016, 04:39
Article "Disney Research's AI system knows what a car sounds like (https://www.engadget.com/2016/11/16/disney-researchs-ai-system-knows-what-a-car-sounds-like)"
Soon, image recognition software may be able to tell you what sound an object makes.

by Sean Buckley
November 16, 2016