Suggesting sounds for images from video collections, Disney Research, Zurich, Switzerland

Administrator · Nov 22, 2016

Administrator · Nov 22, 2016

Suggesting sounds for images from video collections

Published on Nov 15, 2016

Given a still image, humans can easily think of a sound associated with this image. For instance, people might associate the picture of a car with the sound of a car engine. In this paper we aim to retrieve sounds corresponding to a query image. To solve this challenging task, our approach exploits the correlation between the audio and visual modalities in video collections. A major difficulty is the high amount of uncorrelated audio in the videos, i.e., audio that does not correspond to the main image content, such as voice-over, background music, added sound effects, or sounds originating on screen. We present an unsupervised, clustering-based solution that is able to automatically separate correlated sounds from uncorrelated ones. The core algorithm is based on a joint audio-visual feature space, in which we perform iterated mutual kNN clustering in order to effectively filter out uncorrelated sounds. To this end we also introduce a new dataset of correlated audio-visual data, on which we evaluate our approach and compare it to alternative solutions. Experiments show that our approach can successfully deal with a high amount of uncorrelated audio.

Administrator · Nov 22, 2016

Article "Disney Research's AI system knows what a car sounds like"
Soon, image recognition software may be able to tell you what sound an object makes.

by Sean Buckley
November 16, 2016

Suggesting sounds for images from video collections, Disney Research, Zurich, Switzerland

Administrator

Administrator

Administrator

Administrator

Administrator

Administrator

Similar threads