Study in IRLAB

Learning Word-like Units from Joint Audio-Visual Analysis

Learning Word-like Units from Joint Audio-Visual Analysis

created by David Harwath and James R. Glass in Massachusetts Institute of Technology

  • Goal: A method for discorvering word-like acoustic units in the continuous speech signal and grounding them to semantically relevant image regions from given a collection of images and spoken audio captions.