Beyond Mindless Labeling: Really Leveraging Humans to Build Intelligent Machines

October 2, 2014
Devi Parikh | Virginia Tech

Human ability to understand natural images far exceeds machines today. One reason for this gap is an artificially restrictive learning set up – humans today teach machines via Morse code (e.g. providing binary labels on images, such as “this is a horse” or “this is not”), and machines are typically silent. These systems have the potential to be significantly more accurate if they tap into the vast common-sense knowledge humans have about the visual world. I will talk about our work on enriching the communication between humans and machines by exploiting mid-level visual properties or attributes. I will also talk about the more difficult problem of directly learning common-sense knowledge simply by observing the structure of the visual world around us. Unfortunately, this requires automatic and accurate detection of objects, their attributes, poses, etc. in images, leading to a chicken-and-egg problem. I will argue that the solution is to give up on photorealism. Specifically, I will talk about our work on exploiting human-generated abstract visual scenes to learn common-sense knowledge and study high-level vision problems.

No results