From Local Algorithms to Global Results: Human-Machine Collaboration for Robust Analysis of Geographically Diverse Imagery

Modern deep learning-based semantic segmentation models and traditional pattern matching segmentation methods demonstrate similar failure modes in mapping land cover from diverse satellite/aerial imagery. The key problem is that these models mostly respond to textures and colors which, locally, do tend to have consistent land cover labels, but may resemble very different labels in imagery acquired farther away, with a different sensor, or under new imaging conditions. One way to resolve this issue is to endow the algorithms with higher-level, human-like reasoning abilities – e.g., an awareness that houses are connected to roads with driveways, that roads connect towns, and that bridges cast shadows – and the mechanism for tracking such objects across larger areas in order to resolve ambiguity. We propose an alternative approach of human-machine collaboration for creating land cover maps, motivated by the fact that a large amount of human labor is necessary even in seemingly automated land cover mapping solutions. We build spatial ensembles of land cover models with weak supervision and treat them as a hypothesis space. Then, we build an interface where humans, aware of the higher-level structure in imagery, can act to select, apply, and refine these models locally, with the goal of minimizing the labor required to create a land cover map.