## AMETHYST: Image Registration Engine for Multiframe Processing

Mohammed Shoaib, Rich Stoakley, Matt Uyttendayle, and Jie Liu

> Sensing and Energy Research Group Microsoft Research

## MFP: Why should you care?



*e.g.*, high-dynamic range (HDR) imaging, de-noising, image stabilization, de-blurring, super-resolution imaging, de-hazing, panoramic stitching, *etc.*

# Multiframe processing (MFP) enables advanced algorithms for image analysis

## Why is it hard?

### E.g., HDR Photography

Typically, serial processing → f<u>rame delays cause issues:</u>

1. Moving objects create artifacts



~ 2 seconds/frame



2. Moving camera also creates artifacts





Frame misalignments lead to artifacts in fused image

## What are some existing solutions?

### Solution 1: HDR Capture, e.g., Toshiba T4K05



### **Solution 2: Algorithmic**



Algorithmic solution is more interesting → needs <u>no hardware</u> <u>change</u> and <u>scales to other applications</u>

## What are others doing about it?

#### E.g., NVIDIA: Tegra 4 (2014)



Fig: Camera architecture in current high-end mobile devices



Fig: Chimera: The NVIDIA computational photography arch.





1<sup>st</sup> real-time HDR, 1<sup>st</sup> HDR panorama, 1<sup>st</sup> object tracking

## Proprietary ISP-embedded algorithms use GPU for acceleration $\rightarrow \sim 10x$ speedup and cost power

## What are their limitations?

- **1. Current solutions:** Modest speedups. Not generally applicable.
- 2. Current algorithmic solutions: slow on CPUs



Our target: ~100x speedup compared to software

Image registration is a computational bottleneck  $\rightarrow$  <u>needs acceleration</u>

## What have we done about it?



Fig: Proposed architecture for multi-frame image processing (MFP)

We propose an architecture for MFP that has a dedicated accelerator for image registration

## What are our findings?



| AWILITTIST PERIOTIALICE Summary |                                  |
|---------------------------------|----------------------------------|
| Technology                      | 45 nm SOI                        |
| Area                            | 0.15 mm <sup>2</sup> (30k gates) |
| Memory                          | < 2 MB                           |
| Frequency                       | 1.1 GHz                          |
| Power                           | 62.7 mW                          |
| Exec. Time                      | 30ms/frame                       |
| Speed-up <sup>\$</sup>          | 37x over CPU                     |

AMETHVST+ Dorformanco Summary



### AMETHYST shows speed-up of : <u>8x</u> over GPU and <u>5x</u> over FPGA at a power lower by : <u>14x</u> than GPU and <u>3x</u> than FPGA

7

<sup>\$</sup> assuming 60% cost due to (IPD + DFE)

\* synthesis results for (IPD+DFE) blocks only

## What are our findings? Contd...





### <u>Highlights</u>

- State-of-the-art algorithm<sup>\$</sup> (from Photosynth)
- 1<sup>st</sup> MFP engine for re-targetable applications
- Extensively configurable parallelism
- Multi-level data pipelining and interleaving
- Systolic ops w/ 2-level vector reduction

### Next

### **Technical steps:**

- Finish implementing RTL for HE and IWP modules
- Verify full-design on FPGA-based programmable SoCs (e.g., Zynq)
- Develop HW-SW co-design with ARM core towards custom SoC
- Perform physical design and post-layout validation of SoC
- Integrate silicon-proven design IP with ISP core

