Speech Super Resolution Generative Adversarial Network

Sefik Emre Eskimez; Kazuhito Koishida

Speech Super Resolution Generative Adversarial Network

Sefik Emre Eskimez ,
Kazuhito Koishida

ICASSP 2019 | May 2019

Download BibTex

The goal of speech super-resolution (SSR) or speech bandwidth expansion is to generate the missing high-frequency components for a given low-resolution speech signal. It has the potential to improve the quality of telecommunications. We propose a new method for SSR that leverages the generative adversarial networks (GANs) and a regularization method for stabilizing the GAN training. The generator network is a convolutional autoencoder with 1D convolution kernels, operating along time-axis and generating the high-frequency log-power spectra from the low-frequency log-power spectra input. We employ two recent deep neural network (DNN) based approaches to compare them with our proposed method, including both objective speech quality metrics and subjective perceptual tests. We show that our proposed method outperforms the baseline methods in terms of both objective and subjective evaluations.