Spectral manipulation improves elevation perception with non-individualized head-related transfer functions

Journal of the Acoustical Society of America (JASA) | , Vol 145(3): pp. EL222-EL228

Publication

Spatially rendering sounds using head-related transfer functions
(HRTFs) is an important part of creating immersive audio experiences
for virtual reality applications. However, elevation perception
remains challenging when generic, non-personalized HRTFs are used.
This study investigated whether digital audio effects applied to a generic
set of HRTFs could improve sound localization in the vertical plane.
Several of the tested effects significantly improved elevation judgment,
and trial-by-trial variability in spectral energy between 2 and 10 kHz
correlated strongly with perceived elevation. Digital audio effects may
therefore be a promising strategy to improve elevation perception where
personalized HRTFs are not available.


Figure: (A) Directional bias across subjects observed for each audio effect, pooled across sound type and elevation. UP, MD, and DN responses were assigned a value of +1, 0, and -1, respectively, and directional bias was calculated as the average across all responses in a condition. A positive directional bias would indicate that most stimuli were perceived from above the horizon. The 8 kHz peak, 7 kHz notch, and horizontal blur effects show a significant directional bias relative to the control condition (p<10-3, N=15 subjects, Wilcoxon paired signed-rank test, Bonferroni corrected). (B) Accuracy [% correct] of responses across subjects for each audio effect, pooled across sound type and elevation. The rightmost boxplot shows that an “ideal” condition that combined the most accurate effects at each elevation would have achieved a median accuracy of 51%.


Figure: (A) Each panel shows the average frequency spectrum of individual trials for each sound type (columns) and each elevation (rows). The data are split by correct (dotted lines) and incorrect (solid lines) responses. Note the divergence in the average frequency spectra between correct and incorrect responses in the 2-10 kHz range. (B) The summed energy between 2 and 10 kHz plotted against mean directional bias. There is a strong positive correlation between spectral energy in this range and perceived elevation above the horizon for both sound types tested (white noise: R=0.84, p<0.01; pink noise: R=0.91, p<0.01, Pearson correlation).