Acoustic Echo Cancellation Challenge – ICASSP 2023

Region: Global

Program dates: December 2022-February 2023

The ICASSP 2023 (opens in new tab) Acoustic Echo Cancellation Challenge is intended to stimulate research in the area of acoustic echo cancellation (AEC), which is an important part of speech enhancement and is still a top issue in audio communication and conferencing systems. This is the fourth AEC challenge. The winners of this challenge are selected based on the average Mean Opinion Score achieved across all different single talk and double talk scenarios, and the speech recognition rate.

Registration procedure

To register for the challenge, participants are required to email Acoustic Echo Cancellation Challenge aec_challenge@microsoft.com (opens in new tab) with the name of their team members, emails, affiliations, team name, and tentative paper title. Participants also need to register on the Challenge CMT (opens in new tab) site where they can submit the enhanced clips.

Challenge tracks

There are two tracks for this challenge:

Non-personalized AEC. This is similar to the ICASSP 2022 AEC Challenge.
Personalized AEC. This adds speaker enrollment for the near end speaker. A speaker enrollment is a 15-25 second recording of the near end speaker that can be used for adopting the AEC for personalized echo cancellation. For training and model evaluation, the datasets in AEC-Challenge Github page (opens in new tab)can be used, which include both echo and near-end only clips from users. For the blind test set, the enrollment clips will be provided.

Latency and runtime requirements

Algorithmic latency: The offset introduced by the whole processing chain including STFT, iSTFT, overlap-add, additional lookahead frames, etc., compared to just passing the signal through without modification. But this doesn’t include buffering latency.

Ex.1: A STFT-based processing with window length = 20 ms and hop length = 10 ms introduces an algorithmic delay of window length – hop length = 10 ms.
Ex.2: A STFT-based processing with window length = 32 ms and hop length = 8 ms introduces an algorithmic delay of window length – hop length = 24 ms.
Ex.3: An overlap-save-based processing algorithm introduces no additional algorithmic latency.
Ex.4: A time-domain convolution with a filter kernel size = 16 samples introduces an algorithmic latency of kernel size – 1 = 15 samples. Using one-sided padding, the operation can be made fully “causal”, i.e. a left-sided padding with kernel size-1 samples would result in no algorithmic latency.
Ex.5: A STFT-based processing with window_length = 20 ms and hop_length = 10 ms using 2 future frames information introduce an algorithmic latency of (window_length – hop_length) + 2*hop_length = 30 ms.

Buffering latency: It is defined as the latency introduced by block-wise processing, often referred to as hop-size, frame-shift, or temporal stride.

• Ex.1: A STFT-based processing has a buffering latency corresponding to the hop size
• Ex.2: A overlap-save processing has a buffering latency corresponding to the frame size.
• Ex.3: A time-domain convolution with stride 1 introduces a buffering latency of 1 sample.

Real-time factor (RTF): RTF is defined as the fraction of time it takes to execute one processing step. For a STFT-based algorithm, one processing step is the hop-size. For a time-domain convolution, one processing step is 1 sample. RTF = compute time/time step.

All models submitted to this challenge must meet all of the below requirements.

To be able to execute an algorithm in real-time, and to accommodate for variance in compute time which occurs in practice, we require RTF <= 0.5 in the challenge on an Intel Core i5 Quadcore clocked at 2.4 GHz using a single thread.
Algorithmic latency + buffering latency <= 20ms.
No future information can be used during model inference.

Submission instructions

Please use Microsoft Conference Management Toolkit (opens in new tab) for submitting the results. After logging in, complete the following steps to submit the results:

Choose “Create new submission” in the Author Console.
Enter title, abstract and co-authors, and upload a lastname.txt file (can be empty or contain additional information regarding the submission).
Compress the enhanced results files to a single lastname.zip file, retaining the same folder and file names as the blind test set (max file size: 1.8 GB).
After creating the submission, return to the “Author Console” (by clicking on “Submissions” at the top of the page) and upload the lastname.zip file via “Upload Supplementary Material”.

Contact us: For questions, please contact aec_challenge@microsoft.com