
Task Agnostic and Task Specific Self-Supervised Learning from Speech with LeBenchmark
Evain, Solène; Nguyen, Manh Ha; Le, Hang; Zanon Boito, Marcely; Mdhaffar, Salima; Alisamir, Sina; Tong, Ziyi; Tomashenko, Natalia; Dinarelli, Marco; Parcollet, Titouan; Allauzen, Alexandre (2021), Task Agnostic and Task Specific Self-Supervised Learning from Speech with LeBenchmark, Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021), 2021-12
Type
Communication / ConférenceDate
2021Conference title
Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021)Conference date
2021-12Metadata
Show full item recordAuthor(s)
Evain, SolèneNguyen, Manh Ha
Le, Hang
Zanon Boito, Marcely
Mdhaffar, Salima
Alisamir, Sina
Tong, Ziyi
Tomashenko, Natalia
Dinarelli, Marco
Parcollet, Titouan
Allauzen, Alexandre
Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision [LAMSADE]
Abstract (EN)
Self-Supervised Learning (SSL) has yielded remarkable improvements in many different domains including computer vision, natural language processing and speech processing by leveraging large amounts of unlabeled data. In the specific context of speech, however, and despite promising results, there exists a clear lack of standardization in the evaluation process for comprehensive comparisons of these models. This issue gets even worse with the investigation of SSL approaches for other languages than English. We present LeBenchmark, an open-source and reproducible framework for assessing SSL from French speech data. It includes a documented, large-scale and heterogeneous corpora, seven pre-trained SSL wav2vec 2.0 models shared with the community, and a clear evaluation protocol made of four downstream tasks along with their scoring scripts: automatic speech recognition, spoken language understanding, automatic speech translation and automatic emotion recognition. For the first time, SSL models are analyzed and compared on the latter domains both from a task-agnostic (i.e. frozen) and task-specific (i.e. fine-tuned w.r.t the downstream task) perspectives. We report state-of-the-art performance on most considered French tasks and provide a readable evaluation set-up for the development of future SSL models for speech processing.Subjects / Keywords
Spoken language understanding; Automatic speech recognition; Speech translation; Automatic emotion recognition; Self-supervised LearningRelated items
Showing items related by title and author.
-
Evain, Solène; Nguyen, Ha; Le, Hang; Zanon Boito, Marcely; Mdhaffar, Salima; Alisamir, Sina; Tong, Ziyi; Tomashenko, Natalia; Dinarelli, Marco; Parcollet, Titouan; Allauzen, Alexandre (2021) Communication / Conférence
-
Pirogov, Aleksandr; Gurevsky, Evgeny; Rossi, André; Dolgui, Alexandre (2019) Article accepté pour publication ou publié
-
Saltiel, David; Benhamou, Eric (2018) Document de travail / Working paper
-
Saltiel, David; Benhamou, Eric; Laraki, Rida; Atif, Jamal (2021) Communication / Conférence
-
Cvetkov-Iliev, Alexis; Allauzen, Alexandre; Varoquaux, Gaël (2023) Article accepté pour publication ou publié