This startup thesis is dedicated to developing and comparing Speech Recog- nition Machine Learning models for Swiss German dialects. It is divided into two parts: business plan for the startup and technical development of Automatic Speech Recognition (ASR) system. Business plan was mainly created by my university colleague, Viktoryia Kananchuk. This part focuses on technical implementation. The research also includes a part with Neural Machine Translation (NMT) from Swiss German into Standard German. During the study, different Deep Learning architectures and text preprocessing methodologies were explored. In addition, great efforts have been spent into audio dataset collection. The main issue in Speech Recognition for dialects is that there is no official grammar or pre-trained models. This research compares the performance of fine-tuned ASR models with the own developed Speech Recognition architecture on the collected data and discovers, that novel ways of audio data preprocessing and probabilistic language models significantly improve the quality of text prediction. Our final ASR model demonstrates the 14 % Character Error Rate (CER) and 35 % Word Error Rate (WER). The Sequence to Sequence model with Attention mechanism for NMT task shows the 58.8 BLEU score, trained on the textual dataset of 38 000 sentence pairs.
Advisor
Abstract
Publication Type
Publication Year
Subject
Computer Science