This startup thesis is dedicated to developing and comparing Speech Recog-
nition Machine Learning models for Swiss German dialects. It is divided into two
parts: business plan for the startup and technical development of Automatic Speech
Recognition (ASR) system. Business plan was mainly created by my university
colleague, Viktoryia Kananchuk. This part focuses on technical implementation.
The research also includes a part with Neural Machine Translation (NMT) from
Swiss German into Standard German. During the study, different Deep Learning
architectures and text preprocessing methodologies were explored. In addition,
great efforts have been spent into audio dataset collection. The main issue in
Speech Recognition for dialects is that there is no official grammar or pre-trained
models. This research compares the performance of fine-tuned ASR models with
the own developed Speech Recognition architecture on the collected data and
discovers, that novel ways of audio data preprocessing and probabilistic language
models significantly improve the quality of text prediction. Our final ASR model
demonstrates the 14 % Character Error Rate (CER) and 35 % Word Error Rate
(WER). The Sequence to Sequence model with Attention mechanism for NMT task
shows the 58.9 BLEU score, trained on the textual dataset of 38 000 sentence pairs.
Advisor
Abstract
Publication Type
Publication Year
Department
Computer Science and Software Engineering
Subject
Computer Science