This study presents a multi-task prosody prediction model for German articulatory speech synthesis, combining duration, f0 and voicing prediction within an LSTM-based framework. The goal was to investigate how different self-supervised pre-training strategies affect downstream prosody modelling. 

The supplemental material contains the 20 sentences from the Kiel Corpus synthesized with each pre-training strategy, which were used in the perceptual evaluation experiment reported in the paper.