Synthesis Examples

Examples of VTL 2.3 (September 2020)

The following examples were re-synthesized from natural utterances using VTL 2.3 (copy synthesis). First, the original (human) utterance was segmented and annotated using the annotation tier in VTL. From this segment sequence, the gestural score was then automatically created. Finally, the f0 contour of the natural utterance was manually re-recreated with the f0 tier of the gestural score with one pitch target per syllable. Due to the fully automatic translation of the segment sequences into gestural scores, the total re-synthesis of one utterance took typically less than one hour.
"Es kann hilfreich sein, wenn man weiß, wie ein Unterstand gebaut wird." (It can be helpful if you know how to build a shelter.)
Original (human) audio:

Re-synthesized audio:

"Er schützt vor Kälte, Wind und Niederschlägen." (It protects against cold, wind and precipitation.)
Original (human) audio:

Re-synthesized audio:

"Conny glaubt eigentlich nicht mehr an den Osterhasen." (Conny actually no longer believes in the Easter Bunny.)
Original (human) audio:

Re-synthesized audio:

"Aber sehen will sie ihn doch." (But she does want to see him.)
Original (human) audio:

Re-synthesized audio:

"Sie läuft schnell hin." (She's running there fast.)
Original (human) audio:

Re-synthesized audio:

"I think I have a German accent."
Original (human) audio:

Re-synthesized audio:

The following examples demonstrate the possibilies to change the speaking style with VTL. For each sentence, there is one re-synthesized variant of a neutrally spoken sentence ("Neutral style"). The gestural scores for these neutral sentences have been globally manipulated with regard to speaking rate, voice quality, f0, and subglottal pressure to obtain three variants of each sentence: one with a fast speaking rate, one with a more excited/lively speaking style, and one with a rough voice and congested nasal cavity (as if the speaker had a cold).

Variant "Die Straßenbahn fuhr weiter geradeaus." (The tram continued straight ahead.) "Diese Zeitung ist bereits veraltet." (This newspaper is already out of date.) "Sie fährt keinen Ferrari, sondern einen Maserati." (She doesn't drive a Ferrari, she drives a Maserati.)
Neutral style
Excited style
Having a cold
Fast speaking

Examples of *older* VTL versions


Some utterances, synthesized with older versions of the software, are the following videos:

"Guten Tag liebe Zuhörer." ("Good afternoon, dear listeners.")
"Nächster Halt: Hamburg." ("Next stop: Hamburg.")
"Der Zug hat eine Stunde Verspätung." ("The train has a delay of one hour.")

The following examples demonstrate the synthesis of fricatives using VocalTractLab. We developed a fricative noise model that predicts the postion, strength and spectral shape of monopole and dipole noise sources in the vocal tract from the properties of the area functions and flow conditions. In each of the following wav-files you will first hear a natural fricative in isolation, then the corresponding synthetic fricative in isolation and finally the synthetic fricative in a symmetric /a:/-context.
In mid 2007, VocalTractLab has been extended for the synthesis of singing. It participated in the Synthesis of Singing Challenge during the Interspeech 2007 in Antwerp, where it achived the second best place of the six participating synthesis systems. Two songs had to be prepared for this event, one song of our own choice ("Dona Nobis Pacem"), and one compulsory song ("The Synthesizer Song").
A brief description of the extensions made to the speech synthesizer for the Synthesis of Singing Challenge is here.

For the opening ceremony of the annual meeting of the DGPP 2010, the following music video was created by Peter Birkholz using VocalTractLab. The song is based on the well-known "Canon in D" by Pachelbel (1653-1706) with customized Latin lyrics. The instrumental accompaniment was created using the software FRINIKA. Please watch the video here!.

Valid HTML 4.01 Strict CSS ist valide!