ÚFAL Linguistic Mondays

Date(s) - 13/11/2023
2:00 pm - 3:30 pm

Zadal: Radim Hladík

UFAL MFF UK, místnost S1, 4. patro

The Role of Pseudo-Parallel Data in Unsupervised Machine Translation

Unsupervised machine translation (UMT) has gained considerable recognition for its capacity to produce translations without relying on parallel corpora. We investigate the role of training on pseudo-parallel data in advancing UMT. Pseudo-parallel data is a valuable resource that arises from two monolingual corpora by matching equivalent or similar sentences. However, the benefits of this technique vary across language pairs. We analyze the limitations of using pseudo-parallel data, including noise, domain mismatch and data scarcity. Addressing these challenges is vital for enhancing the robustness and real-world applicability of UMT systems.

The talk will be delivered in person (MFF UK, Malostranské nám. 25, 4th floor, room S1) and will be streamed via Zoom. For details how to join the Zoom meeting, please write to sevcikova at ufal.mff.cuni.cz