Skip navigation

Language Data Processing

Language Data Processing (06SM271-518)

General description	This 2-course-module introduces automatic corpus annotation ("Introduction to Language Data Processing") and programming ("Programming for Linguists") and consists of the following parts: Annotation, manipulation and extraction of linguistic data, basic skills in Natural Language Processing on various linguistic levels (morphology, syntax, semantics), basic Unix commands for text handling, regular expressions for pattern matching, file formats and markup languages, encoding and compression, programming in a modern scripting language (e.g. R or Python), introduction to linguistic databases.
ECTS	15
Learning outcome	Students get to know the core methods and tools for automatic corpus analysis, annotation and evaluation. They learn about cross-language alignment and gain insights into the advantages of parallel corpora. Students learn how to use Unix language processing tools and obtain programming knowledge in a modern scripting language (e.g. R or Python) with a focus on the processing of linguistic data.
Language of instruction	English
Prerequisites	None
Assessment	Portfolio (40% written exam for Introduction to Language Data Processing, 40% written exam for Programming for Linguists, 20% proof of self-study achievements in both courses). All elements of this portfolio must be completed. If an element is not completed, the module is considered as «failed».
Repeatability	Repeatable once, book again
Duration / Offered in	1 semester / every fall semester
Courses within this module	Introduction to Language Data Processing - Lecture (06VU271-518a) Introduction to Language Data Processing - Tutorial (06TT271-518a) Programming for Linguists - Lecture (06VU271-518b) Programming for Linguists - Tutorial (06TT271-518b)