Language Data Processing
Language Data Processing (06SM271-518)
General description |
This 2-course-module introduces automatic corpus annotation ("Introduction to Language Data Processing") and programming ("Programming for Linguists") and consists of the following parts: Annotation, manipulation and extraction of linguistic data, basic skills in Natural Language Processing on various linguistic levels (morphology, syntax, semantics), basic Unix commands for text handling, regular expressions for pattern matching, file formats and markup languages, encoding and compression, programming in a modern scripting language (e.g. R or Python), introduction to linguistic databases. |
ECTS | 15 |
Learning outcome | Students get to know the core methods and tools for automatic corpus analysis, annotation and evaluation. They learn about cross-language alignment and gain insights into the advantages of parallel corpora. Students learn how to use Unix language processing tools and obtain programming knowledge in a modern scripting language (e.g. R or Python) with a focus on the processing of linguistic data. |
Language of instruction | English |
Prerequisites |
None |
Assessment |
Portfolio (40% written exam for Introduction to Language Data Processing, 40% written exam for Programming for Linguists, 20% proof of self-study achievements in both courses). All elements of this portfolio must be completed. If an element is not completed, the module is considered as «failed». |
Repeatability | Repeatable once, book again |
Duration / Offered in | 1 semester / every fall semester |
Courses within this module |
|