Header

Search

Language Data Processing

Language Data Processing (06SM271-518)

General description

This 2-course-module introduces automatic corpus annotation ("Introduction to Language Data Processing") and programming ("Programming for Linguists") and consists of the following parts: Annotation, manipulation and extraction of linguistic data, basic skills in Natural Language Processing on various linguistic levels (morphology, syntax, semantics), basic Unix commands for text handling, regular expressions for pattern matching, file formats and markup languages, encoding and compression, programming in a modern scripting language (e.g. R or Python), introduction to linguistic databases.

ECTS 15
Learning outcome Students get to know the core methods and tools for automatic corpus analysis, annotation and evaluation. They learn about cross-language alignment and gain insights into the advantages of parallel corpora. Students learn how to use Unix language processing tools and obtain programming knowledge in a modern scripting language (e.g. R or Python) with a focus on the processing of linguistic data.
Language of instruction English
Prerequisites

None

Assessment

Portfolio (40% written exam for Introduction to Language Data Processing, 40% written exam for Programming for Linguists, 20% proof of self-study achievements in both courses). All elements of this portfolio must be completed. If an element is not completed, the module is considered as «failed».

Repeatability Repeatable once, book again
Duration / Offered in 1 semester / every fall semester
Courses within this module
  • Introduction to Language Data Processing - Lecture (06VU271-518a)
  • Introduction to Language Data Processing - Tutorial (06TT271-518a)
  • Programming for Linguists - Lecture  (06VU271-518b)
  • Programming for Linguists - Tutorial  (06TT271-518b)