This course provides practical training in the use of modern regression
techniques for understanding linguistic and psycholinguistic data. In the
first part of the course, the standard linear model is introduced, with special
attention to model diagnostics, methods for dealing with collinearity, the
dummy coding of factors, and the use of link functions. The second part of the
course introduces the linear mixed-effects model, which is essential for
modeling data sets with repeated observations for predictors such as
participants in experiments, and linguistic units such as words, sentences, or
texts. The focus in this part of the course will be on the interpretation of
the parameters for these so-called random-effect factors. The third part of
the course moves on to generalized additive models, a relatively recent
development in regression modeling that makes it possible to capture nonlinear
relations between predictors and the response variable, including wiggly curves
and wiggly (hyper)surfaces.
Each class consist of a lecture introducing basic concepts and methods,
followed by a hands-on lab session in which participants receive training in
using the R statistical programming environment. Data sets discussed in the
lab sessions range from dialectometry to eye-movements and from reaction time
data to evoked response potentials. By the end of this course, participants
will be able to apply state-of-the-art methods in regression to their own data
sets, as well as critically evaluate analyses reported in the literature.
- Dozierende: Rolf Harald Baayen
Mathematical methods are essential for understanding and working in theoretical and computational linguistics. This course introduces the key concepts from the areas of set theory, algebra and logic, which belong to the basic repertoire of linguistic methods. The main goal of the course is to provide the students with sufficient competence in basic notations, terminology and concepts of discrete mathematics for their studies in theoretical and computational linguistics. Familiarity with concepts such as sets, functions and propositions, and the ability to work with simple proof techniques are a crucial prerequisite for subsequent courses.
Students should acquire sufficient competence in basic notations,
terminology and concepts of mathematics for their studies in
linguistics. The topics of the course comprise the most essential
mathematical notions needed in general linguistics, computational
linguistics, document processing and information management. Familiarity
with concepts such as sets, functions and propositions, and the ability
to work with simple proof techniques will be expected in subsequent
courses. The main purpose of the course is to equip the participants
with the most basic mathematical tools which they will need in their
Given that natural languages cannot be characterized by simply listing all possible sentences and their meaning, a range of grammar formalisms have been developed to characterize form and meaning in a general and compact way. The approaches differ in terms of their focus, empirical coverage, formal foundations, expressive power, conceptualization of generalizations, and the processing regimes that have been developed for those formalisms.
After a general overview of grammar types in the Chomsky Hierarchy, we will discuss plain context-free grammars as a baseline on which we will introduce and compare several current grammar formalisms. The plan is to include a discussion of unification-based phrase structure grammars and dependency grammars like Head-Driven Phrase Structure Grammar (HPSG), Lexical Functional Grammar (LFG), Slot Grammar, but, if time allows, also others like Categorial Grammar. The focus will be on obtaining a sound working knowledge of how different formalisms capture some of the fundamental phenomena of natural language syntax: argument and adjunct realization, agreement and government, middle-distance phenomena (e.g.,equi, raising), long-distance phenomena (e.g., fronting).
Data structures and algorithms are core topics in linguistic programming. Data structures are used to store and retrieve data and algorithms are the recipes used to process data. This course emphasizes the understanding and Java implementation of basic data structures such as linked lists and trees, and the algorithms used to store and retrieve the information stored in them. We will see how these data structures are used in natural language processing programs.
Texts in digital form are an essential preliminary for any subsequent analyses. The course offers a multi-faceted perspective how texts are represented in computers, with topics including (among other) character encodings (e.g. UTF-8), text structuring and data modeling (e.g. XML, HTML format), text licensing (e.g. creative commons licenses), text visualization (e.g. CSS), and text querying tools (e.g. XQUERY). the course combines a theoretical discussion with a practical approach as an illustration of of the concepts.
- Dozierende: Ching-Chu Sun
- Dozierende: Ching-Chu Sun
This course introduces a number of core methods and applications in natural language processing (NLP). On the one hand, the course will
focus on the core tasks in computational linguistic (e.g., part of speech tagging, and statistical parsing) and major NLP applications
(e.g., named entity recognition or machine translation), on the other hand, a selection of relevant concepts and methods from probability theory,
statistics and machine learning will be introduced.
The course is compulsory for the BA degree International Studies in Computational Linguistics. For other degree programs, please contact the instructor before signing up.