This course provides practical training in the use of modern regression
techniques for understanding linguistic and psycholinguistic data.  In the
first part of the course, the standard linear model is introduced, with special
attention to model diagnostics, methods for dealing with collinearity, the
dummy coding of factors, and the use of link functions.  The second part of the
course introduces the linear mixed-effects model, which is essential for
modeling data sets with repeated observations for predictors such as
participants in experiments, and linguistic units such as words, sentences, or
texts.  The focus in this part of the course will be on the interpretation of
the parameters for these so-called random-effect factors.  The third part of
the course moves on to generalized additive models, a relatively recent
development in regression modeling that makes it possible to capture nonlinear
relations between predictors and the response variable, including wiggly curves
and wiggly (hyper)surfaces.

Each class consist of a lecture introducing basic concepts and methods,
followed by a hands-on lab session in which  participants receive training in
using the R statistical programming environment.  Data sets discussed in the
lab sessions range from dialectometry to eye-movements and from reaction time
data to evoked response potentials.   By the end of this course, participants
will be able to apply state-of-the-art methods in regression to their own data
sets, as well as critically evaluate analyses reported in the literature.

Mathematical methods are essential for understanding and working in theoretical and computational linguistics. This course introduces the key concepts from the areas of set theory, algebra and logic, which belong to the basic repertoire of linguistic methods. The main goal of the course is to provide the students with sufficient competence in basic notations, terminology and concepts of discrete mathematics for their studies in theoretical and computational linguistics. Familiarity with concepts such as sets, functions and propositions, and the ability to work with simple proof techniques are a crucial prerequisite for subsequent courses.

Students should acquire sufficient competence in basic notations, terminology and concepts of mathematics for their studies in linguistics. The topics of the course comprise the most essential mathematical notions needed in general linguistics, computational linguistics, document processing and information management. Familiarity with concepts such as sets, functions and propositions, and the ability to work with simple proof techniques will be expected in subsequent courses. The main purpose of the course is to equip the participants with the most basic mathematical tools which they will need in their linguistics courses.

Given that natural languages cannot be characterized by simply listing all possible sentences and their meaning, a range of grammar formalisms have been developed to characterize form and meaning in a general and compact way. The approaches differ in terms of their focus, empirical coverage, formal foundations, expressive power, conceptualization of generalizations, and the processing regimes that have been developed for those formalisms.  

After a general overview of grammar types in the Chomsky Hierarchy, we will discuss plain context-free grammars as a baseline on which we will introduce and compare several current grammar formalisms. The plan is to include a discussion of unification-based phrase structure grammars and dependency grammars like Head-Driven Phrase Structure Grammar (HPSG), Lexical Functional Grammar (LFG), Slot Grammar, but, if time allows, also others like Categorial Grammar. The focus will be on obtaining a sound working knowledge of how different formalisms capture some of the fundamental phenomena of natural language syntax: argument and adjunct realization, agreement and government, middle-distance phenomena (e.g.,equi, raising), long-distance phenomena (e.g., fronting).


Data structures and algorithms are core topics in linguistic programming. Data structures are used to store and retrieve data and algorithms are the recipes used to process data. This course emphasizes the understanding and Java implementation of basic data structures such as linked lists and trees, and the algorithms used to store and retrieve the information stored in them. We will see how these data structures are used in natural language processing programs.

Texts in digital form are an essential preliminary for any subsequent analyses. The course offers a multi-faceted perspective how texts are represented in computers, with topics including (among other) character encodings (e.g. UTF-8), text structuring and data modeling (e.g. XML, HTML format), text licensing (e.g. creative commons licenses), text visualization (e.g. CSS), and text querying tools (e.g. XQUERY). the course combines a theoretical discussion with a practical approach as an illustration of of the concepts.

The word frequency effect is one of the hallmark effects in experimental linguistics. Common words are processed faster than rare words. Recently, a number of studies has documented frequency effects of multiword sequences as well. In the first half of this course, we will read and discuss the findings of these studies as well as the implications for our understanding of language processing. Until now, experimental work on frequency effects of multiword sequences has focused on alphabetical languages. In the second half of this course, we will carry out a psycholinguistic experiment that looks at the effects of the frequency of multiword sequences in a non-alphabetic language: Mandarin Chinese.
There is no midterm exam. Attendance and homework will be required but not graded for this course. The final grade will be determined on the basis of a final report.

When people are speaking, not all words are fully pronounced. Many acoustic forms are subject to reduction. The sentence "I don't know", for instance, is often reduced to "I dunno", or even "I ono". Recently, the phenomenon of acoustic reduction has enjoyed increased popularity in phonetic research in different languages. In this course, we will review this research to get an idea about the circumstances in which acoustic reduction occurs. Furthermore, you will gain hands-on experience by looking at acoustic reduction in actual speech data in Mandarin Chinese.

There is no midterm exam. Attendance and homework will be required but not graded for this course. The final grade will be determined on the basis of a final report.

This course introduces a number of core methods and applications in natural language processing (NLP). On the one hand, the course will
focus on the core tasks in computational linguistic (e.g., part of speech tagging, and statistical parsing) and major NLP applications
(e.g., named entity recognition or machine translation), on the other hand, a selection of relevant concepts and methods from probability theory,
statistics and machine learning will be introduced.

The course is compulsory for the BA degree International Studies in Computational Linguistics. For other degree programs, please contact the instructor before signing up.