Thesis Proposals

Supervisor: Jelle Zuidema (zuidema@uva.nl)
Project title: Simulated Neuroimaging for Language Processing
Status: in negotiation
Project description: N/A
Maximum number of students: 1
Level: MSc
Required background: good programming skills, strong cognitive neuroscience background

Supervisor: Jelle Zuidema (zuidema@uva.nl)
Project title: Predicting MEG brain imaging data using NLP and Machine Learning
Status: filled
Project description: N/A
Maximum number of students: 1
Level: MSc
Required background: good programming skills, interest in cognitve (neuro)science

Supervisor: Jelle Zuidema (zuidema@uva.nl)
Project title: Symbolic Guidance -- Using Symbolic NLP to improve Neural NLP
Status: open
Project description: N/A
Maximum number of students: 1
Level: MSc
Required background: NLP-1, at least one advanced NLP course. Excellent programming skills.

Supervisor: Floris Roelofsen
Project title: Automatic translation into Dutch Sign Language

Project description: Developing a rule-based system that delivers high-quality translations of Dutch text into a representation language for Dutch Sign Language, focusing on specific grammatical aspects of choice. Intended for bachelor and master thesis projects in AI and in Logic. Details of the project can be tailored to the background and the interests of the student. Students need good programming skills and are preferably familiar with rule-based machine translation, in particular the Grammatical Framework.

Maximum number of students: 3

Supervisor: Floris Roelofsen (possibly co-supervised with other members of the inquisitive semantics group)
Project title: Linguistic applications of inquisitive semantics

Project description: Using inquisitive semantics to analyze a linguistic phenomenon of choice. Intended for MoL students. Details of the project can be tailored to the background and the interests of the student. Requires a solid background in logic and natural language semantics.

Maximum number of students: 3

Supervisor: Ulle Endriss

Project title: Topics in Computational Social Choice

Target group: MSc Logic / MSc AI

Project description: I'm happy to supervise a small number of tailor-made thesis projects in the area of Computational Social Choice (including voting theory, judgement aggregation, fair division, ...). I would usually expect interested students to have taken my course on the topic and to have done well in it. Most of these projects will be of a theoretical nature, but if you would like to combine this with an implementation and/or an empirical component, then this will usually be possible as well. Please have a look through some of these examples for past MSc theses before contacting me.

Maximum number of students: 3 [already fully booked for 2018/19]

Supervisor: Ulle Endriss

Will add a couple of concrete proposals soon (mostly for BSc).

Supervisor: Ekaterina Shutova and Wilker Aziz

Project title: Modelling semantic variation across languages

Target group: MSc Logic / MSc AI

Project description: Languages may share universal features at a deep, abstract level, but the structures found in real-world, surface-level natural language vary significantly. This variation makes it challenging to transfer NLP models across languages or to develop systems that apply to a wide range of languages. As a consequence, the availability of NLP technology is limited to a handful of resource-rich languages, leaving many other languages behind. Understanding linguistic variation in a systematic way is crucial for the development of effective multilingual NLP applications, thus making NLP technology more accessible globally.

In recent years, much NLP research has focused on the development of multilingual models, typically based on multilingual joint learning. The intuition behind this class of methods is that information unambiguously present in one language is likely to help to resolve ambiguities in another (since the specific kinds of ambiguities differ across languages). Multilingual models trained in this fashion have been shown to produce predictions more accurate than a monolingual counterpart in several tasks, most notably in the areas of speech recognition (Huang et al., 2013) and syntactic parsing (Vilares et al., 2015; Ammar et al., 2016; Guo et al., 2016a). However, its benefits have also been demonstrated in semantic tasks, such as word sense disambiguation (Navigli and Ponzetto, 2012). This project will apply multilingual joint learning to metaphor processing. Given the variation in the use of metaphor across cultures on one hand, and the systematicity of its use within individual languages on the other hand, the application of multilingual joint learning to this task holds a particular promise. The project aims both to advance the state of the art in metaphor processing and to design a model that will help us to better understand the nature of semantic variation more generally. We will evaluate the resulting models in the context of monolingual tasks (e.g. metaphor identification, paraphrasing) and a multilingual task --- machine translation.

This is an ambitious project, suitable for students with a background and interest in machine learning and keen to conduct novel research. We hope that it would lead to a publication.

What are our expectations of the student?
- Independent and proactive attitude and an interest in artificial intelligence
- Solid maths background: calculus, linear algebra, probability and statistics
- Advanced programming skills (algorithms and data structures; ideally experience with Pytorch or other deep learning libraries)
- Knowledge and skill in developing and applying machine learning algorithms (particularly, interest and experience in deep learning)
- Good familiarity with and experience in NLP; but don’t worry we will help you to fill in the gaps.

Maximum number of students: 1

Supervisor: Ronald de Haan
Project title: Developing a Python Package for Efficient Reasoning with DNNFs
Target group: BSc AI / BSc Informatica

Project description: Boolean functions are often represented using propositional logic formulas. This way, many reasoning tasks are computationally intractable (e.g., checking satisfiability, finding maximal models). For this reason, many different representation languages for Boolean functions have been investigated. An example is that of Boolean circuits in Decomposable Negation Normal Form (DNNF) -- see, e.g., https://jair.org/index.php/jair/article/view/10311. For this language, many reasoning tasks can be done efficiently. For example, even finding weighted maximal models can be done in polynomial time -- see: https://doi.org/10.1016/j.jal.2016.11.031, https://arxiv.org/abs/1211.4475. This project consists of developing a Python package that implements different reasoning tasks for DNNF circuits in an efficient way. This package could then be used in future research, applying DNNF circuits in other research areas such as computational social choice.

Maximum number of students: 2
Required background: Good programming skills, familiarity with propositional logic

Supervisor: Ronald de Haan
Project title: Experimentally Evaluating Heuristic Algorithms for Voting Rules
Target group: BSc AI
Project description: Many voting rules are computationally intractable to carry out. One way of providing tools to work with these voting rules is to use heuristic algorithms (that work well in many cases, but not in the worst case). An example of this approach is to encode voting rules into the framework of Judgment Aggregation (JA), and use heuristic algorithms for JA based on Answer Set Programming (ASP) solvers. These encodings and heuristic algorithms are available (see, e.g., https://github.com/rdehaan/ja-asp). This project consists of experimentally evaluating how well these heuristic algorithms perform on different types of voting data (a library of such benchmark data is available: http://www.preflib.org/).

Maximum number of students: 1
Required background: Good programming skills, familiarity with logic programming

Supervisor: Tom Lentz
Project title: Prosody extraction from speech input
Project description:
Prosody is a bundle of features of speech, e.g., pitch accents, word stress, sentence intonation, that are expressed or carried by a.o. fundamental frequency and duration/speed. Linguists (phonologists) have developed numerous theories on the properties that are employed for (e.g.) focus marking (indicating the new information in an utterance) and irony and/or sarcasm. However, machine learning has only been sparingly applied, due to a lack of annotated data (training sets) and a clear definition of the markers identified by phonological theory, as well as the possible categories. In other words, for prosodic annotation there is debate on how to find the things that are in the signal as well as which things there actually are in the signal. For instance, there is no good tool to detect e.g. irony and sarcasm (or other prosodic events) in speech.

I am looking for students that would be interested in:

1) do some literature research and writing a more specific research question on the issues mentioned above (in cooperation with me and tailored to the student's background knowledge)

2) building a 'proof of concept' learning algorithm and applying it to available data

3) writing a publishable technical report or article on the results, if possible (which would be the thesis or an optional continuation of it at a later phase)

This research area is obviously quite broad, but ideally we would integrate the students' research with a PhD project that starts in February called 'The Sound of Political Irony' in which one PhD researcher works (together with me and a communication scientist, a logician and a political scientist) in this field. With integration I mainly mean: it would be good if students writing their MSc thesis on this topic and the PhD researcher would work on complementary issues and e.g. read literature together.

Don't hesitate to contact me to have a small chat on the possibilities.

Students that apply for this project should have at least have some background knowledge on speech processing and ideally also on (non-speech) language processing.

Maximum number of students: 3