Collective annotation is the problem of aggregating the judgements of multiple individuals, regarding a linguistic annotation task or similar, into a single collective judgement that reflects the view of the community. We approach this problem by combining ideas from computational linguistics and social choice theory (see also this blog post). This website contains publications and resources related to our ongoing research on this topic.

Ulle Endriss and Raquel Fernández. Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL-2013).
This is the original paper on collective annotation in which we present the fundamental idea of harnessing ideas from social choice theory to aggregate noisy data collected from individual annotators by means of crowdsourcing. The paper proposes novel aggregation rules, particularly bias-correcting rules and greedy consensus rules, and presents experimental results for crowdsourced data on textual entailment.
Justin Kruger, Ulle Endriss, Raquel Fernández, and Ciyang Qing. Axiomatic Analysis of Aggregation Methods for Collective Annotation. Proceedings of the 13th International Conference on Autonomous Agents and Multiagent Systems (AAMAS-2014).
This paper introduces a powerful formal framework for modelling aggregation rules for collective annotation. It presents theoretical results regarding the axiomatic characterisation of such rules, with a special focus on the family of bias-correcting rules.
Ciyang Qing, Ulle Endriss, Raquel Fernández, and Justin Kruger. Empirical Analysis of Aggregation Methods for Collective Annotation. Proceedings of the 25th International Conference on Computational Linguistics (COLING-2014). Shortlisted for the Best Paper Award (received Honourable Mention).
This paper presents experimental results for collective annotation in three different domains: textual entailment, preposition sense disambiguation, and question dialogue act classification. For the latter two domains, we present new crowdsourced datasets of linguistic judgments collected specifically for this paper. The paper also introduces one further aggregation rule, which is based on a simple probabilistic generative model tracking the accuracy of annotators.
Crowdsourced datasets on linguistic judgments
The datasets on linguistic judgments regarding textual entailment (originally collected by Snow et al. 2008), preposition sense disambiguation, and question dialogue act classification collected via Amazon's Mechanical Turk (AMT) and described and analysed in the COLING-2014 paper cited above:

Implementation of aggregation rules
An implementation (in R, by Ciyang Qing) of the aggregation rules discussed in the same COLING-2014 paper:

Ulle Endriss


Raquel Fernández