Inferring the location of authors from words in their texts

Max Berggren, Jussi Karlgren, Robert Östling, and Mikael Parkvall

This paper  describes a series of experiments to determine how positionally annotated Twitter texts can be used to learn words which indicate location of other texts and their authors. Many texts are locatable but most have no explicit indication of place — many applications, both commercial and academic, have an interest in knowning where a text or its author is from. 

The notion of placeness of a word is introduced as a measure of how locational a word is, and we find that modelling word distributions to account for several locations, using  local distributional context, and aggregating locational information in a centroid for each text gives the most useful results. The results are applied to data in the Swedish language.

Presented at the 20th NoDaLiDa, Nordic Conference on Computational Linguistics in May 11-13, 2015, Vilnius. This work was done in cooperation with Stockholm University and was partially funded by Vetenskapsrådet, the Swedish Research Council, under its grant SINUS (Spridning av innovationer i nutida svenska).