Speaker: Paul Cook, University of Melbourne
Title: User-level geolocation prediction in social media
Geolocation prediction is vital to geospatial applications like localised search and local event detection. Text-based social media geolocation models are often based on full text data, including common words with little geospatial dimension (e.g., "today") potentially hampering prediction and leading to slower and more memory-intensive models. In this talk, we first present methods for finding location indicative words (LIWs) via feature selection. Our results show that an information gain ratio-based approach surpasses other methods at LIW selection, and outperforms state-of-the-art geolocation prediction methods. The identified LIWs also reveal regional language differences, which could potentially be useful for lexicographers. We further formulate notions of prediction confidence and demonstrate that performance is even higher in cases where our model is more confident, striking a trade-off between accuracy and coverage. We then consider the incorporation of other sources of information, including user-declared meta-data, into our model using a stacking approach. We demonstrate that the stacking method substantially improves performance, achieving 49% accuracy on a benchmark dataset. We further evaluate our method on a recent crawl of Twitter data to investigate the impact of temporal factors on model generalisation. Our results suggest that user-declared location metadata is more sensitive to temporal change than the text of Twitter messages. Finally we present a web-based demo of our geolocation system.