Geoff Ford Political Science x Digital Methods

Corpus construction: The New Zealand Parliamentary Language Corpus

Lots of my research involves building data-sets from web texts. For my PhD research I built a 57-million-word corpus from NZ’s parliamentary debates. The NZ Parliamentary Language Corpus (Version 2; 2016) is annotated to allow comparisons between speakers, political parties and over time.

In my research I’ve built corpora for a number of research projects. For example, for the Mapping LAWS project I’ve built multiple corpora of military, political, activist, academic, media and other discourse about autonomous weapons.

I also teach web scraping and corpus building and have built corpora to provide timely and relevant data-sets for students to analyse in computer labs.

Skills & Tools: Web scraping; Data wrangling; Corpus construction; Python; MySQL.