Introduction

Most of my research applies digital methods in some way. This includes my PhD research, and projects during my Postdoctoral Fellowship with the Arts Digital Lab, my teaching in Digital Humanities and my recent work as a Postdoctoral Fellow on the Mapping LAWS project. I've also helped and advised colleagues and students at UC on their projects applying digital methods.

This page is a showcase of some of my work, experiments and works in progress. Scroll down to begin.

Browser-based concordancing and text analysis tool

During the Mapping LAWS project I've been developing 'ConText' - a browser-based concordancing and corpus analysis tool. The prototype has been developed using Plotly's Dash with language data processed using Spacy. Many corpus analysis tools are feature heavy, but this can be confusing for non-specialists. ConText prioritises an intuitive interface focused on accessing and analysing texts and contexts, rather than focusing on settings. ConText allows users to efficiently search, compare and analyse multiple large corpora. An accompanying Jupyter notebook allows reproduceable, documented, shareable text analysis. This can be extended to incorporate novel visualisations and methods beyond the scope of the web application.

I will release ConText as part of the Mapping LAWS project.

Skills & Tools: Python; Dash; Spacy; Numpy; Interface design; Web application development.

sn0rt

sn0rt is a software tool to automate realtime analysis of topics on Twitter via its API. It tracks user-defined keywords to collect tweeted news articles. I've been developing it over multiple projects since my PhD.

The Mapping LAWS project is using sn0rt to monitor new web content (e.g. news stories, government and NGO reports, academic articles, and events) related to autonomous weapons (or killer robots). Web pages that are being frequently shared on Twitter are then sent (sn0rted) to a channel on our project Slack so project members are up to date with recent developments.

sn0rt will be released as part of the Mapping LAWS project.

Skills & Tools: Python; Twitter API; Web scraping; Slack API.

Robot Dreams

For the Mapping LAWS project I developed an online interface to explore themes related to robots in popular culture. A robotic voice reads 1800 references to robots extracted from a data-set of over 82,000 Wikipedia movie plots and visualises associated themes.

Skills & Tools: Python script to retrieve and process movie plots using Wikipedia's API; Python code to generate circle packing visualisation using Plotly; Automating generation of audio files using text-to-speech engine; Detecting themes using the Empath library; HTML/CSS/Javascript to create interactive web-based interface.

Robot Dreams >

Visualising themes in debates about autonomous weapons

A "scrollytelling" article that introduces an interactive tangled-tree visualisation of debates in autonomous weapons debates. I extended an existing Observable Notebook to add interactivity to allow people to explore our taxonomy of features of debates about autonomous weapons.

Skills & Tools: Javascript; D3.js; Scrollama.js scrollytelling library; Observable Notebook.

Article and visusalisation >

Building corpora

Lots of my research applies web scraping to build data-sets from web texts. For my PhD research I built a 57 million word corpus from NZ's parliamentary debates. The NZ Parliamentary Language Corpus) is annotated to allow comparisons between speakers, political parties and over time.

As a postdoctoral fellow and lecturer I've built corpora for my own research, as well as colleagues' research projects. I also teach web scraping and corpus building and have built corpora to provide timely and relevant data-sets for students to analyse in labs.

For the Mapping LAWS project I'm building multiple corpora of military, political, activist, academic, media and other discourse about autonomous weapons. I'm releasing several data-sets as part of the project.

Skills & Tools: Web scraping; Data wrangling; Corpus construction; Python; MySQL.

WART (Wayback As Research Tool)

WART is a set of notebooks I will be releasing as part of the Mapping LAWS project. We are using this to collect archived versions of websites from the Wayback machine. In some cases we've been able to gather over 20 years of comparable texts (e.g. speeches). The notebooks include functionality to search and visualise changes in textual features over time.

Skills & Tools: Python; Data visualisation; Wayback's API.

Text analysis example: Twitter sentiment

As part of the Mapping LAWS project I've worked on a smaller research project with Jeremy Moses on robot quadrupeds. Our article 'See Spot save lives: fear, humanitarianism, and war in the development of robot quadrupeds' is published in Digital War. I collected a large corpus of tweets about Boston Dynamics' robot quadrupeds and applied sentiment analysis, extending the VADER sentiment library to allow visualisation and analysis of words driving negative and positive sentiment across many tweets. I applied techniques from corpus linguistics to conduct closer analysis of these sentiment-laden words using concordancing.

Skills & Tools: Sentiment analysis; Data visualisation; Data wrangling; Python; Text analysis; Corpus Linguistics.

'Spot' article >

Text analysis example: 'Economy'-rhetoric

My PhD research analysed use of 'economy' in NZ's parliamentary debates. This research quantified a large increase in discussion of the economy after the 2008 financial crisis. It also identified and quantified the shared vocabulary of the major parties, National and Labour, when talking about the economy and the Greens' divergence from the major parties' prioritisation of economic growth. I developed a technique I called 'key collocates analysis' to identify shifts in collocated words over time, allowing me to analyse common features of rhetoric of the economy and specific rhetorical strategies being used by political parties. I've drawn on this research since my PhD in publications on NZ's political parties and the Greens. Analysis was conducted using a corpus analysis tool built for the project (Political Language Browser).

Skills & Tools: Corpus-assisted discourse analysis; Keyword analysis; Collocation analysis; Web application development; PHP/MySQL/Javascript.

PhD Thesis >

Text analysis example: EQC in The Press

I wrote a submission to the Public Inquiry into the Earthquake Commission (EQC) for the UC Arts Digital Lab in 2019. This involved building and analysing a corpus of 112 million words of articles from The Press (2010-2019). The analysis quantified EQC coverage over time, comparing this to the visibility of other organisations and keywords related to the period after the Canterbury and Christchurch earthquakes on 2010 and 2011. I applied collocation analysis to identify words predictably associated with EQC, including negative terms related to EQC repairs (e.g. “botched”, “defective”, “poor”, “shoddy”, “unconsented”, “substandard”) reflecting public reports of poor quality repairs that were featured in Press reporting as early as January 2012.

Skills & Tools: Corpus construction; Data wrangling; Python; Corpus-assisted discourse analysis; Data visualisation.

Research briefing >

Concordancing audio

For my PhD thesis I developed a workflow to create an audio concordance that allowed 'keywords in context' in audio to be listened to and read. I applied this to find and analyse mentions of 'economy' in 1788 hours of talkback radio. Examples of economy could be read and listened to via a web application that allowed listening to much longer sections of audio for context. I also applied this data and method to analyse anti-immigrant discourse for the 'Research-based solutions to online hate and offline consequences Workshop' in 2019.

Skills & Tools: Web application development; PHP; MySQL; CMUSphinx.

PhD Thesis >

Building corpora from television captions

During my PhD research I developed a workflow to extract captions from New Zealand's Digital TV broadcasts to build data-sets of texts related to television broadcasts. Captions were encoded as images and so part of the process used Optical Character Recognition (OCR) to extract the text. At the 2016 NZEENZ conference I presented on the possibility of using caption-based corpora to build data-sets to study and monitor the content of NZ media and NZ English.

Skills & Tools: Wrangling hardware; Automating capture of specific TV shows; Integrating software to extract and OCR image captions.

Website and web-based application development

I've been building websites and web-based applications for over 20 years (13 years as lead developer in my own consultancy prior to my PhD). I'm a generalist with expertise in web technologies, databases and server-side application development, I've also advised on and led many web projects. Through my work with the Arts Digital Lab I've assisted with a number of website projects at UC, most using Wordpress (which I've been working on since the mid-2000s). I've also built websites for research projects and advised colleagues on online survey implementation and website upgrades.

Many of the tools I build are implemented as web-based applications. For example, the Political Language Browser built during my PhD combined frontend (HTML/CSS/Javascript) and backend technologies (PHP/MySQL) to allow searching, browsing and analysing a large corpus of parliamentary debates. This was the inspiration for the ConText browser-based corpus analysis tool I'm developing.

Skills & Tools: Python; PHP; MySQL; HTML; CSS; Javascript; Responsive design; CMS applications/frameworks.

Get Papers Past

You can of course get Papers Past via DigitalNZ API, however not all publications are accessible in this way. I've released a Jupyter Notebook to allows others to collect a specific periodical in a conservative way.

Skills & Tools: Python; Web scraping; Selenium.

Github repository >

Notebooks to run topic models in your browser

For a Digital Humanities course I teach I released a couple of Jupyter Notebooks to run LDA topic models using Mallet via Binder and Google Colab.

Skills & Tools: Python; Gensim; Mallet; Binder; Colab.

Github repository >