This page provides links to papers. I indicate if they have been published in the title and provide a link to the published version; please contact me if you cannot access the paywalled version.
ACCEPTED OR PUBLISHED
Abstract: The rise of the internet and mobile telecommunications has created the possibility of using large datasets to understand behavior at unprecedented levels of temporal and geographic resolution. Online social networks attract the most users, though users of these new technologies provide their data through multiple sources, e.g. call detail records, blog posts, web forums, and content aggregation sites. These data allow scholars to adjudicate between competing theories as well as develop new ones, much as the microscope facilitated the development of the germ theory of disease. Of those networks, Twitter presents an ideal combination of size, international reach, and data accessibility that make it the preferred platform in academic studies. Acquiring, cleaning, and analyzing these data, however, require new tools and processes. This Element introduces these methods to social scientists and provides scripts and examples for downloading, processing, and analyzing Twitter data.
Abstract: Political scientists lack a low-cost methodology for analyzing structural properties of large scale networks. This paper shows how to analyze individuals’ changing structural position at a daily level, using the social network Twitter. To do so, two innovations are introduced. First, one can infer when two individuals connect with other an arbitrary amount of time after they actually connected, a task made difficult by how Twitter delivers data to researchers. Communities which connect individuals from different countries can also be identified with this first method. Observing daily network change reveals changing communities and individuals’ position therein. Second, a network measure from computer science, neighbor cumulative indegree centrality (NCC), is introduced; it preserves the rank ordering of individuals’ centrality without the complete network data that those measures require. Combining the first method with the second creates daily data on network centrality. Moreover, these methods can be applied to a network after the period under study has past. Without these methods, daily data on the structural position of individuals would be prohibitively costly to obtain. These methods are demonstrated with 21 Twitter accounts from Bahrain and Egypt during a 3 month period in early 2011. Ground truth data on their number of followers confirms the accuracy of the post hoc inference, the activists’ network centrality changes, both absolutely and relative to each other, and individuals who link activists in each country are identified.
Abstract: Who is responsible for protest mobilization? Models of disease and information diffusion suggest that those central to a social network (the core) should have a greater ability to mobilize others than those who are less well-connected. To the contrary, this paper argues that those not central to a network (the periphery) can generate collective action, especially in the context of large-scale protests in authoritarian regimes. To show that those on the edge of a social network have no effect on levels of protest, this paper develops a dataset of daily protests across 16 countries in the Middle East and North Africa over 14 months from 2010 through 2011. It combines that dataset with geocoded, individual-level communication from the same period and measures the number of connections of each person. Those on the periphery are shown to be responsible for changing levels of protest, with some evidence suggesting that the core’s mobilization efforts lead to fewer protests. These results have implications for a wide range of social choices that rely on interdependent decision making.
This report, which combines qualitative and quantitative methodology with new sources of data to create a “digital case study”, was created with support from the United States Agency of International Development. It informed subsequent components of the dissertation project, especially theorizing about activism in authoritarian contexts, combining microlevel qualitative and quantitative data, and the development of methodologies for social network analysis using Twitter data.
Abstract: This paper develops a theory of activism and protest in authoritarian regimes. Activists engage in three primary behaviors: common knowledge creation, protest coordination, and negotiation with regime leadership. They are most e↵ect protest turnout. This theory is tested using a novel combination of Twitter data and case studies. Complete Twitter activity is obtained for 19 activists from Bahrain and Egypt from January 11th, 2011 through April 5th of the same year. These histories are used to understand activists’ activity before, during, and after protests. Primary source material corroborates these findings. Protests are one of the key sources of policy change in authoritarian regimes, and activists are prominent actors before, during, and after them. Understanding how activism works and how it affects protest therefore provides a new way of understanding policy change in authoritarian regimes.
Abstract: This paper presents two new findings about protest, using the United States’ Women’s March as a motivating example. First, it shows that two new data sources – the Crowd Counting Consortium and geolocated Twitter accounts – provide better measures of protest than existing datasets on two dimensions. They record more protests with greater geographic precision than other datasets, and they can measure crowd size. Existing datasets provide coverage, sometimes daily, with less precision. Second, protests scale: they appear to follow a power law above small thresholds, and larger cities have more protesters per capita than smaller ones. While absolute measures suggest larger cities have larger protests, residuals show that smaller towns, often with public universities, have the largest protests. While it is epistemologically impossible to know which data source provides the most accurate crowd size, Twitter and the Crowd Counting Consortium record broad agreement on size and city ranking by size. The paper concludes with a discussion of generating automatic events data using Twitter.
Abstract: The distribution of the size of protests follows a fat tail, as does the distribution of connections in social networks. The latter explains the former. The desire to protest accumulates over time, and this desire is stochastically activated. Most of the time, the resulting mobilization is small; sometimes, because of the distribution of network connections and idiosyncratic network structure, the protests are large. Grounding protest mobilization in network structure clarifies some puzzles, such as the apparently random response of protestors to repression, and suggests tests for others, such as the impact of social media on protest. It also generates testable implications for the dis- tribution of protests across regime types and technological eras. We then explain tests to distinguish between two particular fat tails, lognormal and power law distributions, finding the weight of the evidence in favor of power laws as an explanation for protest size. Analyses of the distribution of civil war intensity, terrorist attack outcomes, vi- olent events, and interstate war also finds support for fat tail outcomes, especially power laws. The paper concludes with a discussion of whether power laws may explain phenomena of interest to scholars of international relations and American politics.