Publications

This page provides links to papers.  I have tried to make versions available for free since journal publishing is extortionary; if a link is to a paywall, contact me and I will provide a copy.

Most of these articles include coauthors; the below entries exclude all names in order to save space.

PEER REVIEWED

Collective Identity in Collective Action: Evidence from the 2020 Summer BLM Protests.

Does collective identity drive protest participation? A long line of research argues that collective identity can explain why protesters do not free ride and how specific movement strategies are chosen. Quantitative studies, however, are inconsistent in defining and operationalizing collective identity, making it difficult to understand under what conditions and to what extent collective identity explains participation. In this paper, we clearly differentiate between interest and collective identity to isolate the individual level signals of collective action. We argue that these quantities have been conflated in previous research, causing over estimation of the role of collective identity in protest behavior. Using a novel dataset of Twitter users who participated in Black Lives Matter protests during the summer of 2020, we find that contingent on participating in a protest, individuals have higher levels of interest in BLM on the day of and the days following the protest. This effect diminishes over time. There is little observed effect of participation on subsequent collective identity. In addition, higher levels of interest in the protest increases an individuals chance of participating in a protest, while levels of collective identity do not have a significant effect. These findings suggest that collective identity plays a weaker role in driving collective action than previously suggested. We claim that this overestimation is a byproduct of the misidentification of interest as identity.

A Causal Approach for Detecting Team-Level Momentum in NBA Games.

This paper provides new evidence that team-level momentum exists in the National Basketball Association (NBA). The existence of momentum is one of the most prominent and longstanding questions in sports analytics. But for all its importance to announcers, coaches, and players, existing literature has found little evidence of momentum in professional basketball. This paper exploits a natural experiment in the flow of basketball games: television (TV) timeouts. Since TV timeouts occur at points exogenous to momentum, they enable the measurement of the effect of pauses in the game separate from the effect of strategy changes. We find TV timeouts cause an 11.2% decline in the number of points that the team with momentum subsequently scores. This effect is robust to the size of a run, substitutions, and game context. This result has far reaching implications in basketball strategy and the understanding of momentum in sports more broadly.

GatherTweet: A Python Package for Collecting Social Media Data on Online Events.

Social media plays a crucial role in the organization of massive social move- ments. However, the sheer quantity of data generated by the events as well as the data collection restrictions that researchers encounter, leads to a series of challenges for researchers who want to analyze dynamic public discourse and opinion in response to and in the creation of world events. In this paper we present gatherTweet, a Python package that helps researchers efficiently col- lect social media data for events that are composed of many decentralized ac- tions (across both space and time). The package is useful for studies that re- quire analysis of the organizational or baseline messaging before an action, the action itself, and the effects of the action on subsequent public discourse. By capturing these aspects of world events gatherTweet enables the study of events and actions like protests, natural disasters, and elections.

Mask Images on Twitter increase during COVID-19 mandates, especially in Republican counties.

Wearing masks reduces the spread of COVID‐19, but compliance with mask mandates varies across individuals, time, and space. Accurate and continuous measures of mask wearing, as well as other health‐related behaviors, are important for public health policies. This article presents a novel approach to estimate mask wearing using geotagged Twitter image data from March through September, 2020 in the United States. We validate our measure using public opinion survey data
and extend the analysis to investigate county‐level differences in mask wearing. We find a strong association between mask mandates and mask wearing—an average increase of 20%. Moreover, this association is greatest in Republican‐leaning counties. The findings have important implications for understanding how governmental policies shape and monitor citizen responses to public health crises.

 

Taxing dissent: The impact of a social media tax in Uganda.

We examine the impact of a new tool for digital repression — a daily tax on social media use in Uganda. Using a synthetic control framework and exploiting the exogenous timing of the tax induced by the legislative calendar, we estimate that the tax reduced the number of georeferenced Twitter users by 13 percent. The effects are larger for poorer and less frequent users.  Despite the overall decline in Twitter use, tweets referencing collective action and observed protests both increased around the onset of the tax relative to the synthetic control. The high salience of the tax as digital repression and its impact on the composition of users are two potential mechanisms for this backlash effect.

Introducing MMCHIVED: Multimodal Chile and Venezuela Event Data

This paper introduces the Multimodal Chile & Venezuela Protest Event Dataset (MMCHIVED). MMCHIVED generates city-day event data using a new source of data, text and images shared on social media. These data enables the improved measurement of theoretically important variables such as protest size, protester and state violence, protester demographics, and emotions. In Venezuela, MMCHIVED records many more protests than existing datasets. In Chile, it records slightly more events than the Armed Conflict Location and Events Dataset (ACLED). These extra events are from small cities far from Caracas and Santiago, an improvement of coverage over datasets that rely on newspapers, and the paper confirms they are true positives. MMCHIVED’s methodology can generate protest data in 107 countries that contain 97.14% of global GDP and 82.7% of the world’s population.

Image as Data: Automated Content Analysis for Visual Presentations of Political Actors and Events [CCR version]

Images matter because they help individuals evaluate policies, primarily through emotional resonance, and can help researchers from a variety of fields measure otherwise difficult to estimate quantities. The lack of scalable analytic methods, however, has prevented researchers from incorporating large scale image data in studies. This article offers an in-depth overview of automated methods for image analysis and explains their usage and implementation. It elaborates on how these methods and results can be validated and interpreted and discusses ethical concerns. Two examples then highlight approaches to systematically understanding visual presentations of political actors and events from large scale image datasets collected from social media. The first study examines gender and party differences in the self-presentation of the U.S. politicians through their Facebook photographs, using an off-the-shelf computer vision model, Google’s Label Detection API. The second study develops image classifiers based on convolutional neural networks to detect custom labels from images of protesters shared on Twitter to understand how protests are framed on social media. These analyses demonstrate advantages of computer vision and deep learning as a novel analytic tool that can expand the scope and size of traditional visual analysis to thousands of features and millions of images. The paper also provides comprehensive technical details and practices to help guide political communication scholars and practitioners.

COVID-19 Increased Censorship Circumvention and Access to Sensitive Topics in China [PNAS version]

Crisis motivates people to track news closely, and this increased engagement can expose individuals to politically sensitive infor- mation unrelated to the initial crisis. We use the case of the COVID- 19 outbreak in China to examine how crisis affects information seeking in countries that normally exert significant control over access to media. The crisis spurred censorship circumvention and access to international news and political content on websites blocked in China. Once individuals circumvented censorship, they not only received more information about the crisis itself but also accessed unrelated information that the regime has long cen- sored. Using comparisons to democratic and other authoritarian countries also affected by early outbreaks, the findings suggest that people blocked from accessing information most of the time might disproportionately and collectively access that long-hidden information during a crisis. Evaluations resulting from this access, negative or positive for a government, might draw on both current events and censored history.

How State and Protester Violence Affects Protest Dynamics [JoP version]

How do state and protester violence affect whether protests grow or shrink? Previous research finds conflicting results for how violence affects protest dynamics. This paper argues that expectations and emotions should generate an n-shaped relationship between the severity of state repression and changes in protest size the next day. Protester violence should reduce the appeal of protest- ing and increase the expected cost of protesting, decreasing subsequent protest size. Since testing this argument requires precise measurements, a pipeline is built that applies convolutional neural networks to images shared in geolocated tweets. Continuously valued estimates of state and protester violence are gener- ated per city-day for 24 cities across five countries, as are estimates of protest size and the age and gender of protesters. The results suggest a solution to the repression-dissent puzzle and join a growing body of research benefiting from the use of social media to understand subnational conflict.

How Social Networks Affect the Repression-Dissent Puzzle [PLoS ONE version]

Scholars have offered multiple theoretical resolutions to explain inconsistent findings about the relationship of state repression and protests, but this repression-dissent puzzle remains unsolved. We simulate the spread of protest on social networks to suggest that the repres- sion-dissent puzzle arises from the nature of statistical sampling. Even though the paper’s simulations construct repression so it can only decrease protest size, the strength of repression sometimes correlates with a decrease, increase, or no change in protest size, regardless of the type of network or sample size chosen. Moreover, the results are most contradictory when the repression rate most closely matches that observed in real-world data. These results offer a new framework for understanding state and protester behavior and suggest the importance of collecting network data when studying protests.

News and Social Media Accurately Measure Protest Size Variation  [APSR version]

Larger protests are more likely to lead to policy changes than small ones are, but whether or not attendance estimates provided in news or generated from social media are biased is an open question. This letter closes the question: news and geolocated social media data generate accurate estimates of protest size variation. This claim is substantiated using cellphone location data from more than 10 million individuals during the 2017 United States Women’s March protests. These cellphone estimates correlate strongly with those provided in news media as well as three size estimates generated using geolocated tweets, one text-based and two based on images. Inferences about protest attendance from these estimates match others’ findings about the Women’s March.

Social media and Russian territorial irredentism: some facts and a conjecture [Post-Soviet Affairs version]

After Kremlin policymakers decided to incorporate the territory of Crimea into Russia, updates on public attitudes in Russian-speaking communities elsewhere in Ukraine would have been in high demand. Because social media users produce content in order to communicate ideas to their social networks, online political discourse can provide important clues about the political dispositions of communities. We map the evolution of Russian-speakers’ attitudes, expressed on social media, across the course of the conflict as Russian analysts might have observed them at the time. Results suggest that the Russian-Ukrainian interstate border only moved as far as their military could have advanced while incurring no occupation costs – Crimea, and no further.

Understanding the Political Ideology of Legislators from Social Media Images [ICWSM here]

Abstract: In this paper, we seek to understand how politicians use images to express ideological rhetoric through Facebook images posted by members of the U.S. House and Senate. In the era of social media, politics has become saturated with imagery, a potent and emotionally salient form of political rhetoric which has been used by politicians and political organizations to influence public sentiment and voting behavior for well over a century. To date, however, little is known about how images are used as political rhetoric. Using deep learning techniques to automatically predict Republican or Democratic party affiliation solely from the Facebook photographs of the members of the 114th U.S. Congress, we demonstrate that predicted class probabilities from our model function as an accurate proxy of the political ideology of images along a left-right (liberal-conservative) dimension. After controlling for the gender and race of politicians, our method achieves an accuracy of 59.28% from single photographs and 82.35% when aggregating scores from multiple photographs (up to 150) of the same person. To better understand image content distinguishing liberal from conservative images, we also perform in-depth content analyses of the photographs. Our findings suggest that conservatives tend to use more images supporting status quo political institutions and hierarchy maintenance, featuring individuals from dominant social groups, and displaying greater happiness than liberals.

Twitter as Data [Cambridge Elements here]

Abstract: The rise of the internet and mobile telecommunications has created the possibility of using large datasets to understand behavior at unprecedented levels of temporal and geographic resolution. Online social networks attract the most users, though users of these new technologies provide their data through multiple sources, e.g. call detail records, blog posts, web forums, and content aggregation sites. These data allow scholars to adjudicate between competing theories as well as develop new ones, much as the microscope facilitated the development of the germ theory of disease. Of those networks, Twitter presents an ideal combination of size, international reach, and data accessibility that make it the preferred platform in academic studies. Acquiring, cleaning, and analyzing these data, however, require new tools and processes. This Element introduces these methods to social scientists and provides scripts and examples for downloading, processing, and analyzing Twitter data.

Protest Activity Detection and Perceived Violence Estimation from Social Media Images [ACM Multimedia here]

We develop a novel visual model which can recognize protesters, describe their activities by visual attributes and estimate the level of perceived violence in an image. Studies of social media and protests use natural language processing to track how individuals use hashtags and links, often with a focus on those items’ diffusion. These approaches, however, may not be effective in fully characterizing actual real-world protests (e.g., violent or peaceful) or estimating the demographics of participants (e.g., age, gender, and race) and their emotions. Our system characterizes protests along these dimensions. We have collected geotagged tweets and their images from 2013-2017 and analyzed multiple major protest events in that period. A multi-task convolutional neural network is employed in order to automatically classify the presence of protesters in an image and predict its visual attributes, perceived violence and exhibited emotions. We also release the UCLA Protest Image Dataset, our novel dataset of 40,764 images (11,659 protest images and hard negatives) with various annotations of visual attributes and sentiments. Using this dataset, we train our model and demonstrate its effectiveness. We also present experimental results from various analysis on geotagged image data in several prevalent protest events.

Longitudinal Network Centrality Using Incomplete Data [Political Analysis Here]

Abstract:  Political scientists lack a low-cost methodology for analyzing structural properties of large scale networks. This paper shows how to analyze individuals’ changing structural position at a daily level, using the social network Twitter. To do so, two innovations are introduced. First, one can infer when two individuals connect with other an arbitrary amount of time after they actually connected, a task made difficult by how Twitter delivers data to researchers. Communities which connect individuals from different countries can also be identified with this first method. Observing daily network change reveals changing communities and individuals’ position therein. Second, a network measure from computer science, neighbor cumulative indegree centrality (NCC), is introduced; it preserves the rank ordering of individuals’ centrality without the complete network data that those measures require. Combining the first method with the second creates daily data on network centrality. Moreover, these methods can be applied to a network after the period under study has past. Without these methods, daily data on the structural position of individuals would be prohibitively costly to obtain. These methods are demonstrated with 21 Twitter accounts from Bahrain and Egypt during a 3 month period in early 2011. Ground truth data on their number of followers confirms the accuracy of the post hoc inference, the activists’ network centrality changes, both absolutely and relative to each other, and individuals who link activists in each country are identified.

Spontaneous Collective Action [American Political Science Review here]

Abstract:  Who is responsible for protest mobilization? Models of disease and information diffusion suggest that those central to a social network (the core) should have a greater ability to mobilize others than those who are less well-connected. To the contrary, this paper argues that those not central to a network (the periphery) can generate collective action, especially in the context of large-scale protests in authoritarian regimes. To show that those on the edge of a social network have no effect on levels of protest, this paper develops a dataset of daily protests across 16 countries in the Middle East and North Africa over 14 months from 2010 through 2011. It combines that dataset with geocoded, individual-level communication from the same period and measures the number of connections of each person. Those on the periphery are shown to be responsible for changing levels of protest, with some evidence suggesting that the core’s mobilization efforts lead to fewer protests. These results have implications for a wide range of social choices that rely on interdependent decision making.

SCM: Supplementary Materials

Predictability Versus Flexibility: Secrecy in International Investment Arbitration [World Politics here]

Abstract: There is heated debate over the wisdom and effect of secrecy in international negotiations. This debate has become central to the process of foreign investment arbitration because parties to disputes nearly always can choose to hide arbitral outcomes from public view. Working with a new database of disputes at the world’s largest investor-state arbitral institution, the World Bank’s International Centre for Settlement of Investment Disputes, the authors examine the incentives of firms and governments to keep the details of their disputes secret. The authors argue that secrecy in the context of investment arbitration works like a flexibility-enhancing device, similar to the way escape clauses function in the context of international trade. To attract and preserve investment, governments make contractual and treaty-based promises to submit to binding arbitration in the event of a dispute. They may prefer secrecy in cases when they are under strong political pressure to adopt policies that violate international legal norms designed to protect investor interests. Investors favor secrecy when managing politically sensitive disputes over assets they will continue to own and manage in host countries long after the particular dispute has passed. Although governments prefer secrecy to help facilitate politically difficult bargaining, secrecy diminishes one of the central purposes of arbitration: to allow governments to signal publicly their general commitment to investor-friendly policies. Understanding the incentives for keeping the details of dispute resolution secret may help future scholars explain more accurately the observed patterns of wins and losses from investor-state arbitration as well as patterns of investment.

Online Social Networks and Offline Protest [EPJ here]

Abstract: Large-scale protests occur frequently and sometimes overthrow entire political systems. Meanwhile, online social networks have become an increasingly common component of people’s lives. We present a large-scale longitudinal study that connects online social media behaviors to offline protest. Using almost 14 million geolocated tweets and data on protests from 16 countries during the Arab Spring, we show that increased coordination of messages on Twitter using specific hashtags is associated with increased protests the following day. The results also show that traditional actors like the media and elites are not driving the results. These results indicate social media activity correlates with subsequent large-scale decentralized coordination of protests, with important implications for the future balance of power between citizens and their states.

Online and Offline Activism in Egypt and Bahrain [Published here]

This report, which combines qualitative and quantitative methodology with new sources of data to create a “digital case study”, was created with support from the United States Agency of International Development.  It informed subsequent components of the dissertation project, especially theorizing about activism in authoritarian contexts, combining microlevel qualitative and quantitative data, and the development of methodologies for social network analysis using Twitter data.

INVITED CONTRIBUTIONS

Moving Beyond Newspapers and Text to Study Contentious Politics [The Political Economist]

Event datasets should use images, especially ones from social media, to generate their data. 

Changing Sources: Social Media Activity During Civil War [Digital Activism and Authoritarian Adaptation in the Middle East]

In this essay, we investigate how social media usage may be influenced by local conflict dynamics. We study Twitter usage by individuals based in Syria during the conflict, including data from 2014 to 2017. Instead of studying the content posted by individuals using Twitter, we focus on account activity as an indicator of changing offline dynamics.  Narratives on social media may change not only because individuals change the type of content they post, but also because the composition of users posting from a certain location changes.

Plus Ça Change: New Media and Eternal Protest [APSA Comparative Politics newsletter]

Social media have transformed protest in some ways, yet in more ways than not they do not alter the fundamental dynamics of protest.  While social media can create political power by opening new spaces for dissent, any pro-protest effect of this opening start to dissipate by the first half of 2011, the middle stages of the Arab Spring.  The reason for this muted effect is, like the solution to homelessness, eponymous.  The combination of humans’ instinct to communicate (social) and the ability to disseminate that communication broadly (media) is a new development for humanity, and that merger transforms protest in specific ways.  At the same time, since social media is a recombination of universal, preexisting behaviors and technology, it does not transform most of the important parts of protest, including states’ efforts to suppress them.

How to Use Social Media Data for Political Science Research in The Sage Handbook of Research Methods in Political Science and International Relations.

Comment: The Future of Data is Images [Sociological Methodology version]

First two paragraphs: Zhang and Pan (this volume, p. 000) impresses. Showing how the combination of social media, text, and images can generate new data, the authors have greatly pushed forward the state of event data. Before this article, the frontier methodology for creating event data has been to read large volumes of newspapers by hand or train a computer to extract this information automatically. Combining an image classifier with a text model, both trained on validated Sina Weibo posts, Zhang and Pan point the way to a bright future for the study of collective action. This article is the new frontier.

Despite the impressive accomplishments of this article, it has one overriding fault: it does not argue forcefully enough for the potential of social media image data in the generation of event data for collective action. In this commentary I will therefore make an even stronger argument than the two authors: although there is much exciting, necessary work coming from newspaper-based event data sets, images generated via social media hold so much promise that they should generate their own research agenda.

News and Views: Moralization, Protest, and Violence [Nature: Human Behavior version]

First two paragraphs: Since the rise of the Internet, scholars have asked how digital technologies affect our political lives. One area that has received increasing attention, since at least Iran’s 2009 Green Movement, is how social media — platforms such as Facebook, Twitter, VKontakte and Sina Weibo — affect the course of protest movements. More recently, researchers have started to ask not ‘How do social media affect protests?’, but ‘What can we learn about protests from social media that we otherwise could not?’ A study in Nature Human Behaviour by Marlon Mooijman and co-workers is at the cutting edge of this new wave of work. Combining geocoded Twitter messages (tweets) from Baltimore’s Freddie Gray protests with lab experiments, they find that protests are more likely to become violent when the underlying issue is moralized.

Mooijman et al. argue that a protest is more likely to become violent as the result of two conditions: (1) “the degree to which people see protest as a moral issue” and (2) when protesters believe others see the issue in moral terms. Using Twitter and experiments to study morals is important because it provides a means of testing a class of explanations that is often difficult to test. That is, the protest literature analyses participation as a result of either cost–benefit calculations or ‘emotional’ decisions. Cost–benefit calculations have been substantially tested because their concepts are clearer to operationalize, and a central question of this strand of literature is how to overcome free-riding that naturally follows if individuals are assumed to undertake a cost–benefit analysis before protesting.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.