This page provides links to papers. I indicate where they have been published in the title and provide a link to the published version; please contact me if you cannot access the paywalled version.
ACCEPTED OR PUBLISHED
First two paragraphs: Zhang and Pan (this volume, p. 000) impresses. Showing how the combination of social media, text, and images can generate new data, the authors have greatly pushed forward the state of event data. Before this article, the frontier methodology for creating event data has been to read large volumes of newspapers by hand or train a computer to extract this information automatically. Combining an image classifier with a text model, both trained on validated Sina Weibo posts, Zhang and Pan point the way to a bright future for the study of collective action. This article is the new frontier.
Despite the impressive accomplishments of this article, it has one overriding fault: it does not argue forcefully enough for the potential of social media image data in the generation of event data for collective action. In this commentary I will therefore make an even stronger argument than the two authors: although there is much exciting, necessary work coming from newspaper-based event data sets, images generated via social media hold so much promise that they should generate their own research agenda.
Abstract: In this paper, we seek to understand how politicians use images to express ideological rhetoric through Facebook images posted by members of the U.S. House and Senate. In the era of social media, politics has become saturated with imagery, a potent and emotionally salient form of political rhetoric which has been used by politicians and political organizations to influence public sentiment and voting behavior for well over a century. To date, however, little is known about how images are used as political rhetoric. Using deep learning techniques to automatically predict Republican or Democratic party affiliation solely from the Facebook photographs of the members of the 114th U.S. Congress, we demonstrate that predicted class probabilities from our model function as an accurate proxy of the political ideology of images along a left-right (liberal-conservative) dimension. After controlling for the gender and race of politicians, our method achieves an accuracy of 59.28% from single photographs and 82.35% when aggregating scores from multiple photographs (up to 150) of the same person. To better understand image content distinguishing liberal from conservative images, we also perform in-depth content analyses of the photographs. Our findings suggest that conservatives tend to use more images supporting status quo political institutions and hierarchy maintenance, featuring individuals from dominant social groups, and displaying greater happiness than liberals.
Abstract: The rise of the internet and mobile telecommunications has created the possibility of using large datasets to understand behavior at unprecedented levels of temporal and geographic resolution. Online social networks attract the most users, though users of these new technologies provide their data through multiple sources, e.g. call detail records, blog posts, web forums, and content aggregation sites. These data allow scholars to adjudicate between competing theories as well as develop new ones, much as the microscope facilitated the development of the germ theory of disease. Of those networks, Twitter presents an ideal combination of size, international reach, and data accessibility that make it the preferred platform in academic studies. Acquiring, cleaning, and analyzing these data, however, require new tools and processes. This Element introduces these methods to social scientists and provides scripts and examples for downloading, processing, and analyzing Twitter data.
Protest Activity Detection and Perceived Violence Estimation from Social Media Images [ACM Multimedia here]
We develop a novel visual model which can recognize protesters, describe their activities by visual attributes and estimate the level of perceived violence in an image. Studies of social media and protests use natural language processing to track how individuals use hashtags and links, often with a focus on those items’ diffusion. These approaches, however, may not be effective in fully characterizing actual real-world protests (e.g., violent or peaceful) or estimating the demographics of participants (e.g., age, gender, and race) and their emotions. Our system characterizes protests along these dimensions. We have collected geotagged tweets and their images from 2013-2017 and analyzed multiple major protest events in that period. A multi-task convolutional neural network is employed in order to automatically classify the presence of protesters in an image and predict its visual attributes, perceived violence and exhibited emotions. We also release the UCLA Protest Image Dataset, our novel dataset of 40,764 images (11,659 protest images and hard negatives) with various annotations of visual attributes and sentiments. Using this dataset, we train our model and demonstrate its effectiveness. We also present experimental results from various analysis on geotagged image data in several prevalent protest events.
Abstract: Political scientists lack a low-cost methodology for analyzing structural properties of large scale networks. This paper shows how to analyze individuals’ changing structural position at a daily level, using the social network Twitter. To do so, two innovations are introduced. First, one can infer when two individuals connect with other an arbitrary amount of time after they actually connected, a task made difficult by how Twitter delivers data to researchers. Communities which connect individuals from different countries can also be identified with this first method. Observing daily network change reveals changing communities and individuals’ position therein. Second, a network measure from computer science, neighbor cumulative indegree centrality (NCC), is introduced; it preserves the rank ordering of individuals’ centrality without the complete network data that those measures require. Combining the first method with the second creates daily data on network centrality. Moreover, these methods can be applied to a network after the period under study has past. Without these methods, daily data on the structural position of individuals would be prohibitively costly to obtain. These methods are demonstrated with 21 Twitter accounts from Bahrain and Egypt during a 3 month period in early 2011. Ground truth data on their number of followers confirms the accuracy of the post hoc inference, the activists’ network centrality changes, both absolutely and relative to each other, and individuals who link activists in each country are identified.
Abstract: Who is responsible for protest mobilization? Models of disease and information diffusion suggest that those central to a social network (the core) should have a greater ability to mobilize others than those who are less well-connected. To the contrary, this paper argues that those not central to a network (the periphery) can generate collective action, especially in the context of large-scale protests in authoritarian regimes. To show that those on the edge of a social network have no effect on levels of protest, this paper develops a dataset of daily protests across 16 countries in the Middle East and North Africa over 14 months from 2010 through 2011. It combines that dataset with geocoded, individual-level communication from the same period and measures the number of connections of each person. Those on the periphery are shown to be responsible for changing levels of protest, with some evidence suggesting that the core’s mobilization efforts lead to fewer protests. These results have implications for a wide range of social choices that rely on interdependent decision making.
Predictability Versus Flexibility: Secrecy in International Investment Arbitration [World Politics here]
Abstract: There is heated debate over the wisdom and effect of secrecy in international negotiations. This debate has become central to the process of foreign investment arbitration because parties to disputes nearly always can choose to hide arbitral outcomes from public view. Working with a new database of disputes at the world’s largest investor-state arbitral institution, the World Bank’s International Centre for Settlement of Investment Disputes, the authors examine the incentives of firms and governments to keep the details of their disputes secret. The authors argue that secrecy in the context of investment arbitration works like a flexibility-enhancing device, similar to the way escape clauses function in the context of international trade. To attract and preserve investment, governments make contractual and treaty-based promises to submit to binding arbitration in the event of a dispute. They may prefer secrecy in cases when they are under strong political pressure to adopt policies that violate international legal norms designed to protect investor interests. Investors favor secrecy when managing politically sensitive disputes over assets they will continue to own and manage in host countries long after the particular dispute has passed. Although governments prefer secrecy to help facilitate politically difficult bargaining, secrecy diminishes one of the central purposes of arbitration: to allow governments to signal publicly their general commitment to investor-friendly policies. Understanding the incentives for keeping the details of dispute resolution secret may help future scholars explain more accurately the observed patterns of wins and losses from investor-state arbitration as well as patterns of investment.
Abstract: Large-scale protests occur frequently and sometimes overthrow entire political systems. Meanwhile, online social networks have become an increasingly common component of people’s lives. We present a large-scale longitudinal study that connects online social media behaviors to offline protest. Using almost 14 million geolocated tweets and data on protests from 16 countries during the Arab Spring, we show that increased coordination of messages on Twitter using specific hashtags is associated with increased protests the following day. The results also show that traditional actors like the media and elites are not driving the results. These results indicate social media activity correlates with subsequent large-scale decentralized coordination of protests, with important implications for the future balance of power between citizens and their states.
This report, which combines qualitative and quantitative methodology with new sources of data to create a “digital case study”, was created with support from the United States Agency of International Development. It informed subsequent components of the dissertation project, especially theorizing about activism in authoritarian contexts, combining microlevel qualitative and quantitative data, and the development of methodologies for social network analysis using Twitter data.
Abstract: Larger protests are more likely to lead to policy changes than small ones, but whether or not attendance estimates provided in newspapers or generated from social media are biased is an open question. This research note closes the question: news and geolocated social media data generate accurate estimates of the size of protests. This claim is substantiated using cell phone location data from ten million individuals during the 2017 United States Women’s March protests. These cell phone estimates correlate strongly with those provided in news media as well as three size estimates generated using geolocated tweets, one text-based and two based on images. In testing these estimates, we also show that wealthier, more Democratic, and more urbanized areas generated larger protests.