Author Archives: zacharysteinertthrelkeld

In R, use openxlsx instead of xlsx

I recently had to read an Excel spreadsheet into R.  Why Excel?  The original data were in a Google Sheet, and it appears that Google downloads everything to a .xlsx.  (There HAS to be a way to download to .csv, but I did not feel like searching.)  Opening the file – it was only 12 […]

Parallelize a Multifunction Argument in Python

How do you parallelize a function with multiple arguments in Python? It turns out that it is not much different than for a function with one argument, but I could not find any documentation of that online. An “embarrassingly parallel” computing task is one in which each calculation is independent of the ones that came […]

Assign Country Code to Tweets Based on GPS Coordinates

When looking at tweets, it is often important to know where the tweet was created.  For tweets with GPS coordinates, Twitter is nice enough to provide metadata about those coordinates; one piece of metadata is the ISO 3166-1-alpha-2 country code, making it very easy to find tweets from any country.  Unfortunately, Twitter appears to not […]

A Simple Function for Forest Plots

A great way of conveying regression results is through a forest plot.  Widely used in meta-analyses to compare results across models, they are also a convenient way to visualize regression results.  Wanting to make one for a presentation, I naturally turned to R and its seemingly infinite packages. The package the internet recommends is forestplot. […]

Twitter Descriptive Statistics, Part 2, Or: Let’s Use Twitter to Study Antarctica

The chart below replicates the data presented in my earlier post about Twitter but ranked by accounts per million inhabitants. Antarctica, of course, does not really have the greatest number of accounts per capita.  In this sample, however, Twitter identified 4 tweets from there, and Antarctica has a population of 0, as do the 2nd […]

Twitter Descriptive Statistics, Part 1

How many followers does the average Twitter user have?  How many accounts does the average Twitter account follow?  How many times has the average account tweeted?  What about the median?  These questions seem simple, but it is not easy to find answers to them.  Twitter only discloses how many monthly active users exist, and other […]

Formatting CAMEO Event Codes in ICEWS

UPDATE: Thanks to @icews for helping me figure this out.  It turns out that the CAMEO Code field is saved as a string, but Pandas interprets that column as integers and drops the leading zero.  To read that column correctly, use the following line: data = pd.read_csv(/Data/ICEWS/’, sep = ‘\t’, dtype={‘CAMEO Code’: object}) —————————————————————————————————————————————————— Wanting […]

What sources are in ICEWS?

ICEWS was released to the public on April 1st, and the event studies community has had a field day getting to know this early (or late?) holiday present.  The dataset, which was created by Lockheed-Martin on behalf of the Department of Defense, appears to represent the new frontier for historic events data.  I use the […]

Advances in Using Social Media to Understand Protests

The study of collective action can benefit greatly from big data. Collective action is the study of how large numbers of individuals engage each other to accomplish a common task; big data illuminate how large numbers of individuals engage each other over time. Yet these data have yet to show how they can improve our […]

So, you want historic events data

As far as I am aware, there are no contemporary machine-coded events data if you do not want to use GDELT.*  Phil Schrodt and his colleagues are working on a GDELT replacement that promises to reduce event duplication and provide better geospatial resolution.  Once that project, Phoenix, goes live, it will create real-time data based on 542 […]