Author Archives: zacharysteinertthrelkeld

In LA, Rush Hour Commuters Should Use Mass Transit

I am grateful for many things; one of those things is living in LA.  Another thing is that I am reasonably close to my job, by LA standards: 9 miles as the crow flows.  I learned quickly what any Angelino will tell you: wow traffic.  While I have been fortunate not to have a Michael […]

Header for GDELT 2.0 and Phoenix

Working with machine-coded events data is cool.  What’s not cool is that the raw data from two of the main datasets, GDELT 2.0 and Phoenix, do not include headers in their files.  It is simple to create a list with the column names, but the closest I could find that already existed for GDELT 2.0 […]

Assigning the Correct Time to a Tweet

When Twitter provides a tweet, the ‘created_at’ field provides a timestamp for when the tweet was authored.  This timestamp is useful, but it cannot be used right away because it is in Greenwich Mean Time.  Unless the tweet happens to have come from that timezone, its time needs to be adjusted to account for this discrepancy. […]

Zelig for Clustered Standard Errors

In regression modeling, it is common to correct standard errors for natural groupings (clusters) in the data.  There are various ways to calculate these values using R, from doing it manually to using one of many packages. Theoretically, Zelig is an R package that will cluster standard errors automatically.  In my experience, however, it does […]

Copy of Twitter REST API v1.1 Rate Limits

I’ve been writing some scripts to work with Twitter’s REST API.  Naturally, I went to their developer documentation to refresh myself on their rate limits.  As of today, the link they provide to their rate limit chart is broken. Fortunately, I clipped this page to Evernote a long time ago.  I was therefore able to […]

In R, use openxlsx instead of xlsx

I recently had to read an Excel spreadsheet into R.  Why Excel?  The original data were in a Google Sheet, and it appears that Google downloads everything to a .xlsx.  (There HAS to be a way to download to .csv, but I did not feel like searching.)  Opening the file – it was only 12 […]

Parallelize a Multifunction Argument in Python

How do you parallelize a function with multiple arguments in Python? It turns out that it is not much different than for a function with one argument, but I could not find any documentation of that online. An “embarrassingly parallel” computing task is one in which each calculation is independent of the ones that came […]

Assign Country Code to Tweets Based on GPS Coordinates

When looking at tweets, it is often important to know where the tweet was created.  For tweets with GPS coordinates, Twitter is nice enough to provide metadata about those coordinates; one piece of metadata is the ISO 3166-1-alpha-2 country code, making it very easy to find tweets from any country.  Unfortunately, Twitter appears to not […]

A Simple Function for Forest Plots

A great way of conveying regression results is through a forest plot.  Widely used in meta-analyses to compare results across models, they are also a convenient way to visualize regression results.  Wanting to make one for a presentation, I naturally turned to R and its seemingly infinite packages. The package the internet recommends is forestplot. […]

Twitter Descriptive Statistics, Part 2, Or: Let’s Use Twitter to Study Antarctica

The chart below replicates the data presented in my earlier post about Twitter but ranked by accounts per million inhabitants. Antarctica, of course, does not really have the greatest number of accounts per capita.  In this sample, however, Twitter identified 4 tweets from there, and Antarctica has a population of 0, as do the 2nd […]