With some free time on my hands, I sat down to update my code that extracts tweets from my tweet collection based on user-supplied keywords or locations.  In doing that, however, I ended up making a major improvement, one that should have existed from day one. You see, simply trying to read a file of […]

I thought I was going to spend some time on Friday analyzing tweets from Cameroon.  Instead, starting that process led me down a rabbit hold that has, I hope, culminated in me realizing I should have used Python’s simplejson library this whole time. A script of mine used a try-except sequence to enclose the section […]

I recently wrote a script that reads thousands of files  of tweets, transforms them, and spits out “only” hundreds of files.  Having tested the script on my computer on a few files, I was surprised to find the execution taking much longer than anticipated on my server, especially since the server’s CPUs are more powerful […]

[Most recent update: 09.09.2017.] The purpose of this post is to catalogue advice from the internet about how to achieve tenure at a research university. When I was a PhD student, one method of calming my anxiety was to read advice from professors to PhD students; The Professor is In, Fabio Rojas, and Chris Blattman are particularly helpful.  Now […]

I am grateful for many things; one of those things is living in LA.  Another thing is that I am reasonably close to my job, by LA standards: 9 miles as the crow flows.  I learned quickly what any Angelino will tell you: wow traffic.  While I have been fortunate not to have a Michael […]

Working with machine-coded events data is cool.  What’s not cool is that the raw data from two of the main datasets, GDELT 2.0 and Phoenix, do not include headers in their files.  It is simple to create a list with the column names, but the closest I could find that already existed for GDELT 2.0 […]

When Twitter provides a tweet, the ‘created_at’ field provides a timestamp for when the tweet was authored.  This timestamp is useful, but it cannot be used right away because it is in Greenwich Mean Time.  Unless the tweet happens to have come from that timezone, its time needs to be adjusted to account for this discrepancy. […]

In regression modeling, it is common to correct standard errors for natural groupings (clusters) in the data.  There are various ways to calculate these values using R, from doing it manually to using one of many packages. Theoretically, Zelig is an R package that will cluster standard errors automatically.  In my experience, however, it does […]

I’ve been writing some scripts to work with Twitter’s REST API.  Naturally, I went to their developer documentation to refresh myself on their rate limits.  As of today, the link they provide to their rate limit chart is broken. Fortunately, I clipped this page to Evernote a long time ago.  I was therefore able to […]

I recently had to read an Excel spreadsheet into R.  Why Excel?  The original data were in a Google Sheet, and it appears that Google downloads everything to a .xlsx.  (There HAS to be a way to download to .csv, but I did not feel like searching.)  Opening the file – it was only 12 […]