Written by ZacharySTMay 4, 2023March 5, 2025

Working Through “Killed: 9” Message When Launching Python

Having just returned from paternity leave, it has been at least 8 months since I used Python on my OS Monterey M1 laptop. To my surprise, I could not use it. Any time I would type python, ipython, or try to launch an environment using conda, the command line returned a “Killed: 9” message. Everything […]

Written by ZacharySTMay 8, 2020March 5, 2025

Understanding Subnational Variation in Tweets

My primary source of data is tweets I get from Twitter’s POST statuses/filter endpoint, what I believe was called the “Streaming Endpoint” when I started working with Twitter data eons ago. While it has always been straightforward to use a bounding box to get tweets with geographic information, exactly what Twitter reports and how it […]

Written by ZacharySTApril 29, 2020March 5, 2025

Crawling Followers with Intelligent Stopping

Like almost every other academic, I have started a Covid-19 project. I think my team has a unique angle because of the kind of data I collect. One dynamic we are interested in is patterns of following, and being able to analyze that across enough accounts required me to work with Twitter endpoints I have […]

Written by ZacharySTMarch 6, 2019March 5, 2025

group_by() %>% mutate() using pandas

While I have my issues with the tidyverse, one feature I am enamored with is the ability to assign values to observations in grouped data without aggregating the data. This assigning is done by using the mutate() command instead of summarize(). I am in the middle of some data processing in a Python pipeline where I […]

Written by ZacharySTSeptember 14, 2017March 5, 2025

Proper Handling of Exceptions in Python

With some free time on my hands, I sat down to update my code that extracts tweets from my tweet collection based on user-supplied keywords or locations. In doing that, however, I ended up making a major improvement, one that should have existed from day one. You see, simply trying to read a file of […]

Written by ZacharySTSeptember 10, 2017March 5, 2025

I prefer simplejson to json

I thought I was going to spend some time on Friday analyzing tweets from Cameroon. Instead, starting that process led me down a rabbit hold that has, I hope, culminated in me realizing I should have used Python’s simplejson library this whole time. A script of mine used a try-except sequence to enclose the section […]

Written by ZacharySTJune 2, 2017March 5, 2025

How to Profile Python Code, With an Aside on Parallelism

I recently wrote a script that reads thousands of files of tweets, transforms them, and spits out “only” hundreds of files. Having tested the script on my computer on a few files, I was surprised to find the execution taking much longer than anticipated on my server, especially since the server’s CPUs are more powerful […]

Written by ZacharySTApril 7, 2017March 5, 2025

Header for GDELT 2.0 and Phoenix

Working with machine-coded events data is cool. What’s not cool is that the raw data from two of the main datasets, GDELT 2.0 and Phoenix, do not include headers in their files. It is simple to create a list with the column names, but the closest I could find that already existed for GDELT 2.0 […]