Batch zip files

One of my hard drives is down to its final terabyte, of 8, so its time for me to compress some files. Since I have thousands of files on that drive, it would be inefficient to select them one by one. It turns out its easy to pass a bunch of files to gzip. I […]

Crawling Followers with Intelligent Stopping

Like almost every other academic, I have started a Covid-19 project.  I think my team has a unique angle because of the kind of data I collect.  One dynamic we are interested in is patterns of following, and being able to analyze that across enough accounts required me to work with Twitter endpoints I have […]

My Ongoing Twitter Collections

I recently spent a lot of time reviewing my Twitter data collection infrastructure in order to start some more collections.  In that process, I discovered some tokens and streams I forgot about.  The purpose of this post is to document what data I am collecting as of 04.29.2020 so that I have an easy reference […]

group_by() %>% mutate() using pandas

While I have my issues with the tidyverse, one feature I am enamored with is the ability to assign values to observations in grouped data without aggregating the data.  This assigning is done by using the mutate() command instead of summarize().  I am in the middle of some data processing in a Python pipeline where I […]