Today, I need to launch a new AWS EC2 instance to collect some more tweets. Since I have done this several times, I used my AWS console to replicate the most recent instance I created that collects tweets. Everything ran smoothly until I tried to install streamR, my go to library for collecting tweets in R.
The installation could not install ndjson, a streamR dependency, because of a gcc compiler issue. I’ve learned at this point that any time I see the words “gcc” or “blas” in an error message, I’m in for a bad morning and a deep dive into StackOverflow. This post taught me to figure out that my system’s gcc compiler was version 4.8, which has know bugs. ndjson therefore only supports version 4.9+. I briefly debated updating the compiler, but a git page with instructions confirmed my fear: it would be way more delicate that I wanted to handle. The other alternative was to launch a fresh AWS instance that presumably would have a more recent compiler, but that would take another 30-60 minutes. I could have done it, but blah.
Instead, I took the 1 minute route: I installed an older version of streamR. You see, streamR on my old instance was version 0.2.1, but the current version is 4.2. In the interim, the package switched from rjson to ndjson. rjson is cool with gcc 4.8, ndjson is not. So, I decided to jump back to February 2017, when I set up the instance now cloned. The tweets are collecting, and I’m happy.
Note to self: going forward, do not clone this instance. Second note to self: find a postdoc or graduate student to handle these issues!