I thought I was going to spend some time on Friday analyzing tweets from Cameroon. Instead, starting that process led me down a rabbit hold that has, I hope, culminated in me realizing I should have used Python’s
simplejson library this whole time.
A script of mine used a
try-except sequence to enclose the section of code which reads files of tweets line by line. I did not specify which exceptions to handle, so my code barreled along, a train obliterating all obstacles. Soon, I noticed that I did not have tweets starting on 03.01.2017 from Cameroon. The simple answer is that
try-except is too powerful, and I had to modify my code to properly handle quirks in the underlying data.
I discovered, however, that the
json library that is Python’s default does not create a
JSONDecodeError when it cannot parse a string with
json.loads. Instead, it uses
ValueError. The problem is that many kinds of errors are labelled as
ValueError, so seeing a
ValueError does not help you fix the underlying code.
The answer, as comments at these SO threads mention, is to use a different library to load json files.
simplejson looked good. Though Python’s
json library is built on
simplejson is updated more frequently than Python is, so it has more features. One of those features is the ability to handle a
JSONDecodeError. Once I realized that, installed
simplejson, actually installed
ipython in my virtual environment, and remembered to write
except simplejson.JSONDecodeError: instead of just
except JSONDecodeError:, everything has worked like a charm!
Now let’s just hope that, and the other conditions I accounted for, are the only problems I run into.