I thought I was going to spend some time on Friday analyzing tweets from Cameroon. Instead, starting that process led me down a rabbit hold that has, I hope, culminated in me realizing I should have used Python’s simplejson
library this whole time.
A script of mine used a try-except
sequence to enclose the section of code which reads files of tweets line by line. I did not specify which exceptions to handle, so my code barreled along, a train obliterating all obstacles. Soon, I noticed that I did not have tweets starting on 03.01.2017 from Cameroon. The simple answer is that try-except
is too powerful, and I had to modify my code to properly handle quirks in the underlying data.
I discovered, however, that the json
library that is Python’s default does not create a JSONDecodeError
when it cannot parse a string with json.loads
. Instead, it uses ValueError
. The problem is that many kinds of errors are labelled as ValueError
, so seeing a ValueError
does not help you fix the underlying code.
The answer, as comments at these SO threads mention, is to use a different library to load json files. simplejson
looked good. Though Python’s json
library is built on simplejson
, simplejson
is updated more frequently than Python is, so it has more features. One of those features is the ability to handle a JSONDecodeError
. Once I realized that, installed simplejson
, actually installed ipython
in my virtual environment, and remembered to write except simplejson.JSONDecodeError:
instead of just except JSONDecodeError:
, everything has worked like a charm!
Now let’s just hope that, and the other conditions I accounted for, are the only problems I run into.