Proper Handling of Exceptions in Python

With some free time on my hands, I sat down to update my code that extracts tweets from my tweet collection based on user-supplied keywords or locations.  In doing that, however, I ended up making a major improvement, one that should have existed from day one.

You see, simply trying to read a file of tweets line by line sometimes generates an error.  Perhaps the download was interrupted, so the file does not have a proper end of file marker.  (Why the computer isn’t smart enough to ignore that, I don’t know.)  More frequently, part of the tweet, usually the text, contains “\r” or “\n”.  Computers interpret those two sets of characters as a new line, not as two random sets of characters.  If a tweet has those, later processing I do will fail.  Since a script will fail if you do not tell it how to handle an error, I told it to skip any file which contained an error, regardless of the error.  The script would run, I would have tweets, and the world seemed content.

That was a bad idea; it turned out I was missing many tweets that otherwise would have a hashtag or location match.  So, I made two corrections.  First, I moved my try-except statements in one level, so they work per line of a file instead of per file.  That way, any error would skip that line and not the file containing the line.  Second, I modified my exception statements to be specific to the kind of exception, and I now write these errors to a file.  Before, I used what’s called a “bare except”, meaning I did not differentiate between, say, a ValueError or JSONDecodeError.  These two oversights meant, when my code would reach an error, I would not know where it was occurring and why it occurred.  The improvements are noticeable.  For a three month period in Gabon, I went from 38,000 to 150,000 tweets.

The moral of the story?  Code right the first time!

Here are links I found useful during this process:


