Twitter Descriptive Statistics, Part 1

How many followers does the average Twitter user have?  How many accounts does the average Twitter account follow?  How many times has the average account tweeted?  What about the median?  These questions seem simple, but it is not easy to find answers to them.  Twitter only discloses how many monthly active users exist, and other […]

Formatting CAMEO Event Codes in ICEWS

UPDATE: Thanks to @icews for helping me figure this out.  It turns out that the CAMEO Code field is saved as a string, but Pandas interprets that column as integers and drops the leading zero.  To read that column correctly, use the following line: data = pd.read_csv(/Data/ICEWS/events.2010.20150313084533.tab’, sep = ‘\t’, dtype={‘CAMEO Code’: object}) —————————————————————————————————————————————————— Wanting […]

What sources are in ICEWS?

ICEWS was released to the public on April 1st, and the event studies community has had a field day getting to know this early (or late?) holiday present.  The dataset, which was created by Lockheed-Martin on behalf of the Department of Defense, appears to represent the new frontier for historic events data.  I use the […]

So, you want historic events data

As far as I am aware, there are no contemporary machine-coded events data if you do not want to use GDELT.*  Phil Schrodt and his colleagues are working on a GDELT replacement that promises to reduce event duplication and provide better geospatial resolution.  Once that project, Phoenix, goes live, it will create real-time data based on 542 […]

Machine coded events data and hand-coded data

Working with events data has long posed a fundamental dilemma.  On one hand, the events one wants to study – state-sponsored killings, battles in a war, or protests, for example – have a complex, intertwined nature that requires either detailed case studies or detailed hand-coding of the events.  On the other hand, gathering such detailed […]

The Arab Spring and GDELT

My dissertation, still in its very early stages, seeks to understand how protests spread across countries, with a focus on the Arab Spring.  One source of data I am exploring is Phil Schrodt and Kalev Leetaru’s Global Database of Events, Location, and Tone (GDELT), a machine-coded events dataset.  (GDELT is an amazing resource; to read […]