Header for GDELT 2.0 and Phoenix

Working with machine-coded events data is cool.  What’s not cool is that the raw data from two of the main datasets, GDELT 2.0 and Phoenix, do not include headers in their files.  It is simple to create a list with the column names, but the closest I could find that already existed for GDELT 2.0 was from this Python scraper on GitHub.  Since it was off by three fields and I could find no one who had published a header for Phoenix, I had to bite the bullet and do it myself.  Both headers are designed for Python, though you can replace the brackets with c() and use this code in R.

Here is the header for GDELT 2.0:

header = ['GlobalEventID', 'Day', 'MonthYear', 'Year', 'FractionDate', 'Actor1Code', 'Actor1Name', 'Actor1CountryCode', 'Actor1KnownGroupCode', 'Actor1EthnicCode', 'Actor1Religion1Code', 'Actor1Religion2Code', 'Actor1Type2Code', 'Actor1Type3Code', 'Actor2Code', 'Actor2Name', 'Actor2CountryCode', 'Actor2KnownGroupCode', 'Actor2EthnicCode', 'Actor2Religion1Code', 'Actor2Religion2Code', 'Actor2Type1Code', 'Actor2Type2Code', 'Actor2Type3Code', 'IsRootEvent', 'EventCode', 'EventBaseCode', 'EventRootCode', 'QuadClass', 'GoldsteinScale', 'NumMentions', 'NumSources', 'NumArticles', 'AvgTone', 'Actor1Geo_Type', 'Actor1Geo_Fullname', 'Actor1Geo_CountryCode', 'Actor1Geo_ADM1Code', 'Actor1Geo_ADM2Code', 'Actor1Geo_Lat', 'Actor1Geo_Long', 'Actor1Geo_FeatureID', 'Actor2Geo_Type', 'Actor2Geo_Fullname', 'Actor2Geo_CountryCode', 'Actor2Geo_ADM1Code', 'Actor2Geo_ADM2Code', 'Actor2Geo_Lat', 'Actor2Geo_Long', 'Actor2Geo_FeatureID', 'ActionGeo_Type', 'ActionGeo_Fullname', 'ActionGeo_CountryCode', 'ActionGeo_ADM1Code', 'ActionGeo_ADM2Code', 'ActionGeo_Lat', 'ActionGeo_Long', 'ActionGeo_FeatureID', 'Dateadded', 'Sourceurl']

Here is the header for Phoenix:

header = ['EventID', 'Date', 'Year', 'Month', 'Day', 'SourceActorFull', 'SourceActorEntity', 'SourceActorRole', 'SourceActorAttribute', 'TargetActorFull', 'TargetActorEntity', 'TargetActorRole', 'TargetActorAttribute', 'EventCode', 'EventRootCode', 'PentaClass', 'GoldsteinScore', 'Issues', 'Lat', 'Lon', 'LocationName', 'StateName', 'CountryCode', 'SentenceID', 'URLs', 'NewsSources']

3 thoughts on “Header for GDELT 2.0 and Phoenix

  1. Hi Zachary,

    Thanks for sharing the lists.

    I think that in the GDELT 2.0 list the `Actor1Type2Code` field between `Actor1Religion2Code` and `Actor1Type2Code` may be missing.

    Cheers

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.