Working with machine-coded events data is cool. What’s not cool is that the raw data from two of the main datasets, GDELT 2.0 and Phoenix, do not include headers in their files. It is simple to create a list with the column names, but the closest I could find that already existed for GDELT 2.0 was from this Python scraper on GitHub. Since it was off by three fields and I could find no one who had published a header for Phoenix, I had to bite the bullet and do it myself. Both headers are designed for Python, though you can replace the brackets with c() and use this code in R.
Here is the header for GDELT 2.0:
header = ['GlobalEventID', 'Day', 'MonthYear', 'Year', 'FractionDate', 'Actor1Code', 'Actor1Name', 'Actor1CountryCode', 'Actor1KnownGroupCode', 'Actor1EthnicCode', 'Actor1Religion1Code', 'Actor1Religion2Code', 'Actor1Type2Code', 'Actor1Type3Code', 'Actor2Code', 'Actor2Name', 'Actor2CountryCode', 'Actor2KnownGroupCode', 'Actor2EthnicCode', 'Actor2Religion1Code', 'Actor2Religion2Code', 'Actor2Type1Code', 'Actor2Type2Code', 'Actor2Type3Code', 'IsRootEvent', 'EventCode', 'EventBaseCode', 'EventRootCode', 'QuadClass', 'GoldsteinScale', 'NumMentions', 'NumSources', 'NumArticles', 'AvgTone', 'Actor1Geo_Type', 'Actor1Geo_Fullname', 'Actor1Geo_CountryCode', 'Actor1Geo_ADM1Code', 'Actor1Geo_ADM2Code', 'Actor1Geo_Lat', 'Actor1Geo_Long', 'Actor1Geo_FeatureID', 'Actor2Geo_Type', 'Actor2Geo_Fullname', 'Actor2Geo_CountryCode', 'Actor2Geo_ADM1Code', 'Actor2Geo_ADM2Code', 'Actor2Geo_Lat', 'Actor2Geo_Long', 'Actor2Geo_FeatureID', 'ActionGeo_Type', 'ActionGeo_Fullname', 'ActionGeo_CountryCode', 'ActionGeo_ADM1Code', 'ActionGeo_ADM2Code', 'ActionGeo_Lat', 'ActionGeo_Long', 'ActionGeo_FeatureID', 'Dateadded', 'Sourceurl']
Here is the header for Phoenix:
header = ['EventID', 'Date', 'Year', 'Month', 'Day', 'SourceActorFull', 'SourceActorEntity', 'SourceActorRole', 'SourceActorAttribute', 'TargetActorFull', 'TargetActorEntity', 'TargetActorRole', 'TargetActorAttribute', 'EventCode', 'EventRootCode', 'PentaClass', 'GoldsteinScore', 'Issues', 'Lat', 'Lon', 'LocationName', 'StateName', 'CountryCode', 'SentenceID', 'URLs', 'NewsSources']
Hi Zachary,
Thanks for sharing the lists.
I think that in the GDELT 2.0 list the `Actor1Type2Code` field between `Actor1Religion2Code` and `Actor1Type2Code` may be missing.
Cheers
Sorry I meant the field `Actor1Type1Code` may be missing
Man you saved me so much time! Thank you!