My Ongoing Twitter Collections

I recently spent a lot of time reviewing my Twitter data collection infrastructure in order to start some more collections.  In that process, I discovered some tokens and streams I forgot about.  The purpose of this post is to document what data I am collecting as of 04.29.2020 so that I have an easy reference later.

  1. Global geostream – This connection is my main collection process.  I provide a bounding box for the world and collect tweets.  Started August 2013.
  2. Turkish keywords – This connection collects tweets containing a random sample of Turkish keywords, political relevant keywords, or prominent Turkish politicians.  Started June 2018.
  3. Media streaming – This connection follows approximately 4,700 media accounts.  4,500 were identified using DocNow’s news outlet dataset, and the rest are from an iterative process of downloading Twitter media lists to identify accounts.  Started December 2019.
  4. Random sample – It is the streaming endpoint with no parameters.  A colleague started this stream around March 2015.
  5. US geotagged – It is the streaming endpoint with a United States bounding box.  A colleague started this stream around March 2015.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.