Pulling data from Twitter: Tools | Louvigné Sébastien:
'via Blog this'
witter exposes its data via different types of API (Application Programming Interface): REST, Search and Streaming. In order to collect nearly real-time data from Twitter, we will be using Streaming API to access public statuses filtered in various ways. The following URL is called:https://stream.twitter.com/1/statuses/filter.json
When a request is sent, at least 1 filter option must be specified between keywords (“track“), follow user ids (“follow“) and geographic locations (“location“). With too much parameters the URL might be too long and then rejected which is why we use POST header parameters when we send a request.
Filter parameters are rate-limited. We can find different information on the Twitter documentation pages:
I think that in any case these limitations are high enough. However I know by experience that the Stream is limiting the amount of statuses collected so if you are using a lot of keywords or other filter options and if you leave your stream running for too long, it might be stopped and you will have to restart it.
So far my biggest streaming programs is collecting around 1,5 million tweets a day and I have to restart my program around once a week. Twitter is not giving precise information about this limit and just says it is appropriate for long-term connections: