get conversation tree from question¶
Each #d status with a reply_count > 0 is the start of a diagnostic conversation tree.
reply_count¶
- Not available in the standard API
- Available from TweetScraper
- dataset/replycount.py
Standard (free of charge) Twitter API doesn't allow to get all responses to a specific status. Method to route around this limitation:
- Use TweetScraper
- Search all replies to the user who posted the question status after a certain date and time
- We need to filter those answers with "in_reply_to_status_id" but this field is not present in the json object obtained with TweetScraper...
- Get the full Twitter object with the standard API
- store those objects in database to save API throttling and speed up the process for further lookup
- filter all collected answers with status["in_reply_to_status_id"] == status_id
- if true add to the corpus database
- repeat the process recursively for each answer with not null reply_count
Original tweet is 1st doc(s)toctoc tweet posted on 2012-06-06: https://twitter.com/DrKoibo/status/210290960695959553
Request is "to:DrKoibo since:2012-06-06"
# using pipenv
pipenv run scrapy crawl TweetScraper -a query="to:DrKoibo since:2012-06-06"
returns 8111 status (as of 2018-03-29)