How would i design processing twitter data

by Em Ae   Last Updated October 19, 2019 02:05 AM - source

I am working on a project where a user can input different criteria which will be used to fetch tweets, lets call this action as TweetAnalysis. These tweets will then be sent to another internal system (REST API) to do some calculation and get results. Each tweet will have a unique result from REST API. For each TweetAnalysis created by users, there could be million of tweets and each tweet can have their respective results returned from API. (only 2 values from the results are aggregatable, every other value of the result is unique between tweets)

How would i design such a system?

What I was thinking is

  • User creates a TweetAnalysis (let's call it TA) and it is stored in db.
  • A separate process picks up a TA and retrieve all the respective tweets for it. These tweets can be dumped into an S3 object? While doing so, the S3 objects will be unique for each TA and can be broken down into chunks of 1000 tweets?
  • A separate process can pick up those S3 objects, gather their respective info from REST API system and persist the values in db?

Related Questions

Scaling Systems With Shared Data

Updated March 20, 2017 14:05 PM

How to design a Google Trends feature for small regions?

Updated September 27, 2017 01:05 AM

How does global load balancing work?

Updated September 03, 2017 11:05 AM

How is mullti-threading used in web services?

Updated July 04, 2017 05:05 AM