I am working on a project where a user can input different criteria which will be used to fetch tweets, lets call this action as
TweetAnalysis. These tweets will then be sent to another internal system (REST API) to do some calculation and get results. Each tweet will have a unique result from REST API. For each
TweetAnalysis created by users, there could be million of tweets and each tweet can have their respective results returned from API. (only 2 values from the results are aggregatable, every other value of the result is unique between tweets)
How would i design such a system?
What I was thinking is
TweetAnalysis(let's call it
TA) and it is stored in db.
TAand retrieve all the respective tweets for it. These tweets can be dumped into an
S3object? While doing so, the S3 objects will be unique for each
TAand can be broken down into chunks of 1000 tweets?
S3objects, gather their respective info from
REST APIsystem and persist the values in db?