Storing 450,000,000,000 data points in a bitemporal datastore

by Tom   Last Updated October 18, 2019 22:06 PM - source

I'm really looking for guidance on an appropriate way to tackle persisting and retrieving 450 billion data points. The details:

  • 15,000 equities
  • 15,000 days of history (approximately 40 years)
  • 2,000 columns/properties
  • 45 billion = 15,000*15,000*2,000
  • 30,000,000 inserts per day (15,000 equities * 1 day * 2,000 properties)

Caveats: Some properties have values for nearly every day e.g. price. Some properties have very few values per year e.g. earnings. Therefore the estimates above are upper bounds.

Inserts happen throughout the day and do not need to be "fast". Reads happen during a short windows in the morning and need to be "fast"; retrieve all data and perform business calculations within 1 hour.

Properties are bitemporal - they have a data date and an effective date. The latter is to support corrections to erroneous data and backtesting.

Some properties are calculated from other properties i.e. we read data, calculate a derived value and then insert the result.

There is no offical budget for this project so assume high spec hardware, cloud infrastructure, custom software are all possibilities.

So any suggestions on an appropriate persistence technology for this problem? Any design consideration I should be thinking about?

Related Questions

Database for timeseries data

Updated May 04, 2018 10:06 AM

Timeseries DB and howto store spectral data

Updated September 01, 2017 08:06 AM

High Cardinality Time Series Database?

Updated January 25, 2018 19:06 PM