I'm really looking for guidance on an appropriate way to tackle persisting and retrieving 450 billion data points. The details:
Caveats: Some properties have values for nearly every day e.g. price. Some properties have very few values per year e.g. earnings. Therefore the estimates above are upper bounds.
Inserts happen throughout the day and do not need to be "fast". Reads happen during a short windows in the morning and need to be "fast"; retrieve all data and perform business calculations within 1 hour.
Properties are bitemporal - they have a data date and an effective date. The latter is to support corrections to erroneous data and backtesting.
Some properties are calculated from other properties i.e. we read data, calculate a derived value and then insert the result.
There is no offical budget for this project so assume high spec hardware, cloud infrastructure, custom software are all possibilities.
So any suggestions on an appropriate persistence technology for this problem? Any design consideration I should be thinking about?