I've stayed in the same room for Terabytes of Business Intelligence: Design and Administration of Very Large Data Warehouses on PostgreSQL by Josh Berkus and Joseph Conway. The first case study is on weblog analysis data with ad-hoc reports on one year of data, with large nightly ETL batch loads. It's all quite interesting, showing all the problems faced on the way to a final solution, ranging from server memory and disk allocation, query reoptimising and fiddling with kernel versions. To handle the adhoc querying, they produced aggregate tables at ETL time. One trick mentioned is that VACUUM is a big IO load so avoid it. The second case study was on equipment performance data, which was using NAS mounted using NFS (!) with jumbo frames. Interestingly, this used PostgreSQL's table inheritance which I haven't found useful in the past but maybe I should have another look. The rest was fairly routine data warehouse stuff, but with some notes on upcoming features such as BitmapScan and constraint elimination, which was interesting. Best quote: "... and you notice, there's no WHERE clause".
Re:This looks interesting
acme on 2005-08-03T20:55:28
I assume so, but have no details atm.