This paper describes the rationale for integrating information from multiple and diverse data sources in order to efficiently produce information. Key statistical challenges involved in integrating and interpreting information are described. The fundamental issue underpinning the use of large data streams is the poolability of the data sources. New statistical tools are required to integrate the multiple and diverse data streams in order to produce valid scientific findings.
(Summer 2010)
Academy of Sciences
2010
http://www.nap.edu/openbook.php?record_id=12197&page=269