Data science involves hypothesizing or discovering systematically observable properties of a phenomenon. It can be used to discover correlations (What phenomena occurred) but cannot be used to establish causality (Why the phenomena occurred).
In all fields new facts and knowledge are constantly being produced based on new data, discoveries, experience, and research -‐ far more than a single individual can absorb let alone put into practice. So how do professionals or how does anyone understand that they have a bias, its nature and limitations? And re-evaluate their knowledge (world view) in light of new facts (“ground truth”) and conclusions?
The data curation step involves discovering, analyzing, cleaning, transforming, combining, and de-duplicating data sources to produce target data sources that meet the requirements for input to the analysis. Every data curation step should be documented as data provenance that is then compared against the controls to determine the extent to which the appropriate data governance was followed and the required data quality was achieved.