Theophano Mitsa, Ph.D., is Managing Member at Aretisoft, LLC, and Data Scientist with academic and industrial work experience, in data mining/machine learning. Author of "Temporal Data Mining”, and co-inventor of the Blue Noise Mask, she has proven record of innovation and scholarly ability with 11 U.S. patents and 47 publications.

How Do You Know You Have Enough Training Data?

A crucial issue in machine learning projects is to determine how much training data is needed to achieve a specific performance goal (i.e., classifier accuracy). In this post, we will do a quick but broad in scope review of empirical and research literature results, regarding training data size, in areas ranging from regression analysis to deep learning. The training data size issue is also known in the literature as sample complexity. Specifically, we will present empirical training data size limits for regression and computer vision tasks.

