
Building Early, Accurate Forecasts of Key Government Reports

Industry Hi-Tech

Specialization Or Business Function

Technical Function Analytics (Predictive Modeling, Forecasting, Machine Learning)

Technology & Tools Data Analysis and AI Tools (Scikit-Learn), Programming Languages and Frameworks (R, Python)


Project Description


We are an early stage tech company focused on combining survey and physical data to create accurate, early forecasts and metrics for a global customer base.


  • A government organization has been in the business of monitoring and forecasting a key economic activity for more than 100 years.  Their monhtly outlook reports have become a critical source of truth to the markets
  • The task at hand is to forecast these government reports at both the national and subnational level at least one week ahead of time, refreshed daily.
  • Final models will be assessed based on accuracy and earliness across a 10+ year time window (backtesting)


  • Data are all available at state and national levels on a daily basis for 2003-2016 unless otherwise specified:
  • Government  forecasts at the state and national level, at a monthly cadence (this is what is being forecasted in this pilot work)
  • Government  end of year actuals at the state and national level (this is what the government is forecasting)
  • Survey-based data on a weekly basis that is relevant to the target outcome and available at the state level
  • Earth data features at a state and sub-state level for key physical variables; formatted for tabular ingest; hundreds of predictors on a daily basis

Timeframe & Project Plan: 

  • Overall Aim for productionalized pilot models by end of July; ideally starting work as soon as possible
  • Four week timeline, with assumption of one full time resource
  • Weekly milestones and project go-no-go decision at end of week one

Ramp Up | week 1

  • 2-3 meetings with the team to understand the domain / challenge
  • Data onboarding for all of the materials
  • Discussion of workplan
  • Data exploration
  • Variable selection
  • Modeling plan
  • Outline of white paper (2-3 pages)
  • Very rough, first prototype models (initial results)

Preliminary Models | week 2

  • Analysis of early results; backtesting prepared
  • Prototype models
  • Prioritization for model refinements
  • Pre-engineering for putting into production

Model Improvements | week 3

  • Revised prototype models
  • Updated analysis and backtesting prepared
  • Pre-engineering for putting into production
  • Draft white paper (2-3 pages)

Model Improvements | week 4

  • Final set of changes / permutations
  • Placing models into production alongside engineering
  • Final QA and backtesting
  • White paper finalized (2-3 pages)


Location: Preference for onsite but flexibility for video / remote work; strong preference for roughly in timezone

Engagement: Aim is for full time engagement; 40 hours per week

Tools: R and/or Python in the Scikit-Learn framework; collaborator should be a master of either or both of these frameworks

Modeling: Deep experience in machine learning-based predictive modeling and timeseries; example models where candidate should have long experience with applied work include Random Forest, SVM, Cubist, GBM, etc.

Data engineering: Our team will deliver large, structured data cubes (flat files) for modeling; candidate should be familiar with handling at-scale data challenges; that said, local machine execution should be adequate (no obvious need to distribute or use high performance compute)

Project Overview

  • Posted
    June 22, 2017
  • Planned Start
    June 26, 2017
  • Delivery Date
    August 05, 2017
  • Preferred Location
    Massachusetts, United States

Client Overview


Matching Providers