facebook-pixel

Machine Learning Python-Based Imaging Pipeline

Industry Hi-Tech

Specialization Or Business Function

Technical Function Software and Web Development (Desktop Applications, Scripts & Utilities)

Technology & Tools Programming Languages and Frameworks (Python)

CLOSED FOR BIDDING

Project Description

PROJECT UPDATED MARCH 15th, 2016

We are a health technology startup focused on building machine-learning driven classifiers for medical imaging.  We are currently training using the world’s diagnostic image largest (by at least 50x) training set.  While an enormous treasure trove, this training set introduces unique challenges as well.  We hope resource(s) from Experfy might help us overcome some of these challenges.

Our official runs are conducted on Amazon EC2 instances hitting S3 storage.  Most of our experimental and exploratory research agenda is conducted on a local development environment -- an 8-core i7 + 3*TitanX + 64GB RAM hitting a QNAP NAS via NFS.  

We are seeking external help on three fronts:

  1. Software Architecture Enhancements
  2. Hardware Setup Validation
  3. Neural Network Architecture and General ML Advisory

Software Architecture Enhancements

Our training set is 14 terabytes -- approximately 1 million images of 3000x3000 resolution.  Extracting images from DICOM medical data files also takes time.  We’d like to speed up non-core parts of the training cycle and pipeline.

  • Obviously the entire set cannot be staged in the execution environment, so images are brought over from the NAS as needed, this is slow.  Python multiprocessing is not helping beyond 5 threads, despite a mostly underutilized machine.  We’d like to architect to better utilize the hardware.
  • We would like to implement any enhancements which could cut down the image processing time.
  • We would like to consider more intelligent ways of process this (perhaps as three separate threads running in waves, pre-processing soon-to-be-needed images.) 

Neural Network Architecture and General ML Advisory

Our problem is not easily shoehorned into any of the existing problems in ML-driven image classification.   Specifically, we have several complexities that preclude out-of-box conv-net approaches:

  1. Our images are big, 3000x3000.  It is debatable whether resolution reductions would preserve the features which define classes.
  2. While we only have two classes (normal, abnormal), the abnormal class can be any of about five-dozen feature types
  3. The features defining class membership (abnormal specifically) are not prominent on any of the images, they are usually small percentages of the overall image
  4. The features defining class membership (abnormal specifically) are not consistent in size

We’re currently using two approaches: Support Vector Machines and Deep Neural Networks (specifically convolutional networks, variants of AlexNet.) We have a prioritized research path we’re following, but we’re very interested in variations, enhancements, and any out-of-box ideas.

Hardware Setup Validation

We think we’ve set up our hardware pretty well, but there are obviously some bottlenecks.  Our guess is that we’re network-constrained currently

We’re in the process of:

  1. Direct connection of computer to QNAP NAS via cross-over cable
  2. Port trunking (2x) QNAP NAS to switch
  3. Port trunking (2x) computer to switch
  4. Exploring NFS alternatives such as QNAP http-based web server

We’re open to expert advice on intelligent tweaks.

Project Overview

  • Posted
    January 15, 2016
  • Planned Start
    January 18, 2016
  • Delivery Date
    January 22, 2016
  • Preferred Location
    From anywhere

Client Overview


EXPERTISE REQUIRED
Machine Learning
Deep Networks
deep learning
Deep neural networks
Machine Learning Algorithms

Matching Providers