Machine Learning Python-Based Imaging Pipeline

Industry Hi-Tech

Specialization Or Business Function

Technical Function Software and Web Development (Desktop Applications, Scripts & Utilities)

Technology & Tools Programming Languages and Frameworks (Python)

CLOSED FOR BIDDING

Project Description

PROJECT UPDATED MARCH 15th, 2016

We are a health technology startup focused on building machine-learning driven classifiers for medical imaging. We are currently training using the world’s diagnostic image largest (by at least 50x) training set. While an enormous treasure trove, this training set introduces unique challenges as well. We hope resource(s) from Experfy might help us overcome some of these challenges.

Our official runs are conducted on Amazon EC2 instances hitting S3 storage. Most of our experimental and exploratory research agenda is conducted on a local development environment -- an 8-core i7 + 3*TitanX + 64GB RAM hitting a QNAP NAS via NFS.

We are seeking external help on three fronts:

Software Architecture Enhancements
Hardware Setup Validation
Neural Network Architecture and General ML Advisory

Software Architecture Enhancements

Our training set is 14 terabytes -- approximately 1 million images of 3000x3000 resolution. Extracting images from DICOM medical data files also takes time. We’d like to speed up non-core parts of the training cycle and pipeline.

Obviously the entire set cannot be staged in the execution environment, so images are brought over from the NAS as needed, this is slow. Python multiprocessing is not helping beyond 5 threads, despite a mostly underutilized machine. We’d like to architect to better utilize the hardware.
We would like to implement any enhancements which could cut down the image processing time.
We would like to consider more intelligent ways of process this (perhaps as three separate threads running in waves, pre-processing soon-to-be-needed images.)

Neural Network Architecture and General ML Advisory

Our problem is not easily shoehorned into any of the existing problems in ML-driven image classification. Specifically, we have several complexities that preclude out-of-box conv-net approaches:

Our images are big, 3000x3000. It is debatable whether resolution reductions would preserve the features which define classes.
While we only have two classes (normal, abnormal), the abnormal class can be any of about five-dozen feature types
The features defining class membership (abnormal specifically) are not prominent on any of the images, they are usually small percentages of the overall image
The features defining class membership (abnormal specifically) are not consistent in size

We’re currently using two approaches: Support Vector Machines and Deep Neural Networks (specifically convolutional networks, variants of AlexNet.) We have a prioritized research path we’re following, but we’re very interested in variations, enhancements, and any out-of-box ideas.

Hardware Setup Validation

We think we’ve set up our hardware pretty well, but there are obviously some bottlenecks. Our guess is that we’re network-constrained currently

We’re in the process of:

Direct connection of computer to QNAP NAS via cross-over cable
Port trunking (2x) QNAP NAS to switch
Port trunking (2x) computer to switch
Exploring NFS alternatives such as QNAP http-based web server

We’re open to expert advice on intelligent tweaks.

Project Overview

Posted

January 15, 2016
Planned Start

January 18, 2016
Delivery Date

January 22, 2016
Preferred Location

From anywhere

Client Overview

D********** ***
Projects

0 % Awarded ( 0 of 1 )

EXPERTISE REQUIRED

Machine Learning

Deep Networks

deep learning

Deep neural networks

Machine Learning Algorithms

EXPERFY TALENTCLOUDS

Custom TalentClouds

FUTURE OF WORK PLATFORM

PLATFORM OVERVIEW

UPSKILLING PLATFORM

Machine Learning Python-Based Imaging Pipeline

Project Description

Project Overview

Client Overview

D******** *

Machine Learning Python-Based Imaging Pipeline

Project Description

Project Overview

Client Overview

D********** ***

D******** *