Browse Projects

228 Projects that match your criteria

Sort by:

Thumb 48d96924 3ffe 494e acf3 cc6fdf45ce7b

Training Data Generation for Ansible Build Time Prediction

Summary

We would like to build a continuous learning algorithm that will be able to predict execution times of Ansible builds (Playbooks) based on historical Ansible build data.  As a precursor to developing the algorithm we are seeking a technologist to develop the continuous learning environment using free versions of Ansible, Splunk and Elastic Search and to generate training data from which the algorithm can learn.

Proposal

As part of your proposal please answer the following questions:

  • What cloud environment will you use to develop the continuous learning environment?  Please describe or diagram the system and provide an estimated cost (e.g. EC2 instance costs or Heroku dyno costs) for maintaining the environment for data generation.
  • Please provide an estimate of hours required to build and configure the environment. Please provide an estimate of hours required to generate the training data.
  • What will be your strategy/approach for configuring Playbooks based on the specified Galaxy Roles?  How will you ensure the generated data provides a variety of Playbook structures (singletons, clusters, single and multi-target builds) for optimal machine learning?
  • How do you plan to structure the resulting training data?
  • Please describe your knowledge of and past experience with the technologies required for this project.

Scope of Work

The selected consultant will be responsible for:

  • Setting up a cloud environment for data generation (included in this project posting) and continuous learning (for the future project posting).
  • Setting up and configuring at least one Ansible instance.
  • Setting up and configuring Splunk.
  • Installing and configuring the Ansible App for Splunk (used to import Ansible data in to Splunk).
  • Installing and configuring Elastic Search to access the Ansible data within Splunk.
  • Setting up multiple hosts upon which Ansiblebuilds can be executed.
  • Configuring a selected set of publicly available Ansible Roles (from galaxy.ansible.com) in to both singleton (single-Role) and cluster (multiple-Roles) Ansible Playbooks for the purpose of data generation.
  • Developing a script to execute the resulting Ansible playbooks against single and multiple hosts in order to generate approximately 2,000 rows of test data.
  • Provide a method for extracting the training data for machine learning (extracted data must be in a flat-file format).

The primary outputs of this project are both the test data and the environment for generating additional test data, which can be accessed by the continuous learning environment.

The attached presentation provides additional details around the environment and data requirements and gives additional context to the broader project scope (beyond the environment and data generation scope of this first project).  Details relevant to the scope of this Experfy project posting have been highlighted in yellow in the presentation for clarification.

Hi-Tech
Application Deployment
System Provisioning & Configuration

$13,000 - $18,000

2 Proposals Status: HIRING

Company small

Client: D**********************

Posted: Jan 13, 2017

Thumb eeee0231 9ff6 4214 999a eed3e39fc775

Data Scientist for Exploratory Analysis

This project will be awarded to the same data scientist as before. The details have already been discussed with the expert.

Consumer Goods and Retail

$100/hr - $150/hr

Starts Jan 13, 2017

1 Proposal Status: HIRING

Net 30

Company small

Client: M***

Posted: Jan 10, 2017

Thumb 553a47f4 3173 4968 8a46 47fa8d3804f0

Machine Learning pipelines for optimizing online marketing performance

We would like to create ML pipelines to improve the conversion performance (leads and sales) of the ads we manage on adwords, facebook ads, instagram, twitter ads.

We're looking for a long-term engagement with someone who ideally has some experience with applied ML in digital advertising.

About us

We're a digital advertising management company for SMBs. We launched in April 2016 and currently have ~150 active customers.

Project

Overall, we're trying to improve the conversion performance of our customer's campaigns in an automed way. We believe in order to do this, we need to start by using ML pipelines to output suggested values for % of budget being allocated to the different channels (see above). We also believe there are other pieces to this puzzle but we want to start with the channel allocation suggestions and then move on from there.

We already have a team of developers that will performing any of the devops needed for this project.

So we're looking for a someone to help us do the following:

  • develop models in R or python using past experience and our data
  • help us develop the proper techniques to utilize the model pipelines

We plan on using AzureML to construct our pipelines/APIs. You do not need to know AzureML, you can pick it up along the way as we work together to implement the pipeline(s) you construct.

Data

The data will be advertising performance data from adwords, facebook ads, instagram, twitter ads as well as web site and conversion analytics data.

This data will be ETL'd by us and made available to you in DBs to conduct your work.

However, ideally you have a method to access the APIs directly (ex. Pentaho) as well while performing your function to streamline the workflow (example you need access to something that we're not currently capturing from the APIs)... so that would be ideal, not required.

Online Advertising
Machine Learning
Market Segmentation and Targeting

$100/hr - $200/hr

Starts Jan 23, 2017

16 Proposals Status: HIRING

Company small

Client: A*****

Posted: Jan 09, 2017

Thumb 7f667199 a2ff 4b96 97b6 a4296c94232c

Proof of concept for a Web Page Classifier that identifies reader intent

Background:

Taboola is widely recognized as the world’s leading content discovery platform, reaching 1B unique visitors and serving over 360 billion recommendations every month. Recent ComScore data shows that Taboola is second only to Facebook in terms of reach (https://www.taboola.com/press-release/taboola-crosses-one-billion-user-mark-second-only-facebook-world%E2%80%99s-largest-discovery).

Publishers, marketers, and agencies leverage Taboola to retain users on their site, monetize their traffic and distribute their content to drive high quality audiences. Publishers using Taboola include USA Today, NYTimes, TMZ, Politico.com, BusinessInsider, CafeMom, Billboard.com, Fox Television, Weather.com, Examiner, and many more.

Taboola's operation is vast with ~2,000 servers in 6 data centers processing big data about users and user behavior, content, pages etc..

 

General:

The premise behind this project is that web pages can be used to identify a specific reader intent.
For example, people who read about a store’s opening hours have an intention to visit that store or people that read about “how to write a great CV” probably intend to seek employment.

 

What we are looking for:

Our goal is to have a reproducible methodology for building a web page classifiers for identifying specific user intents.  

Given a specific user intent, we would like to build a binary classifier that determines whether the reader has the specific intent, and would like to be able to reproduce this methodology with different intents.

Once operational, the classifier should run efficiently and be able to scale into classifying millions of web pages in a short amount of time.

 

Project scope:

The project deliverables should be a working classifier which will serve as a proof of concept for a re-usable methodology for creating such classifiers.

In addition the project should include ample documentation describing the general methodology used so it can be recreated for additional intents.  

We will decide as initial for the initial proof of concept with the selected candidate  

 

Your proposal:

Your proposal should outline your approach in general terms, which algorithms you intend to use, which features would you extract from each url and how, how would you determine a truth set for the classifier, how would you measure the correctness and/or other KPIs.

We will share additional information with the expert and define the approach and scope in detail with the relevant expert.

 

(Image provided by Mimooh under the Creative Commons Attribution-Share Alike 3.0 Unported License - https://commons.wikimedia.org/wiki/File:Med_classifier3_by_mimooh.svg)

Consumer Goods and Retail
Financial Services
Healthcare

$10,000 - $15,000

Starts Jan 15, 2017

9 Proposals Status: HIRING

Net 30

Company small

Client: T*******

Posted: Jan 09, 2017

Thumb 8e54b95c a689 45c2 b545 3b255a52471f

R package for media data validation and cleaning engine

Every day we are receiving media data featuring several media metrics that has a business logic that needs to be upheld. Some of this business metrics is easy to uphold. Others might be more tricky. At Blackwood Seven we rely on massive quantities of data and the correctness of this data is naturally crucial. 

The following needs to be understood and completely grasped. 

  • Online media metrics
  • Offline media metrics
  • CPM, CPC, CPA
  • ROI
  • Marginal cost analysis
  • Time domain filtering

For programming

  • R
  • R-Studio
  • Python
  • R6 Classes

Specifically the R package that needs to be developed should handle the sanity checks of data as well as methods for fixing issues on a best effort. The approach could be to use Deep learning on generated sane and insane data. The package needs to be structured and built using R6 classes. Worst case S3 can be used. Alternatively all of it can be implemented in python with an interface to R. 

Sample sane and insane data are provided as CSV's which is of course not enough to train a network but just to reveal some of the potential issues.

Explanation of the data

The data set here consists of Impressions, Clicks and Net as metrics. As dimensions we have Date, Channel and Supplier. 

  • Impressions: The number of times a banner has been shown to a user
  • Clicks: The number of Impressions that users clicked on. Thus the following MUST be true always Clicks < Impressions
  • Net: The amount of money paid for the banner which is usually reconciled as CPC=Cost per Click or CPM=Cost per thousand impressions. In other words (Impressions > 0) => (Net > 0) And (Clicks > 0) => (Net > 0) while the reverse is not true. Just because you paid does not mean you received any clicks. It is however very unlikely. 
Data Cleaning
RStudio
R package development

$100/hr - $200/hr

Starts Jan 01, 2017

12 Proposals Status: HIRING

Company small

Client: B***************

Posted: Dec 30, 2016

Thumb f6e4e106 2606 4c16 a857 8f7964265dd7

Risk Analysis of Fund Investments

We track a number of statistics for our fund investments. All of the numbers are derived from monthly returns of a fund/index. We are limited to monthly due to the fact that each fund only reports on a monthly basis.

Attached are the most common statistical measures we track for each manager. Almost all of these numbers are derived from general finance industry practice. It would be helpful for us to understand how a data analyst would evaluate the risk of our fund investments given the constraints on frequency of data points and with an unbiased approach to how to attack the problem.

We would like to implement more sophisticated risk analytics based on the limited data at our disposal. Please provide your approach and how it may benefit us.

Financial Services
Finance
Risk and Compliance

$100/hr - $150/hr

12 Proposals Status: HIRING

Company small

Client: S****************

Posted: Dec 27, 2016

Thumb 37e04206 e074 4b66 8889 31fc478432ab

Curate and Analyze Publicly Available Financial Data for Business Intelligence

We would like to aggregate both structured and unstructured financial data to inform our decision making process.

  • 13F data: Funds with asset of over $100M are required to publish their equity holdings on a quarterly basis via the SEC form 13F. We want to incorporate the information disclosed in these filings to evaluate current and prospective investments. Our goals would be to analyze the data to help inform our decision making process. A solution would involve a mix of data feeds: 13F filings, stock prices for holdings listed in the 13F. We currently have multiple
  • Public Company Filings & Relevant News: Feed relevant public filing and news data on companies within our portfolio. This would include holdings brought in via 13F data as well any additional companies that we wish to track.
  • Stock Price Alerts: Track any large fluctuations in tracked companies listed above.

Technology Stack: 

  • Database: We maintain a SQL Server at AWS and would anticipate utilizing this going forward
  • Business Intelligence Tool: We have evaluated multiple BI solutions (Tableau, MSFT Power BI, Domo, Qlik, etc); Goal would be transition our current reporting setup (via SSRS) to a more dynamic BI solution that is available on demand. The solution needs to be mobile friendly.

Please provide your approach, how you would structure the milestones and how much time it would take.

Financial Services
Portfolio Optimization
Risk and Compliance

$100/hr - $150/hr

11 Proposals Status: HIRING

Company small

Client: S****************

Posted: Dec 27, 2016

Thumb 9197aca9 624a 4805 9872 92a62899aafd

Health Economics / Outcomes Research Data Scientist

We need following skill set to help support a project and to train another data scientist

Required Skills:

  • Deep understanding of healthcare databases (e.g., claims, EHR, hospital, registry) and the pros/cons of various sources
  • Experience in conducting a range of real world health research studies (e.g., retrospective database analyses, cost effectiveness, comparative effectiveness)
  • Strong background in research methodology and study design
  • Experience developing Research Plans and Protocols
  • Experienced in creating and programming epidemiologic and economic models
  • Ability to manage time and prioritize tasks

 

Qualifications:

  • Master’s or PhD in relevant area (e.g., health economics, epidemiology)
  • Minimum 3 years of combined experience in outcomes research, health economics, epidemiology, or directly related field
Pharmaceutical and Life Sciences
Biology, Health and Medicine
Analytics

$50/hr - $150/hr

Starts Jan 01, 2017

10 Proposals Status: HIRING

Company small

Client: D******************

Posted: Dec 23, 2016

Thumb fa51df0b 5738 4814 8874 7e1d8ee6fd0a

Development of a Resume Scoring Algorithm Follow-up

We are a provider of eRecruitment technology which is used by our clients to manage the workflow of recruiting new hires including the following steps: posting vacancies, providing online application forms, integration of recruitment tests, communication with candidates etc.

This is a follow-up project and will be awarded to the same data scientist as before. The details have been discussed already.

Professional Services
Job Applicant Scoring
Human Resources

$24,000 - $25,000

3 Proposals Status: HIRING

Net 30

Company small

Client: W*******

Posted: Dec 16, 2016

Thumb 79a8cef2 7815 417f 94cf e523f468dab9

Cluster Analysis and Wheel Visualization.

Building a series of breakthrough visualizations for many analysis tasks on the DOMO platform. You areseeking a qualitatively improved way to view clusters of information, compared to existing methods. Viewing data that naturally “clusters together” is of value in many application domains, including data formatted as surveys,transactions, and text. In preparation for this project, we collaborated on a UI sketch, which we have rendered in a mockup image below.

Key elements include the following:

1) Data for the cluster analysis is derived by a similar process as we followed in Market Basket. A transaction file is converted to a cluster file by our back-end AWS framework, which is then returned to the front end.

The difference is that, here, clusters may contain more than the two items. Some details:

a. As with the Market Basket application, we will create an iFrame within the DOMO application that the user can use to send transaction data to a back-end processor, which does combinatoric search to find the clusters with greatest support. Note that, as with Market Basket, this back-end task is nontrivial and if configured incorrectly can take a very long time.

b. To generate the clusters, we will use the frequent itemset method12. This approach uses heuristics to reduce the time required for the 2^n search for all clusters; the algorithm involves

Market Research
Marketing and Brand Management
Domo

$22,000

Starts Dec 13, 2016

1 Proposal Status: IN PROGRESS

Company small

Client: V********

Posted: Dec 13, 2016

350

Matching Providers

Matching providers 2