Taboola is widely recognized as the world’s leading content discovery platform, reaching 1B unique visitors and serving over 360 billion recommendations every month. Recent ComScore data shows that Taboola is second only to Facebook in terms of reach (https://www.taboola.com/press-release/taboola-crosses-one-billion-user-mark-second-only-facebook-world%E2%80%99s-largest-discovery).
Publishers, marketers, and agencies leverage Taboola to retain users on their site, monetize their traffic and distribute their content to drive high quality audiences. Publishers using Taboola include USA Today, NYTimes, TMZ, Politico.com, BusinessInsider, CafeMom, Billboard.com, Fox Television, Weather.com, Examiner, and many more.
Taboola's operation is vast with ~2,000 servers in 6 data centers processing big data about users and user behavior, content, pages etc..
The premise behind this project is that web pages can be used to identify a specific reader intent.
For example, people who read about a store’s opening hours have an intention to visit that store or people that read about “how to write a great CV” probably intend to seek employment.
What we are looking for:
Our goal is to have a reproducible methodology for building a web page classifiers for identifying specific user intents.
Given a specific user intent, we would like to build a binary classifier that determines whether the reader has the specific intent, and would like to be able to reproduce this methodology with different intents.
Once operational, the classifier should run efficiently and be able to scale into classifying millions of web pages in a short amount of time.
The project deliverables should be a working classifier which will serve as a proof of concept for a re-usable methodology for creating such classifiers.
In addition the project should include ample documentation describing the general methodology used so it can be recreated for additional intents.
We will decide as initial for the initial proof of concept with the selected candidate
Your proposal should outline your approach in general terms, which algorithms you intend to use, which features would you extract from each url and how, how would you determine a truth set for the classifier, how would you measure the correctness and/or other KPIs.
We will share additional information with the expert and define the approach and scope in detail with the relevant expert.
(Image provided by Mimooh under the Creative Commons Attribution-Share Alike 3.0 Unported License - https://commons.wikimedia.org/wiki/File:Med_classifier3_by_mimooh.svg)