facebook-pixel

Big Data Architecture of Realtime System (Storm)

Industry Media and Advertising, Hi-Tech

Specialization Or Business Function

Technical Function Data Management (Master Data Management), Data Warehousing (Scheduling & Monitoring), Data Engineering (Data Center), Mobile Apps (Mobile Advertising)

Technology & Tools Big Data and Cloud (Amazon Elastic MapReduce, MongoDB, MySQL, Hadoop MapReduce, Redis, Apache Storm, Apache Kafka, Google Compute Engine, Google Cloud Platform, Amazon RDS, Amazon EC2, Amazon Web Services)

WORK IN PROGRESS

Project Description

We are a proximity marketing company. We have a network of beacons installed in hundereds of locations including malls and resturants. Our beacons currently see of 25 million users per month. We need help in designing our big data architecture and possibly implementing it as well.

Overview

In order to offer a realtime solution about the events data generated by the Mobile API component, we need to define an architecture to deliver information about the user devices interacting with our system. In this process we'll receive events from devices and we need to complete this events with more information.

The topology is composed by the following components:

  • Kafka spout: This component receives data (events) from the backend component via a kafka queue and emits them to the rest of the topology.
  • Validation bolt: It receives events from the Kafka spout and validates whether the event is compliant wit the JSON schema defined. If compliant it emits the event to the rest of the topology and inserts it into the events MongoDB collection. If not compliant, it does not emit the event and stores it into the errors MongoDB collection.
  • IDFA Bolt: Given an event emitted from the Validation bolt, with its corresponding IDFA, this bolt stores info associated with this IDFA into the CIIM MongoDB collection
  • S3 Bolt: It receives events from the Validation bolt, converts them into tuples (using some fields of the JSON document) and stores them into S3 to be processed later by EMR.

This Storm topology, has been implemented using Python and StreamParse. More information about the project is attached.

Sources of Data

  1. Data coming from mobile devices
  2. Data on beacon locations
  3. Campaign data
  4. API reporting usage data against which we are billing

We currently have two MongoDB instances in production, interacting with Storm, S3 and Hadoop.

We are looking for an architect to critique our current plans to help build a highly scalable system.  We are looking for short-term fixes to our the current system and also long-term architecture that will enable us to scale as we increase the number of beacons that we have deployed.

In your proposal, please provide 1) previous work that you have done that is relevant; 2) how you would approach this architecture exercise; and 3) estimated hours and budget.

Project Overview

  • Posted
    September 16, 2015
  • Planned Start
    September 21, 2015
  • Delivery Date
    November 30, 2015
  • Preferred Location
    From anywhere
  • Payment Due
    Net 30

Client Overview


EXPERTISE REQUIRED

Matching Providers