Big Data Big Data Operations

Join Big Data Operations TalentCloud

If you possess mastery in any of the roles or skills below, you can apply to this TalentCloud. Once you become an approved Experfy TalentCloud member, you will get exclusive access to jobs and project opportunities from our clients.

Popular Cloud Architect Roles in this TalentCloud

  • Big Data Operations Engineer
  • NoSQL Operations Engineer

Cloud Description

Big Data Operations Specialists are required to participate in the operations and deployment of NoSQL databases. They will also play a pivotal role in the operation of big data pipelines both on-premises and cloud. TalentCloud members should be able to serve as a senior member who focuses on the availability, reliability, and sustainability of the data platform components. Other abilities and experiences include:

  • Experts in this TalentCloud should be able to work closely with the Data Platform and DevOps teams to enable faster deployment of data-driven applications
  • Expertise in state-of-the-art tools and frameworks to build scalable and efficient solutions for data management, data pre-processing and data set building
  • Experience in deploying Databases and data pipelines from end to end into production environments
  • The data platform could consist of large Hadoop, Spark, HBase (or other NoSQL databases), and Kafka clusters in premises or on cloud


  • Deploy data pipelines and testing frameworks to different development, QA, Stage, and production environments
  • Monitor maintain, provision and upgrade, troubleshoot Hadoop, Hbase, Spark, and Kafka systems to support a complex Data Pipeline Platform
  • Participate in an on-call rotation responding to alerts and systems issues for Hadoop, Hbase, Kafka, and more
  • Troubleshoot, repair, and recover from hardware or software failures
  • Identify and resolve faults, inconsistencies, and systemic issues. Coordinate and communicate with impacted constituencies
  • Manage user access and resource allocations to Data Pipeline Platform
  • Develop tools to automate routine day-to-day tasks such as security patching, software upgrades, hardware allocation. Utilize automated system monitoring tools to verify the integrity and availability of all hardware, server resources, and critical processes
  • Engage other teams during outages or planned maintenance
  • Administer development, test, QA and production servers
  • Triage outages & defects and resolve them within established SLA’s
  • Work with Application Developers and Solution Architects to identify opportunities to improve operational and supportability model as part of continuous improvement and maturing CDA’s Production Operation function
  • Develop and monitor a milestone-based schedule of Production Readiness activities
  • Monitor the stability and performance of production environment after major releases and provides insight to future releases for any improvement opportunities
  • Recommend cluster upgrades and ensure reliable functionality for CDA customers
  • Perform cluster and system performance tuning

Required Skills

  • Relevant experience in implementing, troubleshooting and supporting the Unix/Linux operating system with concrete knowledge of system administration/internals
  • Relevant experience in scripting/writing/modifying code for monitoring/deployment/automation in one of the following (or comparable): Python, Shell, Go, Perl, Java, C
  • Relevant experience with any of the following technologies: Hadoop-HDFS, Yarn-MapReduce, HBase, Kafka
  • Relevant experience with any of the following technologies: Puppet, Chef, Ansible or equivalent configuration management tool
  • Familiar with TCP/IP networking DNS, DHCP, HTTP etc.
  • Strong written and oral communication skills with the ability to interface with technical and non-technical stakeholders at various levels of the organization
  • Beneficial skills and experience (if you don’t have all of them, you can learn them at Xandr):
  • Experience with JVM and GC tuning is a plus
  • Regular expression fluency
  • Experience with Nagios or similar monitoring tools
  • Experience with data collection/graphing tools like Cacti, Ganglia, Graphite, and Grafana
  • Experience with tcpdump, ethereal, tshark, and other packet capture and analysis tools
  • Demonstrated ability to quickly adapt, learn new skill sets, and be able to understand operational challenges. Self-starter
  • Strong analytical, problem-solving, negotiation, and organizational skills with a clear focus under pressure
  • Must be proactive with proven ability to execute multiple tasks simultaneously
  • Resourceful, results orientated with the ability to get things done and overcome obstacles
  • Excellent interpersonal skills, including relationship building with a diverse, global, cross-functional team
  • Proficient in SQL and creating ETL processes
  • Previous experience building or deploying efficient large-scale data collection, storage, and processing pipelines
  • Knowledge of database systems, big data concepts, and cluster computing frameworks (e.g. Spark, Hadoop, or other tools)
  • Experience working in a cloud learning environment, including the deployment of models to production
  • Experience with Agile, Continuous Integration, Continuous Deployment, Test Driven Development, Git
  • Understanding of time, RAM, and I/O scalability aspects of data science applications (e.g. CPU and GPU acceleration, operations on sparse arrays, model serialization and caching)