Big Data Operations

Big Data Operations Specialists are required to participate in the operations and deployment of NoSQL databases. They will also play a pivotal role in the operation of big data pipelines both on-premises and cloud. TalentCloud members should be able to serve as a senior member who focuses on the availability, reliability, and sustainability of the data platform components. Other abilities and experiences include:

Experts in this TalentCloud should be able to work closely with the Data Platform and DevOps teams to enable faster deployment of data-driven applications
Expertise in state-of-the-art tools and frameworks to build scalable and efficient solutions for data management, data pre-processing and data set building
Experience in deploying Databases and data pipelines from end to end into production environments
The data platform could consist of large Hadoop, Spark, HBase (or other NoSQL databases), and Kafka clusters in premises or on cloud

Responsibilities

Deploy data pipelines and testing frameworks to different development, QA, Stage, and production environments
Monitor maintain, provision and upgrade, troubleshoot Hadoop, Hbase, Spark, and Kafka systems to support a complex Data Pipeline Platform
Participate in an on-call rotation responding to alerts and systems issues for Hadoop, Hbase, Kafka, and more
Troubleshoot, repair, and recover from hardware or software failures
Identify and resolve faults, inconsistencies, and systemic issues. Coordinate and communicate with impacted constituencies
Manage user access and resource allocations to Data Pipeline Platform
Develop tools to automate routine day-to-day tasks such as security patching, software upgrades, hardware allocation. Utilize automated system monitoring tools to verify the integrity and availability of all hardware, server resources, and critical processes
Engage other teams during outages or planned maintenance
Administer development, test, QA and production servers
Triage outages & defects and resolve them within established SLA’s
Work with Application Developers and Solution Architects to identify opportunities to improve operational and supportability model as part of continuous improvement and maturing CDA’s Production Operation function
Develop and monitor a milestone-based schedule of Production Readiness activities
Monitor the stability and performance of production environment after major releases and provides insight to future releases for any improvement opportunities
Recommend cluster upgrades and ensure reliable functionality for CDA customers
Perform cluster and system performance tuning

Required Skills

Relevant experience in implementing, troubleshooting and supporting the Unix/Linux operating system with concrete knowledge of system administration/internals
Relevant experience in scripting/writing/modifying code for monitoring/deployment/automation in one of the following (or comparable): Python, Shell, Go, Perl, Java, C
Relevant experience with any of the following technologies: Hadoop-HDFS, Yarn-MapReduce, HBase, Kafka
Relevant experience with any of the following technologies: Puppet, Chef, Ansible or equivalent configuration management tool
Familiar with TCP/IP networking DNS, DHCP, HTTP etc.
Strong written and oral communication skills with the ability to interface with technical and non-technical stakeholders at various levels of the organization
Beneficial skills and experience (if you don’t have all of them, you can learn them at Xandr):
Experience with JVM and GC tuning is a plus
Regular expression fluency
Experience with Nagios or similar monitoring tools
Experience with data collection/graphing tools like Cacti, Ganglia, Graphite, and Grafana
Experience with tcpdump, ethereal, tshark, and other packet capture and analysis tools
Demonstrated ability to quickly adapt, learn new skill sets, and be able to understand operational challenges. Self-starter
Strong analytical, problem-solving, negotiation, and organizational skills with a clear focus under pressure
Must be proactive with proven ability to execute multiple tasks simultaneously
Resourceful, results orientated with the ability to get things done and overcome obstacles
Excellent interpersonal skills, including relationship building with a diverse, global, cross-functional team
Proficient in SQL and creating ETL processes
Previous experience building or deploying efficient large-scale data collection, storage, and processing pipelines
Knowledge of database systems, big data concepts, and cluster computing frameworks (e.g. Spark, Hadoop, or other tools)
Experience working in a cloud learning environment, including the deployment of models to production
Experience with Agile, Continuous Integration, Continuous Deployment, Test Driven Development, Git
Understanding of time, RAM, and I/O scalability aspects of data science applications (e.g. CPU and GPU acceleration, operations on sparse arrays, model serialization and caching)

FUTURE OF WORK PLATFORM

COMPARE OFFERINGS

UPSKILLING PLATFORM

EXPERFY TALENTCLOUDS

CUSTOM TALENTCLOUDS

Big Data Big Data Operations

Big Data Operations

Popular Cloud Architect Roles in this TalentCloud

Cloud Description

Expertise

Skillset

TECHNOLOGY & TOOLS

Required Tech Tools

Looking to hire from this TalentCloud?

The Harvard Innovation Lab

For Clients

For Experts

Solutions

Upskilling Platform

Resources

About Us

Contact Us

Address

Stay in Touch