Big Data Engineers in this TalentCloud are required to have extensive experience in building end-to-end data platforms, data pipelines and data flows. This should include data ingestion/integration, data storage, data transformation, data processing, data deployment, data operations and data cataloging.
The Engineers in this TalentCloud should be able to design and work closely with other big data architects, big data developers, data scientists, DevOps and DataOps engineers to develop a platform capable of executing operational, analytic and predictive workloads that serve thousands of applications and support machine learning deployment and inferencing.
- Extensive experience as a data Engineer, database developer and building data driven applications
- Good understanding of distributed systems and distributed databases
- Experience with ETL/ELT development, Batch processing and stream processing
- Familiarity with frameworks like Spark and Kafka and tools around them
- Understanding of Big Data Ingestion/Integration/Storage/Processing, transformation/ETL tools and data formats for storage
- Ability to debug, troubleshoot and optimize data solutions in the Big Data Ecosystem with tools like Spark, Presto, Hive, Kafka and NoSQL & relational databases and data warehouses
- Experience working with SQL Engines on large data - Presto, Impala, Dremio, SparkSQL, Hive, Drill, Druid and others
- Knowledge and experience working with DevOps and DataOps teams and collaborate with them to develop the process and automate deployment
- Programming experience with one or more - Java, Scala, Python
- Expertise with both intermediate and advanced level of SQL query development
- Ability to understand and work with complex datasets and build solutions around them with data modeling
- Work with other team members - business analysts and data analyst, data stewards to understand the requirements and build solutions
- Ability, passion, and aptitude for learning new programming and querying languages, and applying them to build data solutions
- Good understanding of tools around the DevOps ecosystem with basic understanding of dockers and CI/CD processes
- Good level of expertise working with GIT
- Experience with Data Warehousing, Data Modeling, Data Marts, Data Virtualization, MPP based Engines like Redshift, Vertica, BigQuery, Snowflake, etc.
- Experience with relational databases like - Postgres, MySQL, MariaDB, Oracle, etc.
- Working with at least one or more of NoSQL Databases and able to develop a data model with at least one or more of the main types of NoSQL databases
- Key value data stores - Redis, DynamoDB, Riak,
- Document databases - MongoDB, CouchDB, Couchbase
- Graph Databases - Neo4J,
- Wide column databases - Cassandra, HBase, Scylla
- Time Series databases - InfluxDB, TimeScale
- Search engines and databases - ElasticSearch, Solr
- InMemory databases or InMemory Grids - Apache Ignite, GridGain, etc