
Senior Software Engineer
Qualifications & Requirements
Experience Level: Senior Level
Full Job Description
Senior Data Platform Engineer - Bangalore
Cognite is at the forefront of revolutionizing industrial data management with Cognite Data Fusion, our advanced SaaS platform. We are seeking a talented Senior Data Platform Engineer to join our dynamic team in Bangalore. This role is ideal for an individual who excels at building high-performance distributed systems and thrives in a fast-paced startup environment. You will tackle complex data infrastructure challenges, directly influencing how Fortune 500 industrial companies manage their critical operational data.
Responsibilities
High-Performance Data Systems
- Design and implement robust data processing pipelines using Apache Spark, Flink, and Kafka for terabyte-scale industrial datasets.
- Build efficient APIs and services supporting thousands of concurrent users with sub-second response times.
- Optimize data storage and retrieval for time-series, sensor, and operational data.
- Implement advanced caching strategies leveraging Redis and in-memory data structures.
Distributed Processing Excellence
- Engineer Spark applications, focusing on Catalyst optimizer, partitioning strategies, and performance tuning.
- Develop real-time streaming solutions processing millions of events per second with Kafka and Flink.
- Design efficient data lake architectures on S3/GCS with optimized partitioning and file formats (Parquet, ORC).
- Implement query optimization techniques for OLAP datastores such as ClickHouse, Pinot, or Druid.
Scalability and Performance
- Scale systems to handle 10K+ QPS while ensuring high availability and data consistency.
- Optimize JVM performance through advanced garbage collection tuning and memory management.
- Implement comprehensive monitoring using Prometheus, Grafana, and distributed tracing.
- Design fault-tolerant architectures incorporating circuit breakers and retry mechanisms.
Technical Innovation
- Contribute to open-source projects within the big data ecosystem (e.g., Spark, Kafka, Airflow).
- Research and prototype new technologies to address industrial data challenges.
- Collaborate with product teams to translate complex requirements into scalable technical solutions.
- Participate actively in architectural reviews and technical design discussions.
Requirements
Distributed Systems Experience (4-6 years)
- Production Spark expertise: Proven experience building and optimizing large-scale Spark applications with a deep understanding of internals.
- Streaming systems proficiency: Experience implementing real-time data processing using Kafka, Flink, or Spark Streaming.
- JVM Language expertise: Strong programming skills in Java, Scala, or Kotlin, with a focus on performance optimization.
Data Platform Foundations (3+ years)
- Big data storage systems: Hands-on experience with data lakes, columnar formats, and table formats (e.g., Iceberg, Delta Lake).
- OLAP query engines: Experience with Presto/Trino, ClickHouse, Pinot, or similar high-performance analytical databases.
- ETL/ELT pipeline development: Experience building robust data transformation pipelines using tools like DBT, Airflow, or custom frameworks.
Infrastructure and Operations
- Kubernetes production experience: Experience deploying and operating containerized applications in production environments.
- Cloud platform proficiency: Hands-on experience with AWS, Azure, or GCP data services.
- Monitoring and observability: Experience implementing comprehensive logging, metrics, and alerting for data systems.
Technical Depth Indicators
- Performance Engineering: Proven system optimization experience, delivering measurable performance improvements (e.g., 2x+ throughput gains).
- Resource efficiency: Experience optimizing systems for cost while meeting performance requirements.
- Concurrency expertise: Experience designing thread-safe, high-concurrency data processing systems.
Data Engineering Best Practices
- Data quality frameworks: Experience implementing validation, testing, and monitoring for data pipelines.
- Schema evolution: Experience managing backward-compatible schema changes in production systems.
- Data modeling expertise: Experience designing efficient schemas for analytical workloads.
Collaboration and Growth
- Technical Collaboration: Ability to partner effectively with product managers, ML engineers, and data scientists.
- Codereview excellence: Commitment to providing thoughtful technical feedback and maintaining high code quality.
- Documentation and knowledge sharing: Experience creating technical documentation and facilitating knowledge transfer.
- Continuous Learning: Aptitude for quickly learning and applying new technologies.
- Industry awareness: Staying current with big data ecosystem developments and best practices.
- Problem-solving approach: Demonstrating a systematic approach to debugging complex distributed system issues.
Startup Mindset
- Execution Excellence: Proven ability for rapid delivery of high-quality features.
- Technical pragmatism: Skill in making informed trade-offs between technical debt, velocity, and reliability.
- End-to-end ownership: Taking responsibility for features from design through production and monitoring.
- Ambiguity comfort: Thriving in environments with evolving requirements.
- Technology flexibility: Adaptability to new tools and frameworks.
- Customer focus: Understanding the impact of technical decisions on user experience and business metrics.
Bonus Points
- Open-source contributions to major Apache projects in the data space (e.g., Apache Spark or Kafka).
- Conference speaking or technical blog writing experience.
- Industrial domain knowledge: Prior experience with IoT, manufacturing, or operational technology systems.
Primary Technologies (Technical Stack)
- Languages: Kotlin, Scala, Python, Java.
- Big Data: Apache Spark, Apache Flink, Apache Kafka.
- Storage: PostgreSQL, ClickHouse, Elasticsearch, S3-compatible systems.
- Infrastructure: Kubernetes, Docker, Terraform.
Technologies You May Work With
- Table Formats: Apache Iceberg, Delta Lake, Apache Hudi.
- Query Engines: Trino/Presto, Apache Pinot, DuckDB.
- Orchestration: Apache Airflow, Dagster.
- Monitoring: Prometheus, Grafana, Jaeger, ELK Stack.
Company
Cognite
Cognite: Digitalizing the Industrial WorldCognite is a leading global industrial Software-as-a-Service (SaaS) provider dedicated to the digital transformation of asset-intensive industries. We develop...