- Job Type: Full-Time
- Function: IT
- Industry: Fintech
- Post Date: 05/18/2026
- Website: www.zeta.tech/us
- Company Address: , San Francisco, California US
About Zeta
Founded in 2015, Zeta is a provider of next-gen credit card processing platform. Zeta’s cloud-native and fully API-enabled stack offers a comprehensive range of capabilities, including processing, issuing, lending, core banking, fraud detection, and loyalty programs.Job Description
About the Role:
As a Senior Data Reliability Engineer, you will be responsible for architecting, scaling, and optimizing enterprise-grade data platforms, including large-scale data lakes and data warehouses built from multiple disparate data sources. This role requires deep expertise in cloud databases, data infrastructure reliability, observability, and automation, with a strong focus on operational excellence, performance, and resilience.
Responsibilities:
- Own the reliability, availability, scalability, and performance of PostgreSQL RDS environments across production and non-production systems.
- Lead proactive monitoring and observability initiatives for PostgreSQL RDS instances, leveraging tools such as CloudWatch, Prometheus, Grafana, and other enterprise monitoring platforms.
- Drive advanced PostgreSQL performance tuning, including query optimization, indexing strategies, parameter tuning, and capacity planning.
- Architect and optimize database backup, disaster recovery, and failover strategies to ensure business continuity and minimal downtime.
- Own the reliability and operational excellence of Debezium and Kafka Connect ecosystems, ensuring robust real-time data ingestion and delivery.
- Lead troubleshooting and optimization of ETL workflows and data pipelines, ensuring scalability, reliability, and fault tolerance across data platforms.
- Oversee Apache Airflow workflow orchestration, ensuring high reliability, SLA adherence, and operational efficiency of production DAGs.
- Design and implement Infrastructure as Code (IaC) solutions using tools such as Terraform, Crossplane, and automation frameworks to streamline deployments and operational tasks.
- Lead incident response, root cause analysis, and post-incident reviews for critical production issues.
- Define and enforce database security standards, including access controls, encryption policies, compliance adherence, and periodic security audits.
- Partner closely with engineering, DevOps, and data platform teams to optimize data architecture and improve overall platform reliability.
- Mentor junior engineers and drive best practices across database reliability engineering and cloud data operations.
- Identify and lead continuous improvement initiatives focused on reliability, automation, scalability, and operational maturity.
Skills:
- Deep expertise in PostgreSQL administration and performance tuning, preferably in AWS RDS environments.
- Strong experience with Debezium, Kafka Connect, ETL frameworks/tools, and enterprise-grade data pipeline architectures.
- Strong hands-on experience with Amazon Redshift, S3, and cloud-native data platforms.
- Expertise in Apache Airflow workflow orchestration and operational management.
- Experience with Apache Spark and large-scale distributed data processing.
- Strong scripting and automation experience using Python, Bash, or similar languages.
- Strong experience in Infrastructure as Code (IaC) using Terraform, Crossplane, or equivalent tools.
- Hands-on experience with monitoring and observability tools such as CloudWatch, Prometheus, Grafana.
- Strong understanding of cloud database security, compliance, and governance frameworks (e.g., GDPR, HIPAA).
- Experience designing highly available, fault-tolerant, and scalable cloud database systems.
Experience and Qualifications:
- Bachelor’s degree in computer science, Information Technology, or a related field (master’s preferred).
- 10–12 years of overall experience in database engineering, cloud data infrastructure, or reliability engineering.
- Minimum 5+ years of hands-on experience with PostgreSQL, including AWS RDS administration.
- Strong experience in cloud-native data platforms and enterprise-scale production environments.
- AWS Certified Database - Specialty or relevant cloud certifications preferred.