
Database Administrator
Responsibilities
Qualifications & Requirements
Experience Level: Mid Level
Full Job Description
We are seeking a Cloud & Platform Database Administrator/Database Reliability Engineer to manage, optimize, and secure both managed and self-managed databases that power high-traffic e-commerce websites. This role will primarily focus on AWS and GCP environments, with support for a smaller on-premises and hosted database estate. Your mission will be to ensure high availability, optimal performance, robust security and compliance, and cost efficiency, especially during peak operational periods.
Key Responsibilities:
Operating Managed Databases (Cloud):
- Manage AWS RDS/Aurora (MySQL/PostgreSQL), Google Cloud SQL, DynamoDB, MongoDB, DB2, and Oracle.
- Perform indexing, query plan analysis, schema design for OLTP, and tune connection pools and concurrency.
Building and Supporting Self-Managed Databases (Hosted/On-Prem/VMs):
- Administer Oracle 19c, IBM DB2 LUW 9.5–11.5, MySQL/MariaDB, PostgreSQL, and MongoDB on Linux/Unix/Windows.
- Handle installation, patching, upgrades, parameter/kernel tuning, storage layout (e.g., ASM/LVM), and backup/restore operations.
Ensuring High Availability, Resilience, and Disaster Recovery:
- Implement cloud-based HA/DR solutions including Multi-AZ, read replicas, Aurora Global Database, cross-region replication, RDS Proxy, PITR/snapshots, and tested RTO/RPO.
- Utilize self-managed HA/DR solutions like Oracle Data Guard (and/or RAC), DB2 HADR, MySQL replication/Group Replication/XtraDB Cluster, and MongoDB replica sets/sharding, along with backup utilities such as RMAN, XtraBackup, and db2 backup.
- Support production and non-production environments, ensuring data backup, restore, and sensitive data obfuscation.
- Develop and write SQL/PLSQL code.
Automation and Change Management:
- Utilize Terraform/CloudFormation for cloud infrastructure and Ansible for self-managed environments, implementing GitOps workflows.
- Implement CI/CD pipelines for database changes using tools like Flyway/Liquibase, establish standards, perform drift detection, and automate patching and failover.
Observability and Performance Engineering:
- Implement and utilize monitoring tools such as CloudWatch/Cloud Monitoring, RDS Performance Insights, pg_stat_statements, MySQL Performance Schema, Oracle AWR/ASH/Statspack/OEM, DB2 MON/Explain, and Datadog/New Relic/Prometheus/Grafana.
- Conduct capacity planning, load/soak testing for peak events (e.g., BFCM), define SLOs for latency/availability/throughput, and perform post-incident reviews.
Security, Compliance, and Governance:
- Implement IAM least privilege, secrets management (AWS Secrets Manager/HashiCorp Vault), and audit logging.
- Ensure compliance with PCI-DSS and GDPR controls, manage data retention/masking, and oversee change control and approvals.
Cost and Capacity Stewardship:
- Optimize cloud costs through right-sizing instances/storage/IOPS, managing reserved vs. on-demand/serverless choices, DynamoDB capacity modes, and cost anomaly detection.
- Develop capacity forecasting and scaling plans for self-managed databases, with basic awareness of enterprise platform licensing.
Architecture Collaboration:
- Collaborate on schema evolution, query patterns, caching strategies (Redis/Memcached), and read/write segregation.
- Provide guidance on ORM usage, connection pools, and patterns for microservices and event-driven systems (Kafka/Kinesis/Pub/Sub).
Operations and Ways of Working:
- Partner effectively with development, support, and architecture teams, maintaining clear runbooks and documentation.
- Participate in a 24/7 on-call rotation, leading incident response and implementing preventative actions.
- Support and plan migrations between self-managed and managed services, ensuring validation and rollback strategies.
Required Skills and Experience:
- Hands-on production experience with managed relational databases in AWS or GCP (RDS/Aurora or Cloud SQL) using MySQL or PostgreSQL.
- Experience administering at least one self-managed enterprise RDBMS in Linux/Unix environments (Oracle or IBM DB2 LUW), or the ability to quickly acquire such skills.
- Strong performance tuning skills, including execution plan analysis, indexing strategy, slow query analysis, and concurrency/connection management.
- Expertise in High Availability (HA) and Disaster Recovery (DR), including replication, failover/runbooks, backups, and point-in-time recovery, with a proven ability to set and meet RTO/RPO targets.
- An automation mindset, with experience in Terraform/CloudFormation, Ansible, CI/CD for schema changes (Flyway/Liquibase), Git, and scripting in Python/Bash.
- Solid understanding of security and compliance fundamentals, including IAM, encryption (in transit/at rest), auditing, and familiarity with PCI-DSS and GDPR.
- Excellent communication and collaboration skills, with the ability to mentor teams and document best practices.
Desirable Skills:
- Deep knowledge of Oracle (RAC, Data Guard), DB2 HADR, or advanced PostgreSQL/MySQL internals.
- Experience with MongoDB Atlas and on-prem MongoDB (Ops Manager/backup, replica sets/sharding).
- Familiarity with Redis/Memcached (ElastiCache/Memorystore), Kafka/Kinesis/Pub/Sub, or Kubernetes-based application stacks.
- Exposure to analytics platforms (BigQuery/Snowflake) for operational reporting.
- Proven experience migrating legacy database estates to managed services with minimal downtime.
- Ability to participate in an on-call/out-of-hours rotation with appropriate compensation.
This role offers a hybrid working model, with regular collaboration expected at our Watford office and occasional client site visits.
Company
VML
VML, a part of WPP, is a globally recognized creative company. We specialize in integrating brand experience, customer experience, and commerce to build connected brands that drive growth. Our award-w...