Edward Oo

DevOps / Site Reliability Engineer (SRE)
📍Taipei, Taiwan

About Me

I'm Edward Oo, a passionate DevOps and Site Reliability Engineer based in Taipei, Taiwan.
I specialize in modernizing infrastructures, migrating legacy systems, and streamlining operations through CI/CD pipelines. I ensure high availability, reliability, and performance.
My strengths include workflow automation, enhanced observability, and custom tool development, showcasing a commitment to operational excellence and innovation.
I excel at solving complex challenges and collaborating with cross-functional teams, including developers, operations, and project managers, to deliver resilient and efficient systems.

Projects

  • BlahDNS

    Adblock secure DNS resolver

    460

  • Stacks

    Languages
    Bash, JavaScript
    Monitoring
    AWS Cloudwatch, Grafana, Prometheus
    Clouds
    Amazon AWS
    DevOps
    GitLab, GitHub, AWS CloudFormation, Terraform, AWS CDK, SonarCloud, PagerDuty, Zapier

    Work Experience

    • KKCompany, KKStream (BlendVision) Senior Site Reliability Engineer (SRE) | Full time

      Sep 2022 - Present

      • DevOps to strengthen service reliability
        • Revamped CI/CD flows to enable seamless deployments during large-scale migrations, minimizing downtime and disruptions.
        • Upgraded legacy PHP 5.4 and Ubuntu 14/18 operating systems, reducing P0 alarms related to outdated systems and dependencies.
        • Built and maintained Golden Images, standardizing environments and accelerating CI environment setup.
        • Migrated postfix mail server to Amazon SES, improving email reliability and scalability.
        • Transitioned from Logstash to Fluent Bit, aligning development and operations logging workflows for improved observability.
        • Developed and maintained Slack notification tools, enhancing operational visibility and incident communication.
        • Automated weekly and monthly service latency and SLA reports, reducing manual overhead.
        • Implemented Akamai CDN usage monitoring, optimizing content delivery performance.
        • Migrated legacy CloudFormation stacks, Chef cookbooks to Terraform, streamlining infrastructure management.
      • Enhanced System Reliability and Migration Success
        • Transitioned from Classic Load Balancer (CLB) to Application Load Balancer (ALB), increasing service resilience and observability.
        • Collaborated with backend teams to migrate OpsWorks EC2 stacks to ECS Fargate, reducing false alarms by 20% and cutting P0 incident recovery time by 50%.
        • Refactored SLA reporting workflows with Lambda and CloudWatch, reducing manual effort and improving reporting accuracy.
        • Monitored AWS RDS and Redis clusters usage, maintaining high availability and performance.
      • Infrastructure Optimization
        • Developed maintenance mode to prevent unexpected interruptions during deployments and upgrades.
        • Maintained AWS RDS, Redis clusters with regular and security updates.
        • Migrated infrastructure management from CloudFormation to Terraform, accelerating infrastructure deployment cycles.
        • Implemented pre-warm workflows for ECS Fargate services and RDS Clusters, reducing cold start times and improving availability.
      • Observability
        • Maintained serveral monitoring stacks, including Prometheus, CloudWatch solutions.
        • Gradually rolled out OpenTelemetry (OTEL) across all services, improving system observability.
        • Deployed node-level and pod-level observability for video encoder jobs, enhancing debugging and performance monitoring.
        • Implemented S3 object tagging and alarm monitoring for better resource management and cost tracking.
        • Leveraged Prometheus server, exporter and AWS Managed Prometheus with Alertmanager as central alerting, metrics storage across multiple projects, visualization with Grafana, CloudWatch.
    • CoolbitX DevOps / Site Reliability Engineer | Full time

      Dec 2019 - Sep 2022

      • Logging, Observability and Monitoring
        • Developed in-house tools, including Golden image, slack bots, changelog generator, semantic release, AWS CDK templates, linters, and custom resources, streamlining DevOps processes.
        • Managed Prometheus, Grafana, and Loki monitoring stacks with Terraform on Google Cloud Platform (GKE)
      • Performance and Security Improvements
        • Optimized the China site's browsing experience by doubling its speed through networking enhancements.
        • Deployed CloudFlare WAF and DDoS mitigation for robust API and site security.
      • CI/CD and Infrastructure and Deployment Advancements
        • Designed and managed architectures using AWS CDK, Terraform, including ECS, RDS, Lambda, API Gateway, DynamoDB and CI/CD pipelines.
        • Ensured High Availability (HA) and Disaster Recovery (DR) with Multi-AZ deployments.
        • Established CloudFlare WAF using best practices to safeguard API endpoints and websites.
        • Configured CloudFlare Workers for user traffic logging with Sentry, endpoint health checks, and HA load-balancing.
        • Deployed services ensuring Multi-AZ deployments, High Availability (HA), and Disaster Recovery (DR).
        • Created in-house linter tools, container image builder, Git hooks, AWS CDK templates, constructs, and custom resources.
      • DevSecOps and Automation
        • Consistently implement best practices in permissions and architecture.
        • Built a Slack bot with AWS Lambda for pre-deployment checks, improving production readiness.
        • Fostered a DevSecOps culture within DevOps workflows, accelerating delivery and incorporating vulnerability scans, configuration checks, and container hardening.

    Education

    • Master's degree, Interactive Media Design, National Taipei University of Technology (NTUT) | GPA 3.75

      Sep 2015 - Jun 2019

      Communication Design
      Human-Computer Interaction
      MaxMSP with Myo Armband computable stage lighting performance system
    • University of Applied Sciences Potsdam, Germany

      2016 - 2017 | Exchange semester

      Interface Design
      Human-Computer Interaction (HCI)
    • Information Communication, Bachelor of Science,  MingDao University, Taiwan

      Sep 2011 - Jun 2015

      President of Inline skate club
      Vice president of E-learning volunteer
      Class leader for 4 years

    Talks

    Volunteer Service

    Community

    Certifications

    Languages

    Chinese (Native speaker) English (Fluent)

    Last updated at May 9, 2025