Edward Oo

DevOps / Site Reliability Engineer (SRE)
📍Taipei, Taiwan | Open for Full Time Remote Work or Relocate to Eurepean Countries.

About Me

Self-taught DevOps Engineer with over 5 years of experience in system operations and automation, specializing in AWS cloud architecture.
Recently focused on learning and implementing Kubernetes. Passionate about managing infrastructure through Infrastructure as Code (IaC), optimizing CI/CD pipelines, and enhancing system reliability.
Skilled in cross-functional collaboration, solving complex technical challenges, and driving efficient operational workflows.

Stacks

IaC
AWS CloudFormation、Terraform、AWS CDK
Languages
Bash、JavaScript、Python、YAML
Monitoring
AWS CloudWatch、Prometheus、Grafana、Loki、OpenTelemetry
Clouds
Amazon AWS, CloudFlare, Vultr, Hetzner
CI/CD
GitHub Actions、GitLab CI/CD、AWS CodeBuild
Containers
Docker、ECS Fargate、Kubernetes(EKS)
DevOps
Cloudflare WAF、Cloudflare Worker、SonarCloud、PagerDuty、Zapier、Jira、HetrixTools

Work Experience

  • TrendMicro Sr. DevOps Engineer | Full time

    Sep 2025 - Present

    • AIOps and FinOps Enablement
      • Focused on AIOps and FinOps practices to improve service reliability, capacity planning, and cloud cost visibility.
      • Built metrics-driven operational workflows to support faster incident triage and better budget control.
    • Platform Engineering and Delivery
      • Maintained CI/CD workflows with GitHub Actions for infrastructure and application delivery.
      • Maintained infrastructure as code with Terraform for repeatable provisioning and policy-aligned changes.
      • Supported operations of EKS multi-tenant clusters with Istio service mesh for traffic governance and tenant isolation.
    • Observability and Dashboarding
      • Maintained observability stacks with Prometheus, Thanos, Loki, and Grafana dashboards.
      • Maintained monitoring and alerting views to improve troubleshooting efficiency across teams.
  • KKCompany Senior Site Reliability Engineer (SRE) | Full time

    Sep 2022 - Aug 2025

    • DevOps to strengthen service reliability
      • Revamped CI/CD scripts to GitOps flows to enable seamless deployments during large-scale migrations, minimizing downtime and disruptions.
      • Upgraded legacy PHP 5 and Ubuntu 14/18 operating systems, reducing P0 alarms related to outdated systems and dependencies.
      • Migrated legacy CloudFormation stacks, Chef cookbooks to Terraform, streamlining infrastructure management.
      • Built and maintained Golden Images, standardizing environments and accelerating CI environment setup.
      • Migrated postfix mail server to Amazon SES, improving email reliability and scalability.
      • Transitioned from Logstash to Fluent Bit, aligning development and operations logging workflows for improved observability.
      • Collaborated with backend teams to migrate OpsWorks EC2 stacks to ECS Fargate, reducing false alarms by 20% and cutting P0 incident recovery time by 30%.
    • Enhanced System Reliability
      • Maintained Slack notification tools, enhancing operational visibility and incident communication.
      • Automated weekly and monthly service latency and SLA reports, reducing manual overhead.
      • Implemented Akamai CDN usage monitoring, optimizing content delivery performance.
      • Transitioned from Classic Load Balancer (CLB) to Application Load Balancer (ALB), increasing service resilience and observability.
      • High-Traffic Pre-Warming Mechanism Design: Collaborated with developers and PMs to design pre-warming and auto-scaling mechanisms during peak events (e.g., sports livestreams, baseball games). Dynamically allocated instance types and counts based on traffic forecasts to ensure system availability.
      • Automated Maintenance that activates IP whitelisting and redirects users to a static maintenance page, enabling seamless system upgrades, maintenance, and testing without service disruption.
      • Refactored SLA reporting workflows with Lambda and CloudWatch, reducing manual effort and improving reporting accuracy.
      • Monitored and maintained AWS RDS and Redis clusters usage, maintaining high availability and performance.
      • Kubernetes Operations: Responsible for routine upgrades and version testing of EKS Clusters, as well as maintenance of Prometheus Server and Amazon Managed Prometheus (AMP)
      • Alerting and Notification Management: Managed Alertmanager configurations and routing strategies; Integrated AWS SNS Topics for alert delivery and coordinated with development teams to enforce SLOs and ensure timely incident reporting
    • Observability
      • PoC OpenTelemetry (OTEL) with LGTM stack improving system observability.
      • Implemented S3 object tagging and alarm monitoring for better resource management and cost tracking.
  • CoolbitX DevOps / Site Reliability Engineer | Full time

    Dec 2019 - Sep 2022

    • Logging, Observability and Monitoring
      • Developed in-house tools, including Golden image, slack bots, changelog generator, semantic release, AWS CDK templates, linters, and custom resources, streamlining DevOps processes.
      • Maintained Prometheus, Grafana, and Loki monitoring stacks on Google Cloud Platform (GKE)
    • Performance and Security Improvements
      • Optimized the China site's browsing experience by doubling its speed through networking enhancements.
      • Deployed CloudFlare WAF and DDoS mitigation for robust API and site security.
    • CI/CD and Infrastructure and Deployment Advancements
      • Designed and managed architectures using AWS CDK, Terraform, including ECS, RDS, Lambda, API Gateway, DynamoDB and CI/CD pipelines.
      • Ensured High Availability (HA) and Disaster Recovery (DR) with Multi-AZ deployments.
      • Established CloudFlare WAF using best practices to safeguard API endpoints and websites.
      • Configured CloudFlare Workers for user traffic logging with Sentry, endpoint health checks, and HA load-balancing.
      • Deployed services ensuring Multi-AZ deployments, High Availability (HA), and Disaster Recovery (DR).
      • Created in-house linter tools, container image builder, Git hooks, AWS CDK templates, constructs, and custom resources.
    • DevSecOps and Automation
      • Consistently implement best practices in permissions and architecture.
      • Built a Slack bot with AWS Lambda for pre-deployment checks, improving production readiness.
      • Fostered a DevSecOps culture within DevOps workflows, accelerating delivery and incorporating vulnerability scans, configuration checks, and container hardening.

Projects

  • BlahDNS Adblock secure DNS resolver 473

    Datetime Datetime API

    IP Resolve IP Resolve API

    Text Compare Text Compare

  • Education

    • Master's degree, Interactive Media Design, National Taipei University of Technology (NTUT) | GPA 3.75

      Sep 2015 - Jun 2019

      Communication Design
      Human-Computer Interaction
      MaxMSP with Myo Armband computable stage lighting performance system
    • University of Applied Sciences Potsdam, Germany

      2016 - 2017 | Exchange semester

      Interface Design
      Human-Computer Interaction (HCI)
    • Information Communication, Bachelor of Science,  MingDao University, Taiwan

      Sep 2011 - Jun 2015

      President of Inline skate club
      Vice president of E-learning volunteer
      Class leader for 4 years

    Talks

    Volunteer Service

    Community

    Certifications

    Languages

    Chinese (Native speaker) English (Fluent)

    Last updated at Feb 19, 2026