Edward Oo Résumé

About Me

Self-taught DevOps Engineer with over 5 years of experience in system operations and automation, specializing in AWS cloud architecture.
Recently focused on learning and implementing Kubernetes. Passionate about managing infrastructure through Infrastructure as Code (IaC), optimizing CI/CD pipelines, and enhancing system reliability.
Skilled in cross-functional collaboration, solving complex technical challenges, and driving efficient operational workflows.

Stacks

IaC

AWS CloudFormation、Terraform、AWS CDK

Languages

Bash、JavaScript、Python、YAML

Monitoring

AWS CloudWatch、Prometheus、Grafana、Loki、OpenTelemetry

Clouds

Amazon AWS, CloudFlare, Vultr, Hetzner

CI/CD

GitHub Actions、GitLab CI/CD、AWS CodeBuild

Containers

Docker、ECS Fargate、Kubernetes(EKS)

DevOps

Cloudflare WAF、Cloudflare Worker、SonarCloud、PagerDuty、Zapier、Jira、HetrixTools

Work Experience

TrendMicro Sr. DevOps Engineer | Full time

Sep 2025 - Present
- AIOps and FinOps Enablement
  - Focused on AIOps and FinOps practices to improve service reliability, capacity planning, and cloud cost visibility.
  - Built metrics-driven operational workflows to support faster incident triage and better budget control.
- Platform Engineering and Delivery
  - Maintained CI/CD workflows with GitHub Actions for infrastructure and application delivery.
  - Maintained infrastructure as code with Terraform for repeatable provisioning and policy-aligned changes.
  - Supported operations of EKS multi-tenant clusters with Istio service mesh for traffic governance and tenant isolation.
- Observability and Dashboarding
  - Maintained observability stacks with Prometheus, Thanos, Loki, and Grafana dashboards.
  - Maintained monitoring and alerting views to improve troubleshooting efficiency across teams.
KKCompany Senior Site Reliability Engineer (SRE) | Full time

Sep 2022 - Aug 2025
- DevOps to strengthen service reliability
  - Revamped CI/CD scripts to GitOps flows to enable seamless deployments during large-scale migrations, minimizing downtime and disruptions.
  - Upgraded legacy PHP 5 and Ubuntu 14/18 operating systems, reducing P0 alarms related to outdated systems and dependencies.
  - Migrated legacy CloudFormation stacks, Chef cookbooks to Terraform, streamlining infrastructure management.
  - Built and maintained Golden Images, standardizing environments and accelerating CI environment setup.
  - Migrated postfix mail server to Amazon SES, improving email reliability and scalability.
  - Transitioned from Logstash to Fluent Bit, aligning development and operations logging workflows for improved observability.
  - Collaborated with backend teams to migrate OpsWorks EC2 stacks to ECS Fargate, reducing false alarms by 20% and cutting P0 incident recovery time by 30%.
- Enhanced System Reliability
  - Maintained Slack notification tools, enhancing operational visibility and incident communication.
  - Automated weekly and monthly service latency and SLA reports, reducing manual overhead.
  - Implemented Akamai CDN usage monitoring, optimizing content delivery performance.
  - Transitioned from Classic Load Balancer (CLB) to Application Load Balancer (ALB), increasing service resilience and observability.
  - High-Traffic Pre-Warming Mechanism Design: Collaborated with developers and PMs to design pre-warming and auto-scaling mechanisms during peak events (e.g., sports livestreams, baseball games). Dynamically allocated instance types and counts based on traffic forecasts to ensure system availability.
  - Automated Maintenance that activates IP whitelisting and redirects users to a static maintenance page, enabling seamless system upgrades, maintenance, and testing without service disruption.
  - Refactored SLA reporting workflows with Lambda and CloudWatch, reducing manual effort and improving reporting accuracy.
  - Monitored and maintained AWS RDS and Redis clusters usage, maintaining high availability and performance.
  - Kubernetes Operations: Responsible for routine upgrades and version testing of EKS Clusters, as well as maintenance of Prometheus Server and Amazon Managed Prometheus (AMP)
  - Alerting and Notification Management: Managed Alertmanager configurations and routing strategies; Integrated AWS SNS Topics for alert delivery and coordinated with development teams to enforce SLOs and ensure timely incident reporting
- Observability
  - PoC OpenTelemetry (OTEL) with LGTM stack improving system observability.
  - Implemented S3 object tagging and alarm monitoring for better resource management and cost tracking.
CoolbitX DevOps / Site Reliability Engineer | Full time

Dec 2019 - Sep 2022
- Logging, Observability and Monitoring
  - Developed in-house tools, including Golden image, slack bots, changelog generator, semantic release, AWS CDK templates, linters, and custom resources, streamlining DevOps processes.
  - Maintained Prometheus, Grafana, and Loki monitoring stacks on Google Cloud Platform (GKE)
- Performance and Security Improvements
  - Optimized the China site's browsing experience by doubling its speed through networking enhancements.
  - Deployed CloudFlare WAF and DDoS mitigation for robust API and site security.
- CI/CD and Infrastructure and Deployment Advancements
  - Designed and managed architectures using AWS CDK, Terraform, including ECS, RDS, Lambda, API Gateway, DynamoDB and CI/CD pipelines.
  - Ensured High Availability (HA) and Disaster Recovery (DR) with Multi-AZ deployments.
  - Established CloudFlare WAF using best practices to safeguard API endpoints and websites.
  - Configured CloudFlare Workers for user traffic logging with Sentry, endpoint health checks, and HA load-balancing.
  - Deployed services ensuring Multi-AZ deployments, High Availability (HA), and Disaster Recovery (DR).
  - Created in-house linter tools, container image builder, Git hooks, AWS CDK templates, constructs, and custom resources.
- DevSecOps and Automation
  - Consistently implement best practices in permissions and architecture.
  - Built a Slack bot with AWS Lambda for pre-deployment checks, improving production readiness.
  - Fostered a DevSecOps culture within DevOps workflows, accelerating delivery and incorporating vulnerability scans, configuration checks, and container hardening.

Projects

BlahDNS Adblock secure DNS resolver 473

Datetime Datetime API

IP Resolve IP Resolve API

Text Compare Text Compare

Education

Master's degree, Interactive Media Design, National Taipei University of Technology (NTUT) | GPA 3.75
Sep 2015 - Jun 2019
Communication Design

Human-Computer Interaction

MaxMSP with Myo Armband computable stage lighting performance system
University of Applied Sciences Potsdam, Germany
2016 - 2017 | Exchange semester
Interface Design

Human-Computer Interaction (HCI)
Information Communication, Bachelor of Science, MingDao University, Taiwan
Sep 2011 - Jun 2015
President of Inline skate club

Vice president of E-learning volunteer

Class leader for 4 years

Talks

How to build AWS like Loadbalancer with Cilium + BGP + HAProxy + Wiregurad | CloudNative Taiwan User Group (CNTUG)
Jun 2025, Event page
AppWorks Schoool SRE tutor
Dec, 2023 - Apr, 2024 , Event page
GraphQL with Redis for DNS RPZ, GraphQL Taipei Meetup
Dec 14, 2023 , Event page, LinkedIn Post
How we implement Semantic Changelog for our team, DevOps Taiwan Community
Apr 21, 2021 , Event page
Talked about semantic release and benefit of generate Change log.
Automate a CI/CD pipeline with SlackBot Interaction with AWS CDK, Taiwan CDK Meetup
Aug 12, 2020 , Event page
Talked about how to manage CI/CD flow via Slack with approve feature and implement with AWS CDK.

Volunteer Service

AWS Community Day, Taiwan
Sept 2025
Kubernetes Community Days Taipei 2025

July 2025
Contributor - KCD Taipei 2025

2025
DevOpsDays Taipei 2025

June 2025
AWS Community Day, Taiwan
Sept 2024
Taiwan Digital Nomad Conference 台灣數位遊牧者大會 2024

August 03, 2024

DevOpsDays Taipei 2024
July 10-11, 2024
WordCamp Asia 2024
Archive
March 07-09, 2024
WordCamp Taiwan 2023
Archive
October 2023
AWS Community Day, Taiwan | Archive 1
August 2023
World Design Capital Taipei, Taiwan
2016
Ted-X Dadun, Taichung City, Taiwan
Oct 2015
E-service volunteer in Taiwan Aboriginal Tribe
July 2012 - May 2015
YMCA volunteer service in Quanzhou, Fujian, China
July 2013

Community

Winter Wizardamigos code camp , Host / Speaker.
Jan 2018 - Feb 2018
Wizardamigos JavaScript Weekly Meetup , Host / Speaker.
2017 - 2020

Certifications

KCNA: Kubernetes and Cloud Native Associate
Expires Jan 6, 2028
KCSA: Kubernetes and Cloud Native Security Associate
Expires Jan 6, 2028
AWS Certified Developer – Associate , Amazon Web Services
Aug 2021 - Aug 2024

Languages

Chinese (Native speaker) English (Fluent)

Last updated at Feb 19, 2026