Monitoring và tối ưu chi phí

Cost Optimization Framework

Level 1: Basic Monitoring (Ngày 1-7)

Setup cơ bản:

  1. AWS Budgets

    • Tạo các Budget để đảm bảo sử dụng “AWS an toàn”có biện pháp xử lý kịp thời
      • Budget 1: $50/month (Alert at 80%)
      • Budget 2: $25/month (Alert at 50%)
      • Budget 3: $10/day (Alert at 100%)
  2. Billing Alerts

    • Tạo các CloudWatch Alarms:
      • $25 threshold → Email warning
      • $50 threshold → Email + SMS
      • $75 threshold → Email + SMS + Slack
  3. Cost Explorer

    • Enable daily cost reports
    • Set up service-level breakdown
    • Monitor top 5 cost drivers

Level 2: Advanced Analytics (Tuần 2-4)

Deep Dive Analysis:

  1. Custom CloudWatch Metrics

    import boto3
    
    def publish_cost_metrics():
        ce = boto3.client('ce')
        cw = boto3.client('cloudwatch')
    
        # Get daily cost
        response = ce.get_cost_and_usage(
            TimePeriod={
                'Start': '2025-01-01',
                'End': '2025-01-02'
            },
            Granularity='DAILY',
            Metrics=['BlendedCost']
        )
    
        cost = float(response['ResultsByTime'][0]['Total']['BlendedCost']['Amount'])
    
        # Publish to CloudWatch
        cw.put_metric_data(
            Namespace='AWS/Billing/Custom',
            MetricData=[
                {
                    'MetricName': 'DailyCost',
                    'Value': cost,
                    'Unit': 'None'
                }
            ]
        )
    
  2. Resource Tagging Strategy

    Mandatory Tags:
    - Project: project-name
    - Environment: dev/staging/prod
    - Owner: email@domain.com
    - CostCenter: department
    - AutoShutdown: true/false
    - CreatedDate: YYYY-MM-DD
    

Emergency Cost Control

Critical Actions (>$150 spent)

  1. Immediate Resource Audit

    # Find most expensive resources
    aws ce get-cost-and-usage \
      --time-period Start=2025-01-01,End=2025-01-31 \
      --granularity MONTHLY \
      --metrics BlendedCost \
      --group-by Type=DIMENSION,Key=SERVICE
    
  2. Emergency Shutdown Protocol

    # Stop all non-critical instances
    aws ec2 stop-instances --instance-ids $(
      aws ec2 describe-instances \
        --filters "Name=tag:Critical,Values=false" \
        --query 'Reservations[].Instances[?State.Name==`running`].InstanceId' \
        --output text
    )