Troubleshooting for Daily Operations

Database

MySQL

1.0. MySQL: Create User With Password

Problem

note

From MySQL engine version 8.0.x and above, the IDENTIFIED BY PASSWORD option is deprecated. The CREATE USER statement is used to create a new user account with a password. So will get the error when using the IDENTIFIED BY PASSWORD option, example on AWS Aurora MySQL Cluster

Solution

tip

To resolve this issue, please try to use the CREATE USER statement to create a new user account with a password. And GRANT the necessary permissions to the user.

CREATE USER 'foo_username'@'%' IDENTIFIED BY 'foo_password';

GRANT CREATE, DROP, INDEX, ALTER, CREATE TEMPORARY TABLES, CREATE VIEW, EVENT, TRIGGER, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, EXECUTE ON *.* TO 'foo_username'@'%';

FLUSH PRIVILEGES;

Reference

cyberciti.biz

1.1. MySQL: Permission Denied When Restoring Database

Problem

note

An error occurs when attempting to restore the database or tables from a snapshot or backup in the AWS Aurora Cluster with MySQL engine version 5.7.

Error Message

danger

Access denied; you need (at least one of) the SUPER or SYSTEM_VARIABLES_ADMIN privilege(s) for this operation

Solution

tip

To resolve this issue, please try to dump the database or tables again with the --set-gtid-purged=OFF option.

/scrips/dump_mysql.sh
{
  export DB_BACKUP_DATE=$(date "+%Y%m%d%H%M%S")
  export DB_HOST='database.xxxx.us-east-1.rds.amazonaws.com'
  export DB_NAME='db_name'
  export DB_PASSWORD='db_username'
  export DB_USERNAME='db_username'

  mysqldump \
    -h $DB_HOST \
    -u $DB_USERNAME \
    -p${DB_PASSWORD} \
    --column-statistics=0 \
    --set-gtid-purged=OFF \
    --databases $DB_NAME \
    > ${DB_NAME}.backup.${DB_BACKUP_DATE}.sql
}

Reference

dba.stackexchange.com

1.2. MySQL: Permission Denied When Executing Default Procedures

Problem

note

An error occurs when executing the default procedure from the database in the AWS Aurora Cluster with MySQL engine version 5.7. Current user foo_user_ro has permission SELECT only on the database foo_database with all tables.

+-----------------------------------------------------------------------------+
| Grants for ingestion_6585_user_ro@%                                         |
+-----------------------------------------------------------------------------+
| GRANT USAGE ON *.* TO 'foo_user_ro'@'%'                                     |
| GRANT SELECT ON `foo_database`.* TO 'foo_user_ro'@'%'                       |
+-----------------------------------------------------------------------------+

Error Message

danger

mysql error: execute command denied to user 'foo_user_ro'@'%' for routine 'foo_database.STDEV_SAMP'

Solution

tip

To resolve this issue, please try to add permission EXECUTE.

GRANT EXECUTE ON foo_database.* TO 'foo_user_ro'@'%';
FLUSH PRIVILEGES;


// example query with default procedure
SELECT age, STDDEV_SAMP(age), STDDEV(age), std(age), STDDEV_pop(age), count(*)
FROM tmp_table_age
GROUP BY table_name;

PostgreSQL

2.1. PostgreSQL: Cannot Drop Database Because It Is Currently In Use

Problem

note

Cannot drop database foo_database because it is currently in use or cannot drop the currently open database

Solution

SELECT
    pg_terminate_backend(pid)
FROM
    pg_stat_activity
WHERE
    pid <> pg_backend_pid()
    AND datname = 'foo_database';

ALTER DATABASE foo_database CONNECTION LIMIT 0;

DROP DATABASE foo_database;

Kubernetes

1.1. Namespace is stucked as "Terminating"

Problem

note

As we know, when deleting namespace, all components inside the namespace will be deleted. However, sometimes the namespace is stucked as "Terminating" and cannot be deleted.

Solution

{
  export STUCKED_NAMESPACE='foo'

  kubectl get namespace $STUCKED_NAMESPACE -o json \
    | tr -d "\n" | sed "s/\"finalizers\": \[[^]]\+\]/\"finalizers\": []/" \
    | kubectl replace --raw /api/v1/namespaces/$STUCKED_NAMESPACE/finalize -f -
}

1.2. Get List Of Ip Address Of Pods In A Namespace

Problem

note

Filter the list of IP addresses of pods in a namespace.

{
  export NAMESPACE='foo'
  kubectl -n $NAMESPACE get po -o wide \
    | awk '{print $7}' | sort | uniq
}

1.3. Pod Interrupted When New EC2 Node Group Is Scaled Down

Problem

note

AWS EKS cluster and Cluster-Autoscaler https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md to scale EC2 Node Group when to detect any pods that can’t be scheduled on any existing nodes. And the current problem is the deployment rollout to deploy new pod container on new EC2 Node Group with Cluster-Autoscaler on AWS EKS cluster and after rollout deployment completely, new EC2 Node Group will be scaled down and at that time, new pod will be terminated and is switched to another/existing EC2 Node Group, so we need to prevent new pod after deployment will not interrupt or terminate when new EC2 Node Group is scaled down

Solution

tip

To address the issue of new pods being terminated when the new EC2 node group scales down after a deployment, you can take several steps to ensure that your new pods remain stable and are not interrupted. Here are a few strategies:

1. Node Affinity and Taints/Tolerations:

Node Affinity: Use node affinity to ensure that your new pods are scheduled only on the new node group. You can do this by adding labels to the new nodes and specifying node affinity in your pod specifications.
Taints and Tolerations: Taint the new nodes to prevent other pods from being scheduled on them unless they tolerate the taint. This ensures that only the specific pods you want are scheduled on the new nodes.

2. Pod Disruption Budgets (PDBs):

Use PDBs to specify the minimum number of pods that must remain available during an eviction or node scale-down. This helps ensure that your application maintains its desired availability.

3. Cluster Autoscaler Configuration:

Adjust the --scale-down-unneeded-time and --scale-down-utilization-threshold settings to give your new pods more time to stabilize before the autoscaler considers the new nodes for scale-down.

4. Deployment Strategy:

Use a rolling update strategy in your deployments to gradually replace old pods with new ones. This minimizes disruptions and ensures that new pods have a chance to become stable before old ones are terminated.
Ensure that your deployment has a sufficient number of replicas to maintain availability during the rollout.

eks/staging/cluster_autoscale/cluster-autoscaler-autodiscover.yaml
command:
  - ./cluster-autoscaler
  - --v=4
  - --stderrthreshold=info
  - --cloud-provider=aws
  - --skip-nodes-with-local-storage=false
  - --expander=least-waste
  - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/ir-staging-eks
  - --balance-similar-node-groups
  - --skip-nodes-with-system-pods=false
  - --scale-down-unneeded-time=20m
  - --scale-down-utilization-threshold=0.5

profiles/staging/pod-disruption-budget.yml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: staging-fooapp
  namespace: staging
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app.kubernetes.io/instance: staging-fooapp
      app.kubernetes.io/name: talo6-dct-backend-app

Definition

The scale-down-utilization-threshold is a parameter used by the Kubernetes Cluster Autoscaler to determine when to scale down (remove) underutilized nodes. This parameter sets the threshold for node utilization below which the node is considered underutilized and eligible for scale-down.
Understanding scale-down-utilization-threshold
- Purpose: The scale-down-utilization-threshold helps prevent unnecessary node scaling and ensures efficient resource utilization within the cluster.
- Value Range: The value for this parameter is between 0 and 1, representing the percentage of resource utilization (CPU or memory) of the nodes.
How It Works
- Threshold Value: If the node's resource utilization (average of CPU and memory usage) is below the specified threshold, the node is considered underutilized.
- Eligible for Scale-Down: Nodes that are underutilized for a specified period (defined by the scale-down-unneeded-time parameter) are eligible for scale-down.
- Example: If scale-down-utilization-threshold is set to 0.5 (50%), any node with less than 50% average utilization of its resources will be considered for scale-down.
Example Scenario
- Setting: --scale-down-utilization-threshold=0.5
- Node Utilization: Suppose you have a node that is utilizing 40% of its CPU and 60% of its memory.
- Average Utilization: The average utilization of this node is (40% + 60%) / 2 = 50%.
- Scale-Down Decision: In this case, the node is exactly at the threshold and may not be considered underutilized. However, if the average utilization drops below 50%, the node will be considered underutilized and may be scaled down if it remains below this threshold for the period specified by scale-down-unneeded-time.
Adjusting the Threshold
- Higher Threshold: Setting a higher threshold (e.g., 0.6) means nodes need to be more utilized to remain in the cluster. This can lead to more aggressive scaling down and potentially more frequent rescheduling of pods.
- Lower Threshold: Setting a lower threshold (e.g., 0.3) means nodes can be less utilized and still remain in the cluster. This is less aggressive and might be better for workloads that have fluctuating or unpredictable resource usage.

Troubleshooting for Daily Operations

Database​

MySQL​

1.0. MySQL: Create User With Password​

Problem​

Solution​

Reference​

1.1. MySQL: Permission Denied When Restoring Database​

Problem​

Error Message​

Solution​

Reference​

1.2. MySQL: Permission Denied When Executing Default Procedures​

Problem​

Error Message​

Solution​

PostgreSQL​

2.1. PostgreSQL: Cannot Drop Database Because It Is Currently In Use​

Problem​

Solution​

Kubernetes​

1.1. Namespace is stucked as "Terminating"​

Problem​

Solution​

1.2. Get List Of Ip Address Of Pods In A Namespace​

Problem​

1.3. Pod Interrupted When New EC2 Node Group Is Scaled Down​

Problem​

Solution​

1. Node Affinity and Taints/Tolerations:​

2. Pod Disruption Budgets (PDBs):​

3. Cluster Autoscaler Configuration:​

4. Deployment Strategy:​

Definition​

Database

MySQL

1.0. MySQL: Create User With Password

Problem

Solution

Reference

1.1. MySQL: Permission Denied When Restoring Database

Problem

Error Message

Solution

Reference

1.2. MySQL: Permission Denied When Executing Default Procedures

Problem

Error Message

Solution

PostgreSQL

2.1. PostgreSQL: Cannot Drop Database Because It Is Currently In Use

Problem

Solution

Kubernetes

1.1. Namespace is stucked as "Terminating"

Problem

Solution

1.2. Get List Of Ip Address Of Pods In A Namespace

Problem

1.3. Pod Interrupted When New EC2 Node Group Is Scaled Down

Problem

Solution

1. Node Affinity and Taints/Tolerations:

2. Pod Disruption Budgets (PDBs):

3. Cluster Autoscaler Configuration:

4. Deployment Strategy:

Definition