What is HA Policy? #

HA Policy is a mechanism that ensures sustained and stable running of the business if VM instances are unexpectedly or scheduled stopped or are errored because of errors occurring to compute, network, or storage resources associated with the VM instances. By enabling this feature, you can customize VM HA policies to ensure your business continuity and stability.

Concepts #

The HA Policy feature involves the following key concepts:

  • HA mode: Specifies whether to enable auto restart if VM instances are unexpectedly or scheduled stopped or are errored because of errors occurring to compute, network, or storage resources associated with the VM instances. None and NeverStop are supported:
    • None: VM instances scheduled to be stopped or unexpectedly stopped are not auto restarted.
    • NeverStop:
      • VM instances scheduled to be stopped are auto restarted.
      • Unexpectedly stopped VM instances are auto restarted on another host depending on the failover strategy you configure for them.
  • VM Failover Strategy: Specifies whether to migrate a VM instance to another host if errors occur to the compute resource, storage resource, or network resource associated with the VM instance.The VM failover mechanism inspects the following resource status:
    • Management Network Connectivity Status:
      • Management network connectivity status indicates the status of the network that connects the management node and the host where VM instances reside.
      • This status may turn Abnormal if errors occur to the management node or to the management network.
    • Storage Network Connectivity Status:
      • Detects the connectivity status of the network that VM instances use to access the primary storage where the root volumes of these VM instances reside.
      • This status may turn Abnormal if errors occur to the primary storage or to the storage network.
    • Business NIC Status:
      • Business NIC status may turn Abnormal if errors occur to the host business NIC or the switch port directly connecting to the host business NIC that is associated with the L2 network of VM instances.
    Based on the resource status inspection, the Cloud provides the following truth table for configuring VM failover strategies:Management Network Connectivity StatusStorage Network Connectivity StatusBusiness NIC StatusFail OverNormalNormalAbnormalYes/NoNormalAbnormalNormalYes/NoNormalAbnormalAbnormalYes/NoAbnormalNormalNormalNo

Fundamentals #

ZStack Cloud HA Policy has the following mechanisms:

  • The Cloud polls the running status of VM instances. If a VM instance is scheduled or unexpectedly stopped, its HA mode is checked. If the HA mode of the VM instance is NeverStop, then the VM instance is restarted on the current host or another host.

    Figure 1. VM HA Started After Unexpectedly Stopped

  • The Cloud polls the status of the hosts where VM instances reside. Either of the management network connectivity status, storage network connectivity status, and business NIC status of the host turns abnormal, the corresponding VM failover strategy and VM HA mode are checked. If the corresponding failover strategy is Yes and VM HA mode is NeverStop, then related VM instances are migrated to another host.

    Figure 2. VM HA Started After Host Business NIC Turns Down

Characteristics #

HA Policy has the following characteristics:

  • Comprehensive & Powerful: Covers all mainstream HA scenarios, including various failures, and ensures the stability and continuity of your business.
  • Flexible & Visualized: Provides a simple table that allows you to configure VM failover strategies with one click. This table functions together with the HA Mode that can be configured on all and individual VM instances, thus greatly improving the flexibility of your business HA configuration.

Scenarios #

The following describes the scenarios of the HA Policy feature.

  • Host Business NIC Turns Down:If a host business NIC turns down, to ensure high availability of business, all VM instances associated with this NIC are expected to migrate to other hosts.
    • For example, your business VM instances are running MySQL database service which is required to achieve high availability. In this case, you can set the HA mode of these VM instances to NeverStop and turn on the switch corresponding to Abnormal Business NIC Status. Then as long as host resources are sufficient, in case that a host business NIC associated with these VM instances turns down, these VM instances will be auto started on other hosts.
  • VM Unexpectedly Stops:If a VM instance is unexpectedly stopped, it is expected to auto HA start.
    • For example, your VM instances are running important business applications. To ensure business auto-recovery in case of VM stops due to reasons such as host powered-offs or business overloads, you can set the HA mode of these VM instances to NeverStop. Then if these VM instances are stopped, they are auto started.

Manage HA Policy #

On the main menu of ZStack Cloud, choose Settings > Platform Setting > HA Policy. Then, the HA Policy page is displayed.

HA Policy supports the following actions:

ActionDescription
Enable HA PolicyEnables the HA Policy feature.
Disable HA PolicyDisables the HA Policy feature.Note: If you disable HA Policy, VM instances will not be auto restarted if they are stopped. This may cause business interruptions. Proceed with caution.

HA Policy|Failover Policy #

On the Enable HA Policy page or the Overview page of HA Policy, you can modify the following true table to configure failover policies for VM instances.

Management Network Connectivity StatusStorage Network Connectivity StatusBusiness NIC StatusFail Over
NormalNormalAbnormalYes/No
NormalAbnormalNormalYes/NoNote: If the storage type is SharedBlock and this status is Abnormal, VM instances will auto fail over regardless of this configuration.
NormalAbnormalAbnormalYes/NoNote: The failover policy of this scenario follows the preceding two failover policies of this table. If you set both the preceding two policies to No, then this failover policy is set to No. If you set either of the two to Yes, then this failover policy is set to Yes.
AbnormalNormalNormalNoNote: If the management network is in Abnormal status, you cannot set this failover policy.

Note:

  • For Storage Network Connectivity Status, only shared storage is detected. Local storage is not supported.
  • If an L2 network of a VM instance is of the VXLAN type or the L2 network applies the SR-IOV or Smart NIC, and errors occur to the host business NIC associated with this L2 network or occur to the switch port directly connecting to the host business NIC, this VM instance will not fail over.

On the Enable HA Policy page or the Overview page of HA Policy, you can modify the following host error inspection settings to modify the inspection intervals of the preceding failover policy.

NameDescription
Host Self-Inspection IntervalThe interval that a host inspects its own status. Default: 5. Unit: second.
Maximum Host Self-Inspection AttemptsThe maximum number of attempts that a host inspects its own status. If the self-inspection of a host fails by the maximum attempts, it is determined that network errors occur with the host. Default: 6.

HA Policy|Advanced Settings #

On the Enable HA Policy page or the Overview page of HA Policy, you can modify the advanced settings of HA Policy. They can be classified into the following two categories:

CategoryNameDescription
VM InstanceVM Cross-Cluster HASpecifies whether to enable VM migration across clusters to achieve high availability. Default: false. If set to true, hosts across clusters can be detected to achieve VM high availability.Note: Before you enable this feature, make sure that clusters are well connected.
Maximum GC Retry Interval of NeverStop VMThe maximum interval of garbage collection (GC) attempts to start up NeverStop VM instances that are stopped unexpectedly. Default: 300. Unit: second.
Delay of NeverStop VM Startup AttemptThe delay of another retry to start up a NeverStop VM instance after the last startup attempt fails. Default: 60. Unit: second.
NeverStop VM Scanning IntervalThe interval of scanning NeverStop VM instances that fail to start up. Default: 60. Unit: second.
Sync Speed of HA VM State UpdateThe synchronization speed of the state of highly available VM instances on the UI. Default: 1. Valid values: -1 to 5, integer.A higher value indicates a lower synchronization speed. However, a higher value lowers system loads because outdated status update notifications are ignored.The value -1 indicates the state of HA VM instances on the UI does not automatically change.
VM HA ModeSpecifies whether to enable auto restart if VM instances are scheduled or unexpectedly stopped or are errored because of errors occurred to compute, network, or storage resources associated with the VM instances. Valid values: None and NeverStop.If you set HA mode to None, VM instances scheduled or unexpectedly stopped are not auto restarted.If you set HA mode to NeverStop:VM instances scheduled or unexpectedly stopped are auto restarted.If errors occur to compute, network, or storage resources, associated VM instances are auto restarted on another host depending on the HA policy you configure for them.Note: Note that you can specifically set VM HA mode for a VM instance. If you do, this global setting does not take effect on the VM instance.
HostAbnormal Host Check IntervalThe interval that the management node pings abnormal hosts. Default: 5. Unit: second.
Maximum Attempts to Determine Host DisconnectionThe maximum number of failed connections that are required to determine that a host is disconnected. Default: 12.
Host Successful Connection PeriodThe time period of a successful connection to a host. Default: 5. Unit: second. If a connection request is responded within the specified time, the connection succeeds.
Host Successful Connection PossibilityThe possibility of successful connections in contrast to failed connections that determine whether a host is successfully connected. Default: 50. Unit: %.
Minimum Attempts to Determine Successful Host ConnectionThe minimum number of successful connections that are required to determine that a host is successfully connected. Default: 5.
Timeout Period of Primary Storage Inspection by HostThe timeout period that a host checks its connection with primary storages. Default: 5. Unit: second.

HA Log #

On the main menu of ZStack Cloud, choose Settings > Platform Setting > HA Policy. Then, the HA Policy page is displayed. If HA policy is enabled and the HA mechanism is triggered, then HA logs are generated.

This page displays all VM HA logs in the Cloud. You can view the log information such as task result, VM name, VM owner, host information, and start and end time. These logs can be applied in O&M and audit.

  • You can select a time span to view HA logs. Available time spans: recent 7 days and recent 1 month. By default, logs generated in recent 7 days are displayed.
  • You can customize a time span to view the HA logs in the specified time span.
  • You can search for HA logs by VM name or VM owner.
  • You can filter HA logs by task result. The task results include succeeded and failed.
  • You can sort HA logs by creation or completion time.
  • You can export the HA logs in CSV format.
  • You can adjust the number of HA logs displayed on each page. Optional values: 10, 20, 50, and 100.

Powered by BetterDocs

Get Started today

INDONESIA