Pivotal GemFire® v8.0

Configure Pivotal GemFire to Handle Network Partitioning

Configure Pivotal GemFire to Handle Network Partitioning

This section lists the configuration steps for network partition detection.

The system uses a combination of locators and system members, designated as lead members, to detect and resolve network partitioning problems.
  1. Use locators for member discovery. See Configuring Peer-to-Peer Discovery. In addition, use multiple locators.
  2. Enable partition detection in the locators and in all system members by setting this in their

    All system members should have the same setting for enable-network-partition-detection. If they don’t, the system throws a GemFireConfigException upon startup.

  3. Configure regions you want to protect from network partitioning with DISTRIBUTED_ACK or GLOBAL scope. Do not use DISTRIBUTED_NO_ACK scope. The region configurations provided in the region shortcut settings use DISTRIBUTED_ACK scope. This setting prevents operations from performed throughout the distributed system before a network partition is detected.
    Note: GemFire issues an alert if it detects distributed-no-ack regions when network partition detection is enabled:
    Region {0} is being created with scope {1} but enable-network-partition-detection is enabled in the distributed system. 
    This can lead to cache inconsistencies if there is a network failure.
  4. You must set enable-network-partition-detection to true if you are using persistent partitioned regions.
  5. These other configuration parameters affect or interact with network partitioning detection. Check whether they are appropriate for your installation and modify as needed.
    • If you have network partition detection enabled, the threshold percentage value for allowed membership weight loss is automatically configured to 51. You cannot modify this value. (Note: The weight loss calculation uses standard rounding. Therefore, a value of 50.51 is rounded to 51 and will cause a network partition.)
    • Failure detection is initiated if a member's ack-wait-threshold (default is 15 seconds) and ack-severe-alert-threshold (15 seconds) elapses before receiving a response to a message. If you modify the ack-wait-threshold configuration value, you should modify ack-severe-alert-threshold to match the other configuration value.
    • If the system has clients connecting to it, the clients' cache.xml <cache> <pool> read-timeout should be set to at least three times the member-timeout setting in the server's The default <cache> <pool> read-timeout setting is 10000 milliseconds.
    • You can adjust the default weights of members by specifying the system property gemfire.member-weight upon startup. For example, if you have some VMs that host a needed service, you could assign them a higher weight upon startup.
    • By default, members that are forced out of the distributed system by a network partition event will automatically restart and attempt to reconnect. Data members will attempt to reinitialize the cache. See Handling Forced Cache Disconnection. You can modify the number of times that a member will try to reconnect by specifying the max-num-reconnect-tries GemFire property and the amount of time that the member will wait before attempting to reconnect (max-wait-time-reconnect).