Fixed Issues in GemFire 8.0.0

Last updated: August 26, 2014

Id Title Description Workaround for earlier gemfire versions
#50952 Inconsistent bucket copies after restoring from online backup When using async disk write, the entry might be still in the queue, next operation came. A race will cause the previous entry with new version is written to disk, triggered by switchOplog called by backup. It exists in 7.0.
#50898 'start data-browser' is no longer availabe in Gfsh as of GemFire 8.0 GemFire Data Browser (DB) has become part of Pulse, and as a result is not longer available to be launched from Gfsh using the 'start data-browser' command. Instead, a user can launch Pulse from Gfsh using 'start pulse' and then navigate to the GemFire Data Browser screen.
#50686 Stat sampler thread may not record samples on Windows when system is under heavy I/O load On Windows, if the disk is under load, stat samples may not get recorded. Workaround: 1>Provision for higher disk and CPU capacity when running on Windows Alternatively 2> Turn off virus scans or disk defrags/ disk scans on server machines during operational hours 1>Provision for higher disk and CPU capacity when running on Windows Alternatively 2> Turn off virus scans or disk defrags/ disk scans on server machines during operational hours
#50596 PdxInitializationException encountered after creating AsyncEventQueue PdxInitializationException occurs after creating AsyncEventQueue when adding a member to an existing distributed system. Error message indicates that PDX persistence is incompatible even though the existing members are set PDX persistent=false. This has been fixed in 8.0. Set PDX persistent = true.
#50544 Slow CQ Performance when many of the same queries are registered Cq's with matching queries were not being matched on data nodes, leading to performance degradation for scenarios where many of the same queries were executed/registered. This was due to the server reevaluating every CQ even though the queries were the same and only one evaluation was necessary. None. Fixed with latest patch
#50512 Gateway sender defaults to asynchronous persistence By Default, gateway sender persistence is synchronous and internally default value of isDiskSychonous should be DEFAULT_DISK_SYNCHRONOUS.
#50475 Query returns duplicate results when hashindex is recreated after region.clear When HashIndex is repopulated after clearing region the query may return duplicate results. Index stats also do not get reset in this case. This is now fixed in 7.0.2 and 8.0. No workaround for previous versions
#50424 Setting socket read timeout set using any of the public API has no effect for GatewaySenders and Gateways Setting socket read timeout using socket-read-timeout in XML or setSocketReadTimeout in Java API has no effect for GatewaySenders and Gateways. Instead it logs a warning like: [warning 2014/07/03 12:37:55.471 PDT <main> tid=0x1] Setting the socket read timeout on a gateway is currently disabled. Please contact VMware support for assistance. This bug affects 7.0, 7.0.1.x and 7.0.2.x for GatewaySenders and those versions plus 6.6.x for Gateways. To set the socket read timeout in version prior to 8.0: For Gateways, use the Java system property gemfire.cache.gateway.default-socket-read-timeout like: -Dgemfire.cache.gateway.default-socket-read-timeout=60000 For GatewaySenders, use the Java system property gemfire.GatewaySender.GATEWAY_CONNECTION_READ_TIMEOUT like: -Dgemfire.GatewaySender.GATEWAY_CONNECTION_READ_TIMEOUT=60000 Note: These properties are in milliseconds
#50291 got PutAllPartialResultException if putAll failed It was hard coded to be PutAllPartitialResultException if a putAll failed due to other exceptions, such as NotAuthorizedException. Now it will throw the real exception instead.
#50099 Gfsh 'start locator' command with --bind-address option is ignored, caused by incorrect parse logic in the LocatorLauncher API class, which also affects LocatorLauncher used stand-alone as well. Currently, the LocatorLauncher class, whether used standalone from the OS shell or when invoked indirectly from Gfsh, which occurs when issuing the following command... gfsh>start locator --name=X --bind-address=10.214.116.24 ... Ignores the bind-address setting provided by the user. This fix addresses the following use cases... 1. Invoking LocatorLauncher from the OS shell using Java launcher. 2. Starting a Locator from Gfsh using the 'start locator' command with the --bind-address command-line option. While the GemFire System property gemfire.bind-address probably should have worked in this case, it does not given how the"internal" logic supporting Locators in GemFire is used and the distribution configuration settings are referenced and passed. As such, no current workaround exists.
#50049 toData failed on DataSerializable class com.gemstone.gemfire.internal.cache.versions.RegionVersionHolder due to ArrayIndexOutOfBoundsException methods in RegionVersionHolder need to be synchronized.
#49909 Client silently ignores updates from restarted servers A client cache may silently reject updates from a server if servers are restarted and lose their original data.
#49864 Gfsh 'get' command does not invoke Region CacheLoader on Cache misses. This fix resolves the issue where a 'get' command executed in Gfsh on a Region where a CacheLoader is registered/present does not get invoked on a Cache miss (i.e. the Key is not currently present in memory, in the Region). The issues was caused by a Region.containsKey(missing-key) conditional check wrapping the Region.get(for-missing-key) call, which contains the logic of delegating to the registered CacheLoader for Cache misses. This fix introduces --load-on-cache-miss command-line option to the 'get' command (default is true) that enables the registered CacheLoader on the Region to get invoked when a Cache miss occurs. The user can disable the CacheLoader, suppressing the default behavior, by setting this option explicitly to false. Otherwise 'get' functions exactly the same as the GemFire public API now. No workaround exists.
#49653 The GatewayEventFilter afterAcknowledgement is invoked on all events retrieved from the queue even if they have filtered by beforeTransmit A GatewayQueueEvent filtered from a batch of events by beforeTransmit is still delivered to the afterAcknowledgement callback even though it wasn't sent across the WAN or acknowledged. This bug affects 7.0 and 7.0.1 prior to 7.0.1.16. There is no work-around other than to be able to handle events that have been filtered by beforeTransmit and also to realize that these events have not been sent across the WAN.
#49185 Pulse fails to start for multiple users on the same machine Trying to start the Gemfire XD Pulse application may fail if another user has previously started Pulse on the same machine. This occurs because necessary folders in the /tmp directory remain and are owned by the previous user that started Pulse. The problem appears as the following exception: [severe 2013/10/04 15:42:57.255 IST tid=0x35] (tid=11 msgId=4) LifecycleException java.io.IOException: Failed to create destination directory to copy resources at org.apache.catalina.loader.WebappLoader.setRepositories(WebappLoader.java:891) n/a
#49106 Gfsh 'start locator' and 'start server' commands log a warning rather than fail fast for missing XML and property configuration files. Currently, the Gfsh 'start server' command will only log a warning in the shell when non-existing cache.xml, gemfire.properties or gemfire-security.properties files are specified with the '--cache-xml-file', '--properties' and '--security-properties' command-line options, respectively. Likewise, the Gfsh 'start locator' command will only log a warning in the shell when non-existing gemfire.properties or gemfire-security.properties files are specified with the '--properties' and '--security-properties' command-line options, respectively. This ticket changes the behavior to fail to start either a Locator or Server when any of these options are used and the files are missing. There is no workaround for failing fast. The user is required to ensure the files exist using external means.
#48823 Querying a region with objects having empty collections returns wrong results If query is executed on a region with objects having empty collections as field and the where clause has multiple conditions the query returns incorrect result. Use empty array of objects instead of empty collection
#48415 HashIndex not correctly handling keySet and entries index expressions HashIndex was not correctly handling certain index expressions like keySet and entries Use a functional index and if inplace modification flag can be turned off safely, it will offer some space savings.
#48235 OQL index update could be missed during index creation During index creation if a region entry is updated, in a very rare scenario, the index may not get updated. This is now fixed in 8.0.
#48122 ConcurrentMap operations do not operate as expected GemFire Region API documentation for ConcurrentMap operations needs to be clarified. The semantics of the ConcurrentMap methods on a Partition are consistent with those expected on a ConcurrentMap. In particular multiple writers in different JVMs of the same key in the same partition will be done atomically. The same is true for a region with GLOBAL scope. All ops will be done atomically since the dlock will be held while the operation is done. The same is true for a region with LOCAL scope. All ops will be done atomically since the underlying map is a concurrent hash map and no distribution is involved. For peer REPLICATE and PRELOADED regions atomicity is limited to threads in the JVM the operation starts in. For peer EMPTY and NORMAL regions the ConcurrentMap methods will throw an UnsupportedOperationException. For client server regions the atomicity is determined by the scope and data policy of the server region as described above. The operation is actually performed on the server as described above. Clients will always send the ConcurrentMap operation to the server and the result returned by the ConcurrentMap method in client will reflect what was done on the server. Same goes for any CacheListener called on the client. Any local state on the client will be updated to be consistent with the state change made on the server. Note that if a CacheWriter exists on the client it will not be called; we only call a CacheWriter on the server.
#47907 'jmx-manager-http-port' GemFire DistributionConfig property not properly applied when bind-address is also configured A bug was discovered in the GemFire Manager codebase involving the embedded HTTP service using Tomcat, where the 'jmx-manager-http-port' GemFire DistributionConfig property was not being properly applied (due to a incorrect usage of the Tomcat API) and used. If the bind-address for the GemFire Manager is specified in addition to the port the HTTP ServerSocket will listen on, then the port value from the DistributionConfig properties is ignored and Tomcat defaults to port 8080. However, if a bind-address is not specified, then the 'jmx-manager-http-port' DistributionConfig property is properly applied and used by Tomcat on start of the embedded HTTP service. The only known workaround is not using a bind-address in conjunction with the HTTP port DistributionConfig property (jmx-manager-http-port).
#47877 CustomExpiry may deadlock on overflow regions If you install a CustomExpiry callback on a region that also does overflow disk eviction then you may see a deadlock.
#47856 start server command ignore --disable-default-server when processing cache.xml The GFSH command start server will ignore --disable-default-server and start a server endpoint for clients using the default port 40404 when processing cache.xml. Use the deprecated script $GEMFIRE/bin/cacheserver to start a cache server with a disabled default server and a cache.xml file.
#47815 Select star queries for pdx objects return PreferBytesCachedDeserializables Select star queries for pdx objects return PreferBytesCachedDeserializables when read-serialized="true". This is fixed in 7.1.
#47807 GemFire server runs in a hot loop when a memcached client disconnects After a memcached client disconnects, GemFire server may run in a hot loop loggin the following: [info 2013/04/19 16:45:03.551 JST gemcachedserver01 <pool-1-thread-2> tid=0x41] Unknown command. ensure client protocol is ASCII. For binary protocol it logs: Exception in thread "pool-1-thread-9" java.lang.IllegalStateException: Not a valid request, magic byte incorrect 103 b:67 u:0x67.
#47746 Query execution may fail if executed in a transaction Query execution may fail if executed within a transaction with exception saying Transaction already in progress. This is fixed in 664 and 8.0. no workaround.
#47744 Starting a gemfire process using a gemfire.jar located in root drive on Windows throws NullPointerException Starting a gemfire process using a gemfire.jar located in root drive on Windows throws NullPointerException. Example: C:\gemfire.jar Exception in thread "main" java.lang.NullPointerException at com.gemstone.gemfire.internal.SystemAdmin.getHiddenDir(SystemAdmin.java:1937) at com.gemstone.gemfire.internal.licensing.LicenseChecker.getHiddenSerialFile(LicenseChecker.java:1312) at com.gemstone.gemfire.internal.licensing.LicenseChecker.<init>(LicenseChecker.java:313) Place the gemfire.jar into a subfolder with a non-root parent folder. Example: C:\parent\sub\gemfire.jar. A typical installation of the gemfire product places the gemfire.jar in C:\gemfire\lib\gemfire.jar
#47731 If one site is connected to another site with multiple parallel GatewaySenders, and the remote site has CQs, events may not be delivered to those CQs If a single client thread does a put into a region connected to one sender followed by a put into a region connected to the other sender, those events are queued in different queues and may arrive out of order in the remote site. If that happens, the remote site's client queue will drop the out of order event, and it won't be delivered to the CQ (or register interest) client.
#47667 Client hangs in tx commit when server does not define all regions When a GemFire server does not define all the regions involved in a transaction, a GemFire client may hang while committing a transaction. This has been fixed in 7.1 and 8.0. A server must define all regions participating in a tx or none at all.
#47666 Query results may contain Undefined when using index but not when not using index If query uses index for where clause with NOT EQUALS condition, UNDEFINED may occur in the results if an attribute of a null-valued attribute is accessed. But the same query when executed without index will not contain Undefined in the results. This is fixed in 8.0 no workaround if using index.
#47665 start server command does not immediately fail if the specified port is unavailable The GFSH start server command does not immediately fail if the specified port is unavailable. If the port is unavailable and it is specified with the port argument or within cache.xml, then it may fail after connecting to the cluster when it finally uses the port to create a server listener. The port in use message (involving a Java BindException) is only found in the launching server's log but not printed to the output of the command within GFSH. The user must wait until the server fails to start and then analyze its log file to determine if the cause was the server port being unavailable.
#47664 start server command ignores port argument if cache server is defined in cache.xml The GFSH start server command will ignore the port argument if a cache server is defined in cache.xml. If no port is specified in cache.xml, it will use the default port 40404. If a port is specified in cache.xml, then that port will be used. In either case the port specified in the GFSH command will be ignored unless a cache server is not specified in cache.xml. If a cache server is not specified in cache.xml then a cache server will be created using the port argument (if present) unless disable-default-server is specified. Either specify the port in cache.xml or do not define the cache server in cache.xml. In the latter case, a default cache server will be created and will use the port argument as expected.
#47614 TypeMismatchException during query execution on non existing field in an object During query execution using index on a field, if a Region Entry is updated and the value object is replaced by an object that does not have the field, a TypeMismatchException could occur.
#47601 Queries using alias may throw UnsupportedOperationException Queries may throw UnsupportedOperationException in very rare scenario if alias is not used in the where clause of the query. This is fixed now.
#47582 Loss if pdx disk store in gateway receiver may result in IllegalStateException "Unknown pdx type" If a distributed system that has a gateway receiver loses its pdx disk store then when it is restarted the existing gateway senders will not resend the pdx type definitions. So you may see an IllegalStateException saying "Unknown pdx type". If you can do not delete the disk store used by pdx. You can configure a specific disk store that only stores pdx which would allow you to delete the disk stores that contain region data while preserving you pdx metadata. Otherwise you need to also restart then distributed systems that have gateway senders in them.
#47556 Query with OR condition may not return results When an OR condition clause is added with an AND condition in an Equi-join query and the region has 3 indexes which are used for all the conditions EXCEPT OR clause in the query, query does not return any result as opposed to when run without indexes. This is now fixed.
#47523 Monitoring region and notificationRegion emit warnings on startup with enable-network-partition-detection set to true Warnings like below are now emitted whenever a node comes up with enable-network-partition-detection. {{{ [warning 2013/03/15 02:41:11.448 IST <main> tid=0x1] Region _monitoringRegion_10.112.204.9(11302)<ec>54088 is being created with scope DISTRIBUTED_NO_ACK but enable-network-partition-detection is enabled in the distributed system. This can lead to cache inconsistencies if there is a network failure. [info 2013/03/15 02:41:11.460 IST <main> tid=0x1] Initializing region _monitoringRegion_10.112.204.9(11302)<ec>54088 [warning 2013/03/15 02:41:11.469 IST <main> tid=0x1] Region _notificationRegion_10.112.204.9(11302)<ec>54088 is being created with scope DISTRIBUTED_NO_ACK but enable-network-partition-detection is enabled in the distributed system. This can lead to cache inconsistencies if there is a network failure. }}} You can ignore these warning messages.
#47506 locators start and form separate distributed systems If two locators start simultaneously, it is possible that they will end up connected with no knowledge of each other. Ensure that locators are configured to know about one another in their gemfire properties and stagger the starting of locators.
#47431 Duplicate events can be dispatched when conflation is enabled There is a scenario where an event which has been conflated by primary doesn't get conflated in secondary when queue on secondary node is initializing. In this case, this event can be dispatched again when secondary becomes primary.
#47394 Conflation of latest event in WAN queue can cause inconsistency Inconsistency can occur in WAN sites when conflation is enabled. If the latest event is being conflated in the WAN queue during concurrent operations, it can result in inconsistency between the WAN sites. Disable conflation to resolve this issue.
#47375 Limiting the query fetch result set to 1000 affects queries containing aggregate functions To avoid unintended load from accidental querying (which causes iteration over a large number of rows), the gfsh query command limits FETCH SIZE to the value gfsh-controlled variable APP_FETCH_SIZE (default of 1000). If a user has aggregation functions inside their query, they should explicitly set APP_FETCH_SIZE to a value >= expected entries in the region. Set the gfsh APP_FETCH_SIZE variable to a very high value when a query has aggregate functions in it to avoid getting the result aggregated over a limited set of rows.
#47366 Query using keys throws EntryDestroyedException The query engine can throw EntryDestroyedException while queries on region key sets are being executed. This happens when a region entry destroy operation is in progress, and the query is concurrently doing a non-index scan and trying to retrieve a value from the entry. Create a relevant index that will make the query iterate through index entries. This will allow you to avoid this problem.
#47338 Client PDX metadata is out of synch with server when client runs longer than server When a client is still running and the server is restarted, the client's PDX type metadata may go out of synch with the server's PDX type metadata. This can cause the client's get operations and queries to throw exceptions during deserialization. The issue can be addressed by persisting PDX metadata to disk. See the "Data Serialization" section of the vFabric GemFire User's Guide for more details.
#47303 Hang after creating a partitioned region in one member while doing a global destroy of the region If a partitioned region is globally destroyed using Region.destroyRegion() at the same instant that another member is creating the partitioned region, it's possible the new member may fail to see the destroy. If the same partitioned region is then recreated on other members, it's possible that operations that try to create buckets could hang in the member that created the new region.
#47243 EventFilter can receive PdxType events PdxType events are internal events generated which are required to be sent on remote sites for PDX to work. Currently, it is handed over to filter callback before queuing which could lead to its not getting queued to the GatewaySenderQueue. User should not filter PdxType events.
#47205 PdxSerializer may be passed a CopyOnWriteArraySet At least one internal GemFire class (CacheProfile) sometimes serializes a CopyOnWriteArraySet which may not work if you implement and install a PdxSerializer. In you PdxSerializer toData method check to see if the incoming object is an instanceof CopyOnWriteArraySet. If so return false.
#47181 Combination of mcast-port=0, enforce-unique-host=true and redundancy-zone=x causes UnsupportedOperationException The combination of these properties causes an UnsupportedOperationException: mcast-port=0, enforce-unique-host=true, redundancy-zone=x. This bug is in 7.x versions. Either set the mcast-port to non-zero or to remove the enforce-unique-host and redundancy-zone properties.
#47097 NullPointerException reading cache.xml if loss-action or resumption-action not specified If loss-action or resumption-action not specified in the membership-attributes section, a NullPointer may result. Specify loss-action and resumption-action if using required roles.
#47088 GemFire Locator and Server process dependencies on the JDK tools.jar. It has been observed that a GemFire Locator and Server depend on the tools.jar from the JDK. In particular, it is the new Launcher classes (LocatorLauncher and ServerLauncher in the com.gemstone.gemfire.distributed packagae) that use the Attach API, which is bundled in the tools.jar for Oracle Java distributions. The Attach API is used, and will continue to be required for 7.0.1 to preserve the behavior and functionality of the new Launcher classes. The Attach API enables the Launcher classes with the ability to properly manage the life cycle for GemFire members in order to ensure the proper functioning of the cluster. By using the Attach API, we can more reliably ensure GemFire members are managed properly, which prior to 7.0.1, was accomplished by the use of the error prone status file. Post 7.0.1, alternative approach may be considered in order to remove the dependency on the Attach API, bundled in the tools.jar. Include the JDK tools.jar on the classpath.
#46955 Cache writer beforeUpdate invoked for an event with Operation.CREATE If an accessor updates a cache entry and that entry already exists in the distributed system a cache writer will be notified of the event with beforeUpdate rather than beforeCreate. This change in behavior was introduced in the 7.0.0 release. The expected behavior is that the notification will reflect what happened in the JVM that initiated the operation, so for an empty accessor this should be a beforeCreate notification.
#46876 hang during startup waiting for a message reply that never arrives During startup GemFire sends Region creation messages to other members. A race condition in UDP datagram messaging may cause delivery of one of these messages to fail, causing a hang.
#46784 QueryInvocationTargetException may be thrown instead of QueryExecutionLowMemoryException When while executing a query on a partition region and running low on memory, a QueryInvocationTargetException may be thrown instead of the expected QueryExecutionLowMemoryException. Catch the QueryInvocationTargetException and check the cause.
#46769 Region with Hyphen Any region with a hyphen in its name is not properly queried from GFSH. This is due to special character handling in ObjectName of regions which is used by query command.
#46660 Wakeup delay reported in log entries by statistics sampler is in a wrong time unit When the statistics sampler logs a wakeup delay, the time unit used for the value is ms, which is incorrect. The actual value reported is correct, but the time unit should be ns (nanoseconds).
#46602 StatMonitorNotifier Thread may log NullPointerException warning during Cache closure The StatMonitorNotifier Thread may log a NullPointerException warning originating from com.gemstone.gemfire.management.internal.beans.stats.MBeanStatsMonitor$MBeanLevelStatisticsListener.handleNotification(MBeanStatsMonitor.java:86). This is an unexpected but harmless log message which may occur during Cache closure. It should be ignored and will not adversely affect GemFire, its Management service, or the application. Ignore this warning.
#46590 putIfAbsent may appear to fail in a client cache but actually succeeded in servers and was passed to other clients If a putIfAbsent() operation is retried by a client cache due to server failure it is possible that other servers have already seen the operation and applied it to their caches and also forwarded it to other clients. The cache that retries the putIfAbsent() then finds that the server handling the reattempt rejects the operation because it already has the change. If this happens the value returned by putIfAbsent will equal the value being put into the cache. This leaves the cache of client that performed the operation not having the entry while other servers and clients do have the entry.
#46503 "query" command from gfsh should not be used on data which has domain objects with cyclic references. "query" command from gfsh should not be used on data which has domain objects with cyclic references. It results in StackOverflowError.
#46499 Some Read only operations on Mbean are not authorised in default monitor role Some read only operations on Mbean might not be available with default monitor role. Role definition for Monitor role need to include those operations which are not available
#46419 Revoking a non-existent disk store should fail, but instead succeeds with no indication of an error. Disk stores that are considered missing (as reported by the "show missing-disk-store command) may be removed from the system with the "revoke missing-disk-store" command. If, while entering the revoke command, the user inadvertently enters an ID that doesn't match an existing disk store, a result of success will be returned instead of an error. This could lead to the user believing the disk store has been successfully removed when it has not been removed. After using the "revoke missing-disk-store" command the user should again run the "show missing-disk-store" command to confirm that the disk store was successfully removed.
#46391 The MBean for a newly created region may not be available immediately. GemFire Manager has the federated view of the MBeans for the entities on GemFire Members. There is a delay in the updates that reach the Manager. Therefore, after creating a region using "create region" command, the MBean for that region is not available on the member immediately. If subsequent commands perform an operation on that region, you should add a slight delay using the "sleep" command. Add an explicit delay of 1.5 to 3.0 seconds. You can use the Thread.sleep() method or use gfsh command 'sleep'.
#46105 Incorrect formatting of tabular gfsh output When running gfsh commands, the formatting for displaying Tabular Results can become distorted when the data contains long values in one or more columns. no workaround
#46020 GemFire fails to start if the install directory name contains non-ASCII characters Under Windows, when attempting to launch GemFire scripts (gemfire.bat, cacheserver.bat or gfsh.bat) GemFire may fail to launch if the name of the installed directory contains non-ASCII characters. This issue may also occur if GemFire is launched with the -dir option pointing to a directory whose name contains non-ASCII characters. 1. Relocate the GemFire product tree to a directory structure whose name only includes ASCII characters. 2. Delete the *.dll shared libraries from the lib directory. These libraries provide native startup support but are not required in most cases.
#45931 If customer did not specify classpath for their instantiators, offline compaction (including the conversion) will lose the instantiators. This bug is inherited from 6.5.
#45685 Some javadocs on Region methods say they will throw UnsupportedOperationException but they do not The following methods on Region say they will throw an UnsupportedOperationException if the region is a partitioned one but this is not true: invalidateRegion() getSubregion(String) containsValue(Object) Ignore the javadocs in these cases. These operations are supported on partitioned regions.
#45620 Region.size() may be incorrect during the time when the region is being destroyed or cleared When a non-partitioned region is being destroyed or cleared it may report a negative size.
#45600 Delay while getting Mbean and its Data User may experience latency in getting new Mbean and its Data. The system takes 1.5 sec in federation. So user would get new Mbean and its Data at least in 1.5 sec, but also depends on network traffic. For existing Mbean the delay will be only in updates and not for Mbean. User should develop code or script keeping this delay in mind.
#45409 SystemFailure class will leave two threads running after gemfire closes The SystemFailure class will leave two threads running after the cache has been closed. The threads are named "SystemFailure WatchDog" and "SystemFailure Proctor". Both threads are daemon threads. No workaround.
#45314 Starting up gateway hubs before persistent member recovery results in hang In rare cases, starting persistent members while a gateway is receiving events from a remote WAN site can results in a hang. This bug only affects gemfire 6.5.
#45116 Shutdown all or cache close may cause spurious NotSerializableException If you are using a PdxSerializer you may see a NotSerializableException during a cache close or a shutdown all. This is because the PdxSerializer is uninstalled during the cache close and, in some rare cases, GemFire may still attempt to do a serialization even after the PdxSerializer is no longer installed. In that case it will attempt to use standard java io serialization and if your object does not support standard serialization you will then see a NotSerializableException. Ignore NotSerializableException if the cache is being closed or shutdown.
#45081 IOException can be thrown with Sun JDK 1.6 on Windows Observed "java.io.IOException: An operation was attempted on something that is not a socket" on windows with Sun JDK 1.6. This could be because of NIO bug in the JVM that is fixed in JDK 1.7 [http://support.microsoft.com/kb/817571]. Use JDK 1.7 on Windows machines.
#44961 Mutiple cache creations may fail In some cases, a cache creation request may fail when it should not. Because the cache is a singleton, you can only create one per JVM. If a subsequent cache creation attempts to create an identical cache, the existing cache should be returned. If you have pdx attributes configured in your cache.xml file, and not on the cache factory, then the subsequent creation requests fail with an IllegalStateException saying that the pdx attributes differ. Configure pdx using cache factory methods or change your code to fetch the singleton cache instead of creating it.
#44709 InternalFunctionInvocationTargetException is observed in gemfire log files During function execution, if a member is departed then an InternalFunctionInvocationTargetException is logged as warning.
#44636 Locator log file name cannot be changed Using the gemfire start-locator command, the locator log is always called locator.log and is located in the working directory. There is no way to change this. If the log-file property is defined in the gemfire.properties file, it is ignored. use -Dgemfire.log-file=logfilepath at the end of the gemfire start-locator command to set the log file name
#44576 Query acts as if a non-existent field exists A query may act as if a non-existent field exists and has the default value for the field type. This can only happen if the class that contains the field has been PDX serialized and you have at least two versions of your class; one without the field ("V1"), and one with the field ("V2"). The problem will only happen in a JVM in which the query execution originated when the JVM has read-serialized equal to false. Under these conditions, the query thread will deserialize the PDX back into a domain class. If the serialized data represents "V1" but the query thread deserializes the PDX into an instance of "V2", then the query will act as if the non-existent field exists. However, if read-serialized is set to true or if the query originated in a remote JVM, then the query will just use a PdxInstance for "V1"'s serialized data and will act as if the non-existent field does not exist. If your code invokes queries on objects that may have different versions, set read-serialized to true.
#44558 Gateway.stop() does not cleanup/destroy the region for the Gateway Event Queue Manually stopping a gateway using the API doesn't close the region backing the queue. This will cause unnecessary event replication to the JVM containing the stopped gateway. The region is internal but it can be retrieved and closed manually. The region is named: gatewayHubId + "_" + gatewayId + "_EVENT_QUEUE" String gatewayRegionName = gatewayHubId + "_" + gatewayId + "_EVENT_QUEUE"; Region region = cache.getRegion(gatewayRegionName); region.close(); The region should be just closed and not destroyed so any persistent data is not deleted.
#43784 How can I tell that a region was closed because I ran out of disk space. We see that when peer runs out of disk space, the regions that failed to persist close automatically. Is there a way to detect this condition? For example, to detect network split or any network issues we add listener to any of our Region and watch for regionDestroy and check operation to be FORCED_DISCONNECT or if whole Cache is closed then we do our own logic and exit the JVM. But in case when Region runs out of disk space each individual region is closed. But the Cache doesn't seem to be closed. So we are not able to detect this condition at all. Is there any way we can detect this in the listeners? The CacheListener afterRegionDestroy is called in this case. The operation will be REGION_CLOSE. We do log a message at "error" level that says: A DiskAccessException has occurred while writing to the disk for region XXX. The region will be closed. If you are able to access the log in your CacheListener then you could check for this log message.
#43690 Hang with PDX serialization and conserve sockets true If PDX serialization is used in a system with conserve-sockets=true (the default), there is a possibility of a hang occurring when a type is serialized for the first time. This will happen only if objects are accessed directly in peer VMs or cache servers (not clients), and the peer VMs have a different version of the class on their classpath.
#43193 GemFire clients fail when valueConstraint is set on server and client update violates it. When a client updates a server with a value which violates the valueConstraint set for the region's value on the server, it causes the client to shut down the pool and no longer use the server (even though the server is just fine) Remove the value constraint on the server if your clients cannot honor it
#42324 Data type in Querying A query from client to server on a Long field does return correct result if 'L' is NOT used with value in WHERE condition(s), like "longID=10". Character 'L' must be used with value in where condition for client-server queries.
#42245 Clients that cycle many threads can cause a server memory leak Long-lived subscription-enabled clients that cycle many threads cause a memory leak on the server that contains their primary queue. Reuse threads on the client if possible.
#41578 PutAll partial results when socket broken at persist server The remained issue is not fixable in 6.5 (or even in 7.0). Be aware of this limitation: when keys are applied at the server with persistence, socket is suddenly broken, these keys will not be applied to the calling client. If the server restarted and recover the key from disk, key mismatch will happen between client and server.
#41487 Executing write operations on a cache inside a function can cause inconsistency when redundant copies is greater than 1 This issue occurs when the primary on which the function has partially executed is killed and it has done the following 1> It has distributed the operation to one of the two secondaries. 2> The secondary that received the operation becomes the primary. 3> Re-execution of the function happens on the new primary. The product executes the function on a thread pool. When the cache operation like say destroy, is done in the function body, it goes through the normal process of generating an event id based on member, thread and sequenceid on that node. When the retry comes in, it happens on a different node, on a different thread and has a different sequence id. So there is no way to detect this as a re-execution of a previous function. This is actually no different from the case where we put data into a region on a peer which is the primary and kill it. The redundant nodes will be inconsistent and there is nothing that can be done. Prudent practices: 1> Use a redundancy level of 1 for the partitioned region 2> If 1 is not feasible, use transaction if you need the all or nothing behavior for cache operations running inside a function Please see Bugs 41402, 43556.
#40837 Failures in application provided DataSerializable and Delta interface calls can cause data consistency If there are bugs/failures in application provided callbacks pertaining to data serialization, it can result in distribution failures which can leave two replicas in an inconsistent state. This is because the local cache is updated before the data is serialized out to the remote cache. If the remote cache encounters a failure in the deserialization code (fromData/fromDelta), then the two caches will be inconsistent wrt each other.
#40671 GemFire clients can hang in socket.close in the event of a snipped wire between client and server In versions prior to 6.5, the client server socket could hang in socket.close if a client's network connection to a server was forcefully disconnected (network was dropped) This was true for socket.connect as well. In 6.5 both issues have been addressed and this issue has been resolved.
#40489 Unexpected RegionDestroyedException encountered when trying to perform an operation on a server region Failures on the server cluster that cause a region to be shut down can cause ongoing client initiated operations to see this error. Very rare, usually points to lack of proper provisioning.
#40481 Retrying a function which is not implicitly set to be re-attempted for execution by GemFire can cause exceptions GemFire allows a function to be specified as being highly available. If this method returns true, then GemFire will automatically retry the execution on nodes that it has not tried the function before. It automatically excludes the nodes on which the execution failed previously. If application code tries to re-execute a function explicitly, it could end up retrying it on the node until the member is no longer in the membership view. We recommend using the GemFire Function HA mechanism for any function that can be re-executed after a failure is encountered in the exection attempt.
#40456 Uniformity in heap lru eviction in PR. The lru heap eviction on PR is not uniform across the bucket region.
#39302 Processing queries on strings containing quotes. When a CQ query is applied on the String objects containing quotes, the query engines results are unpredictable, resulting in CQ Events even though the object does not satisfy the CQ query condition.
#39139 Lease expiration causes locking to hang Lease expiration can cause all other lock requests on the DistributedLockService to hang. Global Region operations may hang for the same reasons. Use -1 for lock-lease to prevent lease expiration
#38836 multicast-enabled is not inherited from a region-attributes template If you define a region-attributes template in cache.xml regions created with that template will not properly pick up the multicast-enabled setting.
#35644 Server-originated events not reporting their origin as remote to the client This bug pertains to cache modifications that come from the server to the client. The isOriginRemote method returns false for all related events except entry updates. The origin should be reported as remote for all of these server-originated events.
#34400 InternalGemFireException: Could not process initial view: caused by NegativeArraySizeException (thrown from Distribution Manager.startThread()) This error indicates that a process running one version of GemFire Enterprise has tried to join a distributed system running with a different version of GemFire Enterprise. All processes in a single distributed system must run the same version of GemFire Enterprise.
#34269 Region creation returns RegionExistsException even if Region.isDestroyed is true This problem occurs when distributed region destruction and region creation happen very close together. If a region is destroyed by one VM and created right away in another VM, the creation may return a RegionExistsException even if Region.isDestroyed has just returned true for the region.