From 9e1f377c4f21b899d16f4c62450c68691f4b42a8 Mon Sep 17 00:00:00 2001 From: Ludovic Poitou <ludovic.poitou@forgerock.com> Date: Thu, 20 Jun 2013 15:02:35 +0000 Subject: [PATCH] Fix for OPENDJ-846, Intermittent Replication failure. The issue was triggered by the mix of AssuredReplication and bad network conditions, which resulted in a deadlock between 2 RS, as both were blocked on writing to the TCP socket and not reading (because waiting on the write lock). The solution (more of a workaround) is to have another thread for sending data to the socket and have the reader and writer posting data to send to a queue that this new thread is polling. There are still potential deadlocks but they will occur much later, if the sendQueue gets full. The code needs more work post 2.6 to be fully non blocking, but the changes are enough for now to resolve the customer deadlock case. --- opends/src/messages/messages/replication.properties | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/opends/src/messages/messages/replication.properties b/opends/src/messages/messages/replication.properties index 066a26f..69af0e6 100644 --- a/opends/src/messages/messages/replication.properties +++ b/opends/src/messages/messages/replication.properties @@ -542,4 +542,5 @@ SEVERE_WARN_INVALID_SYNC_HIST_VALUE_214=The attribute value '%s' is not a valid \ synchronization history value SEVERE_ERR_REPLICATIONDB_CANNOT_PROCESS_CHANGE_RECORD_215=Replication server RS(%d) \ - failed to parse change record with changenumber %s from the database. Error: %s \ No newline at end of file + failed to parse change record with changenumber %s from the database. Error: %s +SEVERE_ERR_SESSION_STARTUP_INTERRUPTED_216=%s was interrupted in the startup phase -- Gitblit v1.10.0