mirror of https://github.com/OpenIdentityPlatform/OpenDJ.git

gbellato
18.55.2009 d19acb303c4ff90e48fd98ce2d7ba739ca9ea2db
refs
author gbellato <gbellato@localhost>
Wednesday, November 18, 2009 17:55 +0100
committer gbellato <gbellato@localhost>
Wednesday, November 18, 2009 17:55 +0100
commitd19acb303c4ff90e48fd98ce2d7ba739ca9ea2db
tree 68709e87ca4df678518e486adaf3f19b31ca3d65 tree | zip | gz
parent 20173f7897427f51f1e2f4412b21ed371dc2ad58 view | diff
Fix for Issue 4300 : stop replication server cause OutOfMemoryError

This problem happens in the following conditions :
- use Directory Servers that do not have Replication Servers in the same JVM
- use only 2 Replication Servers
- apply a heavy load of updates on one Directory Server
- stop the first Replication Server
- wait some time long enough to perform millions of change
- Restart the First Replication Server that will therefore have millions of
change to retrieve from the second
- quickly stop the second Replication Server (before it has time to replicate
the missing changes to the first RS)

In such case, The DS will connect to the first RS, see that it missing lots of change and will attempt to re-generate them from the historical information
in the database. Unfortunately this process needs to fetch all the changes
in memory because it needs to send them to the RS in the order of the
ChangeNumbers and therefore currently sort them in memory before sending them.

This change fixes the problem by searching for changes by interval. This avoid the memory
problem because in this case, there is only the need to sort a limited number of changes and
this can fit in memory.

However this fix is not enough because this whole process is done in the replication Listener thread and this thread is also responsible for managing the replication protocol window.
Unfortunately while this thread is busy sending a lot of changes to the RS it is not able to also do the job of managing the window and this can therefore fall into a deadlock.

So a second level of changes is necessary to move the code in a separated new thread that is
created only when necessary.

This lead to the last problem that I met : the creation of this new thread caused some concurrency
problems that I had to fix by introducing some synchronization code between this new thread, the listener thread and the worker thread.
7 files modified
873 ■■■■ changed files
opends/src/server/org/opends/server/replication/plugin/LDAPReplicationDomain.java 372 ●●●● diff | view | raw | blame | history
opends/src/server/org/opends/server/replication/plugin/PendingChanges.java 52 ●●●●● diff | view | raw | blame | history
opends/src/server/org/opends/server/replication/plugin/PersistentServerState.java 12 ●●●●● diff | view | raw | blame | history
opends/src/server/org/opends/server/replication/service/ReplicationBroker.java 72 ●●●● diff | view | raw | blame | history
opends/src/server/org/opends/server/replication/service/ReplicationDomain.java 12 ●●●●● diff | view | raw | blame | history
opends/tests/unit-tests-testng/src/server/org/opends/server/TestCaseUtils.java 18 ●●●● diff | view | raw | blame | history
opends/tests/unit-tests-testng/src/server/org/opends/server/replication/plugin/HistoricalCsnOrderingTest.java 335 ●●●●● diff | view | raw | blame | history