From 18d271a358e25ff92166875e5e5e8f759f10eb18 Mon Sep 17 00:00:00 2001 From: Mark Craig <mark.craig@forgerock.com> Date: Thu, 30 May 2013 16:31:09 +0000 Subject: [PATCH] CR-1752 Fix for OPENDJ-869: Add docs describing replication failover --- opendj3/src/main/docbkx/admin-guide/chap-replication.xml | 267 ++++++++++++++++++++++++++++++++++++++--------------- 1 files changed, 190 insertions(+), 77 deletions(-) diff --git a/opendj3/src/main/docbkx/admin-guide/chap-replication.xml b/opendj3/src/main/docbkx/admin-guide/chap-replication.xml index b346f20..235f80e 100644 --- a/opendj3/src/main/docbkx/admin-guide/chap-replication.xml +++ b/opendj3/src/main/docbkx/admin-guide/chap-replication.xml @@ -102,7 +102,7 @@ </mediaobject> </section> - + <section xml:id="about-repl"> <title>About Replication</title> <indexterm> @@ -113,86 +113,199 @@ <para>Before you take replication further than setting up replication in the setup wizard, read this section to learn more about how OpenDJ replication works.</para> - - <para>Replication is the process of copying updates between OpenDJ - directory servers such that all servers converge on identical copies of - directory data. Replication is designed to let convergence happen over - time by default. <footnote><para>Assured replication can require, however, - that the convergence happen before the client application is notified that - the operation was successful.</para></footnote> Letting convergence - happen over time means that different replicas can be momentarily out of - sync, but it also means that if you lose an individual server or even an - entire data center, your directory service can keep on running, and then - get back in sync when the servers are restarted or the network is - repaired.</para> - - <para>Replication is specific to the OpenDJ directory service. Replication - uses a specific protocol that replays update operations quickly, storing - enough historical information about the updates to resolve most conflicts - automatically. For example, if two client applications separately update - a user entry to change the phone number, replication can work out which - was the latest change, and apply that change across servers. The historical - information needed to resolve these issues is periodically purged to avoid - growing larger and larger forever. As a directory administrator, you must - ensure that you do not purge the historical information more often than you - backup your directory data.</para> - - <para>The primary unit of replication is the suffix, specified by a - base DN such as <literal>dc=example,dc=com</literal>.<footnote><para>When - you configure partial and fractional replication, however, you can replicate - only part of a suffix, or only certain attributes on entries. Also, - if you split your suffix across multiple backends, then you need to set up - replication separately for each part of suffix in a different backend.</para> - </footnote> Replication also depends on the directory schema, defined on - <literal>cn=schema</literal>, and the <literal>cn=admin data</literal> - suffix with administrative identities and certificates for protecting - communications. Thus that content gets replicated as well.</para> - - <para>The set of OpenDJ servers replicating data for a given suffix is - called a replication topology. You can have more than one replication - topology. For example, one topology could be devoted to - <literal>dc=example,dc=com</literal>, and another to - <literal>dc=example,dc=org</literal>. OpenDJ servers are capable of - serving more than one suffix. They are also capable of participating in - more than one replication topology.</para> - <mediaobject xml:id="figure-replication-topologies-right"> - <alt>Three replication topologies set up correctly</alt> - <imageobject> - <imagedata fileref="images/repl-topologies-right.png" format="PNG" /> - </imageobject> - <textobject> - <para>In this figure, all OpenDJ servers serve the replicated suffix - <literal>dc=example,dc=com</literal>. Only servers A and B serve - <literal>dc=example,dc=org</literal>. Only server C and D serve - <literal>dc=example,dc=net</literal>.</para> - </textobject> - </mediaobject> + <section xml:id="repl-what-it-is"> + <title>What Replication Is</title> - <para>Within a replication topology, the suffixes being replicated are - identified to the replication servers by their DN. As all the replication - servers are fully connected in a topology, a consequence is that it is - impossible to have multiple "sub-topologies" within the overall set of - servers as illustrated in the following diagram.</para> + <para>Replication is the process of copying updates between OpenDJ + directory servers such that all servers converge on identical copies of + directory data. Replication is designed to let convergence happen over + time by default. <footnote><para>Assured replication can require, however, + that the convergence happen before the client application is notified that + the operation was successful.</para></footnote> Letting convergence + happen over time means that different replicas can be momentarily out of + sync, but it also means that if you lose an individual server or even an + entire data center, your directory service can keep on running, and then + get back in sync when the servers are restarted or the network is + repaired.</para> - <mediaobject xml:id="figure-replication-topologies-wrong"> - <alt>Two replication topologies, one of which does not work</alt> - <imageobject> - <imagedata fileref="images/repl-topologies-wrong.png" format="PNG" /> - </imageobject> - <textobject> - <para>You cannot have all servers replicating both - <literal>dc=example,dc=com</literal> and also - <literal>dc=example,dc=org</literal>, but with all servers connected for - <literal>dc=example,dc=com</literal> and only some of the servers - connected for <literal>dc=example,dc=org</literal>.</para> - </textobject> - </mediaobject> + <para>Replication is specific to the OpenDJ directory service. Replication + uses a specific protocol that replays update operations quickly, storing + enough historical information about the updates to resolve most conflicts + automatically. For example, if two client applications separately update + a user entry to change the phone number, replication can work out which + was the latest change, and apply that change across servers. The historical + information needed to resolve these issues is periodically purged to avoid + growing larger and larger forever. As a directory administrator, you must + ensure that you do not purge the historical information more often than you + backup your directory data.</para> - <para>Keep server clocks synchronized for your topology. You can use NTP for - example. Keeping server clocks synchronized helps prevent issues with SSL - connections and with replication itself. Keeping server clocks synchronized - also makes it easier to compare timestamps from multiple servers.</para> + <para>Keep server clocks synchronized for your topology. You can use NTP for + example. Keeping server clocks synchronized helps prevent issues with SSL + connections and with replication itself. Keeping server clocks synchronized + also makes it easier to compare timestamps from multiple servers.</para> + </section> + + <section xml:id="repl-per-suffix"> + <title>Replication Per Suffix</title> + + <para>The primary unit of replication is the suffix, specified by a + base DN such as <literal>dc=example,dc=com</literal>.<footnote><para>When + you configure partial and fractional replication, however, you can replicate + only part of a suffix, or only certain attributes on entries. Also, + if you split your suffix across multiple backends, then you need to set up + replication separately for each part of suffix in a different backend.</para> + </footnote> Replication also depends on the directory schema, defined on + <literal>cn=schema</literal>, and the <literal>cn=admin data</literal> + suffix with administrative identities and certificates for protecting + communications. Thus that content gets replicated as well.</para> + + <para>The set of OpenDJ servers replicating data for a given suffix is + called a replication topology. You can have more than one replication + topology. For example, one topology could be devoted to + <literal>dc=example,dc=com</literal>, and another to + <literal>dc=example,dc=org</literal>. OpenDJ servers are capable of + serving more than one suffix. They are also capable of participating in + more than one replication topology.</para> + + <mediaobject xml:id="figure-replication-topologies-right"> + <alt>Three replication topologies set up correctly</alt> + <imageobject> + <imagedata fileref="images/repl-topologies-right.png" format="PNG" /> + </imageobject> + <textobject> + <para>In this figure, all OpenDJ servers serve the replicated suffix + <literal>dc=example,dc=com</literal>. Only servers A and B serve + <literal>dc=example,dc=org</literal>. Only server C and D serve + <literal>dc=example,dc=net</literal>.</para> + </textobject> + </mediaobject> + + <para>Within a replication topology, the suffixes being replicated are + identified to the replication servers by their DN. As all the replication + servers are fully connected in a topology, a consequence is that it is + impossible to have multiple "sub-topologies" within the overall set of + servers as illustrated in the following diagram.</para> + + <mediaobject xml:id="figure-replication-topologies-wrong"> + <alt>Two replication topologies, one of which does not work</alt> + <imageobject> + <imagedata fileref="images/repl-topologies-wrong.png" format="PNG" /> + </imageobject> + <textobject> + <para>You cannot have all servers replicating both + <literal>dc=example,dc=com</literal> and also + <literal>dc=example,dc=org</literal>, but with all servers connected for + <literal>dc=example,dc=com</literal> and only some of the servers + connected for <literal>dc=example,dc=org</literal>.</para> + </textobject> + </mediaobject> + </section> + + <section xml:id="repl-connection-selection"> + <title>Replication Connection Selection</title> + + <para>In order to understand what happens when individual servers stop + responding due to a network partition or a crash, know that OpenDJ can + offer both directory service and also replication service, and the two + services are not the same, even if they can run alongside each other in + the same OpenDJ server in the same Java Virtual Machine.</para> + + <para>Replication relies on the replication service provided by OpenDJ + replication servers, where OpenDJ directory servers publish changes made + to their data, and subscribe to changes published by other OpenDJ directory + servers. A replication server manages replication data only, handling + replication traffic with directory servers and with other replication + servers, receiving, sending, and storing only changes to directory data + rather than directory data itself. Once a replication server is connected + to a replication topology, it maintains connections to all other + replication servers in that topology.</para> + + <para>A directory server handles directory data. It responds to requests, + stores directory data and historical information. For each replicated + suffix, such as <literal>dc=example,dc=com</literal>, + <literal>cn=schema</literal> and <literal>cn=admin data</literal>, the + directory server publishes changes to a replication server, and subscribes + to changes from that replication server. (Directory servers do not publish + changes to other directory servers.) A directory server also resolves any + conflicts that arise when reconciling changes from other directory servers, + using the historical information about changes to resolve the conflicts. + (Conflict resolution is the responsibility of the directory server rather + than the replication server.)</para> + + <para>Once a directory server is connected to a replication topology for a + particular suffix, it connects to one replication server at a time for that + suffix. The replication server provides the directory server with a list of + all replication servers for that suffix. Given the list of possible + replication servers to which it can connect, the directory server can + determine which replication server to connect to when starting up, or when + the current connection is lost or becomes unresponsive.</para> + + <orderedlist> + <para>For each replicated suffix, a directory server prefers to connect to + a replication server:</para> + + <listitem> + <para>In the same group as the directory server</para> + </listitem> + + <listitem> + <para>Having the same initial data for the suffix as the directory + server</para> + </listitem> + + <listitem> + <para>If initial data were the same, having all the latest changes from + the directory server</para> + </listitem> + + <listitem> + <para>Running in the same Java Virtual Machine as the directory + server</para> + </listitem> + + <listitem> + <para>Having the most available capacity relative to other eligible + replication servers</para> + + <para>Available capacity depends on how many directory servers in the + topology are already connected to a replication server, and what + proportion of all directory servers in the topology ought to be connected + to the replication server.</para> + + <para>To determine what proportion of the total number of directory + servers should be connected to a replication server, OpenDJ uses + replication server weight. When configuring a replication server, you + can assign it a weight (default: 1). The weight property takes an integer + that indicates capacity to provide replication service relative to other + servers. For example, a weight of 2 would indicate a replication server + that can handle twice as many connected servers as a replication server + with weight 1.</para> + + <para>The proportion of directory servers in a topology that should be + connected to a given replication server is equal to (replication server + weight)/(sum of replication server weights). In other words, if there are + 4 replication servers in a topology each with default weights, the + proportion for each replication server is 1/4.</para> + </listitem> + </orderedlist> + + <para>Consider a situation where 7 directory servers are connected to + replication servers A, B, C, and D for <literal>dc=example,dc=com</literal> + data. Suppose 2 directory servers each are connected to A, B, and C, and 1 + directory server is connected to replication server D. Replication server D + is therefore the server with the most available capacity relative to other + replication servers in the topology. All other criteria being equal, + replication server D is the server to connect to when an 8th directory + server joins the topology.</para> + + <para>The directory server regularly updates the list of replication servers + in case it must reconnect. As available capacity of replication servers for + each replication topology can change dynamically, a directory server can + potentially reconnect to another replication server to balance the + replication load in the topology. For this reason the server can also end + up connected to different replication servers for different suffixes.</para> + </section> </section> <section xml:id="configure-repl"> -- Gitblit v1.10.0