From a46f615af32542cbb07fa35ef9a987e4a3f65ee6 Mon Sep 17 00:00:00 2001
From: Mark Craig <mark.craig@forgerock.com>
Date: Mon, 27 Jun 2011 16:16:00 +0000
Subject: [PATCH] First draft troubleshooting chapter. Of course, a troubleshooting chapter is never really done, but this one aims to hit some of the highlights.
---
opendj3/src/main/docbkx/admin-guide/chap-troubleshooting.xml | 222 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 222 insertions(+), 0 deletions(-)
diff --git a/opendj3/src/main/docbkx/admin-guide/chap-troubleshooting.xml b/opendj3/src/main/docbkx/admin-guide/chap-troubleshooting.xml
index d186596..659ae6e 100644
--- a/opendj3/src/main/docbkx/admin-guide/chap-troubleshooting.xml
+++ b/opendj3/src/main/docbkx/admin-guide/chap-troubleshooting.xml
@@ -34,5 +34,227 @@
<para>This chapter describes how to troubleshoot common server problems,
and how to collect information necessary when seeking support help.</para>
+
+ <section>
+ <title>Identifying the Problem</title>
+
+ <para>In order to solve your problem methodically, save time by defining the
+ problem clearly up front. In a replicated environment with multiple directory
+ servers and many client applications, it can be particularly important to
+ pin down not only the problem (difference in observed behavior compared to
+ expected behavior), but also the circumstances and steps that lead to the
+ problem occurring.</para>
+
+ <itemizedlist>
+ <para>Answer the following questions.</para>
+
+ <listitem>
+ <para>How do you reproduce the problem?</para>
+ </listitem>
+
+ <listitem>
+ <para>What exactly is the problem? In other words, what is the behavior
+ you expected? What is the behavior you observed?</para>
+ </listitem>
+
+ <listitem>
+ <para>When did the problem start occurring? Under similar circumstances,
+ when does the problem not occur?</para>
+ </listitem>
+
+ <listitem>
+ <para>Is the problem permanent? Intermittent? Is it getting worse?
+ Getting better? Staying the same?</para>
+ </listitem>
+ </itemizedlist>
+
+ <para>Pinpointing the problem can sometimes indicate where you should
+ start looking for solutions.</para>
+ </section>
+
+ <section>
+ <title>Troubleshooting Installation & Upgrade</title>
+
+ <para>Installation and upgrade procedures result in a log file tracing
+ the operation. The log location differs by operating system, but look for
+ lines in the command output of the following form.</para>
+
+ <literallayout>See /var/....log for a detailed log of this operation.</literallayout>
+ </section>
+
+ <section>
+ <title>Troubleshooting LDIF Import</title>
+
+ <para>By default OpenDJ requires that LDIF data you import respect standards.
+ In particular, OpenDJ is set to check that entries to import match the
+ schema defined for the server. You can temporarily bypass this check by using
+ the <option>--skipSchemaValidation</option> with the
+ <command>import-ldif</command> command.</para>
+
+ <para>OpenDJ also ensures by default that entries have only one inheritance
+ of structural object classes. You can relax this behavior by using the
+ advanced global configuration property,
+ <literal>single-structural-objectclass-behavior</literal>. This can be useful
+ when importing data exported from Sun Directory Server. For example, to
+ warn when entries have more than one structural object class instead of
+ reject such entries being added, set the property as follows.</para>
+
+ <screen width="80">$ dsconfig -p 4444 -h `hostname` -D "cn=Directory Manager" -w password \
+> set-global-configuration-prop
+> --set single-structural-objectclass-behavior:warn -X -n</screen>
+
+ <para>By default, OpenDJ also checks syntax for a number of attribute types.
+ You can relax this behavior as well by using the <command>dsconfig
+ set-attribute-syntax-prop</command> command. See the list of attribute
+ syntaxes and use the <option>--help</option> option for further
+ information.</para>
+
+ <para>When running <command>import-ldif</command>, you can use the <option>-R
+ <replaceable>rejectFile</replaceable></option> option to capture entries that
+ could not be imported, and the <option>--countRejects</option> option to
+ return the number of rejected entries as the <command>import-ldif</command>
+ exit code.</para>
+
+ <para>Once you work through the issues with your LDIF data, reinstate the
+ default behavior to ensure automated checking.</para>
+ </section>
+
+ <section>
+ <title>Troubleshooting TLS/SSL Connections</title>
+
+ <para>In order to trust the server certificate, client applications usually
+ compare the signature on certificates with those of the Certificate
+ Authorities (CAs) whose certificates are distributed with the client
+ software. For example, the Java environment is distributed with a key store
+ holding many CA certificates.</para>
+
+ <screen width="80">$ keytool -list -keystore $JAVA_HOME/lib/security/cacerts -storepass changeit \
+> | wc -l
+ 334</screen>
+
+ <para>The self-signed server certificates that can be configured during
+ OpenDJ setup are not recognized as being signed by any CAs. Your software
+ therefore is configured not to trust the self-signed certificates by
+ default. You must either configure the client applications to accept the
+ self-signed certificates, or else use certificates signed by recognized
+ CAs.</para>
+
+ <para>You can further debug the network traffic by collecting debug traces.
+ To see the traffic going over TLS/SSL in debug mode, configure OpenDJ to dump
+ debug traces from <literal>javax.net.debug</literal> into the
+ <filename>logs/server.out</filename> file.</para>
+
+ <screen width="80">OPENDJ_JAVA_ARGS="-Djavax.net.debug=all" start-ds</screen>
+ </section>
+
+ <section>
+ <title>Troubleshooting Client Operations</title>
+
+ <para>By default OpenDJ logs information about all client operations in
+ <filename>logs/access</filename>. The following lines are wrapped for
+ readability, showing a search for the entry with
+ <literal>uid=bjensen</literal> as traced in the access log. In the access
+ log itself, each line starts with a time stamp.</para>
+
+ <screen width="80">[27/Jun/2011:17:23:00 +0200] CONNECT conn=19 from=127.0.0.1:56641
+ to=127.0.0.1:1389 protocol=LDAP
+[27/Jun/2011:17:23:00 +0200] SEARCH REQ conn=19 op=0 msgID=1
+ base="dc=example,dc=com" scope=wholeSubtree filter="(uid=bjensen)" attrs="ALL"
+[27/Jun/2011:17:23:00 +0200] SEARCH RES conn=19 op=0 msgID=1
+ result=0 nentries=1 etime=3
+[27/Jun/2011:17:23:00 +0200] UNBIND REQ conn=19 op=1 msgID=2
+[27/Jun/2011:17:23:00 +0200] DISCONNECT conn=19 reason="Client Unbind"</screen>
+
+ <para>As you see, each client connection and set of LDAP operations are
+ traced, starting with a time stamp and information about the operation
+ performed, then including information about the connection, the operation
+ number for the sequence of operations performed by the client, a message
+ identification number, and additional information about the operation.</para>
+
+ <para>Do help diagnose errors due to access permissions, OpenDJ supports the
+ get effective rights control. The control OID,
+ <literal>1.3.6.1.4.1.42.2.27.9.5.2</literal>, is not allowed by the default
+ global ACIs. You must therefore add access to use the get effective rights
+ control when not using it as Directory Manager.</para>
+ </section>
+
+ <section>
+ <title>Troubleshooting Replication</title>
+
+ <para>Replication can generally recover from conflicts and transient issues.
+ Replication does, however, require that update operations be copied
+ from server to server. It is therefore possible to experience temporary
+ delays while replicas converge, especially when the write operation load is
+ heavy. OpenDJ's tolerance for temporary divergence between replicas is what
+ allows OpenDJ to remain available to serve client applications even when
+ networks linking the replicas go down.</para>
+
+ <para>In other words, the fact that directory services are loosely convergent
+ rather than transactional is a feature, not a bug.</para>
+
+ <para>That said, you may encounter errors. Replication uses its own error log
+ file, <filename>logs/replication</filename>. Error messages in the log file
+ have <literal>category=SYNC</literal>. The messages have the following form.
+ Here the line is folded for readability.</para>
+
+ <screen width="80">
+[27/Jun/2011:14:37:48 +0200] category=SYNC severity=INFORMATION msgID=14680169
+ msg=Replication server accepted a connection from 10.10.0.10/10.10.0.10:52859
+ to local address 0.0.0.0/0.0.0.0:8989 but the SSL handshake failed. This is
+ probably benign, but may indicate a transient network outage or a
+ misconfigured client application connecting to this replication server.
+ The error was: Remote host closed connection during handshake</screen>
+
+ <para>Replicas can become irrevocably out of sync if for example you restore
+ a replica from backup with a backup archive older than the last time
+ historical information for replication was purged from the system. If this
+ happens to you, disable the replica, and then reinitialize it with newer
+ data.</para>
+ </section>
+
+ <section>
+ <title>Asking For Help</title>
+
+ <para>When you cannot resolve a problem yourself, and want to ask for help,
+ clearly identify the problem and how you reproduce it, and also the version
+ of OpenDJ you use to reproduce the problem. The version includes both a
+ version number and also a build time stamp.</para>
+
+ <screen width="80">$ dsconfig --version
+OpenDJ <?eval ${project.version}?>
+Build <replaceable>yyyymmddhhmmss</replaceable>Z</screen>
+
+ <itemizedlist>
+
+ <para>Be ready to provide additional information, too.</para>
+
+ <listitem>
+ <para>The output from the <command>java -version</command> command.</para>
+ </listitem>
+
+ <listitem>
+ <para><filename>access</filename> and <filename>errors</filename> logs
+ showing what the server was doing when the problem started occurring</para>
+ </listitem>
+
+ <listitem>
+ <para>A copy of the server configuration file,
+ <filename>config/config.ldif</filename>, in use when the problem started
+ occurring</para>
+ </listitem>
+
+ <listitem>
+ <para>Other relevant logs or output, such as those from client applications
+ experiencing the problem</para>
+ </listitem>
+
+ <listitem>
+ <para>A description of the environment where OpenDJ is running, including
+ system characteristics, host names, IP addresses, Java versions, storage
+ characteristics, and network characteristics. This helps to understand
+ the logs, and other information.</para>
+ </listitem>
+ </itemizedlist>
+ </section>
</chapter>
--
Gitblit v1.10.0