<?xml version="1.0" encoding="UTF-8"?>
|
<!--
|
! CCPL HEADER START
|
!
|
! This work is licensed under the Creative Commons
|
! Attribution-NonCommercial-NoDerivs 3.0 Unported License.
|
! To view a copy of this license, visit
|
! http://creativecommons.org/licenses/by-nc-nd/3.0/
|
! or send a letter to Creative Commons, 444 Castro Street,
|
! Suite 900, Mountain View, California, 94041, USA.
|
!
|
! You can also obtain a copy of the license at
|
! trunk/opendj3/legal-notices/CC-BY-NC-ND.txt.
|
! See the License for the specific language governing permissions
|
! and limitations under the License.
|
!
|
! If applicable, add the following below this CCPL HEADER, with the fields
|
! enclosed by brackets "[]" replaced with your own identifying information:
|
! Portions Copyright [yyyy] [name of copyright owner]
|
!
|
! CCPL HEADER END
|
!
|
! Copyright 2011-2012 ForgeRock AS
|
!
|
-->
|
<chapter xml:id='chap-troubleshooting'
|
xmlns='http://docbook.org/ns/docbook'
|
version='5.0' xml:lang='en'
|
xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
|
xsi:schemaLocation='http://docbook.org/ns/docbook http://docbook.org/xml/5.0/xsd/docbook.xsd'
|
xmlns:xlink='http://www.w3.org/1999/xlink'
|
xmlns:xinclude='http://www.w3.org/2001/XInclude'>
|
<title>Troubleshooting Server Problems</title>
|
<indexterm><primary>Troubleshooting</primary></indexterm>
|
|
<para>This chapter describes how to troubleshoot common server problems,
|
and how to collect information necessary when seeking support help.</para>
|
|
<section xml:id="troubleshoot-identify-problem">
|
<title>Identifying the Problem</title>
|
|
<para>In order to solve your problem methodically, save time by defining the
|
problem clearly up front. In a replicated environment with multiple directory
|
servers and many client applications, it can be particularly important to
|
pin down not only the problem (difference in observed behavior compared to
|
expected behavior), but also the circumstances and steps that lead to the
|
problem occurring.</para>
|
|
<itemizedlist>
|
<para>Answer the following questions.</para>
|
|
<listitem>
|
<para>How do you reproduce the problem?</para>
|
</listitem>
|
|
<listitem>
|
<para>What exactly is the problem? In other words, what is the behavior
|
you expected? What is the behavior you observed?</para>
|
</listitem>
|
|
<listitem>
|
<para>When did the problem start occurring? Under similar circumstances,
|
when does the problem not occur?</para>
|
</listitem>
|
|
<listitem>
|
<para>Is the problem permanent? Intermittent? Is it getting worse?
|
Getting better? Staying the same?</para>
|
</listitem>
|
</itemizedlist>
|
|
<para>Pinpointing the problem can sometimes indicate where you should
|
start looking for solutions.</para>
|
</section>
|
|
<section xml:id="troubleshoot-installation">
|
<title>Troubleshooting Installation & Upgrade</title>
|
|
<para>Installation and upgrade procedures result in a log file tracing
|
the operation. The log location differs by operating system, but look for
|
lines in the command output of the following form.</para>
|
|
<literallayout class="monospaced">See /var/....log for a detailed log of this operation.</literallayout>
|
</section>
|
|
<section xml:id="troubleshoot-import">
|
<title>Troubleshooting LDIF Import</title>
|
|
<para>By default OpenDJ requires that LDIF data you import respect standards.
|
In particular, OpenDJ is set to check that entries to import match the
|
schema defined for the server. You can temporarily bypass this check by using
|
the <option>--skipSchemaValidation</option> with the
|
<command>import-ldif</command> command.</para>
|
|
<para>OpenDJ also ensures by default that entries have only one inheritance
|
of structural object classes. You can relax this behavior by using the
|
advanced global configuration property,
|
<literal>single-structural-objectclass-behavior</literal>. This can be useful
|
when importing data exported from Sun Directory Server. For example, to
|
warn when entries have more than one structural object class instead of
|
reject such entries being added, set the property as follows.</para>
|
|
<screen>$ dsconfig
|
set-global-configuration-prop
|
--port 4444
|
--hostname `hostname`
|
--bindDN "cn=Directory Manager"
|
--bindPassword password
|
--set single-structural-objectclass-behavior:warn
|
--trustAll
|
--no-prompt</screen>
|
|
<para>By default, OpenDJ also checks syntax for a number of attribute types.
|
You can relax this behavior as well by using the <command>dsconfig
|
set-attribute-syntax-prop</command> command. See the list of attribute
|
syntaxes and use the <option>--help</option> option for further
|
information.</para>
|
|
<para>When running <command>import-ldif</command>, you can use the <option>-R
|
<replaceable>rejectFile</replaceable></option> option to capture entries that
|
could not be imported, and the <option>--countRejects</option> option to
|
return the number of rejected entries as the <command>import-ldif</command>
|
exit code.</para>
|
|
<para>Once you work through the issues with your LDIF data, reinstate the
|
default behavior to ensure automated checking.</para>
|
</section>
|
|
<section xml:id="troubleshoot-secure-connections">
|
<title>Troubleshooting TLS/SSL Connections</title>
|
|
<para>In order to trust the server certificate, client applications usually
|
compare the signature on certificates with those of the Certificate
|
Authorities (CAs) whose certificates are distributed with the client
|
software. For example, the Java environment is distributed with a key store
|
holding many CA certificates.</para>
|
|
<screen>$ keytool -list -keystore $JAVA_HOME/lib/security/cacerts -storepass changeit
|
| wc -l
|
334</screen>
|
|
<para>The self-signed server certificates that can be configured during
|
OpenDJ setup are not recognized as being signed by any CAs. Your software
|
therefore is configured not to trust the self-signed certificates by
|
default. You must either configure the client applications to accept the
|
self-signed certificates, or else use certificates signed by recognized
|
CAs.</para>
|
|
<para>You can further debug the network traffic by collecting debug traces.
|
To see the traffic going over TLS/SSL in debug mode, configure OpenDJ to dump
|
debug traces from <literal>javax.net.debug</literal> into the
|
<filename>logs/server.out</filename> file.</para>
|
|
<screen>OPENDJ_JAVA_ARGS="-Djavax.net.debug=all" start-ds</screen>
|
|
<section xml:id="troubleshoot-certificate-authentication">
|
<title>Troubleshooting Certificates & SSL Authentication</title>
|
|
<para>Replication uses SSL to protect directory data on the network.
|
In some configurations, replica can fail to connect to each other due
|
to SSL handshake errors. This leads to error log messages such as the
|
following.</para>
|
|
<screen>[21/Nov/2011:13:03:20 -0600] category=SYNC severity=NOTICE
|
msgID=15138921 msg=SSL connection attempt from myserver (123.456.789.012)
|
failed: Remote host closed connection during handshake</screen>
|
|
<itemizedlist>
|
<para>Notice these problem characteristics in the message above.</para>
|
<listitem>
|
<para>The host name, <literal>myserver</literal>, is not fully
|
qualified.</para>
|
<para>You should not see non fully qualified host names in the error logs.
|
Non fully qualified host names are a sign that an OpenDJ server has not
|
been configured properly.</para>
|
<para>Always install and configure OpenDJ using fully-qualified host names.
|
The OpenDJ administration connector, which is used by the
|
<command>dsconfig</command> command, and also replication depend upon SSL
|
and, more specifically, self-signed certificates for establishing SSL
|
connections. If the host name used for connection establishment does not
|
correspond to the host name stored in the SSL certificate then the SSL
|
handshake can fail. For the purposes of establishing the SSL connection,
|
a host name like <literal>myserver</literal> does not match
|
<literal>myserver.example.com</literal>, and vice versa.</para>
|
</listitem>
|
<listitem>
|
<para>The connection succeeded, but the SSL handshake failed, suggesting
|
a problem with authentication or with the cipher or protocol negotiation.
|
As most deployments use the same Java Virtual Machine, and the same JVM
|
configuration for each replica, the problem is likely not related to SSL
|
cipher or protocol negotiation, but instead lies with authentication.</para>
|
</listitem>
|
</itemizedlist>
|
|
<orderedlist>
|
<para>Follow these steps on each OpenDJ server to check whether the problem
|
lies with the host name configuration.</para>
|
<listitem>
|
<para>Make sure each OpenDJ server uses only fully qualified host names in
|
the replication configuration. You can obtain a quick summary by running
|
the following command against each server's configuration.</para>
|
<screen>$ grep ds-cfg-replication-server: config/config.ldif | sort | uniq</screen>
|
</listitem>
|
<listitem>
|
<para>Make sure that the host names in OpenDJ certificates also contain
|
fully qualified host names, and correspond to the host names found in the
|
previous step.</para>
|
<screen># Examine the certificates used for the administration connector.
|
$ keytool -list -v -keystore config/admin-truststore
|
-storepass `cat config/admin-keystore.pin` |grep "^Owner:"
|
|
# Examine the certificates used for replication.
|
$ keytool -list -v -keystore config/ads-truststore
|
-storepass `cat config/ads-truststore.pin`| grep "^Owner:"
|
</screen>
|
</listitem>
|
</orderedlist>
|
|
<para>Sample output for a server on host <literal>opendj.example.com</literal>
|
follows.</para>
|
<screen>$ grep ds-cfg-replication-server: config/config.ldif |sort | uniq
|
ds-cfg-replication-server: opendj.example.com:8989
|
ds-cfg-replication-server: opendj.example.com:9989
|
|
$ keytool -list -v -keystore config/admin-truststore
|
-storepass `cat config/admin-keystore.pin` | grep "^Owner:"
|
Owner: CN=opendj.example.com, O=Administration Connector Self-Signed Certificate
|
|
$ keytool -list -v -keystore config/ads-truststore
|
-storepass `cat config/ads-truststore.pin`| grep "^Owner:"
|
Owner: CN=opendj.example.com, O=OpenDJ Certificate
|
Owner: CN=opendj.example.com, O=OpenDJ Certificate
|
Owner: CN=opendj.example.com, O=OpenDJ Certificate</screen>
|
|
<itemizedlist>
|
<para>Unfortunately there is no easy solution to badly configured host
|
names. It is often easier and quicker simply to reinstall your OpenDJ
|
servers remembering to use fully qualified host names everywhere.</para>
|
<listitem>
|
<para>When using the <command>setup</command> tool to install and
|
configure a server ensure that the <option>-h</option> option is
|
included, and that it specifies the fully qualified host name. Make sure
|
you include this option even if you are not enabling SSL/StartTLS LDAP
|
connections (see <link
|
xlink:href="https://bugster.forgerock.org/jira/browse/OPENDJ-363"
|
>OPENDJ-363</link>).</para>
|
<para>If you are using the GUI installer, then make sure you specify the
|
fully qualified host name on the first page of the wizard.</para>
|
</listitem>
|
<listitem>
|
<para>When using the <command>dsreplication</command> tool to enable
|
replication make sure that any <option>--host</option> options include the
|
fully qualified host name.</para>
|
</listitem>
|
</itemizedlist>
|
|
<orderedlist>
|
<para>If you cannot reinstall the server, follow these steps.</para>
|
<listitem>
|
<para>Disable replication in each replica.</para>
|
<screen>$ dsreplication
|
disable
|
--disableAll
|
--port <replaceable>adminPort</replaceable>
|
--hostname <replaceable>hostName</replaceable>
|
--bindDN "cn=Directory Manager"
|
--adminPassword <replaceable>password</replaceable>
|
--trustAll
|
--no-prompt</screen>
|
</listitem>
|
<listitem>
|
<para>Stop and restart each server in order to clear the in-memory ADS
|
trust store backend.</para>
|
</listitem>
|
<listitem>
|
<para>Enable replication making certain that fully qualified host names
|
are used throughout</para>
|
<screen>$ dsreplication
|
enable
|
--adminUID admin
|
--adminPassword <replaceable>password</replaceable>
|
--baseDN dc=example,dc=com
|
--host1 <replaceable>hostName1</replaceable>
|
--port1 <replaceable>adminPort1</replaceable>
|
--bindDN1 "cn=Directory Manager"
|
--bindPassword1 <replaceable>password</replaceable>
|
--replicationPort1 <replaceable>replPort1</replaceable>
|
--host2 <replaceable>hostName2</replaceable>
|
--port2 <replaceable>adminPort2</replaceable>
|
--bindDN2 "cn=Directory Manager"
|
--bindPassword2 <replaceable>password</replaceable>
|
--replicationPort2 <replaceable>replPort2</replaceable>
|
--trustAll
|
--no-prompt</screen>
|
</listitem>
|
<listitem>
|
<para>Repeat the previous step for each remaining replica. In other words,
|
host1 with host2, host1 with host3, host1 with host4, ..., host1 with
|
hostN.</para>
|
</listitem>
|
<listitem>
|
<para>Initialize all remaining replica with the data from host1.</para>
|
<screen>$ dsreplication
|
initialize-all
|
--adminUID admin
|
--adminPassword password
|
--baseDN dc=example,dc=com
|
--hostname <replaceable>hostName1</replaceable>
|
--port 4444
|
--trustAll
|
--no-prompt</screen>
|
</listitem>
|
<listitem>
|
<para>Check that the host names are correct in the configuration and in
|
the key stores by following the steps you used to check for host name
|
problems. The only broken host name remaining should be in the key and
|
trust stores for the administration connector.</para>
|
<screen>$ keytool -list -v -keystore config/admin-truststore
|
-storepass `cat config/admin-keystore.pin` |grep "^Owner:"</screen>
|
</listitem>
|
<listitem>
|
<para>Stop each server, and then fix the remaining admin connector
|
certificate as described here in the procedure about <link
|
xlink:href="admin-guide#change-server-certificates"
|
xlink:role="http://docbook.org/xlink/role/olink">changing server
|
certificates</link>.</para>
|
</listitem>
|
</orderedlist>
|
</section>
|
</section>
|
|
<section xml:id="troubleshoot-connections">
|
<title>Troubleshooting Client Operations</title>
|
|
<para>By default OpenDJ logs information about all client operations in
|
<filename>logs/access</filename>. The following lines are wrapped for
|
readability, showing a search for the entry with
|
<literal>uid=bjensen</literal> as traced in the access log. In the access
|
log itself, each line starts with a time stamp.</para>
|
|
<screen>[27/Jun/2011:17:23:00 +0200] CONNECT conn=19 from=127.0.0.1:56641
|
to=127.0.0.1:1389 protocol=LDAP
|
[27/Jun/2011:17:23:00 +0200] SEARCH REQ conn=19 op=0 msgID=1
|
base="dc=example,dc=com" scope=wholeSubtree filter="(uid=bjensen)" attrs="ALL"
|
[27/Jun/2011:17:23:00 +0200] SEARCH RES conn=19 op=0 msgID=1
|
result=0 nentries=1 etime=3
|
[27/Jun/2011:17:23:00 +0200] UNBIND REQ conn=19 op=1 msgID=2
|
[27/Jun/2011:17:23:00 +0200] DISCONNECT conn=19 reason="Client Unbind"</screen>
|
|
<para>As you see, each client connection and set of LDAP operations are
|
traced, starting with a time stamp and information about the operation
|
performed, then including information about the connection, the operation
|
number for the sequence of operations performed by the client, a message
|
identification number, and additional information about the operation.</para>
|
|
<para>Do help diagnose errors due to access permissions, OpenDJ supports the
|
get effective rights control. The control OID,
|
<literal>1.3.6.1.4.1.42.2.27.9.5.2</literal>, is not allowed by the default
|
global ACIs. You must therefore add access to use the get effective rights
|
control when not using it as Directory Manager.</para>
|
|
<section xml:id="troubleshooting-simple-paged-results">
|
<title>Clients Need Simple Paged Results Control</title>
|
|
<para>For Solaris and some versions of Linux you might see a message in
|
the OpenDJ access logs such as the following.</para>
|
|
<literallayout class="monospaced">
|
The request control with Object Identifier (OID) "1.2.840.113556.1.4.319"
|
cannot be used due to insufficient access rights</literallayout>
|
|
<para>This message means clients are trying to use the <link xlink:show="new"
|
xlink:href="http://tools.ietf.org/html/rfc2696">simple paged results
|
control</link> without authenticating. By default, OpenDJ includes a global
|
ACI to allow only authenticated users to use the control.</para>
|
|
<screen>$ dsconfig
|
--port 4444
|
--hostname opendj.example.com
|
--bindDN "cn=Directory Manager"
|
--bindPassword "password"
|
get-access-control-handler-prop
|
|
Property : Value(s)
|
-----------:-------------------------------------------------------------------
|
enabled : true
|
global-aci : (extop="1.3.6.1.4.1.26027.1.6.1 || 1.3.6.1.4.1.26027.1.6.3 ||
|
...
|
: (targetcontrol="1.3.6.1.1.12 || 1.3.6.1.1.13.1 || 1.3.6.1.1.13.2
|
: || <emphasis role="strong">1.2.840.113556.1.4.319</emphasis> || 1.2.826.0.1.3344810.2.3 ||
|
: 2.16.840.1.113730.3.4.18 || 2.16.840.1.113730.3.4.9 ||
|
: 1.2.840.113556.1.4.473 || 1.3.6.1.4.1.42.2.27.9.5.9") (version
|
: 3.0; acl "Authenticated users control access"; allow(read)
|
: userdn="ldap:///all";), (targetcontrol="2.16.840.1.113730.3.4.2 ||
|
: 2.16.840.1.113730.3.4.17 || 2.16.840.1.113730.3.4.19 ||
|
: 1.3.6.1.4.1.4203.1.10.2 || 1.3.6.1.4.1.42.2.27.8.5.1 ||
|
: 2.16.840.1.113730.3.4.16") (version 3.0; acl "Anonymous control
|
: access"; allow(read) userdn="ldap:///anyone";)</screen>
|
|
<para>To grant anonymous (unauthenticated) user access to the control,
|
add the OID for the simple paged results control to the list of those in
|
the <literal>Anonymous control access</literal> global ACI.</para>
|
|
<screen>$ dsconfig
|
--port 4444
|
--hostname opendj.example.com
|
--bindDN "cn=Directory Manager"
|
--bindPassword "password"
|
set-access-control-handler-prop
|
--remove global-aci:"(targetcontrol=\"2.16.840.1.113730.3.4.2 ||
|
2.16.840.1.113730.3.4.17 || 2.16.840.1.113730.3.4.19 ||
|
1.3.6.1.4.1.4203.1.10.2 || 1.3.6.1.4.1.42.2.27.8.5.1 ||
|
2.16.840.1.113730.3.4.16\") (version 3.0; acl \"Anonymous control access\";
|
allow(read) userdn=\"ldap:///anyone\";)"
|
--add global-aci:"(targetcontrol=\"2.16.840.1.113730.3.4.2 ||
|
2.16.840.1.113730.3.4.17 || 2.16.840.1.113730.3.4.19 ||
|
1.3.6.1.4.1.4203.1.10.2 || 1.3.6.1.4.1.42.2.27.8.5.1 ||
|
2.16.840.1.113730.3.4.16 || <emphasis role="strong">1.2.840.113556.1.4.319</emphasis>\")
|
(version 3.0; acl \"Anonymous control access\"; allow(read)
|
userdn=\"ldap:///anyone\";)"
|
--no-prompt</screen>
|
|
<para>Alternatively, stop OpenDJ, edit the corresponding ACI carefully in
|
<filename>/path/to/OpenDJ/config/config.ldif</filename>, and restart OpenDJ.
|
<footnote><para>Unlike the <command>dsconfig</command> command, the
|
<filename>config.ldif</filename> file is not a public interface. In this
|
particular case, however, the <command>dsconfig</command> command is such a
|
pain in the nether regions that you might as well edit the LDIF instead
|
(unless you are doing this in production).</para></footnote></para>
|
</section>
|
</section>
|
|
<section xml:id="troubleshoot-repl">
|
<title>Troubleshooting Replication</title>
|
<indexterm>
|
<primary>Replication</primary>
|
<secondary>Troubleshooting</secondary>
|
</indexterm>
|
|
<para>Replication can generally recover from conflicts and transient issues.
|
Replication does, however, require that update operations be copied
|
from server to server. It is therefore possible to experience temporary
|
delays while replicas converge, especially when the write operation load is
|
heavy. OpenDJ's tolerance for temporary divergence between replicas is what
|
allows OpenDJ to remain available to serve client applications even when
|
networks linking the replicas go down.</para>
|
|
<para>In other words, the fact that directory services are loosely convergent
|
rather than transactional is a feature, not a bug.</para>
|
|
<para>That said, you may encounter errors. Replication uses its own error log
|
file, <filename>logs/replication</filename>. Error messages in the log file
|
have <literal>category=SYNC</literal>. The messages have the following form.
|
Here the line is folded for readability.</para>
|
|
<screen>[27/Jun/2011:14:37:48 +0200] category=SYNC severity=INFORMATION msgID=14680169
|
msg=Replication server accepted a connection from 10.10.0.10/10.10.0.10:52859
|
to local address 0.0.0.0/0.0.0.0:8989 but the SSL handshake failed. This is
|
probably benign, but may indicate a transient network outage or a
|
misconfigured client application connecting to this replication server.
|
The error was: Remote host closed connection during handshake</screen>
|
|
<para>OpenDJ maintains historical information about changes in order to
|
bring replicas up to date, and to resolve replication conflicts. To prevent
|
historical information from growing without limit, OpenDJ purges historical
|
information after a configurable delay
|
(<literal>replication-purge-delay</literal>, default: 3 days). A replica
|
can become irrevocably out of sync if you restore it from a backup archive
|
older than the purge delay, or if you stop it for longer than the purge
|
delay. If this happens to you, disable the replica, and then reinitialize it
|
from a recent backup or from a server that is up to date.</para>
|
</section>
|
|
<section xml:id="troubleshoot-get-help">
|
<title>Asking For Help</title>
|
|
<para>When you cannot resolve a problem yourself, and want to ask for help,
|
clearly identify the problem and how you reproduce it, and also the version
|
of OpenDJ you use to reproduce the problem. The version includes both a
|
version number and also a build time stamp.</para>
|
|
<screen>$ dsconfig --version
|
OpenDJ <?eval ${docTargetVersion}?>
|
Build <replaceable>yyyymmddhhmmss</replaceable>Z</screen>
|
|
<itemizedlist>
|
|
<para>Be ready to provide additional information, too.</para>
|
|
<listitem>
|
<para>The output from the <command>java -version</command> command.</para>
|
</listitem>
|
|
<listitem>
|
<para><filename>access</filename> and <filename>errors</filename> logs
|
showing what the server was doing when the problem started occurring</para>
|
</listitem>
|
|
<listitem>
|
<para>A copy of the server configuration file,
|
<filename>config/config.ldif</filename>, in use when the problem started
|
occurring</para>
|
</listitem>
|
|
<listitem>
|
<para>Other relevant logs or output, such as those from client applications
|
experiencing the problem</para>
|
</listitem>
|
|
<listitem>
|
<para>A description of the environment where OpenDJ is running, including
|
system characteristics, host names, IP addresses, Java versions, storage
|
characteristics, and network characteristics. This helps to understand
|
the logs, and other information.</para>
|
</listitem>
|
</itemizedlist>
|
</section>
|
</chapter>
|