Troubleshooting Server Problems This chapter describes how to troubleshoot common server problems, and how to collect information necessary when seeking support help.
Identifying the Problem In order to solve your problem methodically, save time by defining the problem clearly up front. In a replicated environment with multiple directory servers and many client applications, it can be particularly important to pin down not only the problem (difference in observed behavior compared to expected behavior), but also the circumstances and steps that lead to the problem occurring. Answer the following questions. How do you reproduce the problem? What exactly is the problem? In other words, what is the behavior you expected? What is the behavior you observed? When did the problem start occurring? Under similar circumstances, when does the problem not occur? Is the problem permanent? Intermittent? Is it getting worse? Getting better? Staying the same? Pinpointing the problem can sometimes indicate where you should start looking for solutions.
Troubleshooting Installation & Upgrade Installation and upgrade procedures result in a log file tracing the operation. The log location differs by operating system, but look for lines in the command output of the following form. See /var/....log for a detailed log of this operation.
Troubleshooting LDIF Import By default OpenDJ requires that LDIF data you import respect standards. In particular, OpenDJ is set to check that entries to import match the schema defined for the server. You can temporarily bypass this check by using the with the import-ldif command. OpenDJ also ensures by default that entries have only one inheritance of structural object classes. You can relax this behavior by using the advanced global configuration property, single-structural-objectclass-behavior. This can be useful when importing data exported from Sun Directory Server. For example, to warn when entries have more than one structural object class instead of reject such entries being added, set the property as follows. $ dsconfig -p 4444 -h `hostname` -D "cn=Directory Manager" -w password set-global-configuration-prop > --set single-structural-objectclass-behavior:warn -X -n By default, OpenDJ also checks syntax for a number of attribute types. You can relax this behavior as well by using the dsconfig set-attribute-syntax-prop command. See the list of attribute syntaxes and use the option for further information. When running import-ldif, you can use the option to capture entries that could not be imported, and the option to return the number of rejected entries as the import-ldif exit code. Once you work through the issues with your LDIF data, reinstate the default behavior to ensure automated checking.
Troubleshooting TLS/SSL Connections In order to trust the server certificate, client applications usually compare the signature on certificates with those of the Certificate Authorities (CAs) whose certificates are distributed with the client software. For example, the Java environment is distributed with a key store holding many CA certificates. $ keytool -list -keystore $JAVA_HOME/lib/security/cacerts -storepass changeit | wc -l 334 The self-signed server certificates that can be configured during OpenDJ setup are not recognized as being signed by any CAs. Your software therefore is configured not to trust the self-signed certificates by default. You must either configure the client applications to accept the self-signed certificates, or else use certificates signed by recognized CAs. You can further debug the network traffic by collecting debug traces. To see the traffic going over TLS/SSL in debug mode, configure OpenDJ to dump debug traces from javax.net.debug into the logs/server.out file. OPENDJ_JAVA_ARGS="-Djavax.net.debug=all" start-ds
Troubleshooting Client Operations By default OpenDJ logs information about all client operations in logs/access. The following lines are wrapped for readability, showing a search for the entry with uid=bjensen as traced in the access log. In the access log itself, each line starts with a time stamp. [27/Jun/2011:17:23:00 +0200] CONNECT conn=19 from=127.0.0.1:56641 to=127.0.0.1:1389 protocol=LDAP [27/Jun/2011:17:23:00 +0200] SEARCH REQ conn=19 op=0 msgID=1 base="dc=example,dc=com" scope=wholeSubtree filter="(uid=bjensen)" attrs="ALL" [27/Jun/2011:17:23:00 +0200] SEARCH RES conn=19 op=0 msgID=1 result=0 nentries=1 etime=3 [27/Jun/2011:17:23:00 +0200] UNBIND REQ conn=19 op=1 msgID=2 [27/Jun/2011:17:23:00 +0200] DISCONNECT conn=19 reason="Client Unbind" As you see, each client connection and set of LDAP operations are traced, starting with a time stamp and information about the operation performed, then including information about the connection, the operation number for the sequence of operations performed by the client, a message identification number, and additional information about the operation. Do help diagnose errors due to access permissions, OpenDJ supports the get effective rights control. The control OID, 1.3.6.1.4.1.42.2.27.9.5.2, is not allowed by the default global ACIs. You must therefore add access to use the get effective rights control when not using it as Directory Manager.
Troubleshooting Replication Replication can generally recover from conflicts and transient issues. Replication does, however, require that update operations be copied from server to server. It is therefore possible to experience temporary delays while replicas converge, especially when the write operation load is heavy. OpenDJ's tolerance for temporary divergence between replicas is what allows OpenDJ to remain available to serve client applications even when networks linking the replicas go down. In other words, the fact that directory services are loosely convergent rather than transactional is a feature, not a bug. That said, you may encounter errors. Replication uses its own error log file, logs/replication. Error messages in the log file have category=SYNC. The messages have the following form. Here the line is folded for readability. [27/Jun/2011:14:37:48 +0200] category=SYNC severity=INFORMATION msgID=14680169 msg=Replication server accepted a connection from 10.10.0.10/10.10.0.10:52859 to local address 0.0.0.0/0.0.0.0:8989 but the SSL handshake failed. This is probably benign, but may indicate a transient network outage or a misconfigured client application connecting to this replication server. The error was: Remote host closed connection during handshake Replicas can become irrevocably out of sync if for example you restore a replica from backup with a backup archive older than the last time historical information for replication was purged from the system. If this happens to you, disable the replica, and then reinitialize it with newer data.
Asking For Help When you cannot resolve a problem yourself, and want to ask for help, clearly identify the problem and how you reproduce it, and also the version of OpenDJ you use to reproduce the problem. The version includes both a version number and also a build time stamp. $ dsconfig --version OpenDJ Build yyyymmddhhmmssZ Be ready to provide additional information, too. The output from the java -version command. access and errors logs showing what the server was doing when the problem started occurring A copy of the server configuration file, config/config.ldif, in use when the problem started occurring Other relevant logs or output, such as those from client applications experiencing the problem A description of the environment where OpenDJ is running, including system characteristics, host names, IP addresses, Java versions, storage characteristics, and network characteristics. This helps to understand the logs, and other information.