Troubleshooting Server Problems Troubleshooting This chapter describes how to troubleshoot common server problems, and how to collect information necessary when seeking support help.
Identifying the Problem In order to solve your problem methodically, save time by defining the problem clearly up front. In a replicated environment with multiple directory servers and many client applications, it can be particularly important to pin down not only the problem (difference in observed behavior compared to expected behavior), but also the circumstances and steps that lead to the problem occurring. Answer the following questions. How do you reproduce the problem? What exactly is the problem? In other words, what is the behavior you expected? What is the behavior you observed? When did the problem start occurring? Under similar circumstances, when does the problem not occur? Is the problem permanent? Intermittent? Is it getting worse? Getting better? Staying the same? Pinpointing the problem can sometimes indicate where you should start looking for solutions.
Troubleshooting Installation & Upgrade Installation and upgrade procedures result in a log file tracing the operation. The log location differs by operating system, but look for lines in the command output of the following form. See /var/....log for a detailed log of this operation.
Resetting Administrator Passwords This section describes what to do if you forgot the password for Directory Manager or for the global (replication) administrator. Resetting the Directory Manager's Password Resetting passwords cn=Directory Manager OpenDJ directory server stores the entry for Directory Manager in the LDIF representation of its configuration. You must be able to edit directory server files in order to reset Directory Manager's password. Generate the encoded version of the new password using the OpenDJ encode-password command. $ cd /path/to/OpenDJ/bin/ $ ./encode-password --storageScheme SSHA512 --clearPassword password Encoded Password: "{SSHA512}yWqHnYV4a5llPvE7WHLe5jzK27oZQWLIlVcs9gySu4TyZJMg NQNRtnR/Xx2xces1wu1dVLI9jVVtl1W4BVsmOKjyjr0rWrHt" Stop OpenDJ directory server while you edit the configuration. $ ./stop-ds Find Directory Manager's entry, which has DN cn=Directory Manager,cn=Root DNs,cn=config, in /path/to/OpenDJ/config/config.ldif, and carefully replace the userpassword attribute value with the encoded version of the new password, taking care not to leave any whitespace at the end of the line. dn: cn=Directory Manager,cn=Root DNs,cn=config objectClass: person objectClass: inetOrgPerson objectClass: organizationalPerson objectClass: ds-cfg-root-dn-user objectClass: top userpassword: {SSHA512}yWqHnYV4a5llPvE7WHLe5jzK27oZQWLIlVcs9gySu4TyZJMg NQNRtnR/Xx2xces1wu1dVLI9jVVtl1W4BVsmOKjyjr0rWrHt givenName: Directory cn: Directory Manager ds-cfg-alternate-bind-dn: cn=Directory Manager sn: Manager ds-pwp-password-policy-dn: cn=Root Password Policy,cn=Password Policies ,cn=config ds-rlim-time-limit: 0 ds-rlim-lookthrough-limit: 0 ds-rlim-idle-time-limit: 0 ds-rlim-size-limit: 0 Start OpenDJ directory server again. $ ./start-ds Verify that you can administer the server as Directory Manager using the new password. $ ./dsconfig -p 4444 -h `hostname` -D "cn=Directory Manager" -w password >>>> OpenDJ configuration console main menu What do you want to configure? ... Enter choice: q To Reset the Global Administrator's Password Resetting passwords Global (replication) administrator When you enable replication, part of the process involves creating a global administrator and setting that user's password. This user is present on all replicas. If you chose default values, this user has DN cn=admin,cn=Administrators,cn=admin data. You reset the password as you would for any other user, though you do so as Directory Manager. Use the ldappasswordmodify command to reset the global administrator's password $ cd /path/to/OpenDJ/bin/ $ ./ldappasswordmodify --useStartTLS --port 1389 --hostname opendj.example.com --bindDN "cn=Directory Manager" --bindPassword password --authzID "cn=admin,cn=Administrators,cn=admin data" --newPassword password The LDAP password modify operation was successful Let replication copy the password change to other replicas.
Preventing Access While You Fix Issues Lockdown mode Misconfiguration can potentially put OpenDJ in a state where you must intervene, and where you need to prevent users and applications from accessing the directory until you are done fixing the problem. OpenDJ provides a lockdown mode that allows connections only on the loopback address, and allows only operations requested by root users, such as cn=Directory Manager. You can use lockdown mode to prevent all but administrative access to OpenDJ in order to repair the server. To put OpenDJ into lockdown mode, the server must be running. You cause the server to enter lockdown mode by using a task. Notice that the modify operation is performed over the loopback address (accessing OpenDJ on the local host). $ ldapmodify --port 1389 --bindDN "cn=Directory Manager" --bindPassword password --defaultAdd dn: ds-task-id=Enter Lockdown Mode,cn=Scheduled Tasks,cn=tasks objectClass: top objectClass: ds-task ds-task-id: Enter Lockdown Mode ds-task-class-name: org.opends.server.tasks.EnterLockdownModeTask Processing ADD request for ds-task-id=Enter Lockdown Mode,cn=Scheduled Tasks,cn=tasks ADD operation successful for DN ds-task-id=Enter Lockdown Mode,cn=Scheduled Tasks,cn=tasks OpenDJ logs a notice message in logs/errors when lockdown mode takes effect. [30/Jan/2012:17:04:32 +0100] category=BACKEND severity=NOTICE msgID=9896350 msg=Lockdown task Enter Lockdown Mode finished execution Client applications that request operations get a message concerning lockdown mode. $ ldapsearch --port 1389 --baseDN "" --searchScope base "(objectclass=*)" + SEARCH operation failed Result Code: 53 (Unwilling to Perform) Additional Information: Rejecting the requested operation because the server is in lockdown mode and will only accept requests from root users over loopback connections You also leave lockdown mode by using a task. $ ldapmodify --port 1389 --bindDN "cn=Directory Manager" --bindPassword password --defaultAdd dn: ds-task-id=Leave Lockdown Mode,cn=Scheduled Tasks,cn=tasks objectClass: top objectClass: ds-task ds-task-id: Leave Lockdown Mode ds-task-class-name: org.opends.server.tasks.LeaveLockdownModeTask Processing ADD request for ds-task-id=Leave Lockdown Mode,cn=Scheduled Tasks,cn=tasks ADD operation successful for DN ds-task-id=Leave Lockdown Mode,cn=Scheduled Tasks,cn=tasks OpenDJ also logs a notice message when leaving lockdown. [30/Jan/2012:17:13:05 +0100] category=BACKEND severity=NOTICE msgID=9896350 msg=Leave Lockdown task Leave Lockdown Mode finished execution
Troubleshooting LDIF Import By default OpenDJ requires that LDIF data you import respect standards. In particular, OpenDJ is set to check that entries to import match the schema defined for the server. You can temporarily bypass this check by using the with the import-ldif command. OpenDJ also ensures by default that entries have only one structural object class. You can relax this behavior by using the advanced global configuration property, single-structural-objectclass-behavior. This can be useful when importing data exported from Sun Directory Server. For example, to warn when entries have more than one structural object class instead of reject such entries being added, set single-structural-objectclass-behavior:warn as follows. $ dsconfig set-global-configuration-prop --port 4444 --hostname `hostname` --bindDN "cn=Directory Manager" --bindPassword password --set single-structural-objectclass-behavior:warn --trustAll --no-prompt By default, OpenDJ also checks syntax for a number of attribute types. You can relax this behavior as well by using the dsconfig set-attribute-syntax-prop command. See the list of attribute syntaxes and use the option for further information. When running import-ldif, you can use the option to capture entries that could not be imported, and the option to return the number of rejected entries as the import-ldif exit code. Once you work through the issues with your LDIF data, reinstate the default behavior to ensure automated checking.
Troubleshooting TLS/SSL Connections In order to trust the server certificate, client applications usually compare the signature on certificates with those of the Certificate Authorities (CAs) whose certificates are distributed with the client software. For example, the Java environment is distributed with a key store holding many CA certificates. $ keytool -list -keystore $JAVA_HOME/lib/security/cacerts -storepass changeit | wc -l 334 The self-signed server certificates that can be configured during OpenDJ setup are not recognized as being signed by any CAs. Your software therefore is configured not to trust the self-signed certificates by default. You must either configure the client applications to accept the self-signed certificates, or else use certificates signed by recognized CAs. You can further debug the network traffic by collecting debug traces. To see the traffic going over TLS/SSL in debug mode, configure OpenDJ to dump debug traces from javax.net.debug into the logs/server.out file. OPENDJ_JAVA_ARGS="-Djavax.net.debug=all" start-ds
Troubleshooting Certificates & SSL Authentication Replication uses SSL to protect directory data on the network. In some configurations, replica can fail to connect to each other due to SSL handshake errors. This leads to error log messages such as the following. [21/Nov/2011:13:03:20 -0600] category=SYNC severity=NOTICE msgID=15138921 msg=SSL connection attempt from myserver (123.456.789.012) failed: Remote host closed connection during handshake Notice these problem characteristics in the message above. The host name, myserver, is not fully qualified. You should not see non fully qualified host names in the error logs. Non fully qualified host names are a sign that an OpenDJ server has not been configured properly. Always install and configure OpenDJ using fully-qualified host names. The OpenDJ administration connector, which is used by the dsconfig command, and also replication depend upon SSL and, more specifically, self-signed certificates for establishing SSL connections. If the host name used for connection establishment does not correspond to the host name stored in the SSL certificate then the SSL handshake can fail. For the purposes of establishing the SSL connection, a host name like myserver does not match myserver.example.com, and vice versa. The connection succeeded, but the SSL handshake failed, suggesting a problem with authentication or with the cipher or protocol negotiation. As most deployments use the same Java Virtual Machine, and the same JVM configuration for each replica, the problem is likely not related to SSL cipher or protocol negotiation, but instead lies with authentication. Follow these steps on each OpenDJ server to check whether the problem lies with the host name configuration. Make sure each OpenDJ server uses only fully qualified host names in the replication configuration. You can obtain a quick summary by running the following command against each server's configuration. $ grep ds-cfg-replication-server: config/config.ldif | sort | uniq Make sure that the host names in OpenDJ certificates also contain fully qualified host names, and correspond to the host names found in the previous step. # Examine the certificates used for the administration connector. $ keytool -list -v -keystore config/admin-truststore -storepass `cat config/admin-keystore.pin` |grep "^Owner:" # Examine the certificates used for replication. $ keytool -list -v -keystore config/ads-truststore -storepass `cat config/ads-truststore.pin`| grep "^Owner:" Sample output for a server on host opendj.example.com follows. $ grep ds-cfg-replication-server: config/config.ldif |sort | uniq ds-cfg-replication-server: opendj.example.com:8989 ds-cfg-replication-server: opendj.example.com:9989 $ keytool -list -v -keystore config/admin-truststore -storepass `cat config/admin-keystore.pin` | grep "^Owner:" Owner: CN=opendj.example.com, O=Administration Connector Self-Signed Certificate $ keytool -list -v -keystore config/ads-truststore -storepass `cat config/ads-truststore.pin`| grep "^Owner:" Owner: CN=opendj.example.com, O=OpenDJ Certificate Owner: CN=opendj.example.com, O=OpenDJ Certificate Owner: CN=opendj.example.com, O=OpenDJ Certificate Unfortunately there is no easy solution to badly configured host names. It is often easier and quicker simply to reinstall your OpenDJ servers remembering to use fully qualified host names everywhere. When using the setup tool to install and configure a server ensure that the option is included, and that it specifies the fully qualified host name. Make sure you include this option even if you are not enabling SSL/StartTLS LDAP connections (see OPENDJ-363). If you are using the GUI installer, then make sure you specify the fully qualified host name on the first page of the wizard. When using the dsreplication tool to enable replication make sure that any options include the fully qualified host name. If you cannot reinstall the server, follow these steps. Disable replication in each replica. $ dsreplication disable --disableAll --port adminPort --hostname hostName --bindDN "cn=Directory Manager" --adminPassword password --trustAll --no-prompt Stop and restart each server in order to clear the in-memory ADS trust store backend. Enable replication making certain that fully qualified host names are used throughout $ dsreplication enable --adminUID admin --adminPassword password --baseDN dc=example,dc=com --host1 hostName1 --port1 adminPort1 --bindDN1 "cn=Directory Manager" --bindPassword1 password --replicationPort1 replPort1 --host2 hostName2 --port2 adminPort2 --bindDN2 "cn=Directory Manager" --bindPassword2 password --replicationPort2 replPort2 --trustAll --no-prompt Repeat the previous step for each remaining replica. In other words, host1 with host2, host1 with host3, host1 with host4, ..., host1 with hostN. Initialize all remaining replica with the data from host1. $ dsreplication initialize-all --adminUID admin --adminPassword password --baseDN dc=example,dc=com --hostname hostName1 --port 4444 --trustAll --no-prompt Check that the host names are correct in the configuration and in the key stores by following the steps you used to check for host name problems. The only broken host name remaining should be in the key and trust stores for the administration connector. $ keytool -list -v -keystore config/admin-truststore -storepass `cat config/admin-keystore.pin` |grep "^Owner:" Stop each server, and then fix the remaining admin connector certificate as described here in the procedure To Replace a Server Key Pair.
Troubleshooting Client Operations By default OpenDJ logs information about all client operations in logs/access. The following lines are wrapped for readability, showing a search for the entry with uid=bjensen as traced in the access log. In the access log itself, each line starts with a time stamp. [27/Jun/2011:17:23:00 +0200] CONNECT conn=19 from=127.0.0.1:56641 to=127.0.0.1:1389 protocol=LDAP [27/Jun/2011:17:23:00 +0200] SEARCH REQ conn=19 op=0 msgID=1 base="dc=example,dc=com" scope=wholeSubtree filter="(uid=bjensen)" attrs="ALL" [27/Jun/2011:17:23:00 +0200] SEARCH RES conn=19 op=0 msgID=1 result=0 nentries=1 etime=3 [27/Jun/2011:17:23:00 +0200] UNBIND REQ conn=19 op=1 msgID=2 [27/Jun/2011:17:23:00 +0200] DISCONNECT conn=19 reason="Client Unbind" As you see, each client connection and set of LDAP operations are traced, starting with a time stamp and information about the operation performed, then including information about the connection, the operation number for the sequence of operations performed by the client, a message identification number, and additional information about the operation. Do help diagnose errors due to access permissions, OpenDJ supports the get effective rights control. The control OID, 1.3.6.1.4.1.42.2.27.9.5.2, is not allowed by the default global ACIs. You must therefore add access to use the get effective rights control when not using it as Directory Manager.
Clients Need Simple Paged Results Control For Solaris and some versions of Linux you might see a message in the OpenDJ access logs such as the following. The request control with Object Identifier (OID) "1.2.840.113556.1.4.319" cannot be used due to insufficient access rights This message means clients are trying to use the simple paged results control without authenticating. By default, OpenDJ includes a global ACI to allow only authenticated users to use the control. $ dsconfig --port 4444 --hostname opendj.example.com --bindDN "cn=Directory Manager" --bindPassword "password" get-access-control-handler-prop Property : Value(s) -----------:------------------------------------------------------------------- enabled : true global-aci : (extop="1.3.6.1.4.1.26027.1.6.1 || 1.3.6.1.4.1.26027.1.6.3 || ... : (targetcontrol="1.3.6.1.1.12 || 1.3.6.1.1.13.1 || 1.3.6.1.1.13.2 : || 1.2.840.113556.1.4.319 || 1.2.826.0.1.3344810.2.3 || : 2.16.840.1.113730.3.4.18 || 2.16.840.1.113730.3.4.9 || : 1.2.840.113556.1.4.473 || 1.3.6.1.4.1.42.2.27.9.5.9") (version : 3.0; acl "Authenticated users control access"; allow(read) : userdn="ldap:///all";), (targetcontrol="2.16.840.1.113730.3.4.2 || : 2.16.840.1.113730.3.4.17 || 2.16.840.1.113730.3.4.19 || : 1.3.6.1.4.1.4203.1.10.2 || 1.3.6.1.4.1.42.2.27.8.5.1 || : 2.16.840.1.113730.3.4.16") (version 3.0; acl "Anonymous control : access"; allow(read) userdn="ldap:///anyone";) To grant anonymous (unauthenticated) user access to the control, add the OID for the simple paged results control to the list of those in the Anonymous control access global ACI. $ dsconfig --port 4444 --hostname opendj.example.com --bindDN "cn=Directory Manager" --bindPassword "password" set-access-control-handler-prop --remove global-aci:"(targetcontrol=\"2.16.840.1.113730.3.4.2 || 2.16.840.1.113730.3.4.17 || 2.16.840.1.113730.3.4.19 || 1.3.6.1.4.1.4203.1.10.2 || 1.3.6.1.4.1.42.2.27.8.5.1 || 2.16.840.1.113730.3.4.16\") (version 3.0; acl \"Anonymous control access\"; allow(read) userdn=\"ldap:///anyone\";)" --add global-aci:"(targetcontrol=\"2.16.840.1.113730.3.4.2 || 2.16.840.1.113730.3.4.17 || 2.16.840.1.113730.3.4.19 || 1.3.6.1.4.1.4203.1.10.2 || 1.3.6.1.4.1.42.2.27.8.5.1 || 2.16.840.1.113730.3.4.16 || 1.2.840.113556.1.4.319\") (version 3.0; acl \"Anonymous control access\"; allow(read) userdn=\"ldap:///anyone\";)" --no-prompt Alternatively, stop OpenDJ, edit the corresponding ACI carefully in /path/to/OpenDJ/config/config.ldif, and restart OpenDJ. Unlike the dsconfig command, the config.ldif file is not a public interface, so this alternative should not be used in production.
Troubleshooting Replication Replication Troubleshooting Replication can generally recover from conflicts and transient issues. Replication does, however, require that update operations be copied from server to server. It is therefore possible to experience temporary delays while replicas converge, especially when the write operation load is heavy. OpenDJ's tolerance for temporary divergence between replicas is what allows OpenDJ to remain available to serve client applications even when networks linking the replicas go down. In other words, the fact that directory services are loosely convergent rather than transactional is a feature, not a bug. That said, you may encounter errors. Replication uses its own error log file, logs/replication. Error messages in the log file have category=SYNC. The messages have the following form. Here the line is folded for readability. [27/Jun/2011:14:37:48 +0200] category=SYNC severity=INFORMATION msgID=14680169 msg=Replication server accepted a connection from 10.10.0.10/10.10.0.10:52859 to local address 0.0.0.0/0.0.0.0:8989 but the SSL handshake failed. This is probably benign, but may indicate a transient network outage or a misconfigured client application connecting to this replication server. The error was: Remote host closed connection during handshake OpenDJ maintains historical information about changes in order to bring replicas up to date, and to resolve replication conflicts. To prevent historical information from growing without limit, OpenDJ purges historical information after a configurable delay (replication-purge-delay, default: 3 days). A replica can become irrevocably out of sync if you restore it from a backup archive older than the purge delay, or if you stop it for longer than the purge delay. If this happens to you, disable the replica, and then reinitialize it from a recent backup or from a server that is up to date.
Asking For Help When you cannot resolve a problem yourself, and want to ask for help, clearly identify the problem and how you reproduce it, and also the version of OpenDJ you use to reproduce the problem. The version includes both a version number and also a build time stamp. $ dsconfig --version OpenDJ Build yyyymmddhhmmssZ Be ready to provide additional information, too. The output from the java -version command. access and errors logs showing what the server was doing when the problem started occurring A copy of the server configuration file, config/config.ldif, in use when the problem started occurring Other relevant logs or output, such as those from client applications experiencing the problem A description of the environment where OpenDJ is running, including system characteristics, host names, IP addresses, Java versions, storage characteristics, and network characteristics. This helps to understand the logs, and other information.