| | |
| | | xmlns:xlink='http://www.w3.org/1999/xlink' |
| | | xmlns:xinclude='http://www.w3.org/2001/XInclude'> |
| | | <title>Tuning Servers For Performance</title> |
| | | |
| | | |
| | | <para>Server tuning refers to the art of adjusting server, JVM, and system |
| | | configuration to meet the service level performance requirements of directory |
| | | clients. In the optimal case you achieve service level performance |
| | |
| | | This chapter therefore aims to provide suggestions on how to measure and |
| | | to improve directory service performance for better trade offs.</para> |
| | | |
| | | <!-- TODO: Demonstrate measuring directory service throughput and response |
| | | times using authrate, modrate, and searchrate. --> |
| | | |
| | | <section> |
| | | <title>Defining Performance Requirements & Constraints</title> |
| | | |
| | | <para>Your key performance requirement is most likely to satisfy your |
| | | users or customers with the resources available to you. Before you can |
| | | solve potential performance problems, define what those users or customers |
| | | expect, and determine what resources you will have to satisfy their |
| | | expectations.</para> |
| | | |
| | | <section> |
| | | <title>Service-Level Agreements</title> |
| | | |
| | | <para>Service-level agreement (SLA) is a formal name for what directory |
| | | client applications and the people who run them expect from your service in |
| | | terms of performance.</para> |
| | | |
| | | <para>SLAs might cover many aspects of the directory service. Whether or not |
| | | your SLA is formally defined, you ought to know what is expected, or at least |
| | | what you provide, in the following four areas.</para> |
| | | |
| | | <itemizedlist> |
| | | <listitem> |
| | | <para>Directory service <firstterm>response times</firstterm></para> |
| | | |
| | | <para>Directory service response times range from less than a |
| | | millisecond on average across a low latency connection on the same |
| | | network to however long it takes your network to deliver the response. |
| | | More important than average or best response times is the response time |
| | | distribution, because applications set timeouts based on worst case |
| | | scenarios. For example, a response time performance requirement might |
| | | be defined as, "Directory response times must average less than 10 |
| | | milliseconds for all operations except searches returning more than 10 |
| | | entries, with 99.9% of response times under 40 milliseconds."</para> |
| | | </listitem> |
| | | <listitem> |
| | | <para>Directory service <firstterm>throughput</firstterm></para> |
| | | |
| | | <para>Directory service throughput can range up to many thousands of |
| | | operations per second. In fact there is no upper limit for read operations |
| | | such as searches, because only write operations must be replicated. To |
| | | increase read throughput, simply add additional replicas. More important |
| | | than average throughput is peak throughput. You might have peak write |
| | | throughput in the middle of the night when batch jobs update entries in |
| | | bulk, and peak binds for a special event or first thing Monday morning. |
| | | For example, a throughput performance requirement might be expressed as, |
| | | "The directory service must sustain a mix of 5,000 operations per second |
| | | made up of 70% reads, 25% modifies, 3% adds, and 2% deletes."</para> |
| | | |
| | | <para>Even better is to mimic the behavior of key operations for |
| | | performance testing, so that you understand the patterns of operations |
| | | in the throughput you need to provide.</para> |
| | | </listitem> |
| | | <listitem> |
| | | <para>Directory service <firstterm>availability</firstterm></para> |
| | | |
| | | <para>OpenDJ is designed to let you build directory services that are |
| | | basically available, including during maintenance and even upgrade of |
| | | individual servers. Yet, in order to reach very high levels of |
| | | availability, you must make sure not only that the software is |
| | | designed for availability, but also that your operations execute in |
| | | such a way as to preserve availability. Availability requirements |
| | | can be as lax as best effort, or as stringent as 99.999% or more |
| | | uptime.</para> |
| | | |
| | | <para>Replication is the OpenDJ feature that allows you to build a |
| | | highly available directory service.</para> |
| | | </listitem> |
| | | <listitem> |
| | | <para>Directory service administrative support</para> |
| | | |
| | | <para>Do not forget to make sure you understand and set expectations |
| | | about how you support your users when they run into trouble. Directory |
| | | services can perhaps help you turn password management into a self-service |
| | | visit to a web site, but some users no doubt still need to know what they |
| | | can expect if they need your help.</para> |
| | | </listitem> |
| | | </itemizedlist> |
| | | |
| | | <para>Writing down the SLA, even if your first version consists of |
| | | guesses, helps you reduce performance tuning from an open-ended project |
| | | to a clear set of measurable goals for a manageable project with a definite |
| | | outcome.</para> |
| | | </section> |
| | | |
| | | <section> |
| | | <title>Available Resources</title> |
| | | |
| | | <para>With your SLA in hand, take inventory of the server, networks, |
| | | storage, people, and other resources at your disposal. Now is the time to |
| | | estimate whether it is possible to meet the requirements at all.</para> |
| | | |
| | | <para>If for example you are expected to serve more throughput than the |
| | | network can transfer, maintain high availability with only one physical |
| | | machine, store 100 GB of backups on a 50 GB partition, or provide 24/7 |
| | | support all alone, no amount of tweaking available resources is likely to |
| | | fix the problem.</para> |
| | | |
| | | <para>When checking that the resources you have at least theoretically |
| | | suffice to meet your requirements, do not forget that high availability in |
| | | particular requires at least two of everything to avoid single points |
| | | of failure. Be sure to list the resources you expect to have, when and how |
| | | long you expect to have them, and why you need them. Also make note of |
| | | what is missing and why.</para> |
| | | |
| | | <section> |
| | | <title>Server Hardware Recommendations</title> |
| | | |
| | | <para>Concerning server hardware, OpenDJ runs on systems with Java support, |
| | | and is therefore quite portable. That said, OpenDJ tends to perform best on |
| | | single-board, x86 systems due to low memory latency.</para> |
| | | </section> |
| | | |
| | | <section> |
| | | <title>Storage Recommendations</title> |
| | | |
| | | <para>OpenDJ is designed to work with local storage for the database, |
| | | not for network file systems such as NFS.</para> |
| | | |
| | | <para>High performance storage is essential if you need to handle high |
| | | write throughput.</para> |
| | | |
| | | <para>The Berkeley Java Edition DB works well with traditional disks as |
| | | long as the database cache size allows the DB to stay fully cached in |
| | | memory. This is the case because the database transaction log is append |
| | | only. When the DB is too big to stay cached in memory, however, then |
| | | cache misses lead to random disk access, slowing OpenDJ performance.</para> |
| | | |
| | | <para>You might mitigate this effect by using solid-state disks for |
| | | persistent storage, or for file system cache.</para> |
| | | |
| | | <para>Regarding database size on disk, if you have sustained write traffic |
| | | then the database grows to about twice its initial size on disk. This is |
| | | normal, and due to the way the database manages its logs. The size on disk |
| | | does not impact the DB cache size requirements.</para> |
| | | </section> |
| | | </section> |
| | | </section> |
| | | |
| | | <section> |
| | | <title>Testing Performance</title> |
| | | |
| | | <para>Even if you do not need high availability, you still need two of |
| | | everything, because your test environment needs to mimic your production |
| | | environment as closely as possible if you want to avoid nasty |
| | | surprises.</para> |
| | | |
| | | <para>In your test environment, you set up OpenDJ as you will later in |
| | | production, and then conduct experiments to determine how best to meet |
| | | the requirements defined in the SLA.</para> |
| | | |
| | | <para>Use <command>make-ldif</command> to generate sample data that match |
| | | what you expect to find in production.</para> |
| | | |
| | | <para>The OpenDJ LDAP Toolkit provides three command-line tools to help |
| | | with basic performance testing.</para> |
| | | |
| | | <itemizedlist> |
| | | <listitem> |
| | | <para>The <command>authrate</command> command measures bind throughput and |
| | | response time.</para> |
| | | </listitem> |
| | | <listitem> |
| | | <para>The <command>modrate</command> command measures modification |
| | | throughput and response time.</para> |
| | | </listitem> |
| | | <listitem> |
| | | <para>The <command>searchrate</command> command measures search throughput |
| | | and response time.</para> |
| | | </listitem> |
| | | </itemizedlist> |
| | | |
| | | <para>All three commands show you information about the response time |
| | | distributions, and allow you to perform tests at specific levels of |
| | | throughput.</para> |
| | | |
| | | <para>For more extensive testing, try the <link |
| | | xlink:href="http://slamd.com/">SLAMD Distributed Load Generation |
| | | Engine</link>. SLAMD is built to test more than just directory, but is |
| | | particularly well suited to test directory service performance, is |
| | | well documented, and is available under the Sun Public License. SLAMD is |
| | | designed both to offer an easy to used web-based interface, and also to |
| | | allow you to customize jobs to match the access patterns you expect from |
| | | client applications.</para> |
| | | </section> |
| | | |
| | | <section> |
| | | <title>Tweaking OpenDJ Performance</title> |
| | | |
| | | <para>When your tests show that OpenDJ performance is lacking even though |
| | | you have the right underlying network, hardware, storage, and system |
| | | resources in place, you can tweak OpenDJ performance in a number of ways. |
| | | This section mentions the most common tweaks.</para> |
| | | |
| | | <section> |
| | | <title>Java Settings</title> |
| | | |
| | | <para>Default Java settings let you evaluate OpenDJ using limited system |
| | | resources. If you need high performance for production system, test with |
| | | the following JVM options. These apply to the Sun/Oracle JVM.</para> |
| | | |
| | | <tip> |
| | | <para>To apply JVM settings for your server, edit |
| | | <filename>config/java.properties</filename>, and apply the changes with the |
| | | <command>dsjavaproperties</command> command.</para> |
| | | </tip> |
| | | |
| | | <variablelist> |
| | | <varlistentry> |
| | | <term><option>-server</option></term> |
| | | <listitem> |
| | | <para>Use the C2 compiler and optimizer.</para> |
| | | </listitem> |
| | | </varlistentry> |
| | | <varlistentry> |
| | | <term><option>-d64</option></term> |
| | | <listitem> |
| | | <para>To use a heap larger than about 3.5 GB on a 64-bit system, use |
| | | this option.</para> |
| | | </listitem> |
| | | </varlistentry> |
| | | <varlistentry> |
| | | <term><option>-Xms, -Xmx</option></term> |
| | | <listitem> |
| | | <para>Set both minimum and maximum heap size to the same value to avoid |
| | | resizing. Leave space for the entire DB cache and more.</para> |
| | | </listitem> |
| | | </varlistentry> |
| | | <varlistentry> |
| | | <term><option>-Xmn</option></term> |
| | | <listitem> |
| | | <para>Set the new generation size between 1-4 GB for high throughput |
| | | deployments, but leave enough overall JVM heap to avoid overlaps with |
| | | the space used for DB cache.</para> |
| | | </listitem> |
| | | </varlistentry> |
| | | <varlistentry> |
| | | <term><option>-XX:MaxTenuringThreshold=1</option></term> |
| | | <listitem> |
| | | <para>OpenDJ does not create medium lifetime objects, only transient |
| | | objects, and long lived objects.</para> |
| | | </listitem> |
| | | </varlistentry> |
| | | <varlistentry> |
| | | <term><option>-XX:+UseConcMarkSweepGC</option></term> |
| | | <listitem> |
| | | <para>The CMS garbage collector tends to give the best performance |
| | | characteristics. You might also consider the G1 garbage collector.</para> |
| | | </listitem> |
| | | </varlistentry> |
| | | <varlistentry> |
| | | <term><option>-XX:+PrintGCDetails</option></term> |
| | | <term><option>-XX:+PrintGCTimeStamps</option></term> |
| | | <listitem> |
| | | <para>Use these when diagnosing JVM tuning problems. You can turn them |
| | | off when everything is running smoothly.</para> |
| | | </listitem> |
| | | </varlistentry> |
| | | </variablelist> |
| | | </section> |
| | | |
| | | <section> |
| | | <title>Data Storage Settings</title> |
| | | |
| | | <para>By default, OpenDJ does use compact data encoding, reducing size used |
| | | by attribute type and object class strings. However, OpenDJ does try to |
| | | compress entries by default. You can potentially gain space by setting the |
| | | backend property <literal>entries-compressed</literal> to |
| | | <literal>true</literal> before you (re-)import data from LDIF. OpenDJ |
| | | compresses entries before writing them to the database, but does not |
| | | proactively rewrite all entries in the database after you change the |
| | | settings, so to force OpenDJ to compress the entries, import the data |
| | | from LDIF.</para> |
| | | |
| | | <screen width="80">$ dsconfig -p 4444 -h `hostname` -D "cn=Directory Manager" -w password \ |
| | | > set-backend-prop --backend-name userRoot --set entries-compressed:true -X -n |
| | | $ import-ldif -p 4444 -h `hostname` -D "cn=Directory Manager" -w password \ |
| | | > -l /path/to/Example.ldif -n userRoot -b dc=example,dc=com -t 0 |
| | | Import task 20110627101758486 scheduled to start Jun 27, 2011 10:17:58 AM CEST</screen> |
| | | </section> |
| | | |
| | | <section> |
| | | <title>LDIF Import Settings</title> |
| | | |
| | | <para>You can tweak OpenDJ to speed up import of large LDIF files.</para> |
| | | |
| | | <para>By default, the temporary directory used for scratch files is |
| | | <filename>import-tmp</filename> under the directory where you installed |
| | | OpenDJ. Use <command>import-ldif</command> with the |
| | | <option>--tmpdirectory</option> option to set this directory to a |
| | | <literal>tmpfs</literal> file system, such as |
| | | <filename>/tmp</filename>.</para> |
| | | |
| | | <para>In some cases, you can improve performance by using the |
| | | <option>--threadCount</option> option with the |
| | | <command>import-ldif</command> command to set the thread count larger than |
| | | the default, which is twice the number of CPUs.</para> |
| | | |
| | | <para>If you are certain your LDIF contains only valid entries with |
| | | correct syntax, because the LDIF was exported from OpenDJ with all checks |
| | | active for example, you can skip schema and DN validation. Use the |
| | | <option>--skipSchemaValidation</option> and |
| | | <option>--skipDNValidation</option> options with the |
| | | <command>import-ldif</command> command to skip validation.</para> |
| | | </section> |
| | | |
| | | <section> |
| | | <title>Database Cache Settings</title> |
| | | |
| | | <para>Database cache size is, by default, set as a percentage of the JVM |
| | | heap, using the backend property <literal>db-cache-percent</literal>. |
| | | Alternatively, you use the backend property |
| | | <literal>db-cache-size</literal> to set the size.</para> |
| | | |
| | | <para>Depending on the size of your database, you have a choice to make |
| | | about database cache settings.</para> |
| | | |
| | | <para>By caching the entire database in the JVM heap, you can get more |
| | | deterministic response times and limit disk I/O. Yet, caching the whole |
| | | DB can require a very large JVM, which you must pre-load on startup, and |
| | | which can result in long garbage collections and a difficult-to-manage |
| | | JVM. Test database pre-load on startup by setting the |
| | | <literal>preload-time-limit</literal> for the backend.</para> |
| | | |
| | | <screen width="80">$ dsconfig -p 4444 -h `hostname` -D "cn=Directory Manager" -w password \ |
| | | > set-backend-prop --backend-name userRoot --set preload-time-limit:30m -X -n</screen> |
| | | |
| | | <para>Database pre-load is single-threaded, and loads each database one |
| | | at a time.</para> |
| | | |
| | | <para>By allowing file system cache to hold the portion of database that |
| | | does not fit in DB cache, you trade less deterministic and slightly slower |
| | | response times for not having to pre-load the DB and not having garbage |
| | | collection pauses with large JVMs. How you configure the file system cache |
| | | depends on your operating system.</para> |
| | | </section> |
| | | |
| | | <section> |
| | | <title>Logging Settings</title> |
| | | |
| | | <para>Debug logs trace the internal workings of OpenDJ, and therefore |
| | | generally should be used sparingly, especially in high performance |
| | | deployments.</para> |
| | | |
| | | <para>In general leave other logs active for production environments to |
| | | help troubleshoot any issues that arise.</para> |
| | | |
| | | <para>For OpenDJ servers handling very high throughput, however, such as |
| | | 100,000 operations per second or more, the access log constitue a performance |
| | | bottleneck, as each client request results in multiple access log |
| | | messages. Consider disabling the access log in such cases.</para> |
| | | |
| | | <screen width="80">$ dsconfig -p 4444 -h `hostname` -D "cn=Directory Manager" -w password \ |
| | | > set-log-publisher-prop --publisher-name "File-Based Access Logger" \ |
| | | > --set enabled:false -X -n</screen> |
| | | </section> |
| | | </section> |
| | | </chapter> |