Kernel comparison: Web serving on 2.4 and 2.6
New features add up to faster, more reliable Web performance
Li Ge, Staff Software Engineer
Linux Technology Center, IBM
10 Feb 2004
Many improvements have been made in the Linux 2.6 kernel to favor enterprise applications. This article presents results from the IBM Linux Technology Center's Web serving testing efforts, comparing various aspects of the Linux 2.4 and 2.6 kernels. Highlighted here are the key enhancements in the 2.6 kernel, the test methodologies, and the results of the tests themselves. Bottom line: the 2.6 kernel is much faster than 2.4 for serving Web pages, with no loss in reliability.
The objective of the Linux Web serving testing effort in the IBM Linux Technology Center (LTC) is to uncover Linux kernel defects. The emphasis is on workloads relevant to real-world enterprise user environments using Web servers/application servers, and on improving Linux kernel stability, scalability, and compatibility with Web servers/application servers. Identification of defects of Web servers and application servers is not the primary focus.
Overview of testing
Two categories of Web serving testing
There are two major servers available for Web serving: the Web server and the application server. I refer to them collectively in this article using the term Web serving.
We selected quite a few Web servers and application servers for our Linux kernel testing environment, including Apache, Jakarta-Tomcat, IBM® WebSphere® Application Server, and Jboss. Most of these are open source projects and can be downloaded for free (see Resources for links to more information on these servers).
Testing differences between the 2.4 and 2.6 kernels
The testing effort on the 2.5/2.6 kernel using Web servers and application servers as the test workload has been much more extensive than on the 2.4 kernel. In testing the 2.4 kernel, Apache and WebSphere Application Server were the only two servers used as part of integration-testing scenarios. The Web Performance Tool (WPT) was the major Web test tool used. The Web serving tests were executed when there was a major change in the kernel or as requested for software verification on a random basis.
In the 2.5/2.6 kernel testing, we have developed much more solid and complete test plans (see Resources for links to the 2.5 test plan and execution plans on SourceForge). The test scope, test methods, and test timeline are well defined in the plans. Web server and application server testing are widely used as test tools in integration test, focus test, and user simulation test.
In addition using more servers, we used several different Web client test tools, including WPT, Hammerhead, Httperf, and Pagepoker to simulate different types of user environments. All of the server and client tools were executed with a different duration (24 hours and 96 hours) against the latest available kernel on a regular basis.
Moreover, the test hardware was not limited to an Intel-based single-processor system. The tests were done on 1-way, 4-way, and 8-way IBM eServer™ xSeries® machines, as well as on a 64-bit IBM PowerPC® system. Kernel-related defects were opened in the Linux kernel bug tracking system.
Key enhancements in the 2.6 kernel
Web serving plays a major role in the enterprise world. Significant improvements and changes have been made on the 2.6 kernel to favor enterprise applications. New hardware support, software support, and internal kernel improvements give the 2.6 kernel better scalability and stability. The 2.6 kernel performs much better than the 2.4 kernel under heavy load across a number of CPUs and a large amount of memory. Some of the key features in 2.6 that will benefit enterprise applications include:
New hardware support
Linux supports a wide range of hardware platforms. The 2.6 kernel supports new architectures, such as the 64-bit PowerPC, the 64-bit AMD Opteron, and embedded processors.
Hyper-threading
Hyper-threading, an innovation from Intel, is a major hardware enhancement supported
by the 2.6 kernel. Basically, hyper-threading can create multiple virtual processors
based on a single physical processor using simultaneous multi-threading technology
(SMT); multiple application threads can be run simultaneously on one processor.
To take full advantage of it, applications need to be multithreaded.
Hyper-threading offers many benefits to Web servers and application servers. It can increase the number of transactions that can be processed, provide faster server response time, and enable servers to handle larger workloads and more user requests. Currently, Intel Pentium 4 Xeon processors have hyper-threading hardware built-in.
NUMA (Non-Uniform Memory Access)
NUMA is another major feature that has been added in the Linux 2.6 kernel to improve
system performance. In the traditional model for multiprocessor support (symmetric
multiprocessing, or SMP), each processor has equal access to memory and I/O. The
high contention rate of the processor bus becomes a performance bottleneck. The
NUMA architecture can increase processor speed without increasing the load on the
processor bus. In NUMA systems, each processor is close to some parts of memory
and further from others. Processors are arranged in smaller regions called nodes.
Each node has its own processors and memory; the nodes can talk to each other. It
is quicker for processors to gain access to memory in a local node than in different
nodes. Minimizing the inter-node communications can improve the system performance.
To support NUMA hardware, the Linux kernel adapts a series of enhancements in several areas, including the scheduler, multi-path I/O, a user-level API to let a user understand the allocation of processor and memory resources to be used, and internal kernel APIs to let the kernel subsystems understand NUMA topology. NEC Azusa, IBM x440, and IBM NUMA-Q are examples of NUMA machines.
Expanded device support
In the 2.6 kernel, more types of devices have been supported. The 2.6 kernel has also expanded the limitation of the major number from 255 to 4095 and has allowed more than one million subdevices per type. This should give high-end enterprise systems sufficient support.
Threading improvements
The 2.6 kernel adapts the new thread library, Native POSIX Thread Library (NPTL). This new library is based on a 1:1 model and full POSIX compliance. A test done by Red Hat indicates that on an old IA-32 dual 450MHz PII Xeon system, 100,000 threads could be created and destroyed in 2.3 seconds (with up to 50 threads running at any one time) using NPTL.
NPTL gives the kernel a major performance boost for multi-threading applications in an SMP environment. It is especially valuable for heavily multi-threaded enterprise level application, such as Java® applications, as well as Web server and application server applications.
Another threading improvement in the 2.6 kernel is that the number of PIDs that can be allocated has increased from 32,000 to 1 billion. The threading change improves the application-starting performance on heavily loaded systems. The 2.4 kernel sometimes suffers with higher numbers of PIDs requested by applications due to the low PID limit it allows.
O(1) scheduler
The O(1) scheduler was accepted into the official Linux 2.5 kernel tree in 2002. The O(1) scheduler increases Linux scalability and overall performance by improving throughput with large numbers of processes, especially on large SMP. O(1) scales well with a large number of tasks and CPUs and has strong affinity, to avoid tasks bouncing between the CPUs. The O(1) scheduler also allows for load-balancing across CPUs and NUMA-aware load-balancing.
I/O improvements
Block I/O Layer
The Block I/O Layer in the 2.6 kernel has been rewritten to improve kernel scalability
and performance. The global I/O request lock in 2.4 has been removed. The block
I/O buffer (kiobuf) in the 2.6 kernel allows I/O requests larger than PAGE_SIZE.
Most of the problems that are seen are caused by the use of the buffer head and
kiobuf and are addressed in the new layer. The I/O scheduler was completely rewritten.
There are also major improvements that have been made on SCSI support.
Asynchronous I/O
Asynchronous I/O is new in the 2.6 kernel. It provides ways for enterprise applications
such as Web servers and databases to scale up without resorting to complex internal
pooling mechanisms for network connections.
Other improvements
In addition to these enhancements, there are some other remarkable changes and new features worth mentioning. For example, the 2.6 kernel provides support for several new file systems, including JFS, XFS, NFS v4, and the Andrew File System (AFS). New networking protocols and features such as Stream Control Transmission Protocol (SCTP), Internet Protocol Security (IPSec), improved IPv6 support, and IP Payload Compression (IPComp) provide Linux 2.6 kernel users better network security and transmission quality.
Not all of the enhancements provided by the 2.6 kernel will apply to each enterprise application. Some of them do have specific hardware and software requirements. However, most of the enhancements listed here are general kernel improvements that will help Linux break the enterprise barrier.
Test infrastructure
In this section, I will discuss how the Web serving tests were done, including the hardware environment, selected Web servers/application servers and Web test tools, and the testing strategy with typical test scenarios. The following discussion is based on the 2.6 kernel.
Web serving servers
There were four Web serving servers used in Linux 2.6 kernel testing. Two were Web servers (Apache and Jakarta-Tomcat), and the other two were application servers (WebSphere Application Server and Jboss).
Apache is the market leader of Web servers. The Netcraft Web Server Survey found that more than 64% of the Web sites on the Internet are using Apache. It is an open source project.
Jakarta-Tomcat is an open source servlet container with a JSP environment available under the Apache license. Jakarta-Tomcat has a built-in Web server and can also be used with other Web servers in a production environment.
The WebSphere Application Server is an enterprise-level application server for dynamic e-business applications. The J2EE technology and Web services are the foundation of the server. The IBM WebSphere Application Server provides high performance and an extremely scalable transaction engine across most of the operating systems. More and more WebSphere applications are being migrated from a traditional UNIX operating system to Linux for lower costs with similar performance.
The Jboss Application Server is also an open source application server with a full J2EE personality. Started as an open source EJB container, Jboss is now targeted to become an enterprise-ready application server.
Web test tools
Quite a few Web test tools and benchmarks are available online. The following are the four open source tools we mainly used to simulate Web-client stress in our 2.6 kernel test environment (see Resources for links to more information on these):
In addition to the previous tools discussed for Web serving testing, IBM has a tool called Trade3, which is the WebSphere end-to-end benchmark and performance sample application. The Trade3 benchmark models an online stock brokerage application and provides a real-world workload driving WebSphere performance components and features.
Test strategy
The Web serving tests attempted to create user scenarios that viewed the system as a whole. The test duration started with 24 hours for the first run. The second run increased the time to 96 hours, with the third and fourth runs lasting seven days and 14 days, respectively. All the scenarios based on different combinations of server and client tools were executed on up to 8-way IBM xSeries and pSeries® servers. System utility monitoring tools were used to record the kernel stress level.
Figure 1 shows how several different test tools went to different Web servers or application servers. The different test tools tried to simulate different types of user environments.
Figure 1. Test environment
Figure 2 shows the stress test environment using the IBM WebSphere product and benchmark tool, Trade3, which simulates an online stock brokerage environment.
Test results summary
The following sections represent a snapshot of the Web serving testing with typical scenarios we used on the 2.4/2.6 kernel. A typical Apache/WPT test on an 8-way SMP IBM xSeries system demonstrates the dramatically improved performance on the 2.6 kernel without impacting the service quality.
Test environment
Observations
Table 1. Results comparison
Kernel | Average CPU utilization | Average memory utilization | Average swap utilization | Total Web page served | Page served per second | Processing mean time (millisecs) | Unsuccessful connections |
2.4.18 -smp | 100% (user:7.38% system:92.62%) | 6.41% | 0% | 8,845,147 | 102.37 | 294.44 | 0 |
2.6.0 –test5 | 99.42% (user:39.35% system:60.07%) | 35.96% | 0% | 53,827,939 | 623.00 | 57.71 | 0 |
Figure 3. Web pages served vs. time
Conclusion
We've shown that, using a typical test scenario (Apache/WPT on an 8-way SMP IBM xSeries system), the Apache server has better scalability and performance on the 2.6 kernel compared to the 2.4 kernel. On the same system under the same workload, the Apache server with 2.6.0-test5 kernel more effectively used system resources and served six times more Web pages than the 2.4.18 kernel did. This real data demonstrates that a variety of features and changes have helped the 2.6 kernel offer better scalability and performance and become more mature for enterprise-level applications.
>Resources
About the author
Li Ge is a Staff Software Engineer in the IBM Linux Technology Center. She graduated from New Mexico State University with an MS in Computer Science in 2001. She has been working on Linux for three years and is currently working on Linux kernel validation and Linux reliability measurement.
Copyright 2004