Archive for the ‘Performance’ Category

Presentation for October Rules Fest

Thursday, September 18th, 2008

orf-logo.png

I am busy working on my presentation for October Rules Fest this week — assembling the slides and double checking details with UBS. The abstract is below and the presentation is shaping up very nicely. Hope to see you there!

Distributed Data Processing with ILOG JRules

Abstract

UBS Bank operates in over 50 countries and employs more than 80,000 people. Learn how UBS uses ILOG JRules and a distributed grid architecture to generate internal and regulatory reports for all the regions around the world off a single dataset. In order to acheive this UBS is required to process 2 billion records every night, with over 30 million records passing through the rule engine. Performance objectives are in place to ensure the bank meets its regulatory and financial reporting requirements before the trading day starts.

JRules in the Real-World

Friday, September 12th, 2008

800px-Motherhood_and_apple_pie.jpg

I just got back from an on-site visit with one of our customers in the USA, a major insurance company. The customer is using JRules 6.6.x for commercial property insurance underwriting. The trip was very interesting as we were able to get hands-on with their large rule project for 3 days — approximately 40,000 rules. We gained some fascinating insights into their development challenges and in exchange we performed an audit to ensure they were getting the most out of JRules. I came back with a couple of bugs to fix and a headful of ideas for enhancements.

One of their major challenges is that they simultaneously work on 3 versions of their platform — a 3 month major release (2.0 say), a 1 month minor release (1.1) and a weekly “patch” release (1.0.5). The patch release is edited in Rule Team Server by business users while the other two releases are edited in Rule Studio by Java developers with JRules training. Due to this rolling release schedule they version the rules in a source code control system (using branches) and have to perform regular merges of the changes coming from RTS into RS.

The RTS repository does not support branches. This was a conscious limitation when we designed RTS as we believed that branching and merging would be problematic for a business user. The development process we therefore recommended is to treat RTS as a satellite user of the source code control system.

  1. For each branch in the source code control system an RTS instance is created, which could be just a table space in a database rather than a new application server install.
  2. The business users collaboratively edit the rules within RTS. When they are ready they request that the changes be moved into the system of record — the source code control system (SCCS).
  3. A RS user updates their workspace from SCCS and then synchronizes with the appropriate RTS instance. The RS user resolves any conflicting changes and commits everything to SCC.
  4. The RS user then uses the SCCS merge tools to merge the contents of the code branch with the next branch. For example versions 1.0.5 might be merged into the branch for versions 1.1.
  5. At a later checkpoint the 1.1 branch will be merged with the branch for version 2.0

As you can see one of the RS users essentially “proxies” the changes coming from RTS into her local workspace.

Due to the textual nature of BAL rules (if-then-else) merging them is relatively straightforward. Decision Tables and Trees are more problematic as they are persisted as complex XML documents. We spent some time discussing the APIs that we provide that might enable a graphical merge capability for Decision Tables.

We also spent quite a bit of time optimizing the RS build time for these large projects. Here is a checklist:

  • Check RS JVM heap size. For large projects allocate as much heap as you can, typically around 1.2 GB on 32-bit Windows.
  • Use the latest patch level of the JVM and JRules. Both are constantly evolving and usually getting faster!
  • Switch off as many semantic checks as you can to save time during the build. The settings can be found on the Build preferences page. The checks are useful of course but some may be time-consuming, particularly static rule analysis.

    rs-build-prefs.png

  • If you are using Decision Tables switch off the semantic checks for each table. For large decision tables this can result in significant time savings during builds. You may want to leave the checks enabled while you edit the table and switch them off just before committing your changes.
  • rs-dt-prefs.png

  • Keep your vocabulary “lean and mean”, as a bloated vocabulary makes the rules slower to parse. Only verbalize BOM elements that you plan on using in your rules.

I hope these tips will help as your projects get larger. Rest assured that we take real-world feedback sessions such as this one very seriously and we are constantly trying to stay one step ahead in the race for better performance.

When Does Complex Event Processing (CEP) Complement a BRMS?

Tuesday, July 22nd, 2008

It’s my pleasure to introduce Pierre-Henri Clouin, an ILOG colleague, for this post on Complex Event Processing (CEP). CEP has emerged as one of the “hot topics” in the rules space over the past 12 months. Pierre-Henri is based in Sunnyvale, California and has spent several months looking at the technical capabilities of the CEP products as well as how they are positioned. His first post provides a nice introduction to the subject matter.

Perso_pix_web_ILOG.JPG

Pierre-Henri Clouin, ILOG

As interest grows in CEP, we have started receiving inquiries about how CEP and BRMS compete with or complement each other. After discussing with customers, prospects, and vendors, and reviewing a wide range of use cases, a few patterns have emerged.

CEP shines when:

  • event data rates are very high, typically in the 100,000s events per second, and with multiple event streams;
  • latency is low, typically in the millisecond range;
  • flat data model, simple data type;
  • a few stable rules/ statements/ queries (a few dozens at most) are deployed for filtering, joins or aggregate computation.

These core capabilities are well documented. For additional details, Mark Tsimelzon’s CEP Complexity Scorecard summarizes them very effectively.

On the other hand, a BRMS addresses three critical needs:

  1. rich rulesets, typically ranging from a few dozens to tens of thousands of business rules;
  2. a complete lifecycle management environment for business rules, empowering technical and business users to author, manage, simulate and retire business rules;
  3. extreme agility with the ability to update business rules in as little as a few minutes.

ILOG BRMS does not compromise on performance either, as have shown benchmarks and actual deployments with demanding customers, such as some of the largest websites, payment networks, underwriters, and telecom operators.

brms+cep.png

The map above sums it up: a CEP engine complements a BRMS for use cases with large data rates, low latency, and rich decision automation and management. The CEP engine pairs down the volume of events and only passes interesting events on to the BRMS to perform a rich decision process. Examples abound, notably in fraud management and national security.

Conversely, CEP overlaps with BRMS at the low end of data rates, latency requirements and rulesets. This is the area where we’ve seen some confusing accounts and claims and where a CEP engine provides limited value on top of a BRMS.

In upcoming posts, we will continue to explore and discuss best practices surrounding BRMS and CEP. We encourage you to reach out to us with related experience and questions.

JRules Rule Execution Server Memory Usage

Tuesday, July 8th, 2008

Albin sent me some more benchmark results; an update to the results I published a couple of weeks ago. The initial results were intriguing because of the low heap usage on IBM z/OS, while we only ran tests with the Rule Execution Server using the J2SE and POJO rule sessions. For the second benchmark run Albin therfore tested WebSphere Application Server running on Windows, to provide a point of comparison with the same application server running on z/OS. He also expanded the benchmark to test ruleset execution using IlrContext, providing a measure of the memory overhead of the Rule Execution Server.

JVM Heap Size by Number of Rules (Windows)

The chart below shows the memory requirements for ruleset execution based on 4 scenarios:

  1. Command line application using IlrContext rule engine API
  2. Command line application using Rule Execution Server with J2SE rule session provider
  3. Servlet deployed to IBM WebSphere Application Server using RES with J2SE rule session
  4. Servlet deployed to IBM WebSphere Application Server using RES with POJO rule session

win-heap.png

JVM Heap Size by Number of Rules (z/OS)

The four scenarios described above were also run on z/OS.

zos-heap.png

Comparing Windows vs z/OS Heap Usage

win-zos-main-heap.png win-zos-servlet-heap.png

Conclusions

  1. JVM heap usage reported on z/OS is consistently less than on Windows. For example, using a Servlet deployed to WAS with Rule Execution Server with POJO provider, the memory usage was 17%-21% less than the same configuration running on Windows.
  2. The Rule Execution Server has a heap memory overhead compared to IlrContext of between 3% and 15% depending on configuration and the size of the ruleset.
  3. There is very little difference in heap usage between the Rule Execution Server with J2SE provider and with POJO rule session provider on either Windows or z/OS.

The product overall showed predictable memory usage patterns on both Windows and z/OS with modest heap memory requirements. The highest heap memory reported was 32 MB for a Servlet deployed to WebSphere Application Server using the Rule Execution Server J2SE provider, executing a ruleset containing 528 rules.

Memory Requirements for JRules on IBM zSeries

Thursday, June 26th, 2008

This week Albin Carpentier and I helped to run JRules memory consumption benchmarks on IBM zSeries. Within R&D we have 24×7 remote access to a machine we rent from IBM in Dallas, Texas. A large bank in Europe is evaluating deploying JRules on the mainframe and Albin was able to quickly deploy the Rule Execution Server to WebSphere and DB2 running on z/OS so we could gather memory consumption statistics to help inform their decision.

The configuration tested was: zOS 1.9, DB2 8.1, WebSphere Application Server 6.1.0.16 and IBM JDK 5.

We used a proof of concept ruleset containing representative rules that were cloned to create rulesets with 33, 66, 132, 264 and 528 rules. Memory consumption was measured using the JDK 5 MemoryMXBean and MemoryPoolMXBean. We ran three scenarios:

  1. Invoking the engine within a JVM (no WebSphere) using IlrContext,
  2. Rule Execution Server on WAS using J2SE provider,
  3. Rule Execution Server on WAS using POJO provider.

The product performed very well and we saw linear scalability in all three scenarios. The highest heap memory usage we saw was a very modest 35 MB on WebSphere with the Rule Execution Server.

res-zseries-memory.png

The Good, The Bad and The Ugly - Rule Engine Benchmarks

Thursday, June 19th, 2008

Goodbaduglydvd

Last night I watched “Unforgiven” so I thought I’d continue the spaghetti Western theme today; looking at the challenge of developing, running and interpreting benchmarks for Business Rule Management Systems. I recently read the excellent article “Behind the benchmarks: SPEC, GFLOPS, MIPS et al” by Jon “Hannibal” Stokes. Although Jon discusses CPU benchmarking many of his points are relevant to rule engine benchmarking as well. For example, here is how he describes the various techniques used to to measure CPU performance:

  • Real-world Applications: one of the most popular, and best, ways to benchmark a system is by running a real program on it and see how long it takes the program to complete a task.
  • Kernels: a small, CPU-intensive portion of a real program that’s intended to be run on its own or as part of a benchmarking suite.
  • Toy Applications: small programs like Quicksort or the Sieve of Eratosthenes that people cook up to compile and run on any computer.
  • Synthetic Benchmarks: try to determine the average instruction mix of a typical program so that it can replicate that mix and run those types of instructions on the CPU.

The current “Academic Benchmarks” for BRMS fall somewhere within the definition of a “Kernel” or a “Toy Application”, probably closer to the latter. Although they can be useful for implementors of rule engines, I’d argue they are of no use to evaluators of rule engine technology and may be a significant distraction. ILOG has been guilty of publishing academic benchmark numbers in the past of course, so I am not putting us on a pedestal!

Academic Benchmarks

The issues with the academic benchmarks for BRMS are fairly well documented, and there seems to be a general consensus that they should not be used for reliable product performance comparison, however they keep cropping up! Even in Ruby!

Here are some of the problems:

  • Very small number of rules (less than 50)
  • Test worst-case RETE inference and agenda usage, which is rarely encountered with well designed applications
  • Implement solutions to combinatorial search problems, typically better implemented using a CP Engine rather than a RETE engine
  • The solutions are not verified, meaning that correctness may be inadvertently (?) sacrificed for speed. This is particularly the case for Waltz and WaltzDB for which I’ve never seen any verification of the results.
  • The benchmarks are easy to “game” because every vendor ports from the original OPS5 source code into their own rule language, where they take advantage of their product’s features

For reference, I have included the descriptions for the three most famous academic benchmarks below (paraphrased from the paper “Effects of Database Size on Rule System Performance: Five Case Studies” published by Daniel Miranker et al). How representative do they sound of your application!?

Manners

Manners is based on a depth-first search solution to the problem of finding an acceptable seating arrangement for guests at a dinner party. The seating protocol ensures that each guest is seated next to someone of the opposite sex who shares at least one hobby.

Waltz and Waltzdb

Waltz was developed at Columbia University. It is an expert system designed to aid in the 3-dimensional interpretation of a 2-dimensional line drawing. It does so by labeling all lines in the scene based on constraint propagation. Only scenes containing junctions composed of two and three lines are permitted. The knowledge that Waltz uses is embedded in the rules. The constraint propagation consists of 17 rules that irrevocably assign labels to lines based on the labels that already exist. Additionally, there are 4 rules that establish the initial labels.

Waltzdb was developed at the University of Texas at Austin. It is more general version of the Waltz program. Walzdb is designed so that it can be easily adapted to support junctions of 4, 5, and 6 lines. The method used in solving the labeling problem is a version of the algorithm described by Winston [Winston, 1984]. The key difference between the problem solving technique used in waltz and waltzdb is that waltzdb uses a database of legal line labels that are applied to the junctions in a constrained manner. In Waltz the constraints are enforced by constant tests within the rules. The input data for waltz is a set of lines defined by Cartesian coordinate pairs.

Conclusions

If performance is one of your major buying criteria then I would strongly encourage you to build a proof-of-concept set of rules and data and verify rule engine performance in your own environment. It is impossible to meaningfully extrapolate from published academic benchmark results to your running application, with your rules and data, deployed to your OS and hardware. In addition this will also allow you to evaluate the BRMS from as many angles as possible, spanning ease-of-use, business user accessibility, support and professional services, performance, deployment platforms, scalability etc.

References

AMD CodeSleuth

Tuesday, March 18th, 2008

I saw a cool demo of a new (Open Source) profiler from AMD this morning. The profiler uses the hardware performance counters in the CPU (it supports both AMD and Intel chips) and profiles at the machine instruction set level. The CodeSleuth Eclipse plug-in can relate the machine instructions back to Java source code statements. If you want a deep view into the instructions generated by the JVM’s JIT, and how they relate to your Java application this could be very useful. They also claim that the overhead is very low because the profiler does not need to perform byecode injection to implement counters for example — instead using counters built into the CPU cores.

Profiling Tips and Tricks

Friday, January 11th, 2008

Performance-Trick

For many people business rules is a new technology, and they don’t yet have an innate sense of the performance profile of an application that uses a rule engine. For example, is invoking a set of 1,000 rules faster or slower than a database call? Of course the answer is “it depends”, so in this post I’d like to outline some best practices when dealing with performance issues in business rules applications.

There is a lot of great material out there on performance testing, tuning, continuous performance and software performance engineering. I won’t rehash it here, but if you need general guidance on when to test or monitor performance, the tools available, as well as performance modeling I recommend the following links as a jumpstart:

For applications with stringent performance targets or Service Level Agreements I believe that you should seriously consider continuous performance monitoring. By investing a small daily amount in performance monitoring you can avoid expensive refactoring due to performance issues found at the end of your development cycle. This is particularly important for business rules applications where the relationship between the rules being authored and the code being invoked at runtime is not as obvious as with a traditional single-programming-language procedural software system.

First, let’s take a look at the basic sequence of actions required to invoke a ruleset using JRules:

  1. Ruleset archive is parsed, creating an IlrRuleset
  2. Instantiation of an IlrContext for the IlrRuleset
  3. Creation of input data (IN,INOUT parameters) for the ruleset
  4. Optionally add objects to Working Memory
  5. Execute the ruleset
  6. Retrieve INOUT and OUT parameters
  7. Optionally retrieve objects from Working Memory
  8. Reset IlrContext for subsequent execution

Operation (1) is costly, and for a large ruleset containing 10K+ rules can take several minutes, however it is a one time cost as the resulting IlrRuleset should be pooled for subsequent reuse. Rule Execution Server provides out of the box caching of IlrRuleset and IlrContext.

Operation (3) is often slow as this is where your code hits various expensive backend systems such as databases to retrieve the data required by the rule engine. Caching within your code may be a useful optimization here.

The time required for operation (5) depends on the details of the rules, ruleflows and rule tasks within your ruleset. The rules typically invoke methods in your Java Executable Object Model (XOM) - so it is very important to determine the amount of time spent within your code versus the time spent within the rule engine or within the rules.

Rule Engine Performance Cost Breakdown

The performance cost for a JRules application may looks something like the figure below.

Performance-Breakdown-1

The vast majority of application time is typically spent within the Executable Object Model (XOM) code invoked by the rules. It is therefore critical to get a good understanding of which methods within your XOM code are going to be invoked by rules and roughly how frequently. This may also include synchronization considerations, because if you have 10 rule engines concurrently calling a synchronized method within your XOM your throughput is going to suffer!

Profiling

If you have an execution performance problem the first thing to do is to execute your application with a Java profiler. There are many good profilers, however my current favorite is JProfiler, as I find it easy to use and it has good platform support (including 64-bit platforms). It also has good Eclipse and hence Rule Studio integration.

To write this post I imported the debug-rules-answer example into Rule Studio using the Import Wizard. I then launched JProfiler and used its Integration Wizard to add JProfiler to the Eclipse instance used by Rule Studio. I then created a new “Java Project for Rules” that invokes the debug-rules-answer Rule Project. I made a small customization of the generated code to invoke the ruleset’s ilrmain function using engine.executeMain(null). Once these steps are complete you can launch JProfiler to profile the Java Project for Rules from within Rule Studio.

The first test is simply to launch the application with the profiler to get a sense for performance. Using the default JProfiler settings should give you something like this:

Profiler-Cpu-All

For a short run the CPU Hot Spots view will be dominated by ruleset parsing. As the ruleset parsing is implemented by JRules and is a one-time cost, it is very useful to be able to ignore ruleset parsing when you are profiling your rules and XOM. The easiest way is to start the profiler with CPU and Allocation tracking disabled on startup and then enable both using a Method Trigger.

Using Method Triggers: Controlling the Profiler

The diagram below shows the Method Trigger I added to enable CPU and Allocation profiling once the IlrContext.executeMain method is invoked.

Profile-Trigger

The resulting CPU Hot Spots view is a lot more interesting as it is limited to the execution of the rules:

Profiler-Cpu2

As you can see, the code in ilog.rules package makes up a small fraction of the overall time, with the majority spent in java.util.Calendar methods, invoked by the ruleset’s XOM, rules and functions.

Using Filters: Profiling the XOM

It is very useful to be able to perform targeted profiling of a package - particularly to profile the XOM in isolation. You can use the Filter Settings for the profiler to specify that you want to exclude the ilog.rules package and include the carrental (XOM) package for example:

Profiler-Filters

This will give your a CPU Hot Spots view something like the diagram below, showing that we spend about 400 ms performing data setup in the populateBranches method.

Profiler-Xom

Using Filters: Profiling JRules

Of course, within ILOG we spend most of our time profiling JRules, not the XOM! By reversing your filters to include the ilog.rules package you get a rule engine centric profiling view:

Profile-Engine

Many of the rule engine method names are internal and obfuscated so this view is not terribly useful outside ILOG, but it could be used for troublshooting.

GC Overhead: Profiling Allocations

Sometimes GC overhead for an application may become an issue, particularly if objects are being created faster than they can be reclaimed, forcing the JVM to halt execution threads periodically so that the GC thread can catch up. You can use the Allocation Hot Spots view to understand which methods in your XOM are responsible for object allocations. Again, java.util.Calendar is the culprit for this sample:

Profile-Allocation

Profiling and the Rule Engine JIT

The rule engine JITs all rule tasks that use the Sequential or Fastpath algorithms to JVM byte-code. It also JITs BOM-to-XOM (B2X) methods and optionally method calls in conditions for rule tasks using the RETE algorithm. This makes profiling these calls very easy using a profiler, as you will see the JITted entities as normal Java methods in the profiler.

The image below shows that the rule engine generated a Java method for the RetalAgreement.getCustomerAge BOM method and that this method is invoked by 5 eligibility rules. In turn the BOM method calls 8 XOM methods, with the majority of time spent in the java.util.Calendar.getInstance method.

Graph-Sequential-1

Recommendations

My recommendations for avoiding performance issues in your rule projects are:

  • Understand, document and communicate your performance objectives
  • Consider putting in place continuous performance measurement to provide you with a safety-net as you modify your XOM and rules. It is much easier to see the impact of adding a synchronized method call to the XOM, or adding a rule with a complex join if you are receiving performance/throughput reports every morning.
  • Performance test with realistic data. There is no point optimizing for a dummy data set that bears no resemblance to what you will see in production.
  • Understand the frequency with which the methods in your XOM are invoked by rules - particularly if you are doing lots of inferencing using the RETE algorithm
  • If you are performing combinatorial/complex joins across attributes in your rules try to do a back of an envelope performance model. Are you expecting O(n) performance, O(n2) or worse?
  • Spend time optimizing your XOM where profitable
  • Spend time getting to know the features of your profiler
  • BEWARE java.util.Date and java.util.Calendar! If you need to perform lots of Date/Time manipulation in rules consider using another library, such as Joda Time

Academic Benchmark Performance

Monday, October 22nd, 2007

We are lucky enough to employ some of the smartest rule engine minds in the industry, so last week I was not too surprised to hear that in JRules 6.x the rule engine team have raised the performance bar yet again. We run many performance tests, but the ones people seem to get most excited/obsessed about are the so-called “academic benchmarks”. Frankly I don’t like these benchmarks very much because they are not representative of the types of problems customers are trying to solve using business rules. Anyone doing N-dimensional graph labeling using rules? Assigning guest seating for a dinner party? Calculating the Fibonacci series? I didn’t think so…! To compound matters the results of some of the academic benchmarks are difficult to verify, leading to performance comparisons between buggy implementations or inadvertent over-optimization.

As an aside, I think it would be much more useful to take a simplified customer problem as the basis for a benchmark. The functional characteristics of the problem would be described, and then all the vendors could provide a solution. Solutions would be graded not purely on performance (my assembler is faster than your Java!), but also on subjective criteria such as the business expressiveness of the rules, the flexibility of the solution to evolution and change etc. In some cases this would also allow vendors to showcase their specialized execution algorithms, such as Sequential or Fastpath from ILOG. I will return to this subject in a future blog post.

To make the results more interesting (and to provide some basis for comparison) today I ran the Waltz and Manners academic benchmarks against Drools 4.0.1 as well as JRules 6.6.1 Trial. As you can see in the table and charts below the results are very encouraging for JRules users, however I’d also like to tip my hat to Mark Proctor and Co. as Drools 4 is certainly moving in the right direction performance-wise. The Drools team is doing a good job at evangelizing business rules to Java developers and we hope to also do our part with the newly launched BRMS Resource Center. We welcome you to recreate these results in your own environment by downloading a trial version of JRules from the BRMS Resource Center. Of course, if you have optimization tips for improving either the Drools or JRules results I’d love to hear from you.

Jrules-Perf-Manners

The Manners results show broadly the same performance profile for both products, with JRules between 50% and 150% faster than Drools. Note that for reasons I don’t yet understand the Drools engine fires slightly more rules than JRules for Manners 512, while the two engines fire the same number of rules for the other Manners benchmarks.

Jrules-Perf-Waltz

The results for Waltz are very different, with JRules showing a clear algorithmic advantage for this benchmark. Again, the two engines do not agree on the number of rules that should be fired.

Jrules-Perf-Table-1

Naturally our Rules for .NET customers will also experience improved rule engine performance as the vast majority of the rule engine code is shared between JRules and Rules for .NET.

An Excel spreadsheet with the results I recorded is available here. After you’ve downloaded the JRules Trial you can run these benchmarks by downloading the ZIP file here.

Need for Speed? Upgrade your JDK!

Friday, August 3rd, 2007

Caveats

I feel compelled to remind you that in this post I am presenting results from benchmarks. Benchmarks are not applications and will not reflect the performance profile of the application you are designing. Below I am also only reporting average rule engine execution time. There are many other factors that may be important to you, such as ruleset parsing time or JVM memory usage.

Benchmarks

I spent today re-running some old performance tests against JRules 5.1.0 (yes, that old thing!). There have been various reports of performance improvements in JDK 6, so I wanted to try to quantify them for JRules. I ran four benchmarks I have used in the past:

  • ManyObjects: a very working memory intensive benchmark that inserts 1M objects into working memory and processes them using 7 rules
  • PatternMatching: uses advanced Rete constructs to correlate 100 accounts and 400 account events using 15 rules
  • Sequential: executes 5959 sequential rules against a single input parameter, a single rule fires per execution
  • XmlBinding: processes two XML documents using ILOG XML binding and 401 rules

I tested using the JDKs I had at hand (Sun, IBM and BEA) using a single machine configuration (my Dell D620 core-duo laptop running Windows XP). All benchmarks were run on untuned JVMs, just setting 512 MB as the maximum heap size.

Summary

Jrules51 Summary-1

The chart above shows the average execution time over 10 runs, with the first run discarded. The vertical axis is in milliseconds (smaller is better). You can clearly see that IBM JDK 1.4.2 and Sun JDK 1.5.0_08 both have significant performance problems with these benchmarks. By just upgrading your Sun JDK from patch level 8 to patch level 11 you will see a huge performance improvement (with these benchmarks). In general terms however you can see the JVM vendors tuning and optimizing their implementations and IBM, BEA and Sun are all playing in the same ballpark once we get to JDK 5/6.

So, impressive work by Sun on JDK 6:

  • +49% for Sequential
  • +20% for PatternMatching
  • +17% for ManyObjects
  • +24% for XmlBinding

JRules 6.7 will officially support JDK 6.

Details

If you look at the individual benchmark results you can see how the different JVM vendors have optimized for different use cases.

Jrules51 Many-1

The ManyObjects benchmark performs very well on JRockit. JRockit must be optimized for memory allocation and have an efficient default garbage collector for this benchmark.

Jrules51 Pattern-1

The PatternMatching benchmark also performs well on BEA JRockit.

Jrules51 Seq-1

The Sequential benchmark is very interesting because it shows that JRockit is (still) poor at handling dynamically generated classes. In comparison the Sun JDK performs very well with this sequential mode benchmark.

Jrules51 Xml-1

IBM JDK 5 performed very well with ILOG XML binding.