Profiling Tips and Tricks

For many people business rules is a new technology, and they don’t yet have an innate sense of the performance profile of an application that uses a rule engine. For example, is invoking a set of 1,000 rules faster or slower than a database call? Of course the answer is “it depends”, so in this post I’d like to outline some best practices when dealing with performance issues in business rules applications.
There is a lot of great material out there on performance testing, tuning, continuous performance and software performance engineering. I won’t rehash it here, but if you need general guidance on when to test or monitor performance, the tools available, as well as performance modeling I recommend the following links as a jumpstart:
- Continuous Performance by Cody Menard, particularly the associated discussion thread
- Matt Secoske’s presentation from the Open Source Convention
- Kirk Pepperdine’s article on Iterative, Continuous and Automated Performance
For applications with stringent performance targets or Service Level Agreements I believe that you should seriously consider continuous performance monitoring. By investing a small daily amount in performance monitoring you can avoid expensive refactoring due to performance issues found at the end of your development cycle. This is particularly important for business rules applications where the relationship between the rules being authored and the code being invoked at runtime is not as obvious as with a traditional single-programming-language procedural software system.
First, let’s take a look at the basic sequence of actions required to invoke a ruleset using JRules:
- Ruleset archive is parsed, creating an IlrRuleset
- Instantiation of an IlrContext for the IlrRuleset
- Creation of input data (IN,INOUT parameters) for the ruleset
- Optionally add objects to Working Memory
- Execute the ruleset
- Retrieve INOUT and OUT parameters
- Optionally retrieve objects from Working Memory
- Reset IlrContext for subsequent execution
Operation (1) is costly, and for a large ruleset containing 10K+ rules can take several minutes, however it is a one time cost as the resulting IlrRuleset should be pooled for subsequent reuse. Rule Execution Server provides out of the box caching of IlrRuleset and IlrContext.
Operation (3) is often slow as this is where your code hits various expensive backend systems such as databases to retrieve the data required by the rule engine. Caching within your code may be a useful optimization here.
The time required for operation (5) depends on the details of the rules, ruleflows and rule tasks within your ruleset. The rules typically invoke methods in your Java Executable Object Model (XOM) - so it is very important to determine the amount of time spent within your code versus the time spent within the rule engine or within the rules.
Rule Engine Performance Cost Breakdown
The performance cost for a JRules application may looks something like the figure below.
The vast majority of application time is typically spent within the Executable Object Model (XOM) code invoked by the rules. It is therefore critical to get a good understanding of which methods within your XOM code are going to be invoked by rules and roughly how frequently. This may also include synchronization considerations, because if you have 10 rule engines concurrently calling a synchronized method within your XOM your throughput is going to suffer!
Profiling
If you have an execution performance problem the first thing to do is to execute your application with a Java profiler. There are many good profilers, however my current favorite is JProfiler, as I find it easy to use and it has good platform support (including 64-bit platforms). It also has good Eclipse and hence Rule Studio integration.
To write this post I imported the debug-rules-answer example into Rule Studio using the Import Wizard. I then launched JProfiler and used its Integration Wizard to add JProfiler to the Eclipse instance used by Rule Studio. I then created a new “Java Project for Rules” that invokes the debug-rules-answer Rule Project. I made a small customization of the generated code to invoke the ruleset’s ilrmain function using engine.executeMain(null). Once these steps are complete you can launch JProfiler to profile the Java Project for Rules from within Rule Studio.
The first test is simply to launch the application with the profiler to get a sense for performance. Using the default JProfiler settings should give you something like this:
For a short run the CPU Hot Spots view will be dominated by ruleset parsing. As the ruleset parsing is implemented by JRules and is a one-time cost, it is very useful to be able to ignore ruleset parsing when you are profiling your rules and XOM. The easiest way is to start the profiler with CPU and Allocation tracking disabled on startup and then enable both using a Method Trigger.
Using Method Triggers: Controlling the Profiler
The diagram below shows the Method Trigger I added to enable CPU and Allocation profiling once the IlrContext.executeMain method is invoked.
The resulting CPU Hot Spots view is a lot more interesting as it is limited to the execution of the rules:
As you can see, the code in ilog.rules package makes up a small fraction of the overall time, with the majority spent in java.util.Calendar methods, invoked by the ruleset’s XOM, rules and functions.
Using Filters: Profiling the XOM
It is very useful to be able to perform targeted profiling of a package - particularly to profile the XOM in isolation. You can use the Filter Settings for the profiler to specify that you want to exclude the ilog.rules package and include the carrental (XOM) package for example:
This will give your a CPU Hot Spots view something like the diagram below, showing that we spend about 400 ms performing data setup in the populateBranches method.
Using Filters: Profiling JRules
Of course, within ILOG we spend most of our time profiling JRules, not the XOM! By reversing your filters to include the ilog.rules package you get a rule engine centric profiling view:
Many of the rule engine method names are internal and obfuscated so this view is not terribly useful outside ILOG, but it could be used for troublshooting.
GC Overhead: Profiling Allocations
Sometimes GC overhead for an application may become an issue, particularly if objects are being created faster than they can be reclaimed, forcing the JVM to halt execution threads periodically so that the GC thread can catch up. You can use the Allocation Hot Spots view to understand which methods in your XOM are responsible for object allocations. Again, java.util.Calendar is the culprit for this sample:
Profiling and the Rule Engine JIT
The rule engine JITs all rule tasks that use the Sequential or Fastpath algorithms to JVM byte-code. It also JITs BOM-to-XOM (B2X) methods and optionally method calls in conditions for rule tasks using the RETE algorithm. This makes profiling these calls very easy using a profiler, as you will see the JITted entities as normal Java methods in the profiler.
The image below shows that the rule engine generated a Java method for the RetalAgreement.getCustomerAge BOM method and that this method is invoked by 5 eligibility rules. In turn the BOM method calls 8 XOM methods, with the majority of time spent in the java.util.Calendar.getInstance method.
Recommendations
My recommendations for avoiding performance issues in your rule projects are:
- Understand, document and communicate your performance objectives
- Consider putting in place continuous performance measurement to provide you with a safety-net as you modify your XOM and rules. It is much easier to see the impact of adding a synchronized method call to the XOM, or adding a rule with a complex join if you are receiving performance/throughput reports every morning.
- Performance test with realistic data. There is no point optimizing for a dummy data set that bears no resemblance to what you will see in production.
- Understand the frequency with which the methods in your XOM are invoked by rules - particularly if you are doing lots of inferencing using the RETE algorithm
- If you are performing combinatorial/complex joins across attributes in your rules try to do a back of an envelope performance model. Are you expecting O(n) performance, O(n2) or worse?
- Spend time optimizing your XOM where profitable
- Spend time getting to know the features of your profiler
- BEWARE java.util.Date and java.util.Calendar! If you need to perform lots of Date/Time manipulation in rules consider using another library, such as Joda Time















