RegelSpraak for Business Rules: Experiences in building a Business Rules Compiler for the Dutch Tax Administration
The Dutch Tax Administration tried to reduce the complexity of the IT systems during 2008 / 2009. That initial project failed because the solutions became too complex. However, some of the members of the project team continued on, in their spare time, because they were convinced that the Rule-based Approach could help the Dutch Tax Administration to manage the legacy knowledge and reduce the complexity of the systems. The Specifications Team resulted from that effort.
In 2008, we used FICO's Blaze Advisor product as a rule engine, which was our introduction to the Rule-based Approach. All of our Income Tax rules were written in SRL (Structured Rule Language). We now know that it is important to write the rules in a language that is understandable for the business. Because lawyers are able to read the rules they are also capable of validating the rules and even writing them! But in addition to this, we also wanted this language to be as strict as SRL and only interpretable in one way. This will allow a machine to read the rules. The language, which we have developed, is called RegelSpraak.
The Specifications Team
In order to succeed on the next attempt, we came up with the idea of forming a team with various skills (disciplines). We started with a three-person team. Over time, people came and people went.
At this time we have a very interesting collection of skills/expertise:
- In the area of repositories: RuleXpress,
- Laws and law texts: fiscalists,
- Language and patterns: rules, natural language,
- Helpdesk: damage and requirements,
- Software engineering: database, compilers, user interfaces,
- Artificial intelligence: user experience,
- Domain experts: income tax, VAT, customs (etc.),
- Metadata: business and technical.
RegelSpraak is the Dutch translation of the word 'RuleSpeak'. And RuleSpeak was our starting point, from which we have developed RegelSpraak.
When we began, we already had a lot of rules in SRL and we started by rewriting most of the SRL rules as RuleSpeak statements. In parallel we did some code mining. We retrieved the conditions (rules) from existing legacy control tables, Microsoft-Word documents, and program code, and we transformed them into RuleSpeak. We ended up with about 3000 rules and 4500 terms.
The Dutch Tax Administration uses a lot of different systems, and every single system has its own terms. A major focus of this work involves synonyms and homonyms. From all the different existing terms we defined a unique set of terms and we use these terms in our rules. The synonyms are linked with these terms so that the terms used in the systems can be retrieved. At this time our rule maintenance application is RuleXpress; in this tool we manage the rules and terms.
The terms and rules need to be validated and changes must be applied. And here a problem occurs. The people able to validate the rules are the same people who maintain the existing legacy systems. A choice has to be made.
Option One is that these people do their normal jobs and deliver the old control tables. In this case the Specifications Team has to synchronize new and changed rules with the repository and the validation cannot be done. The advantage is that we have no interference with the existing legacy systems that are delivered in a release/version/patchlevel approach. However, the lack of time and capacity makes this choice a bad one.
Option Two is that these people directly write changed and new rules into RuleSpeak with RuleXpress. The advantage here is that we could benefit from the Rule-based Approach directly. In the existing working method it is necessary to copy the complete system every year; this is a lot of work. Working with rules that have a valid begin and/or end date, it is easy to produce deltas. The advantage is that the rules can be validated and the Dutch Tax Administration gets direct benefits from the Rule-based Approach. The challenge of this option is to transform our RuleSpeak rules in a way that can feed our existing legacy systems.
Here RegelSpraak was born. We sorted out our installed base of rules and added some extra metadata. Our rules conform to certain patterns. For each pattern a pattern description has been made. Only rules conforming to these patterns are allowed to be written — we force ourselves to do so. The characteristic of RegelSpraak is that the rules follow the patterns very strictly, but the rules are still in a natural language.
As soon as you write rules conforming to a pattern (in general), you are able to do a kind of syntax checking. Rules that are syntactically correct can be compiled. Compilable rules can be interpreted or executed. In this way we can transform our RegelSpraak rules into any format we want and feed our existing legacy systems. So, with RegelSpraak we have the availability of executable rules!!
When we picked Option Two we realized that we had to look for a compiler. A RegelSpraak compiler did not exist so we developed our own. We rewrote the patterns as formal grammars. For each domain we developed a so-called G-file (grammar file). Grammar files can be compiled into a compiler with the correct set of tools.
For the construction and development of a grammar file we use the ANTLRWorks GUI workbench. In an iterative way we develop a grammar for each of our domains. These grammars check if the RegelSpraak is syntactically correct. When each possible RegelSpraak statement can be recognized, a compiler can be generated. But because we want to use the compiler for more than syntax checking, we also write so-called production rules into certain recognition phases. As a result each RegelSpraak compiler can compile into "some" output format.
We have already built such compilers for different legacy systems. For example, the "fiscale voorcontrole" system does a check on fiscal correctness of data provided by the taxpayer. The RegelSpraak statements — which are combined in a community called "fiscal voorcontrole" — are compiled into a Microsoft-Excel Workbook having several Sheets. This spreadsheet is a control table for the existing legacy.
Another example is the "rekenserver" which is a calculation server. The RegelSpraak statements are compiled into technical design documents as PDF files. Here the existing legacy is served as well. Currently, the PDF files are transformed into COBOL/CICS and C-code by hand, but it is also possible to compile COBOL/CICS or C-code directly from RegelSpraak. The pre-condition and post-condition rules could be written as RegelSpraak statements as well. Developing the corresponding patterns — and from there the corresponding grammars — would give us a compiler that is able to give us both the formal construction description of the calculation server as well as the programming language code (Cobol, C, Java, Blaze)!
By doing all this we have shown that we are able to write rules in a natural language that can be read by the business. At the same time these rules are strict enough to be read by a compiler. This shows us that it is possible to transform rules in RegelSpraak into any form.
To compile the statements for a given community as a whole in bulk (batch processing), we use the ANTLR toolset in the form of Eclipse plugins in the Gallileo version of Eclipse. A grammar file is transformed into a lexical analyzer and a parser. RegelSpraak statements are recognized by the analyzer, turned into tokens, and syntax checked by the parser.
RuleXpress has a filter mechanism helping us to export an XML report file for a community, containing both the terms (and as such the metadata) and the rules. The rule itself is technically exported as a <statement> tag. This tag contains either a <text> tag for regular text (represented as black in the GUI) or a <termRef> tag (represented as blue underlined in the GUI).
However, simply knowing if some piece of text is regular text or a term reference is not enough for our compilers so we added profiling attributes to the metadata. A StatementReader Java class reads the XML file, extracting the <statement> and combining it with the metadata, resulting in a RegelSpraak statement. This statement "knows" if a term is a regular (default) term, a role, an object, a constant, a function, or an enumeration (for example).
A SpreadSheet Java class turns parsed statements into a Microsoft-Excel Spreadsheet. A RekenServer Java class turns parsed statements into PDF documents. And so on….
At this time, most IT systems are built to conform to the version/release/patchlevel approach. In practice, this means copying the latest (tagged) release and starting development from there. When working in parallel, branches are made — and merged afterward. Conflicts are solved by hand.
We want to adopt the Rule-based Approach, which would free us from versions and releases. By using a mechanism for validity of a rule — some begin date/time value and a corresponding end date/time value — one then has only to deal with deltas.
Lessons learned during our projects in 2008 told us to beware of very complex situations when working with time! The use of bi-temporal data can be an elegant solution. Bi-temporality is not an easy subject but, used correctly, it frees you from a lot of complexity. The advantage of bi-temporality is that it really is possible to travel in time. It is possible to point to a version that is valid at a certain period of time and also to a former version that (accidentally) was wrong and valid for the same period of time.
Bi-temporal databases are rare so we built one of our own, conforming to the guidelines of Richard T. Snodgrass. We create our bi-temporal tables into a MySQL relational database because (1) it is open source, (2) it supports JDBC, (3) it comes with GlassFish, and (4) we know it.
There are several approaches to building bi-temporal tables; we chose a variant with one primary key being a GUID. We had to write the framework on our own, so we did that as well. All tables have the same structure, which is an id, some bi-temporal attributes (2 dates and 2 timestamps), and an XML string. The XML string is, in fact, the representation of an object. The XML objects are processed with a Java DOM parser. Referential integrity and typed values are, in fact, rules and so (in our case) are not written in DDL (!) but instead in RegelSpraak.
To summarize, here is a list of what we needed and used in our project — our building blocks:
- A rules / terms editor: RuleXpress,
- A rule language: RegelSpraak,
- For compiler generation: ANTLR,
- For Java development and ANTLR integration: Eclipse,
- A web-server: SUN GlassFish,
- A (bi-temporal) database: MySQL,
- An editor for XML (and so on): Notepad++,
- An Eclipse dynamic class loader (here some work has to be done for bi-temporality): OSGi,
- A paradigm and discussion platform: ReguloParolo.
Looking to the Future
ReguloParolo is the Esperanto translation of the word 'RegelSpraak'.
During our pilot phase, using RuleXpress as a tool, about 100 requirements came up. A rule management tool that supports all these requirements does not exist on the market today.
As a team we always speak about ReguloParolo as the ultimate tool to help us maintain rules and terms. Among the most important features is the support of time-traveling options and the integration of history (traceability), as well as the absence of version/release/patchlevel numbers. These mechanisms are all integrated in the tool and are (of course) Rule-based.ReguloParolo could be built commercially, as an open source development, or both. We started an asset named "reguloparolo" on the http://www.osor.eu site, as a possible collaboration product.
# # #