Collecting Knowledge for Rule-Based Applications
Collecting Knowledge for Rule-Based Applications
By Ernest J. Friedman-Hill
Imagine a different way to program in which you specify rules and facts instead of the usual linear set of instructions. That's the idea behind rule-based programming. A rule engine automatically decides how to apply the rules to your facts, and hands you the result. Rule-based systems are growing in popularity. Rule engines are ubiquitous in the enterprise, and rule-based systems control everything from web sites to control factories.
The first step in developing any rule-based system is collecting the knowledge the system will embody, and in this article - an excerpt from the new Manning book Jess in Action - you'll learn how this is done.
The Tax Forms Advisor
Imagine that you're developing a simple rule-based application that recommends United States income tax forms. The application asks the user a series of questions and, based on the answers, tells the user which paper Internal Revenue Service forms she will likely need. You will populate the application with enough data to make it realistic, although you won't try to make it exhaustive. Your application might be used in an information kiosk at the post office. In this article, we'll use this application for our examples.
Introduction to Knowledge Engineering
Every rule-based system is concerned with some subset of all the world's collected knowledge. This subset is called the domain of the system. The process of collecting information about a domain for use in a rule-based system is called knowledge engineering, and people who do this for a living are called knowledge engineers. On small projects, the programmers themselves might do all the knowledge engineering, whereas very large projects might include a team of dedicated knowledge engineers.
Professional knowledge engineers may have degrees in a range of disciplines: obvious ones like computer science or psychology, and domain-related ones like physics, chemistry, or mathematics. Obviously, it helps if the knowledge engineer knows a lot about rule-based systems, although she doesn't have to be a programmer.
A good knowledge engineer has to be a jack of all trades, because knowledge engineering is really just learning-the knowledge engineer must learn a lot about the domain in which the proposed system will operate. A knowledge engineer doesn't need to become an expert, although that sometimes happens. But the knowledge engineer does have to learn something about the topic. In general, this information will include:
The knowledge engineer can use many potential sources of information to research these points. Broadly, though, there are two: interviews and desk research. In the rest of this section, we'll look at techniques for mining each of these information sources to gather the four categories of information we just listed.
Where Do You Start?
When you're starting on a new knowledge engineering endeavor, it can be difficult to decide what to do first. Knowledge engineering is an iterative process. You usually can't make a road map in advance; instead you feel your way along, adjusting your course as you go. As the saying goes, though, a journey of a thousand miles begins with a single step, and taking that first step can be hard.
With most projects, you should first talk to the customers-the people who are paying you to write the system. Find out what their needs are and what resources they can make available. This isn't knowledge engineering per se, but requirements engineering-part of planning any software project. But the customer might point you to particular sources of technical information and help you plan your approach to knowledge engineering. After talking to the customers, you should have a rough idea of what the system should do and how long development is expected to take.
Next, it's best to seek out general resources you can use to learn about the fundamentals of the domain and do a bit of self-study. Being at least vaguely familiar with the jargon and fundamental concepts in the domain will let you avoid wasting the time of people you interview later. You should learn enough about the fundamentals to have a rough idea of what kinds of knowledge the system will need to have.
Once you've developed an understanding of the basics, you're ready to begin the iterative process. Based on your initial research, write down a list of questions about the domain, which, if answered, would provide knowledge in the areas you previously identified. Seek out a cooperative subject-matter expert, briefly explain the project to him, and ask him the questions (often the customer will provide the expert; otherwise they should pay the expert a consulting fee to work with you). Usually the answers will lead to more questions.
After the initial interview, you can try to organize the information you've gathered into some kind of structure-perhaps a written outline or a flow chart. As you do this, you can begin to look for what might turn out to be individual rules. For the income tax forms advisor, an individual rule you might encounter early in the process would be:
filing status is "single" and
user made less than $50000
recommend the user file Form 1040EZ
Buy a stack of white index cards and write each potential rule on one side of an individual card. Use pencil so you can make changes easily. The cards are useful because they let you group the rules according to function, required inputs, or other criteria. When you have a stack of 100 cards or more, the utility becomes obvious. You can use the reverse sides of the cards to record issues regarding each rule. This stack of cards might be the final product of knowledge engineering, or the cards' contents might be turned into a report. The cards themselves are often the most useful format, though.
After organizing the new knowledge on index cards, you may see obvious gaps that require additional information. Develop a new set of interview questions and meet with the expert again. The appropriate number of iterations depends on the complexity of the system.
Knowledge engineering doesn't necessarily end when development begins. After an initial version of a system is available, the expert should try it out as a user and offer advice to correct its performance. If possible, a prototype of the system should be presented to the expert at every interview-except perhaps the first one.
Likewise, development needn't be deferred until knowledge engineering is complete. For many small projects, the knowledge engineer is one of the developers, and in this case you may be able to dispense with the cards and simply encode the knowledge you collect directly into a prototype system.
People are the best source of information about the requirements for a system. Many projects have requirements documents: written descriptions of how a proposed system should behave. Despite the best intentions, such documents rarely capture the expectations for a system in enough detail to allow the system to be implemented. Often, you can get the missing details only by talking to stakeholders: the customers and potential users of the system.
People can also direct you to books, web sites, and other people who will help you learn about the problem domain. These days it's common to suffer from information overload when you try to research a topic-there are so many conflicting resources available that it's hard to know what information to believe. The stakeholders in the system can tell you which resources they trust and which ones they don't.
If you find conflicting information among otherwise trustworthy references
during your research, or hear conflicting statements during interviews, don't
be afraid to ask for clarification. You'll need a strategy for resolving conflicts
that hinge on matters of
opinion. Sometimes you can do this by picking a specific person as the ultimate arbiter. Other times, especially on larger projects, it's appropriate to hold meetings to get the stakeholders to make decisions in a group setting.
Created: March 27, 2003
Revised: August 6, 2003