Navigation:  Articles > Oct-1998 >

Case Study: Configuring Data

Previous pageReturn to chapter overviewNext page

199810_pv1 Peter Vogel          
 
Sometimes the way that data is stored makes it difficult to realize that a SQL solution is possible. Here's a case where using SQL decreased the size of the program and cut its run time by 12,000 percent.
 
As I discussed in last month's column ("Case Study: Researching Data"), I've found that most Access developers seem to know all the SQL that they need. What often seems to happen, though, is that developers don't recognize that a SQL solution is possible and don't take advantage of SQL's power. In this case study, I'm going to look at a manufacturing company's order-taking system. The application used 500 lines of code to process one part of the order. By using SQL, I was able to replace those 500 lines with just three lines of code and cut the run time of the routine from 60 seconds to five. While this article will focus on how I solved this particular problem, the solution is, in fact, typical of a wide variety of data-processing situations.
 
The company I was working with made heavy equipment that's used in road construction: road graders, pavers, compacters, and the like. Basically, they made all the equipment you creep past when traffic is held up because the road is being repaired. The company's order entry system allowed customers to select which options they wanted on their equipment. The options available ranged from the kind of engine on the equipment to the size of the cab that the driver would sit in.
 
The routine in the order entry system that was giving everyone grief was known as the Configurator. The Configurator took a list of sales options and converted it into a list of parts. In other words, after the customer said, "I want the low-profile cab with air conditioning and the CD player," the Configurator would figure out that, in order to build that cab, the company would need parts 5321, 6X8YZ, and so on. Since the typical piece of equipment had over 13,000 parts, this wasn't a trivial task.
 
You might feel that this problem is a rather peculiar situation and not one that you're likely to run across in your career. However, what I'm actually trying to do is find which parts have a set of criteria that match a set of sales options. This very common problem might be expressed as "Find me all of these items that match all of these other criteria." Other examples are "Find me all of the customers that match these sales criteria," "Classify all of our vendors according to these different performance classifications," "Assign these classrooms to these classes based on subject and class size," and so on. More importantly, handling this problem involves understanding how SQL works in a fundamental way.
 
Rules, rules, rules
The data that the Configurator worked with was stored in the most poorly designed table I think I've ever seen. This table contained information on parts, sales options, prices, weights of parts, and just about everything else. The first thing I did was apply the rules of normalization to break the table down into five separate tables (see "The Trouble With Normal" and "Getting Normal" in the January and February 1998 issues of Smart Access).
 
One of the tables that resulted from this process is shown in Table 1. This is the Configurator's rules table, and it defines the rules that control when a part is to be added to an order. The first column in the table gives the part name, while the second column gives the rules under which the part is to be used. A rule consists of a series of sales options that must be present (or absent) in order for a part to be used to build the piece of equipment.
 
As an example, look at the first row in the table, which describes when part 8678Z is to be used. In the peculiar notation that the developers created for writing the rules, the row says that the part is to be used when the customer selects sales options 2-36 and 2-40 without option 2-36A, or when the customer selects options 3-51 and 3-52. In the rule (2-36&2-40~2-36A)|(3-51&3-52), the special characters "&", "~", and "|" have these meanings:

& means "and" -- The customer must have ordered all of these options to get this part.
~ means "and not" -- The customer must not have ordered this option to get this part.
| means "or" -- The customer would get the part if they ordered any one of these parts.

 
Table 1. The tblParts table.

Part

Rule

8678Z

(2-36&2-40~2-36A)|(3-51&3-52)

70821

(2-40)

KJB40C

1-10|1-20|1-50

 
 
The problem in this case study was the way that the rules were stored. By placing the rules in a string with a set of special characters, the original developers were forced to write several routines to parse out the rule and then to check whether the sales options were actually being ordered. But that was only one of the problems that this data structure created. The code had to process every row in the table, as there was no way to tell which parts had rules that would cause them to be required by a particular sales order. There was also no way to search the table. Unless you did a wildcard search using the Like operator, you couldn't create a query that said "which parts are used by sales option 2-36" or "which sales options are affected by part 70821." And you certainly couldn't join the rules table to a table of sales options using the data buried in the rules.
 
Solutions, solutions, solutions
The first step in solving this problem was to restructure the data so that the multiple sales options jammed together in a rule were broken out into separate rows. This is really an example of the first rule of normalization, which states that you can only store one data value in a field. The Rule field really stored several pieces of data in one Text field.
 
For part 8678Z, there were two separate rules stored in the table: the rule 2-36&2-40~2-36A and the rule 3-51&3-52. I broke these rules up and assigned them numbers (in this case, they became rule 0001 and rule 0002). The new tblParts table just lists each part number along with the rules that they were tied to (see Table 2).
 
Table 2. The new tblParts.

Part

Rule number

8678Z

0001

8678Z

0002

70821

0003

KJB40C

0004

KJB40C

0005

KJB40C

0006

 
 
While tblParts no longer stored multiple values in a single field, the rules themselves still contained multiple sales options. I created a tblRules table to hold the rules by breaking up the rules into their individual components. In this table, each sales option that made up the rule went into a separate row. I used the rule number to tie the components for one rule together. I also added a field called "Negative" to indicate whether the option was part of an "and not." The result is shown in Table 3.
 
Table 3. The tblRules table.

Rule number

Sales option

Negative

0001

2-36

No

0001

2-40

No

0001

2-36A

Yes

0002

3-51

No

0002

3-52

No

0003

2-40

No

0004

1-10

No

0005

1-20

No

0006

1-50

No

 
 
The last table I need to introduce is the tblSalesDetails table. This table listed, for every sales order, the sales options that had been selected for it (see Table 4).
 
Table 4. The tblSalesDetails table.

Sales order number

Sales option

18221

2-36

18221

2-40

18221

1-10

 
 
Given these tables -- now correctly designed according to the rules of normalization -- I was ready to generate the SQL statements that would perform the equivalent of the VBA code. The only problem was that, while I was sure a SQL solution existed, I wasn't sure what it was.
 
SQL processing
The first thing I tried was to generate a SQL statement that joined the tblPart, tblRules, and tblSalesDetail together. Initially, I didn't even try to handle the components with negative options. I figured that if I could determine the SQL for the solution where the sales options were present, then I could work out the solution where the parts didn't exist later. My first SQL statement looked like this (it's called qryRule1 in the sample database):
 

SELECT tblPart.Part, tblRule.RuleNumber

FROM (tblSalesDetail INNER JOIN tblRules

 ON tblSalesDetail.SalesOption =

        tblRules.SalesOption)

 INNER JOIN tblPart

   ON tblRules.RuleNumber = tblPart.RuleNumber

WHERE tblRules.Negative=No;

 
 
With the sample data that I've shown, the results of running this query appear in Table 5. While very interesting, this query didn't really seem to solve my problem. That is, until I realized that I had one entry in the table for every component of a rule whose sales option was present in the order. I thought of these as satisfied components. If I could determine the number of components that made up a rule, then I could determine which rules had all of their components met. All I would need to do is compare the number of satisfied components in a rule with the total number of components in a rule. If the two numbers matched, the part that had that rule should be added to the order.
 
Table 5. The results of the first query.

Part

Rule number

8678Z

0001

8678Z

0001

70821

0003

KJB40C

0004

 
 
Without really realizing it, I'd stumbled upon the secret of developing solutions in SQL. SQL is based on the branch of mathematics known as set theory. SQL is always working with sets of records. Sometimes the set has only one record, sometimes it has none -- but it's always a set. What I was doing was creating a set of satisfied rule components and comparing it to the set of components in the rule. And the simplest comparison you can make between sets is whether or not they have the same number of members. Once I realized this, the SQL solution was easy.
 
To get to the SQL solution, the first thing I needed to do was enhance the query so that it would handle those negative sales options. To do that, I turned the join between tblSalesDetail and tblRules into an outer join. This meant that all of the components of a rule would be forced into the result even if they didn't have a matching sales option in the tblSalesDetail table. In the result of this query, I was interested in two kinds of rows:

the rows where the Negative field was No and there was a sales option
the rows where the Negative field held a Yes and there wasn't a sales option

 
The resulting query looks like this (and its results are shown in Table 6):
 

SELECT tblPart.Part, tblRules.RuleNumber

FROM (tblSalesDetail RIGHT JOIN tblRules

 ON tblSalesDetail.SalesOption =

        tblRules.SalesOption)

 INNER JOIN tblPart

   ON tblRules.RuleNumber = tblPart.RuleNumber

WHERE (tblSalesDetail.SalesOrderNumber Is Not Null

     AND (tblRules.Negative=No)

  OR (tblSalesDetail.SalesOrderNumber Is Null)

     AND tblRules.Negative=Yes);

 
 
Table 6. The results of the second query.

Part

Rule number

8678Z

0001

8678Z

0001

8678Z

0001

70821

0003

KJB40C

0004

 
 
The query is probably easier to understand if you look at it in the query design window, as shown in Figure 1. Since I was only interested in the count of rules, I converted the query to a Totals query that returned the number of components satisfied for each rule (and called it qryCountComponents):

199810_pv1
 

SELECT tblPart.Part, tblRules.RuleNumber,

  Count(tblPart.Part) AS CountOfPart

FROM (tblSalesDetail RIGHT JOIN tblRules

   ON tblSalesDetail.SalesOption =

               tblRules.SalesOption)

 INNER JOIN tblPart ON tblRules.RuleNumber =

                tblPart.RuleNumber

WHERE (tblSalesDetail.SalesOrderNumber Is Not Null

  AND tblRules.Negative=No)

OR (tblSalesDetail.SalesOrderNumber Is Null

  AND tblRules.Negative=Yes)

GROUP BY tblPart.Part, tblRules.RuleNumber;

 
 
The query that counts the components for each rule is simplicity itself and is called qryCountRules in the sample database:
 

Select RuleNumber, Count(RuleNumber)

From tblRules

 
 
I now had the two parts of my solutions: a count of the number of satisfied components for each part's rule and a count of the number of components in each rule. The query that combines these two queries and selects the parts is relatively straightforward. The query is called qryConfigure and matches parts from qryCountRules and qryCountComponents to select the rows where the number of satisfied components matches the number of components in a rule. I converted the query into a Make Table query so that it generated a table of parts:
 

SELECT qryCountComponents.Part

INTO tblParts

FROM qryCountRules INNER JOIN qryCountComponents

 ON qryCountRules.RuleNumber =

        qryCountComponents.RuleNumber

WHERE qryCountRules.CountOfRuleNumber =

       qryCountComponents.CountOfRuleNumber

 
 
With my solution in hand, I returned to the program, removed the original code, and added the code to run the query. The code that replaced the 500 lines of original code (and ran in less than five seconds) was:
 

Dim dbs As Database

Set dbs = CurrentDB()

dbs.querydefs("qryConfigure").Execute

 
 
Structure, structure, structure
While the SQL in this article is probably interesting, the real key to resolving this problem was restructuring the tables and recognizing the set nature of the solution. The key to restructuring the tables was recognizing that the original format for the rules violated the rules of normalization. Once the data was restructured, it was possible to replace complicated VBA code with a shorter, faster, and easier-to-maintain SQL solution. In addition, restructuring the table allowed the data in it to be used for better analysis and control.
 
When I originally restructured the tables, I was confident that cleaning up the data would lead to a better solution. I'll admit that I wasn't clear what that solution was, but I knew that getting the data design right was the first step. In the end, it produced a solution that not only gave users a faster program, it also gave them more insight into the data stored in their application.
 
Read about the download WSQLCS2.ZIP on this page
 
 
 
 
Sidebar: Foundation Concepts
SQL is the standard language used to retrieve and update data in relational databases like Access. In Access, when you create a query using the Access Query window, a SQL statement is prepared behind the scenes and used to execute your query. By selecting SQL view when building a query, you can view the SQL statement for a query or enter a SQL statement from scratch.The normalization rules for SQL tables describe a set of criteria that a well-designed relational table should meet.