ONE of the benefits of editing a journal about Access is that I get to have conversations with the very clever people who write articles for us. Typically, in the process I get smarter. Russell Sinclair and I had one of those discussions where I learned (once more) that I’m perfectly capable of missing the most obvious solution to a problem.
In Russell’s article in this month’s issue, he describes two of the reasons that you might end up with a Null value in a field. In his article, Russell points out that the SpousesName field in an employee record could contain a Null value. Russell identifies two reasons for Null values: the employee might not have a spouse, or, at the time the data was entered, the spouse’s name wasn’t known.
Nulls give most database developers a lot of trouble. You can’t compare Nulls to anything, for instance. If you have 31 employee records and one of them has Null in the SpouseName field, almost any Where clause that uses the SpouseName field will fail. If, for instance, you run a query that asks for all the rows with the SpouseName field beginning with letters less than “M,” equal to “M,” or greater than “M,” you’ll never find more than 30 records. The single record with Null in the SpouseName field won’t appear in any of the sets. Only if the SpouseName field is tested with the “Is Null” operator will the record be found.
Numeric fields with Nulls in them can drive developers crazy. A numeric field with Null in it won’t be less than, equal to, or greater than any other number. This seems foolish since you’d think that a Null has to be either greater than or less than zero (assuming that Null isn’t equal to zero). If you add a numeric field with Null in it to another field with an actual numeric value, the result is... Null. However, if you use the built-in function Sum to total up the rows (including the Null value), the result is... the total of all the non-Null fields. If you use the Avg function, the result is the average of all the nonNull values.
It gets worse. If you use the Count function with an asterisk (for instance, Count(*)),you get the number of rows in the table returned. If, on the other hand, you use the Count function with a field name (for example, Count(BDate)), you get the number of rows that don’t have a Null value in that field. By the way, Count(*) returns the number of rows in the table even if all the fields in the row have Nulls in them (something that shouldn’t happen because the table should have a primary key, and the fields in a primary key aren’t allowed to hold Nulls). The SQL guru C. J. Date once commented that when discussing Null you can either be consistent or logical, but not both.
I restrict Null to occasions when I know that a value exists but don’t know what it is. If you use Null exclusively to mean “value exists but is currently unknown,” then much of the behavior that I’ve described makes sense. When you add two numbers together and one of them is Null, the result (of course) is Null. After all, if you add two numbers together and the value of one of them is unknown, the value of the addition will also be unknown. Since you don’t know the value of the number, you don’t know whether it’s less than, greater than, or equal to zero. The rule forbidding Nulls in primary keys also becomes clear if Nulls represent “value exists but is currently unknown.” Since the primary key is supposed to provide a unique identifier for each row, the values in the primary key must be known. If the values in the primary key fields aren’t known, they might match the values for some existing row and no longer be unique.
Unfortunately, the result of the various summary functions (Sum, Average, Count) still don’t make any sense, as they seem to just arbitrarily ignore the rows with Null values in the relevant fields.
So now I was left with a question: How do you handle Russell’s other case, the employee who doesn’t have a spouse? I puzzled over this and decided that I’d have to put a zero-length string in the field—a dumb answer. I was smart enough to go on to consider what to do if the spouse’s age was also recorded in the employee record. With numerics, there’s no equivalent to the zerolength string. I thought the case was interesting enough to send my ruminations to Russell.
Fortunately, Russell was smarter than I. I got an e-mail back pointing out that the spousal information really belonged in a separate table. With the spouse’s name and age in another table, an employee who didn’t have a spouse wouldn’t have a record in that table. Where the name of the spouse (or the age) wasn’t known, the appropriate field could be set to Null (which meant that neither field could be part of the primary key). In fact, as Russell pointed out, this design would even support those cultures where an employee could have multiple spouses. So, while I started with the wrong idea, getting to the right design not only solved my problem; it opened up new possibilities as well.
Other Pages on the Site You May Like to Read