Sunday, October 17, 2010

The Serial Comma

This morning I was wandering through Wikipedia when I stumbled on "The Oxford Comma". I looked it up, and realized I had read about it before - the serial comma: the final comma that is followed by an and or an or, which is controversial in whether it should be used. In reading the arguments for and against, I realized that one argument for the usage was left out, and that much of the ambiguity in using it or not using it seem to be related to inconsistent noun usage. Let me explain.

The argument for the serial comma that I would like to add is for those people who convert our human languages into computer languages. It is much clearer what is desired of a computer when all commas are specified, for example: XYQ report needs names, jobs and titles. Does that mean two distinct fields: name and job/title; or three distinct fields: name, job, and title. Clarification is necessary, and at that point the original communication regarding the need would have been less useful.

One of the examples of ambiguity on the Wikipedia article was the following: They went to Oregon with Betty, a maid, and a cook. To a programmer, this statement mixes different fields up. The origin of the issue is not using a serial comma, but mixing Names with Jobs. In a database, there would two fields: Names and Jobs. No one would set up a database mixing the two because it would cause much confusion and conflict (e.g., if you find someone's name but the field is already occupied by their title, where do you put the name?), and language should follow the clarity of such a database structure.

That is to say that the original statement should use all names, all jobs, or both. For example: They went to Oregon with Betty, a nurse; Jenny, a maid; and Jim, a cook. Or: They went to Oregon with Betty, a maid, and Jim, a cook. Or: They went to Oregon with Betty, who is both the maid and cook. These statements are clearly about three people, two people, and one person, respectively. The original statement was confusing because of the mix of descriptions: Names and Jobs.

Back to the first part: In actual code, it is necessary to separate items in a list with commas every time. (Of course, there usually isn't an and separating items...) I wrote a query in SAS using spaces to separate items and then passed it to Oracle, but Oracle generated an error. Oracle required the commas, whereas SAS implies the separation of items by spaces alone. For example: I need the names of people who do the following jobs: journalist, lawyer, and accountant; so I would write the following criteria (let's ignore the possible issue of case-sensitivity):

where job in ('Journalist', 'Lawyer', 'Accountant')

Each item requires the comma separator. Now had the report required the names of people who eat eggs, toast and jam, I would be confused. Would that be 'eggs', 'toast and jam' or 'eggs', 'toast', 'jam'? A better way to say it, whichever way was meant would be: eggs, toast, and jam; or eggs, and toast with jam. I would have to figure out whether the database recorded 'toast with jam' or 'toast and jam' or some other variant ('jammed toast'?), but either revision would be much clearer.

In the end, if the serial comma creates confusion, revise the statement altogether; otherwise, use the serial comma!

1 comment:

  1. Here's an excellent example of the need for the serial comma:

    "...the following terms are used interchangeably: table and data set, record, row, and observation, field and variable, and key and by variable."

    From "When Your PROC SQL and Oracle Joins Are Crushed by Data Monsters" (2007) by Frederick Houliang Li, MD.

    There's no better way to complete that list without using a semicolon.

    ReplyDelete

Note: Only a member of this blog may post a comment.