Monday, December 5, 2011

A SAS Macro to Convert Text to a Persistent ID

I was working on a project in which we usually carried forward a code and a description for every category to make it easier on coders to select a desired category using an ID rather than the long and sometimes-changing description. I found a particular category in which no ID existed, and there were no similar categories that I could use as the basis for a new ID. So what could I do?

In a previous position, I had the same problem crop up, and I solved the problem by doing the following:
  1. Extract one character at a time from the original categorical variable (e.g., 'A').
  2. Convert the character to its binary representation (e.g., '01000001').
  3. Convert the binary representation to the numeric value (e.g., 65).
  4. Sum the final value of each character (e.g., 'ABC' = 65 + 66 + 67 = 198).
Combined, these steps only take up 3 lines of code (excluding the data step code):

do _i_=1 to length(Category);
  ID + input(put(substr(Category, _i_, 1), $binary8.), binary8.);

And the process can be run as a macro program:

Note: Run the code in test mode (e.g., %TextID(data, var, test=Y) ) to see each step broken out into separate columns.

The result is a unique ID that will never change for the particular configuration of characters. Keep in mind, however, that the ID is not only character-sensitive, but case sensitive, so "Category" is not the same as "category". To avoid this problem, force the category into upper- or lower-case using the UPPER or LOWER functions.

Using this process, we were able to add a constant, unchanging ID for each category. The ID will not vary depending on the number of categories, the time at which the code is run, or any other variable (other than user-error, if the categories are typed by hand!).

If anyone uses this process, please let me know what you used it for! Oh, and it would be nice if you gave me credit, too!