Monday, December 5, 2011

A SAS Macro to Convert Text to a Persistent ID

I was working on a project in which we usually carried forward a code and a description for every category to make it easier on coders to select a desired category using an ID rather than the long and sometimes-changing description. I found a particular category in which no ID existed, and there were no similar categories that I could use as the basis for a new ID. So what could I do?

In a previous position, I had the same problem crop up, and I solved the problem by doing the following:
  1. Extract one character at a time from the original categorical variable (e.g., 'A').
  2. Convert the character to its binary representation (e.g., '01000001').
  3. Convert the binary representation to the numeric value (e.g., 65).
  4. Sum the final value of each character (e.g., 'ABC' = 65 + 66 + 67 = 198).
Combined, these steps only take up 3 lines of code (excluding the data step code):

do _i_=1 to length(Category);
  ID + input(put(substr(Category, _i_, 1), $binary8.), binary8.);

And the process can be run as a macro program:

Note: Run the code in test mode (e.g., %TextID(data, var, test=Y) ) to see each step broken out into separate columns.

The result is a unique ID that will never change for the particular configuration of characters. Keep in mind, however, that the ID is not only character-sensitive, but case sensitive, so "Category" is not the same as "category". To avoid this problem, force the category into upper- or lower-case using the UPPER or LOWER functions.

Using this process, we were able to add a constant, unchanging ID for each category. The ID will not vary depending on the number of categories, the time at which the code is run, or any other variable (other than user-error, if the categories are typed by hand!).

If anyone uses this process, please let me know what you used it for! Oh, and it would be nice if you gave me credit, too!

Saturday, November 26, 2011

64bit GNU/Linux and Webcams (Logitech Quickcam Express)

I've been trying to get  Skype to work on my home PC, which has 64-bit Debian GNU/Linux installed. I came across the following article, which, with a slight modification, worked perfectly.

Here are the instructions with the modification:
  1. As a superuser...
  2. For each 32-bit application, create a script in /usr/bin consisting of the following code.
  3. Name the code APP-camera-fix, where APP is the software to patch (e.g., skype-camera-fix).
  4. Mark the code as executable (I do this in Nautilus via the properties menu).
  5. Change each shortcut to the application to the script you just made (e.g., change the Skype shortcut to /usr/bin/skype-camera-fix).
  6. Test it out by opening Skype and testing the camera in the Options menu.

#! /bin/sh
LD_PRELOAD=/usr/lib32/libv4l/ skype

The modification is to simply add "32" to the /usr/lib path. Works perfectly!

Wednesday, November 23, 2011

Check the SAS Log from a Windows Context Menu

Have you ever wanted to check a SAS log from the Windows context menu? Well, now you can with a new script I developed using my CheckLog macro. Here's how:
  1. Download the CheckLog SAS macro and save it to a permanent location.
  2. Download the CheckLogSendTo.txt file.
  3. Open the file and replace the value of the variable CheckLog with the location of the CheckLog SAS macro.
  4. Save the file.
  5. Change the extension of the file to "vbs".
  6. Create a shortcut for the CheckLogSendTo.vbs file. Rename it as "CheckLog".
  7. Copy the shortcut to the user's SendTo files here: C:\Documents and Settings\%USERNAME%\SendTo, replacing the %USERNAME% variable with the username in question.
  8. Right click on a file or directory to check. Go to Send To > CheckLog.
  9. Select Yes, No, or Cancel when asked whether you would like to add additional arguments.
    1. Yes = Add arguments
    2. No = Use the default arguments
    3. Cancel = Cancel the check
  10. If you said Yes above, enter the additional arguments you would like to use. See the Arguments page for more detail.
Here are some screenshots of the script in action:

Note: VBScript is required, which is unfortunately only available on Microsoft Windows. I hope to develop a platform-independent script sometime. I chose VBScript because it required no additional installation or administrator requirements on the platform I current work on.

This script should save me a lot of time! Normally I would copy the directory and filename of the log into a SAS session and paste it into a CheckLog call. But now it's more automatic, and I get the results faster.

Please feel free to send me feedback about the script!

Tuesday, November 15, 2011

SAS and Ampersands

I realized I'm a SAS geek today. On my way to work, near the UW and VA hospitals, I saw a street sign that displayed "UW&VA" and I wondered, "What does &VA resolve to?"

Road sign: 'UW&VA Hospitals Parking Right Lane'
What does &VA resolve to?

So you know when you're a SAS geek when you see a sign with an ampersand immediately followed by characters, and wonder, "What does that resolve to?" Bad joke, go to your work library!

Seriously, though, this brings up a good topic: How do you resolve macro variables with special characters in the value? For example, if we have the macro variable parking_sign with the value "UW&VA", and this value is assigned automatically in a data step using the CALL SYMPUT routine or SQL using the INTO statement, neither of which would generate an error in the log regarding the value of the macro variable VA.

There are a couple of solutions to solve this and avoid an error in the log:
  1. Define the macro variable VA as "VA" (i.e., %let va=VA;). This is a quick fix that allows the process to continue by creating a value for the macro variable that the code is mistakenly trying to look up. But it does not solve the problem completely, since new strings with ampersands or even percent signs could be introduced into your source data. This process could be automated to generate multiple macro variables for each string with an ampersand found, but it would not work for macro programs (which use percent signs).
  2. Edit the source data and mask the characters after the ampersand (or percent sign) with the %STR (or related) function. This solution would work, but it would be rather awkward, and your source data would have a strange-looking value (i.e. "UW&%str(V)A"), which might not be a good solution if you have to use the data in a different way later, in which the %STR function would not execute, leaving the odd-looking raw value. It gets even more complicated when percent signs are involved.
  3. Instead of resolving the macro variable with the ampersand, resolve it with the %SUPERQ function. This function is a better solution, since it avoids any special characters problems and does not edit the source. Basically it tells SAS to resolve only the first level of the macro variable, and no further. However, it is not as easy to implement, since it requires replacing all references to the macro variable (e.g., &PARKING_SIGN) with different code (i.e., %SUPERQ(PARKING_SIGN));
  4. Use the %SUPERQ function with an additional step that redefines the PARKING_SIGN macro variable (see below). This is the best solution, since the rest of the code does not have to be re-written. The %LET statement is used to immediately redefine the macro variable using the %SUPERQ function, which will not resolve the values any further. Better yet, this is only one line of very simple code, and it fixes the problem forever!
/* Dummy data */
data test;
    input parking_sign $;
    format parking_sign $15.;
    infile datalines;

/* Generate macro variable */
data _null_;
    set test;
    call symputx('parking_sign', parking_sign);

/* Use macro variable by writing it to the log. */
/* Note: Generates a (w)arning on the first run. */

/* Solution: Redefine macro variable */
%let parking_sign=%superq(PARKING_SIGN);

/* Use macro variable by writing it to the log. */
/* Note: This time there is no (w)arning. */

This last solution is by far the simplest and least amount of work to fix the problem.

For more on macro variable quoting (the %STR and %SUPERQ functions), see the SAS website:

Saturday, November 12, 2011

A New SAS Tool for Checking the Log

SAS just released a new tool for checking the SAS log:

SAS Log Error Checking Tool

It only works on Windows, requiring the .NET framework, and it appears to be a separate executable from the usual SAS program. Further, it seems to only check for ERROR and WARNING messages.

As I've written before, I have a macro, CheckLog, that checks the log, too, and it runs on any operating system, can be used within a current interactive SAS session, and it appears to check for more messages. (See the CheckLog About page for more documentation.)

If anyone tries out SAS's tool, I'd love to hear a comparison of the capabilities!

Thursday, October 20, 2011

The Name of This Blog

Today I received the latest sascom magazine and found an article about retaining analytical talent. One of the items in the topic of finding talent, is "The skill set involves both sides of the brain". That's what I'm trying to write all about here!

Monday, October 17, 2011

Debian GNU/Linux and LibreOffice

I've upgraded my personal computer from Windows XP to Debian GNU/Linux, a modern, 64-bit operating system. The experience has been excellent, and you know you've done the right thing when you realize over half of the software you have on your old system are there just because the OS was either doing something incompetently, not at all, or if you didn't have the software it was a huge security risk.

Anyway, it's been great, but one thing has been driving me crazy. When I go to shutdown, I can't. The Gnome interface seems to just ignore me. I finally found out what the issue was, and I wanted to share it with everyone: The LibreOffice quickstarter is interfering with the ability to shutdown or logout. Disable it, and it should be fixed.

So there. Now if you have no idea what I've been talking about regarding Debian, go download a CD to try it out for free. If you've ever tried to keep a Windows OS up-to-date, you'll know how much a relief Debian GNU/Linux is. Package management - you're my hero!

Monday, September 12, 2011

SAS Passwords Suck

There, I said it. Apparently anyone else who uses SAS passwords hasn't had the guts to say that. Here's why I say so:
  • SAS data set passwords must adhere to the rules for SAS names. These rules include:
    • The first character must be a letter or the underscore (1)
    • All other characters can be letters, numbers, or the underscore (1)
    • The length cannot exceed 8 characters (2)
  • Using PROC SQL, programmers can enter longer passwords than are acceptable, without any notification (unlike in the DATA step), and these passwords become truncated. This lets the programmer think they are applying good passwords. Worse, if someone else has to use the password other than the programmer who created it, they will find it is broken and will not be able to continue with their work.
  • There is an option to encode passwords, but as the SAS documentation notes (3): "With encoding, one character set is translated to another character set through some form of table lookup. An encoded password is intended to prevent casual, non-malicious viewing of passwords. You should not depend on encoded passwords for all your data security needs; a determined and knowledgeable attacker can decode the encoded passwords." So I'm not quite sure what the point is of such an option, since it's easily hacked.
  • SAS data set passwords are typically stored as plain text within SAS code. For anyone who wants to get the password, find the source code. Usually the directory is not secured in the OS.
  • If the password is incorrect, a pop-up appears, which can be turned off (4). So a hacker could turn off the option and keep cycling through passwords until the correct one is reached.
  • Requiring passwords on data sets implies that other, standard systems of securing and backing up data are not trusted; furthermore, it conveys a level of distrust for the SAS programmers who are using the data sets. Other than human error, what is a well-meaning SAS programmer going to do with the data?
Someone out there might say, why not use data set generations (5)? (Generations are basically automatic copies of SAS data sets that have a limit on the number of copies.) There are several reasons against this:
  1. It's a waste of disk space, especially with large data sets, and especially if you have a good backup system.
  2. The generations are eventually deleted. Once an item reaches the maximum number, it's deleted. If that data set was important, why would you want to delete it, ever?
  3. It's confusing. There are no explicit explanation of what a generation may mean. It could be that one value was off in one generation so the data had to be re-run. It could be the whole data set was the wrong one. Why would you want to keep a history of that? It's simply garbage.
So what's a SAS administrator to do? First, use the operating system to manage rights by folder access. If only certain people should access the SAS data sets and code, lock the files for only those users. Second, implement good OS password policies. Those passwords can be by far better than SAS's! Third, back up hard drives automatically. Fourth, set a coding standard for making backups/copies in SAS before overwriting important data sets, which simply makes it easier to restore from a bad run (instead of finding the actual backup). (The DATASETS procedure is great for bulk copying and modifying SAS data sets.) Fifth, trust your programmers. If you can't trust your programmers, why are they working for you? Why do you give them important work? Do you really want to convey that to them? What does it do for their morale and self worth?

I hope by posting this SAS realizes that their password mechanism is essentially broken and useless. Even a novice hacker could figure out how to break into such a system. Other options in the system don't really help, which leads back to the OS itself. And the OS probably has much better tools for managing access and passwords that SAS does.


Sunday, August 7, 2011

Writing a Portable SAS Macro

One aspect of writing macros for SAS is to try to make them portable. There are two aspects of this: First, writing the process in a manner that is generalizable so that there's a need for the macro to be portable; second, writing the program so it works on multiple platforms of SAS and operating systems (OS).

The first task I've always seemed to excel at. With almost 90 general-use macros hosted on my site, I have a macro for many useful, reusable processes. I've written code with these macros where most of the code is simply calls to these macros. They save a lot of time and repetitious code.

The second task, however, has always been a bit elusive, especially since I am unable to run my macros on different OSs. A SAS license would be required on each OS to do so, even if I had UNIX, z/OS, OpenVMS, and Windows installed where I could access them. This creates some significant problems with this last task. As a result, I have come up with a few strategies to get around the issue.

The first and most powerful is to re-write any OS-specific function or process in the native SAS language. This can be tricky since a simple process like obtaining a list of files in a directory can be a simple, one-line piece of code for the OS, but several steps (and sometimes more limited functionality) for SAS.

Another step is to simply modify the macro to run several different versions of the same thing depending on the OS. SAS has two automatic macro variables that define the operating system, SYSSCP and SYSSCPL, that can be used to run conditional code that will work in the OS. This is still tricky, since I have no way to test the code to see if it works. It helps to review help documentation, examples online, and forums for those OSs, but I'm still never 100% certain that it will work as expected.

For this last part, I rely on the users out there. I received several rounds of feedback recently on my CheckLog macro, which reviews the SAS log, an external log (even non-SAS), and an entire directory of logs, with several additional features to tweak how it runs and how it notifies the user about issues. (See the documentation for more detail.) These users were able to provide specific feedback about what didn't work, and they were able to test new versions on their system quickly. Through this process, I was able to modify the macro significantly, and now it runs on all SAS-supported OSs (UNIX, z/OS, OpenVMS, and Windows). And the users have a better utility that fits their needs with insignificant cost.

Using this model for developing macros, it could be possible to add to the standard SAS library with new and very useful processes. So if you're a SAS user, join in and contribute to the community by publishing general-use macros or by testing and providing feedback to others who have done so!

Tuesday, July 12, 2011

News and Advice

Last week, I started a new position with the Health Innovation Program in the School of Medicine and Public Health at the University of Wisconsin-Madison. In this position, I hope to be more involved in research that could potentially affect patient care. Ever since my days doing academic and educational research, I have missed being involved in the process.

Along with this transition, it appears I've started to become a help desk of sorts for some SAS. I received two sets of email from different users of my macros, and I thought I would start sharing those questions and answers here:

Question 1:

Is there a way to limit the number of observations read by PROC IMPORT? I tried using OPTIONS OBS=2 before proc import and it worked but still the reduction in time was not significant.

Answer 1:

It depends on the file, so you'd have to look at the IMPORT procedure documentation depending on what you're importing to see if that feature is available.

Where did you put the option? It must have been just above the output. If so, then yes, it wouldn't decrease the time, since SAS is still pulling in all of the records. Depending on the location of the option, SAS may only keep the number of records on its way out of the import process. Either way, SAS may need to read the entire file, which would reduce time.

What if you tried just pulling in the header record, and delimiting them yourself? I'm guessing you're looking at comma-delimited, space-delimited, or tab-delimited data, and you should be able to figure out how to read the data in as one record, then split the variable names into multiple variables or even records. Then you could see if there are any new records.

Question 2:

Kindly help me to create the folders using SAS codes in local drive or on server without using X-command.

Answer 2:

You can use either CALL SYSTEM within a data step or %SYSEXEC. You'll want to tinker with the XSYNC and XWAIT options as you see fit. Here are two examples:

data _null_;
    call system('md c:\TestDir');


%sysexec(md c:\TestDir\);

Monday, June 13, 2011

Unusual SAS Error Message

Today I encountered the following error messages in SAS after creating a table using SQL based on a SASHelp table:

ERROR: Floating Point Zero Divide.
ERROR: Termination due to Floating Point Exception

This strange set of errors is not well documented and, as I eventually found out, has nothing to do with my original query:

proc sql;
create table _test_ as
select * from sashelp.vtable
where upcase(libname)="SPECIFIC_LIBNAME"

Where "SPECIFIC_LIBNAME" was a macro variable containing the libname I wanted information about.

The solution to this problem was to clear a particular library that consisted only of views. This library's members were generated prior to the above step, and may have somehow contributed to the problem within the SASHelp metadata. In general, it may be a good practice to review the libraries and their sizes when encountering problems with the SASHelp library, which contains a great deal of metadata on SAS libraries.

For this set of errors, however, this solution will probably not always work. It would appear that the problem is related to extensions of SAS that are dependent on external sources. In this case, it may have been the SQL procedure. This does seem strange, since SQL is a widely-used standard. Clearing the previously-created library may have cleared up an exception caused by a large amount of data in the SASHelp table that was not accounted for in the manner in which SAS implemented the SQL standard or the proprietary extensions of it.

If you encounter this error, please let me know! Especially if you know how to fix it!

Articles and Discussions Regarding these Errors:

Sunday, May 8, 2011

What About Gas Prices?

Recently there's been a lot of hubbub about gas prices. While high, gas prices are not as high as many other things people buy, especially some things people buy on a daily basis, like coffee. For my family, I can demonstrate that the cost of gas is not as big a concern and not as big a budget item. Using, my credit and debit cards, my wife's credit and debit cards, and many other financial accounts are linked and the transactions pooled together and categorized. Using this tool, I graphed the entire year of spending in 2010 in seconds:

Note: "Everything Else" includes Gifts and Donations, Health and Fitness, Uncategorized (cash and some checks), Travel, Business Services, Personal Care, Pets, Entertainment, and Fees and Charges. I also separated "Gas and Fuel" from "Auto and Transport" to demonstrate my point. What's left in in "Auto and Transport" includes Parking, Services and Parts, Auto Insurance, and "Other" (state transportation fees). Restaurants, coffee shops, and alcohol were included in Food and Dining, but I split those out into "Restaurants, etc.".

Our family probably does not spend like other families. We're generally fiscally conservative. We do like to buy good, healthy, and (when we can) local food, so our food prices are high. We don't splurge on electronics, entertainment, fancy phone or cable services, or personal care. We also minimize driving, which keeps auto costs lower. Both our drives to work are within 15 minutes. On the other hand we have a lot of educational debt. Even so, that's still a relatively small percentage of expenses. It should be no surprise that the top of the list is housing. That's probably true for just about anyone at any income level.

So what concerns me more? The cost of fuel or something else? Even as the cost of gas has risen, I'm not as concerned as I am about the cost of food, which is partly related to the cost of gas. However, there are more insidious factors in the cost of food: Wall Street.

As quoted by BoingBoing author Mark Frauenfelder, Frederick Kaufman wrote, in an article titled How Goldman Sachs Created the Food Crisis:
[T]he boom in new speculative opportunities in global grain, edible oil, and livestock markets has created a vicious cycle. The more the price of food commodities increases, the more money pours into the sector, and the higher prices rise. Indeed, from 2003 to 2008, the volume of index fund speculation increased by 1,900 percent. "What we are experiencing is a demand shock coming from a new category of participant in the commodities futures markets," hedge fund Michael Masters testified before Congress in the midst of the 2008 food crisis.

The result of Wall Street's venture into grain and feed and livestock has been a shock to the global food production and delivery system. Not only does the world's food supply have to contend with constricted supply and increased demand for real grain, but investment bankers have engineered an artificial upward pull on the price of grain futures. The result: Imaginary wheat dominates the price of real wheat, as speculators (traditionally one-fifth of the market) now outnumber bona-fide hedgers four-to-one.

Today, bankers and traders sit at the top of the food chain -- the carnivores of the system, devouring everyone and everything below. Near the bottom toils the farmer. For him, the rising price of grain should have been a windfall, but speculation has also created spikes in everything the farmer must buy to grow his grain -- from seed to fertilizer to diesel fuel. At the very bottom lies the consumer. The average American, who spends roughly 8 to 12 percent of her weekly paycheck on food, did not immediately feel the crunch of rising costs. But for the roughly 2-billion people across the world who spend more than 50 percent of their income on food, the effects have been staggering: 250 million people joined the ranks of the hungry in 2008, bringing the total of the world's "food insecure" to a peak of 1 billion -- a number never seen before.

As you can see in my spending graph, "Food and Dining" consumes 13% of our budget, higher than the quoted 8-12%. This category originally included restaurants, coffee shops, and alcohol, which I moved into "Restaurants, etc.", and was at 18.5% of spending. With these items removed, it seems much more reasonable, but still high. About 4 times higher than gas prices!

Given that our expenditure is already higher than the estimate in the article, and that food prices are likely to increase due to this investment scheme as well as increased transportation costs due to gas prices, I am much more concerned about the cost of food. Occasionally I joke that my son eats so much I might as well buy a farm to feed him, and that's not that far from the truth. We do partake in local farmer's markets and a community farm, but even so, those prices will probably be affected too.

So take a look at your own expenses and critically examine what's really costly. And once you figure that out, take action to reduce the costs however you can. For me, I'll chose to vote for politicians who push for financial reform and continue to buy as much local, organic food as I can.

Friday, April 29, 2011

Sound Advice for LinkedIn

The Job Doc has some good advice for how to behave on LinkedIn:
Avoid these 4 mistakes with LinkedIn

1) Never inviting anyone.

Hmm, this one's tempting. No one likes rejection; it forces you to relive high school. ... Last week, I invited 20 people to LinkedIn. Most accepted, a few ignored me. Nothing terrible happened.

2) Bad timing.

... Timing matters. Are your LinkedIn invitations too late, or too early? "Don't invite within two hours of meeting," says Rod Hughes, Director of Communications, Oxford Communications. "I typically wait till the next day," advises Rod. "Anything sooner seems stalker-esque."

3) Inviting everyone.

... "You need a policy," says Thom Singer, author of several networking books. "My policy," says Thom, "is The Coffee, Meal or Beer Rule, which means not accepting links unless I've had a real conversation."

4) Bad invitation.

At LinkedIn, the default invite is, "I'd like to add you to my professional network." But that's robotic. ... Tip: For better results, deliver a better invitation.

How do you go about inviting people into your network? Could you improve the way you go about it? I know I've used the default invitation far too often: usually because I'm in a hurry. Even a more personal default would be better than the LinkedIn default.

LinkedIn isn't Facebook either, where you can invite anyone on a whim. I've received a number of unwanted or unexpected invitations, some of which I probably should not have accepted given the "coffee, meal, or bear" rule. Those contacts I put in a "monitor" category, since I do not know these people very well and I'm not sure I want the association. Should their behavior be questionable, I remove the association with the contact.

What about you? How do you go about adding contacts?

Friday, April 8, 2011

Recipe Database

I've been working on a recipe database for my wife, to help her plan meals by being able to easily peruse recipes (especially favorite recipes) and associate those recipes with days and meals. I also wanted to be able to easily add recipes, ingredients, and grocery items. Here is the most recent relationship view of the database:

Each recipe is associated with a list of directions and a list of ingredients. Since ingredients are variations on specific items, they are not distinct grocery list items. For example, one recipe may call for 2 tomatoes and another may call for 3 tomatoes, and each would be a separate ingredient, but only one grocery item with a combined total of 5 units. Grocery items also have alternatives in a self-referential link (ALT_GROC_ID), which would be appropriate for recipes with "or"s in the ingredient list or legitimate alternatives (e.g., sea salt for salt).

Then there are the weekly plans, called "Lists". I realized this term may cause some confusion between grocery lists and plans, so I may change it in the future. For now, I'm focusing on completing the functionality of the database and not improvements to the delivery. Each list is associated with a beginning and end date ("from" and "thru" on the Lists table). Each list has several detail lines in the Lists_detail table that associates days with meals for the days between the beginning and end dates, inclusively. Each meal is then related to a recipe.

Currently the database exists in a Microsoft Access database, and eventually I would like to port it over to SQLite or something similar, with HTML forms. For now, there are several Access forms for data entry, and the initial "Start" form that opens with the database provides access to many of those forms.

The top section of buttons appends new plans to the Lists table with different start dates. For a typical week, a user might start planning on Saturday for the next week Sunday - Saturday and the following week Sunday - Saturday, and would thus start two new lists using the Tomorrow and "Week from Tomorrow" buttons. The two new lists would be associated with today's date, which is displayed in the upper right hand corner.

Once the new lists are initiated, the user can open them by using the "Today's Plans" button in the "Open..." section. The user can also open recipes for viewing and the grocery list based on today's plans. Eventually I'm going to switch the grocery list from a view-only report to a form based on an append query, which would keep track of the lists through time and build an inventory of the kitchen. (This is where the confusion with "lists" would come in to play.) Also from the start form, the user can also add or edit recipes, add or edit grocery items, or delete plans.

The main function of the database is to help the user plan for a week or two of meals by associating days and meals with recipes and ultimately generating a list of grocery items to use while shopping. Here is the plan entry form (click for a larger view) that opens when the "Today's Plans" button is used:

The list date range is the first section, with arrows allowing the user to switch lists. The navigable lists depends on how the user opened the form based on the buttons in the start form. The user then uses the drop-down boxes to select days and meals. Next, the user can use the bottom part of the form to look up recipes. Say the user wants a rice dish, so the user clicks on the binoculars and searches the Name field for "rice". Currently there are fields for descriptions, authors, origins, and URLs for recipes, which could also be added to the form to help the user find a particular recipe.

The ingredients and directions are listed to help the user figure out what would be appropriate to prepare for any given day's plan. Once the recipe is selected, the user enters the recipe ID next to the meal. Alternatively, the user can enter something other than a recipe for a meal, for example, "go out to eat with friends" or "order pizza". Once the weekly plan(s) are complete, the user can return to the start form and open the grocery list report, which is organized by the location the items are found in the store, like produce, meats, aisles, etc.:

There is a lot of work still to do. As I mentioned, I have to figure out how to append the grocery lists to a list table to maintain a history. Then, I need to enter a lot of recipes. The recipes in the database currently were entered using a SAS program and borrowing work's SAS server to process it. The next recipes will be entered using the new form as a test run of how well it works. I expect I will revise the form quite a bit to make it flow better. I also have to figure out how to merge disparate measurements, e.g., cups and ounces, and only display a total quantity on the grocery list. Some merges will not be possible, as the above image shows for kosher salt: 7.5 tsp and "to taste". (Note the odd "40 taste" - this is a conversion of the ingredient "8 dashes of hot sauce", which is unquantifiable, and the dummy records I entered had 5 of the same recipe, thus "40 taste".) Then there's the conversion into a "better" database tool and using web forms instead of these proprietary-format forms.

For now, though, I am very happy that the database, although not fully populated with recipes, is usable and can save the user (just my wife, for now) a lot of time.

Friday, March 18, 2011

Wednesday, March 16, 2011

SAS Macros versus Python Functions, Part 1

I recently started reading Dive Into Python, a primer to the programming language Python (which I converted to MOBI and loaded onto my Kindle - available here), and I was delightfully surprised how similar functions are to the SAS macro. Here are some examples:

First, the very first line of a function or macro, that defines it as a function/macro, is very similar. In Python, one would write:

def buildConnectionString(params):

In SAS, it is very similar:

%macro buildConnectionString(params);

The terms def and %macro begin the code by stating what follow: a function definition or a macro. The "buildConnectionString" is the name of the function/macro, and enclosed in parentheses are the function/macro parameters or arguments, separated by commas.

Second, it is often a good idea to have some documentation on the function/macro. In Python, the documentation is stored in a doc string; in SAS, a very short and not as useful description string is available:

def buildConnectionString(params):
    """Build a connection string from a dictionary of parameters.

    Returns a string."""


%macro buildConnectionString(params) / des="Build connection string. Returns string.";

UPDATE: I'm not sure where I learned the 40-character limit mentioned below, but I just read in the SAS help documentation that the limit is 256 characters. Nonetheless, reading the description is still limited by screen space, so the alternatives I mentioned here are still very useful.

Unfortunately, SAS macros are limited to 40 256 characters in the description. This can be worked around by offering HELP as a valid value to the first argument, which then outputs additional information about the macro. My macros are documented in a spreadsheet, and I have a macro (ArgList) that reads in the spreadsheet and outputs additional details about the arguments when I need it. One day, I would like to convert all of my macros to use a standard header that contains the information about the arguments, its purpose, and other details in a standard way so I can modify the ArgList macro to read the macro itself instead of external documentation. In Python, though, the doc string can be viewed when referencing the function, on the fly. That's incredibly useful, and much more flexible than SAS.

One area of difference is the syntax and the meaning of space: In all Python code, space has meaning, whereas in SAS code in general, it does not. Even so, I typically indent my SAS code just like Python code requires (in most cases - I try to maintain readability, too).

Once I learn more about Python, I'll post more comparisons!

Thursday, February 3, 2011

The Powerful SAS Macro

In SAS, there are three main types of code: the DATA step, procedures (PROC), and the macro language. The macro language is incredibly powerful. It allows users to create dynamic programs, either by conditional logic, iterations, or a combination. A macro program can be as simple as running the same exact DATA step the same way every time, or as dynamic as generating custom code based on the input. (See here and here for more information on the macro language. For some examples of general-use macros, see my macro site.)

As a visual demonstration of the power of the SAS macro language, here is a comparison of SAS code and log output, printed on paper with a similar format:

SAS code (33 pages), including project-specific macros (13 pages)
(Courier New font, size 10; 1.5 line spacing; .5" margins)

SAS log from above code (391 pages)
(Courier New font, size 10; original line spacing; .5" margins)

(Note: The SAS log is a record of what SAS has done. When code is run, it documents input and output record counts, issues like warnings and errors, run time, and other information. See this page for more information.)

The code was formatted with 1.5 line spacing, although the line spacing in the log was left as-is. Additionally, the code included general-use macros (macros not specific to this project), which were not included in the printed code, but would have printed log output upon use. For example, the DUPCHECK macro, used for identifying and resolving duplicates, was used often. As one line of code, it generates, in the simplest example, 42 lines of code. (For an example of its use, see code and log.)

Even so, the ability to do so much work with so few lines of code is incredibly powerful and efficient. I am able to save time and energy, make processing more efficient and effective, simplify and standardize code, and share processes with other users. If you are a SAS user and you do not know the macro language, I encourage you to start learning!

Saturday, January 22, 2011

Kindle Review from a Newbie

I received a Kindle for my birthday last month, and ever since, I have had it with me quite a bit. I've read a number of books, online articles, downloaded articles/books (hello Gutenberg project!), personal documents, PDFs, and alternate ebook formats (e.g., text, Mobi). I've also viewed images, listened to music, played games, and listened to articles with the text-to-speech feature.

My first impression? AWESOME. This is one of the most useful devices I own, and it has helped me face a mounting task: to read that which I "should" read, including articles for work, research articles, random pages online, Wikipedia entries, classics I should have read in school, and simply books I've been meaning to read for a long time.

Further, the device is extremely useful for work. I don't have to print out long documents for meetings: I simply load them on to the Kindle. (I don't have the 3G version, which is fine by me. I don't want to pay extra fees for content delivery, and the USB cable is easy enough to tote to work where there's no WiFi.) I maintain a project list as a text file with current items on the first page and completed items on the second page. Once it fills up I plan to archive it for review purposes.

The capability to search, easily navigate a document, and have a document anywhere has been immensely useful, especially during critical moments in a meeting when someone says, "I think that's documented in that XYZ document that I've either printed out a 150-page version of and will take me 15 minutes to find or I did not print it or I left it at my desk, so none of us will know the answer until later." Just yesterday we were fumbling around trying to figure out the next date we should meet, when we should run a process, and when we should communicate by email or phone. I opened up a PDF calendar I loaded and it made planning much easier.

Of course, this could be done with just about any e-reader, right? Yes, but the Kindle has that iPod-like cool and ease-of-use. The ease-of-use is really the big advantage. The Kindle loads quickly, turns pages quickly (except in some PDFs, especially those with too many images), has an excellent organization method ("collections", where a book can be located in multiple collections), a "newspaper" feature with auto-archiving of old "editions", and all the other features I mentioned above. It's also comfortable to hold and it has the potential for more capabilities. For example, it has a microphone with no known current use, but I imagine that voice commands (e.g., "turn page") would be an excellent addition, especially for people with disabilities.

Even though people complain that it has a locked-down ebook format, so do the other e-readers, especially the Nook, which claims it doesn't. Barnes and Noble use a more open standard, but they still lock it down with DRM, making it practically inaccessible to other ereaders. And frankly, there are ways around these limitations, as any savvy computer geek can find a way to break DRM. (Hopefully, those people are being honest and not selling or distributing copies.)

Have I had any hiccups? Yes. First and foremost: DO NOT BUY THE NON-LIGHTED COVER BY AMAZON!!! Due to its construction, it causes the Kindle to short and restart, seemingly at random. It is not the Kindle. It is the cover! The lighted cover is fine, in fact, I exchanged the non-lighted cover for the lighted cover, and I haven't had any problems with it restarting since. (Caution: The lighted cover costs a bit more, almost half of the cost of the Kindle!)

Additionally, I've had some odd problems with PDFs. Sometimes viewing PDFs with a lot of images makes the page turns very slow. PDFs maintain the format, so they are often hard to read in the default orientation. Rotating the orientation to landscape helps with that problem. It's possible to load images onto the Kindle, which is a hidden, experimental feature. The images look similar to the screen savers, and are a bit slow to navigate as well. Simply put, it's not a device built for handling images (yet). So really not too many drawbacks directly related to the Kindle!

Overall, it's a great device. Easy to use, flexible, and quite frankly, beautiful. My wife says, "It's not beautiful - it's sexy." Please note: she said that, not me.