Monday, September 12, 2011

SAS Passwords Suck

There, I said it. Apparently anyone else who uses SAS passwords hasn't had the guts to say that. Here's why I say so:
  • SAS data set passwords must adhere to the rules for SAS names. These rules include:
    • The first character must be a letter or the underscore (1)
    • All other characters can be letters, numbers, or the underscore (1)
    • The length cannot exceed 8 characters (2)
  • Using PROC SQL, programmers can enter longer passwords than are acceptable, without any notification (unlike in the DATA step), and these passwords become truncated. This lets the programmer think they are applying good passwords. Worse, if someone else has to use the password other than the programmer who created it, they will find it is broken and will not be able to continue with their work.
  • There is an option to encode passwords, but as the SAS documentation notes (3): "With encoding, one character set is translated to another character set through some form of table lookup. An encoded password is intended to prevent casual, non-malicious viewing of passwords. You should not depend on encoded passwords for all your data security needs; a determined and knowledgeable attacker can decode the encoded passwords." So I'm not quite sure what the point is of such an option, since it's easily hacked.
  • SAS data set passwords are typically stored as plain text within SAS code. For anyone who wants to get the password, find the source code. Usually the directory is not secured in the OS.
  • If the password is incorrect, a pop-up appears, which can be turned off (4). So a hacker could turn off the option and keep cycling through passwords until the correct one is reached.
  • Requiring passwords on data sets implies that other, standard systems of securing and backing up data are not trusted; furthermore, it conveys a level of distrust for the SAS programmers who are using the data sets. Other than human error, what is a well-meaning SAS programmer going to do with the data?
Someone out there might say, why not use data set generations (5)? (Generations are basically automatic copies of SAS data sets that have a limit on the number of copies.) There are several reasons against this:
  1. It's a waste of disk space, especially with large data sets, and especially if you have a good backup system.
  2. The generations are eventually deleted. Once an item reaches the maximum number, it's deleted. If that data set was important, why would you want to delete it, ever?
  3. It's confusing. There are no explicit explanation of what a generation may mean. It could be that one value was off in one generation so the data had to be re-run. It could be the whole data set was the wrong one. Why would you want to keep a history of that? It's simply garbage.
So what's a SAS administrator to do? First, use the operating system to manage rights by folder access. If only certain people should access the SAS data sets and code, lock the files for only those users. Second, implement good OS password policies. Those passwords can be by far better than SAS's! Third, back up hard drives automatically. Fourth, set a coding standard for making backups/copies in SAS before overwriting important data sets, which simply makes it easier to restore from a bad run (instead of finding the actual backup). (The DATASETS procedure is great for bulk copying and modifying SAS data sets.) Fifth, trust your programmers. If you can't trust your programmers, why are they working for you? Why do you give them important work? Do you really want to convey that to them? What does it do for their morale and self worth?

I hope by posting this SAS realizes that their password mechanism is essentially broken and useless. Even a novice hacker could figure out how to break into such a system. Other options in the system don't really help, which leads back to the OS itself. And the OS probably has much better tools for managing access and passwords that SAS does.

References
  1. http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a001028606.htm
  2. http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000998953.htm#a000998960
  3. http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a003166704.htm
  4. http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000995314.htm
  5. http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000243174.htm

Sunday, August 7, 2011

Writing a Portable SAS Macro

One aspect of writing macros for SAS is to try to make them portable. There are two aspects of this: First, writing the process in a manner that is generalizable so that there's a need for the macro to be portable; second, writing the program so it works on multiple platforms of SAS and operating systems (OS).

The first task I've always seemed to excel at. With almost 90 general-use macros hosted on my site, I have a macro for many useful, reusable processes. I've written code with these macros where most of the code is simply calls to these macros. They save a lot of time and repetitious code.

The second task, however, has always been a bit elusive, especially since I am unable to run my macros on different OSs. A SAS license would be required on each OS to do so, even if I had UNIX, z/OS, OpenVMS, and Windows installed where I could access them. This creates some significant problems with this last task. As a result, I have come up with a few strategies to get around the issue.

The first and most powerful is to re-write any OS-specific function or process in the native SAS language. This can be tricky since a simple process like obtaining a list of files in a directory can be a simple, one-line piece of code for the OS, but several steps (and sometimes more limited functionality) for SAS.

Another step is to simply modify the macro to run several different versions of the same thing depending on the OS. SAS has two automatic macro variables that define the operating system, SYSSCP and SYSSCPL, that can be used to run conditional code that will work in the OS. This is still tricky, since I have no way to test the code to see if it works. It helps to review help documentation, examples online, and forums for those OSs, but I'm still never 100% certain that it will work as expected.

For this last part, I rely on the users out there. I received several rounds of feedback recently on my CheckLog macro, which reviews the SAS log, an external log (even non-SAS), and an entire directory of logs, with several additional features to tweak how it runs and how it notifies the user about issues. (See the documentation for more detail.) These users were able to provide specific feedback about what didn't work, and they were able to test new versions on their system quickly. Through this process, I was able to modify the macro significantly, and now it runs on all SAS-supported OSs (UNIX, z/OS, OpenVMS, and Windows). And the users have a better utility that fits their needs with insignificant cost.

Using this model for developing macros, it could be possible to add to the standard SAS library with new and very useful processes. So if you're a SAS user, join in and contribute to the community by publishing general-use macros or by testing and providing feedback to others who have done so!

Tuesday, July 12, 2011

News and Advice

Last week, I started a new position with the Health Innovation Program in the School of Medicine and Public Health at the University of Wisconsin-Madison. In this position, I hope to be more involved in research that could potentially affect patient care. Ever since my days doing academic and educational research, I have missed being involved in the process.

Along with this transition, it appears I've started to become a help desk of sorts for some SAS. I received two sets of email from different users of my macros, and I thought I would start sharing those questions and answers here:

Question 1:

Is there a way to limit the number of observations read by PROC IMPORT? I tried using OPTIONS OBS=2 before proc import and it worked but still the reduction in time was not significant.

Answer 1:

It depends on the file, so you'd have to look at the IMPORT procedure documentation depending on what you're importing to see if that feature is available.

Where did you put the option? It must have been just above the output. If so, then yes, it wouldn't decrease the time, since SAS is still pulling in all of the records. Depending on the location of the option, SAS may only keep the number of records on its way out of the import process. Either way, SAS may need to read the entire file, which would reduce time.

What if you tried just pulling in the header record, and delimiting them yourself? I'm guessing you're looking at comma-delimited, space-delimited, or tab-delimited data, and you should be able to figure out how to read the data in as one record, then split the variable names into multiple variables or even records. Then you could see if there are any new records.

Question 2:

Kindly help me to create the folders using SAS codes in local drive or on server without using X-command.

Answer 2:

You can use either CALL SYSTEM within a data step or %SYSEXEC. You'll want to tinker with the XSYNC and XWAIT options as you see fit. Here are two examples:

data _null_;
    call system('md c:\TestDir');
run;

OR

%sysexec(md c:\TestDir\);

Monday, June 13, 2011

Unusual SAS Error Message

Today I encountered the following error messages in SAS after creating a table using SQL based on a SASHelp table:

ERROR: Floating Point Zero Divide.
ERROR: Termination due to Floating Point Exception

This strange set of errors is not well documented and, as I eventually found out, has nothing to do with my original query:

proc sql;
create table _test_ as
select * from sashelp.vtable
where upcase(libname)="SPECIFIC_LIBNAME"
;
quit;

Where "SPECIFIC_LIBNAME" was a macro variable containing the libname I wanted information about.

The solution to this problem was to clear a particular library that consisted only of views. This library's members were generated prior to the above step, and may have somehow contributed to the problem within the SASHelp metadata. In general, it may be a good practice to review the libraries and their sizes when encountering problems with the SASHelp library, which contains a great deal of metadata on SAS libraries.

For this set of errors, however, this solution will probably not always work. It would appear that the problem is related to extensions of SAS that are dependent on external sources. In this case, it may have been the SQL procedure. This does seem strange, since SQL is a widely-used standard. Clearing the previously-created library may have cleared up an exception caused by a large amount of data in the SASHelp table that was not accounted for in the manner in which SAS implemented the SQL standard or the proprietary extensions of it.

If you encounter this error, please let me know! Especially if you know how to fix it!

Articles and Discussions Regarding these Errors:

Sunday, May 8, 2011

What About Gas Prices?

Recently there's been a lot of hubbub about gas prices. While high, gas prices are not as high as many other things people buy, especially some things people buy on a daily basis, like coffee. For my family, I can demonstrate that the cost of gas is not as big a concern and not as big a budget item. Using Mint.com, my credit and debit cards, my wife's credit and debit cards, and many other financial accounts are linked and the transactions pooled together and categorized. Using this tool, I graphed the entire year of spending in 2010 in seconds:



Note: "Everything Else" includes Gifts and Donations, Health and Fitness, Uncategorized (cash and some checks), Travel, Business Services, Personal Care, Pets, Entertainment, and Fees and Charges. I also separated "Gas and Fuel" from "Auto and Transport" to demonstrate my point. What's left in in "Auto and Transport" includes Parking, Services and Parts, Auto Insurance, and "Other" (state transportation fees). Restaurants, coffee shops, and alcohol were included in Food and Dining, but I split those out into "Restaurants, etc.".

Our family probably does not spend like other families. We're generally fiscally conservative. We do like to buy good, healthy, and (when we can) local food, so our food prices are high. We don't splurge on electronics, entertainment, fancy phone or cable services, or personal care. We also minimize driving, which keeps auto costs lower. Both our drives to work are within 15 minutes. On the other hand we have a lot of educational debt. Even so, that's still a relatively small percentage of expenses. It should be no surprise that the top of the list is housing. That's probably true for just about anyone at any income level.

So what concerns me more? The cost of fuel or something else? Even as the cost of gas has risen, I'm not as concerned as I am about the cost of food, which is partly related to the cost of gas. However, there are more insidious factors in the cost of food: Wall Street.

As quoted by BoingBoing author Mark Frauenfelder, Frederick Kaufman wrote, in an article titled How Goldman Sachs Created the Food Crisis:
[T]he boom in new speculative opportunities in global grain, edible oil, and livestock markets has created a vicious cycle. The more the price of food commodities increases, the more money pours into the sector, and the higher prices rise. Indeed, from 2003 to 2008, the volume of index fund speculation increased by 1,900 percent. "What we are experiencing is a demand shock coming from a new category of participant in the commodities futures markets," hedge fund Michael Masters testified before Congress in the midst of the 2008 food crisis.

The result of Wall Street's venture into grain and feed and livestock has been a shock to the global food production and delivery system. Not only does the world's food supply have to contend with constricted supply and increased demand for real grain, but investment bankers have engineered an artificial upward pull on the price of grain futures. The result: Imaginary wheat dominates the price of real wheat, as speculators (traditionally one-fifth of the market) now outnumber bona-fide hedgers four-to-one.

Today, bankers and traders sit at the top of the food chain -- the carnivores of the system, devouring everyone and everything below. Near the bottom toils the farmer. For him, the rising price of grain should have been a windfall, but speculation has also created spikes in everything the farmer must buy to grow his grain -- from seed to fertilizer to diesel fuel. At the very bottom lies the consumer. The average American, who spends roughly 8 to 12 percent of her weekly paycheck on food, did not immediately feel the crunch of rising costs. But for the roughly 2-billion people across the world who spend more than 50 percent of their income on food, the effects have been staggering: 250 million people joined the ranks of the hungry in 2008, bringing the total of the world's "food insecure" to a peak of 1 billion -- a number never seen before.


As you can see in my spending graph, "Food and Dining" consumes 13% of our budget, higher than the quoted 8-12%. This category originally included restaurants, coffee shops, and alcohol, which I moved into "Restaurants, etc.", and was at 18.5% of spending. With these items removed, it seems much more reasonable, but still high. About 4 times higher than gas prices!

Given that our expenditure is already higher than the estimate in the article, and that food prices are likely to increase due to this investment scheme as well as increased transportation costs due to gas prices, I am much more concerned about the cost of food. Occasionally I joke that my son eats so much I might as well buy a farm to feed him, and that's not that far from the truth. We do partake in local farmer's markets and a community farm, but even so, those prices will probably be affected too.

So take a look at your own expenses and critically examine what's really costly. And once you figure that out, take action to reduce the costs however you can. For me, I'll chose to vote for politicians who push for financial reform and continue to buy as much local, organic food as I can.