Tuesday, January 10, 2012

SCAPROC Bug in SAS

In my current position, we have a macro set up to run other programs, and it outputs a copy of the code, an MPRINT file, and the log. It also re-runs a project-specific autoexec to keep a copy in the log, analyzes the log for errors and warnings (using something other than my CheckLog macro), and it collects and outputs metadata. The part that finds and outputs metadata is called PROC SCAPROC. Here is how SAS describes it:
The SCAPROC procedure implements the SAS Code Analyzer, which captures information about input, output, and the use of macro symbols from a SAS job while it is running. The SAS Code Analyzer can write this information and the information that is in the original SAS file to a file that you specify.
Sounds nifty, right? Well, I've noticed some odd issues with it over time, and only recently was I able to put my finger on one of its problems.

The arrangement of code that results in the issue is rather odd and easily avoided. Nonetheless, in case anyone has encountered the problem, I thought I would post code to replicate the problem here. First, run this part in a new SAS session:

/* Set the base directory */
%let dir=%sysfunc(pathname(work));

/* Create a new folder */
%sysexec mkdir "&DIR.\lock";

/* Assign a library with a locked data set */
libname lock "&DIR.\lock";

/* Create a locked data set */
data lock.locked(pw='asdf1234');
    x=1;
run;

So we've created a new library with a locked data set based on our work library. Now, try out SCAPROC with the following code and see what happens:

/* Copy the CARS data set */
data cars;
    set sashelp.cars;
run;

/* Create a view */
proc sql;
    create view badview as
    select type
    from cars
    ;
quit;

/* Remove type from the data set */
/* This makes the view (i)nvalid */
data cars;
    set cars;
    drop type;
run;

/* Start PROC SCAPROC */
proc scaproc;
    record "&DIR.\lock\scaproc_out.txt" attr;
run;

/* Copy data out of SASHELP */
data test;
    set sashelp.vcolumn(where=(libname="WORK"));
run;

/* Write out SCAPROC */
proc scaproc;
    write;
run;

And now we've created a view from a SASHELP table and made it invalid by removing the column that the view uses in its definition. We started SCAPROC and attempted to use another SASHELP table that contains metadata about all of the columns available to SAS. Did you get a pop-up for a password at this point? If so, that's part of the bug. Next, SCAPROC ends. Take a look at the log: are there some error messages? There should be at least one regarding the invalid view. Now take a look at the output of SCAPROC. It's a bit inflated for such a small program, especially if you have a large number of libraries assigned. Assign a bunch of libraries and try the code again, and it will explode.

The issue here has a few parts: First, we need an invalid view and a fully password-protected data set. Second, we need to run SCAPROC; and finally, we need to run a DATA step against SASHELP. The invalid view causes SCAPROC, upon encountering the SASHELP.Vcolumn data set, to consider all data sets listed in the table as input to the program. Interestingly, if you replace the DATA step with SQL, the issue disappears, except for the error regarding the invalid view. Dropping the use of SCAPROC also removes the problem (obviously).

Why did the password pop-up appear? Since SCAPROC considered all data sets as input, it attempted to collect metadata on the locked data set. A sub-bug of the main bug is that SCAPROC should not query locked data sets, since it won't know the password. If you were to run it in batch mode, there wouldn't be anyone around to enter a password. Another sub-bug is that the DATA step somehow tells SCAPROC that all available data is an input.

As a consequence of this issue, we had a program run where the metadata output of SCAPROC that contained variable names was a SAS data set with 5.1 million rows at 972 MB. That was huge, and the SCAPROC output file was also huge, at 751 MB. And it's just a text file - no SAS header, column structure, anything!

I submitted the bug set to SAS, but in the meantime if anyone encounters this issue, the easiest way to fix it is to either fix the broken views, replace the DATA step with SQL, or both. Good luck, and please let me know if you encounter the issue yourself, or any related issues!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.