Wednesday, December 12, 2012

An Amended Quote

Today on The Writer's Almanac, Garrison Keillor read a quote by Gustave Flaubert:
"I spent the morning putting in a comma and the afternoon removing it."
I thought this could use some amending for programmers:
I spent the morning putting in a parenthesis and the afternoon removing it.
Or for you SAS programmers:
I spent the morning putting in a semicolon and the afternoon removing it.
How about SQL:
I spent the morning putting in a column and the afternoon removing it.
Or if you write HTML:
I spent the morning putting in an escaped ampersand and the afternoon removing it.
Why not Python:
I spent the morning importing a library and the afternoon removing it.
Essentially, though, these all boil down to:
I spent the morning putting in (an) arbitrary character(s) and the afternoon removing it.
For all you programmers during this holiday season: May your parentheses be matched (your quotes too!), special characters escaped, and code executable on the first run.

Saturday, September 22, 2012

Two Special Notes

A few weeks ago we were packing up all our stuff to move to our new house. During this process, I found two notes from former coworkers, delivered on my last day of work at different jobs. Here's the first:





Here's the transcription in case the image does not appear:
Dear Chris —
     I have enjoyed watching you learn and grow over the last couple of years. You have become and excellent analyst — UWMF is lucky to have you. I will miss your superb work, smiling face and Luca stories! Best of luck to you on this exciting step in your career! I wish you, Kyra and Luca much happiness!
     Sincerely,
     [name]

And here's the second:





with the text:
Chris
     I really appreciate the time and dedication and professionalism that you offered to the [organization's] projects and my RA project in particular. You pressed on to make this all possible, and I am grateful. Regrets for the frustrations along the way, but know that your contribution will be remembered.
    All the best with your next personal + professional pursuits! Congrats on the well deserved Scientist position!
    Thank you,
    [name]

I kept both notes because they are very meaningful to me, and I think this highlights how important it is to write hand-written notes of thanks to you colleagues: It can be touching, powerful, and memorable for the recipient, and the writer has earned a relatively permanent place in memory. It's a perfect way to not only show genuine gratitude, but also earn a positive place in the recipient's mind, whether for the purpose of networking, friendship, or simple kindness.

Friday, September 7, 2012

New House

I haven't been posting much lately because we've been looking for, buying, and moving to a new house! Hopefully once things settle down I'll be back to posting as usual. Stay tuned!

Tuesday, June 19, 2012

An Attempt to Answer a Quiz Question

Kyra heard the NPR Weekend Edition Sunday puzzle on the radio the other day: "Think of a common French word that everyone knows. Add a 'v' (as in 'violin') to the beginning and an 'e' at the end. The result will be the English-language equivalent of the French word. What is it?"

I thought, well, I don't know French, but I do know how to write a Python program that can look up the answer. Unfortunately, the puzzle appears to be a bit of a trick. I thought as much as I was writing the code. I was thinking, "I bet some people are going to be mad about the answer, because the question was worded in such a way as to be tricky."

At any rate, the code definitely does not result in the right answer, but it was a good exercise for me. First, I had to find two dictionaries robust enough to have a lot of words in English and French. Then I had to find dictionaries: I located the Debian operating system dictionary on my local computer, did a search in the package manager Synaptic for the French equivalent and any other English dictionaries that might help, and downloaded both. (The English alternative is labeled "insane" for its size, which "possibly contains invalid words (as well as words that are very uncommon)." [1])

Next I had to convert certain French characters into their English equivalent. To answer the question, I looked up all English words that start with "V" and end with "E", then removed those two letters, looking up the "word" in the modified French dictionary. It didn't work, as can be told from its output:

a
a
ah
aire
alu
an
ange
ares
as
er
erg
es
es
il
o
t

Yep, those are definitely not the answers. Some, when you add V and E back, aren't  even really English words. "Vte"? What is that? Oh, and I think A is listed twice due to the missing diacritics.

For anyone who's interested, I posted the code online. (It's not pretty posting a lot of code on Blogger, so here it is in a document.) Keep in mind I'm a new programmer, so I might not have done this task very elegantly. It is a short piece of code, though, and it could have worked.

One other note: If you attempt to run it, it may not work on your system. It's designed for Debian GNU/Linux, as it relies on dictionaries in a Debian system and it uses the "/" as the directory delimiter. Additionally, the "insane" dictionary makes it run a bit slow, and I did nothing in the code to help with that, like buffering the file, so you could run out of RAM or peak the CPU. It works on my system, which is all that mattered to me at the time.

Of course, a more robust program should be more careful and actually take user input in some fashion. Perhaps look up words that with other characters or use different languages. As it is, it's more of a "script" that does one thing and non-interactively. In any case, it doesn't even return the "right" answer!

[1] http://packages.debian.org/squeeze/wamerican-insane

Sunday, May 27, 2012

Update in May 2012

Last year around this time I had decided to take a risk and leave my position at the University of Wisconsin Medical Foundation for a position at the UW School of Medicine, where I was hoping to be more involved with research. This endeavor hasn't turned out as I expected, so I've decided to move on. Starting later this month, I'll be working with CPM HealthGrades, a marketing firm that is offering more and more data services to their customers.

I accepted the position at CPM HealthGrades as a Data Scientist. This is a pretty cool opportunity in a relatively new field of data analysis, using some of the same technologies behind IBM's Watson computer, the computer that bested two top-rank Jeopardy winners. I'll be learning the programming languages Python and R, and I'm really excited to put these skills to commercial and research use.

As you may know, I attended the SAS Global Forum conference last month, and I plan to keep up with the SAS community as well, by writing papers, presenting, staying in touch with user groups, and maybe even writing a book. So I plan on keeping my SAS site up-to-date; however, I may not be developing much new. Perhaps I'll come out with a Python or R site to go along with it, with translations of code back and forth between the languages. Heck, maybe I'll write a SAS macro to write Python, or a Python module to write SAS. We'll see! Stay tuned for more details!

Wednesday, April 18, 2012

SAS and Keyboard Shortcuts

Base SAS has an easy way to add keyboard shortcuts that not only do X typical keyboard shortcut task, but also Y SAS program task (e.g., execute a macro). This can be a bit tricky to get to work right, but I've done it with CheckLog, PrintContents, OpenTable, and other interactive macros. However, I can't seem to figure out how to do some of these same things in Enterprise Guide.

Sure, it's easier in Enterprise Guide to open a table that's been run, but what about that library with 10,000 tables where I want to open only 1? It's like searching for a needle in a haystack, not to mention the load time just to see the tables in the library (very slow at 10k tables). It's so much easier to use OpenTable, and it's even easier to use OpenTable via keyboard shortcut. I have another keyboard command to open the properties of the table (a window that is quite severely limited in SAS EG, so perhaps that wouldn't be quite as useful anyway).

And the windowing commands seem to be gone, too. I can't edit a keyboard shortcut to jump from the program to the log: In Base SAS, I use CTRL + L for the log, CTRL + J for the program ("J" only because it's easy to reach and on the same keyboard line as L), and CTRL + K for executing the CheckLog macro ("K" for checK, I guess--again, it's on the same keyboard line as J and L). Just to note: I also have the CAPS key remapped to CTRL, so that's even easier to reach for my lazy efficient hands.

If anyone has experience with doing these things in Enterprise Guide, I'd love to hear feedback on how you get things like this to work - or if you just gave up and went with the new flow. My searches online have been fruitless up to this point, so I wanted to have this post available, at least for a point of contact in case someone else is wondering about the same things I am.

Monday, April 9, 2012

Another Odd SAS Error Message

I encountered the following error message last week:

"ERROR: Invalid value for width specified - width out of range"

The code that generated it was using SQL and the PUT function to convert an ID using a format, like so:

compress(put(id, idfmt.), '. ')

Where the IDFMT format originated as a user-created format, stored in a data set. The quick and dirty solution is to add the "?" modifier to the PUT function, which is oddly documented on the INPUT function but not the PUT function:

compress(put(id, ? idfmt.), '. ')

Now why wouldn't that be documented on the PUT function page?

And what about my original problem? I've tried varying the lengths of the input column, the format, removing the compress statement, and switching to TRANWRD or PRXCHANGE. I also tried to identify the records involved, but with 35.6 million records, that was far to slow. I know that the issue occurred in the second half of the data, but beyond that is too time consuming to look for the needle in the haystack.

Finally, I figured out a solution: The original format is based on an input data set, using the CNTLIN= option on PROC FORMAT to generate the format on the fly. The data set does not have the HLO (High/Low/Other) flag to indicate what non-matching (Other) starting values would be labeled as. I added it like so:

data custom_format;
    /* Output the original records */
    set custom_format end=end;
    output;

    /* Output an additional record */
    if end then do;
        start=.;
        label=.;
        fmtname='idfmt';
        hlo='O';
        output;
    end;
run;

Then the code using the format worked, and to boot it no longer needed the COMPRESS function to remove the blanks and numeric missing values (".").

Hopefully anyone else with a similar issue will find that one of the above solutions fits their need.

Friday, April 6, 2012

SAS Global Forum Papers Online

The SAS Global Forum papers are already posted online!

http://support.sas.com/resources/papers/proceedings12/index.html

Here are my papers:

Standardized Macro Programs for Macro Variable Manipulation

An Advanced, Multi-Featured Macro Program for Reviewing Logs

Feel free to read ahead if you're attending. It might help!

Tuesday, April 3, 2012

SAS Enterprise Guide and the Log

I'm presenting on the CheckLog macro that I developed to review a log (or any file or list of files) for errors, warning, and other messages in SAS related to issues. To make sure everything is ready to go, I've been reviewing it to see if there's anything I can improve, especially in regard to SAS Enterprise Guide, the new(er) platform for using SAS.

I encountered a few issues between the two. First, the display manager statements (DM) no longer work in Enterprise Guide. The fix was easy, although a bit tricky: For Windows users (Enterprise Guide is currently only on Windows), I switched from using the DM POSTMESSAGE statement to a Windows API called MessageBox. The API allows greater flexibility than the POSTMESSAGE statement, e.g., I can set an icon to go along with the message. Now Windows users can tell what happened not only by the message, but the icon, too.

However, the DM statement that attempts to export the current log is still intact. I was unable to find a way to replace this, or much less export the Enterprise Guide log in a simple, programmatic fashion. Essentially, it is not possible to use CheckLog to review the "current log" as defined in the Base SAS sense. However, Enterprise Guide users can still check the "current log", but they have to make it an external log in one of two ways.

The first way is to export the log using PROC PRINTTO before the program executes. The code for this isn't too bad, and it can be easily added to any existing program:

/* Set a directory and filename for the log */
%let log=%sysfunc(pathname(work))\work.log;

/* Print the log to the specified location */
proc printto log="&LOG";
run;

/* Run some SAS code... */
proc sql;
create table test as
select *
from sashelp.vtable
;
quit;

/* Turn off printing to the file */
proc printto;
run;

/* Check the external log */
/* Note: The shadow is set to "N" (No) since it is not necessary to copy the log */
%checklog(&LOG, shadow=N);

This method works fairly well. However, not all users may want to use PROC PRINTTO, or they may be using it in a fashion that would conflict with the above suggestion.

The second option involves the project flow in Enterprise Guide. This solution is quite a bit more convoluted, and I outlined it, along with screenshots, on the CheckLog Enterprise Guide page. Essentially, the user has to export the log as a step in the project flow and run CheckLog in a small program. From there, the user can either programmatically determine what to do or set other steps in Enterprise Guide to run based on conditions, e.g., using the ISSUES macro variable to determine what to do next.

Overall, this is quite a bit more convoluted, and it would be great if there were some programmatic way to export the Enterprise Guide log. If anyone knows of a simple way to do that, please let me know!

Monday, January 23, 2012

SQL Remerging

One of the issues that my CheckLog macro detects is the following statement:

"The query requires remerging summary statistics back with the original data."

What's going on here? Here's an example of code that generates this statement:

proc sql;
    create table patient_bp as
    select
          a.patient_id
        , a.year
        , b.exam_date
        , avg(b.bp_systolic) as bp_systolic_avg
        , avg(b.bp_diastolic) as bp_diastolic_avg
    from patient a
    left join vitals b
    on a.patient_id=b.patient_id
    where year(b.exam_date) >= (a.year-2)
    group by a.patient_id, a.exam_date
    ;
quit;

Two tables are joined, one that appears to be at the level of patient and year and another at patient and exam date. The table seeks to find any exam date greater than two years prior to the year in the patient table. The GROUP BY statement includes patient ID and exam date. However, the variable year is specified in the select statement.

Given that an extra grouper is in the data, the above query must first calculate:

    select a.patient_id, b.exam_date, avg(b.bp_systolic), avg(b.bp_diastolic)

Since the level of data, though, includes year, it must then do the following:

    select a.patient_id, a.year, b.exam_date, bp_systolic_avg, bp_diastolic_avg

Essentially, a mini-query needs to execute to create the summary variables (averages), then another mini-query to add that summary data to the main data.

So what's the issue? This seems useful. Well, the "remerging" note can obscure data issues within the data that the coder either ignored or did not expect, since only one of these statements can be generated for a SQL query. For example, the query above actually generates duplicates at the patient, year, and exam date levels, and the remerging note hints at this issue. Normally, this does not happen with "clean" data, but we rarely have clean data. Either the patient table had more than one record per patient and year, or the vitals table had more than one record per patient and exam date, which could easily occur in our system and is valid.

In the end, there are two ways to fix this and avoid the message in the log: Either split the query, like below, or add year to the GROUP BY statement (if that is acceptable - here it is, but that's not always the case). The split query is a little longer, but I've never noticed that such queries take much longer.

proc sql;
    create table _temp_ as
    select
          a.patient_id
        , b.exam_date
        , avg(b.bp_systolic) as bp_systolic_avg
        , avg(b.bp_diastolic) as bp_diastolic_avg
    from patient a
    left join vitals b
    on a.patient_id=b.patient_id
    where year(b.exam_date) >= (a.year-2)
    group by a.patient_id, a.exam_date
    ;

    create table patient_bp as
    select
          a.patient_id
        , a.year
        , b.exam_date
        , b.bp_systolic_avg
        , b.bp_diastolic_avg
    from patient a
    left join _temp_ b
    on a.patient_id=b.patient_id
    ;

    drop table _temp_;
quit;


Note that this is essentially replicating what SQL is doing in the back-end, but since it is split out, it is obvious that the programmer knew about the potential issue and purposefully coded it that way to avoid it.

Someone suggested adding "distinct" to the select statement, but the message regarding remerging does not go away when this solution is used. Thus, any other unexpected data issues may continue to be masked by this statement, since only one can be generated for each SQL step.

Any time you're doing data transformation, it's always best to test your assumptions. A good macro for testing the level of the data (input and output) is the DupCheck macro. Adding this one line of code would have told the programmer something was amiss:

    %DupCheck(patient_bp, patient_id exam_date);

The DupCheck macro and its documentation is available online.

Good luck, and be wary of remerging!

Tuesday, January 10, 2012

SCAPROC Bug in SAS

In my current position, we have a macro set up to run other programs, and it outputs a copy of the code, an MPRINT file, and the log. It also re-runs a project-specific autoexec to keep a copy in the log, analyzes the log for errors and warnings (using something other than my CheckLog macro), and it collects and outputs metadata. The part that finds and outputs metadata is called PROC SCAPROC. Here is how SAS describes it:
The SCAPROC procedure implements the SAS Code Analyzer, which captures information about input, output, and the use of macro symbols from a SAS job while it is running. The SAS Code Analyzer can write this information and the information that is in the original SAS file to a file that you specify.
Sounds nifty, right? Well, I've noticed some odd issues with it over time, and only recently was I able to put my finger on one of its problems.

The arrangement of code that results in the issue is rather odd and easily avoided. Nonetheless, in case anyone has encountered the problem, I thought I would post code to replicate the problem here. First, run this part in a new SAS session:

/* Set the base directory */
%let dir=%sysfunc(pathname(work));

/* Create a new folder */
%sysexec mkdir "&DIR.\lock";

/* Assign a library with a locked data set */
libname lock "&DIR.\lock";

/* Create a locked data set */
data lock.locked(pw='asdf1234');
    x=1;
run;

So we've created a new library with a locked data set based on our work library. Now, try out SCAPROC with the following code and see what happens:

/* Copy the CARS data set */
data cars;
    set sashelp.cars;
run;

/* Create a view */
proc sql;
    create view badview as
    select type
    from cars
    ;
quit;

/* Remove type from the data set */
/* This makes the view (i)nvalid */
data cars;
    set cars;
    drop type;
run;

/* Start PROC SCAPROC */
proc scaproc;
    record "&DIR.\lock\scaproc_out.txt" attr;
run;

/* Copy data out of SASHELP */
data test;
    set sashelp.vcolumn(where=(libname="WORK"));
run;

/* Write out SCAPROC */
proc scaproc;
    write;
run;

And now we've created a view from a SASHELP table and made it invalid by removing the column that the view uses in its definition. We started SCAPROC and attempted to use another SASHELP table that contains metadata about all of the columns available to SAS. Did you get a pop-up for a password at this point? If so, that's part of the bug. Next, SCAPROC ends. Take a look at the log: are there some error messages? There should be at least one regarding the invalid view. Now take a look at the output of SCAPROC. It's a bit inflated for such a small program, especially if you have a large number of libraries assigned. Assign a bunch of libraries and try the code again, and it will explode.

The issue here has a few parts: First, we need an invalid view and a fully password-protected data set. Second, we need to run SCAPROC; and finally, we need to run a DATA step against SASHELP. The invalid view causes SCAPROC, upon encountering the SASHELP.Vcolumn data set, to consider all data sets listed in the table as input to the program. Interestingly, if you replace the DATA step with SQL, the issue disappears, except for the error regarding the invalid view. Dropping the use of SCAPROC also removes the problem (obviously).

Why did the password pop-up appear? Since SCAPROC considered all data sets as input, it attempted to collect metadata on the locked data set. A sub-bug of the main bug is that SCAPROC should not query locked data sets, since it won't know the password. If you were to run it in batch mode, there wouldn't be anyone around to enter a password. Another sub-bug is that the DATA step somehow tells SCAPROC that all available data is an input.

As a consequence of this issue, we had a program run where the metadata output of SCAPROC that contained variable names was a SAS data set with 5.1 million rows at 972 MB. That was huge, and the SCAPROC output file was also huge, at 751 MB. And it's just a text file - no SAS header, column structure, anything!

I submitted the bug set to SAS, but in the meantime if anyone encounters this issue, the easiest way to fix it is to either fix the broken views, replace the DATA step with SQL, or both. Good luck, and please let me know if you encounter the issue yourself, or any related issues!

Wednesday, January 4, 2012

SAS Global Forum

I'm a SAS programmer, and I've always wanted to attend one of those big SAS conferences. So this year, when I heard about the SAS Global Forum, I applied for a grant from the University of Wisconsin and submitted two papers to present at the conference.

A few weeks ago I heard that I won the grant, and just last week I found out that both of my papers were accepted by the conference organizers. I'll be presenting both papers in the Coder's Corner section, with the following titles:

An Advanced, Multi-Featured Macro Program for Reviewing Logs
Standardized Macro Programs for Macro Variable Manipulation

If you plan on attending and you happen to follow this blog, please feel free to stop by and say hello! If not, I'll post the links to the papers as soon as their available from the conference organizers. For more information about the conference, go to http://support.sas.com/events/sasglobalforum/2012/index.html.

Hope to see you there!