Saturday, March 13, 2021

Swenson's Law

The other day we were going through agile training, and one of my colleagues was struggling with the concept of assigning unit-less numbers to work effort / difficulty instead of hours or a 1-10 rating.

I tried to convey that when we estimate tasks in hours or dollars it is often very wrong anyway, and gave a couple of examples. 

Let's say you estimate $10,000 for a project. On day one, you find out that the software you expected to be available is not, and you have to pay $2,000, immediately. So now your estimate is off by $2,000, immediately.

Instead of estimating the project with a specific dollar amount and being immediately wrong, the agile approach encourages figuring out the meaning of work effort through experience, by assigning somewhat random values to projects and reflecting on that estimate, ideally, improving your ability to estimate tasks in your own terms.

This led me back to an earlier realization I had while doing home projects:

Swenson's Law: It's never just "one thing".

Let's say you want to wash the exterior of your spouse's car. You go to pull it out of the garage / parking space and realize that it's filthy inside. It really needs to be cleaned inside, too. So you go to get the vacuum cleaner, but you find out that has a broken part. So you go to order the replacement part online, but you get an email from your tax accountant noting a missing document. So you go to scan the document and send it. 

If this were a joke:

Spouse: I thought you were going to wash my car!

Me, sitting at a computer: I am dammit!!!

I originally wrote a note about this back in 2015, before my daughter was born, but forgot to publish it:

I went into the kitchen to throw away a tissue. I noticed the trash was full. I thought: "Oh, I'll just take out the garbage." However, I then noticed that the top of the trash can was rather dirty, so I went to get a wet paper towel to clean it off. I turned to find that the paper towels were out.

So then I went downstairs to get some more paper towels. I remembered that we had laundry to put away, and that later today, when we usually do laundry, we would be out to the hospital for a tour. We had to do laundry earlier, likely about right now.

I got the paper towels, came back upstairs, cleaned the trash can, and while I was taking out the trash, noticed the recycling bin was also full. So that had to be taken care of, too.

This situation, along with many others (e.g., fixing anything in the house; replacing a light switch ended up taking half a day), resulted in a conclusion: It's never just "one thing".

Saturday, March 21, 2020

Home Projects

Stuck at home? Bored? Here are a few ideas:

1. Learn a new skill.

I highly encourage everyone to start learning a new skill online. Find a good service that is cheap or free, with a good online learning system. Be sure to schedule time on it every day, even if it's just 15 minutes. Learn to use Excel, program in SQL or Python, or learn a new language. These skills will always be useful.

Don't be limited by online learning. Has that piano, guitar, or flute been collecting dust? Clean it up first, show it some care that you don't usually have time for, and give it a go. If you're already good at it, consider teaching someone else.

That said, see if someone around you (physically or digitally) is interested in the same skill, and try to learn it together. If you learn a new skill together, you can help each other, encourage each other, and ensure you're committing to it every day.

On the flip side, if you're good at teaching, consider tutoring online or creating content for one of those online learning systems.

2. Check your living space for expired or out of date things.

Check your light bulbs for any incandescent bulbs. Replace any you find, if you have extra bulbs. Consider ordering LEDs, if it doesn't interfere with other deliveries. Initially I bought daylight LEDs, but we found them too obnoxiously bright in the evening and at night, so I recommend soft yellow lights.

Similarly, check your fire extinguishers. You may not be able to get them officially checked, but you can make a list of things to do post-quarantine. Expired fire extinguishers should be replaced if they have plastic handles; otherwise, they can be recharged by a certified specialist.

There are lots of other things in your house to check. Furnace filter, water filter, fridge filter, humidifier filters. Ok, so lots of filters.

3. Organize your storage items.

The best idea that I've had regarding storage is to NOT label boxes with words. Instead, use a number and keep a list with these columns: Box Number, Contents, and Location. If you're living in a small space, you may not need a Location; but even in a small spaces boxes are easier to locate when the location is noted. Using this method of numbering boxes helps to avoid covering up old labels or leaving misleading labels.

To identify the location of a box when they're on shelves, I use a location name with a letter / number coordinate just like spreadsheet software (Spreadsheet Cell Reference). Letters for columns (groups of boxes going up and down) and numbers for rows (groups of boxes going left and right). For example:

Number: 5, Contents: Pictures, Location: Storage Shelf B2
      A      B
1 |______|______|
2 |______|_Box5_|
3 |______|______|

Don't forget your digital storage! Organize your photos and videos, too. I use year/month/day format for pictures and videos, and I group pictures and videos by year/month/day folders. Sometimes if there aren't enough pictures to justify an entire folder, I lump some together.

For example, if there are only 5 pictures in Feb 2019, I put them all in a folder named 20190201 with picture names like 2019-02-01_0950.png or 2019-02-25_1735.png. Using the year/month/day format at the beginning, the pictures will sort correctly even if you edit them at a later date. For major events, like birthdays with lots of photos, I add an even name after the folder year/month/day, like so: 20190525_Anniversary_Party.

4. Clean frequently touched items or replace them with automated items.

The CDC recommends cleaning frequently touched things every day. What I find quite obnoxious about this advice is that they don't tell you how to do it. I use a mix of water and bleach for light switches, doorknobs, and keyboards, but what about your phone? What are you supposed to use when it shouldn't get wet? I have a product that says it cleans phones, but there's little evidence that it does. Perhaps getting a slightly damp cloth and wiping down your phone and then immediately drying it is the best you can do.

Consider replacing frequently used light switches with automatic switches so you don't have to touch them. I bought one automatic switch for the main floor bathroom. This is likely not feasible for everyone, especially if you only have 1 bathroom. I suppose a voice-command light would be better so it doesn't turn on at night, but likely much more expensive than a simple motion detection switch. While you're at it, if you have any outlets that are lose, figure out the breaker, turn them off, and tighten them.

Personally, I don't like smart voice-command items like Alexa, but right now they seem like a smart choice since you can do so much without touching anything! No need to touch the speaker to play music, use your keyboard to search and order something, or get a reminder on your phone that you have to swipe to view. I might have just convinced myself to buy one...

5. Write about your experience.

One thing that has tripped me up over the years regarding my health is that I'll do something or something will happen, and months later I will have forgotten the solution or details of what occurred. If I had written a health journal, it would have helped me remember. This may be a great idea during this time of health crisis, especially if you aren't able to communicate your past condition.

It may also be cathartic to write about your experience, to write about how you feel, or to translate your experience and feelings into a work of fiction. My wife writes fiction as sort of a therapeutic practice. She doesn't let anyone read it (so far), but it really helps her work out emotions and life events in a different way.

One thing that I've forgotten to do over the years is to send an "update" email to people I care about, personally and professionally, updating them about my life. I usually do this by email, but I treat the writing of it like a long-form letter. I write it as if I won't get a response, like a one-way communication method. I usually get quite a few responses, and I think people appreciate hearing about changes in my life. I take the time to thank people, too, for helping me get where I am. And I always ask, "what's new with you?" It's been a great way to keep in touch.

Whatever you do in large-group emails, DO NOT put everyone's email in the TO or CC field! Put everyone in the BCC field and put your own address in the TO field. Your contacts may not want their email shared with 10s or 100s of others, and if any of those addresses are compromised to a hacker, you didn't share any email address with that hacker except your own.

6. Listen to music and stories the old-fashioned way.

How often do we sit back on the couch and just listen to music? No phones or tablets or books. People used to do that! Just listen. Share your music interests with those around. Nowadays music preferences are so private, who knows what you like? Does anyone know you like to listen to death metal during your workout? Or J-POP on the way home from work? Who knows, someone you share with may really like it too.

I heard about a deal on a certain website that sells audio books. Maybe it would be fun to gather the family around the "radio" and listen to a chapter from a good book! Make some popcorn! Make it a weekly event.

I recently pulled out some old tapes I made as a kid, and my son and I listened to the goofy tapes for a good 30 minutes.

7. Share your ideas about what to do while stuck at home!

Of course we should all exercise more, and we need to be rather creative about it stuck at home. How do you manage? Any other ideas about what to do at home?

Wednesday, September 13, 2017

Tough Year

It's been a tough year. I covered the beginning of the year in another post regarding the Calculus course I took in preparation for applying to a master's program in applied statistics. I had planned on studying for the GRE this summer and taking it this fall as well as applying for the master's program, then working through the master's program for the next 3-5 years. My employer, HDMS (a subsidiary of Aetna), was going to help pay for the degree. However, those plans were going to get thrown off course.

I had just finished submitting my coursework for reimbursement when I was sent an out-of-place meeting request. At the meeting, I found out HDMS was letting me go. At first it appeared it was just me, but as I found out later in the day, they were laying off about 10 other employees and closing about 10 other open positions. For two hours, I was trying to figure out what I did wrong - but it had nothing to do with my performance. I was the most recent hire on the team, and there were others getting laid off too.

I hit the ground running. The severance package included a career consulting service - I looked it up and scheduled time to review my resume with a consultant. The paper / PDF version has undergone quite a few revisions over the past few weeks. I also started contacting my network, browsing through online job boards, and all the usual job-hunting tasks. I did find quite a few roles through my network, and a couple of them resulted in offers.

I quickly found a role as a consultant data analyst with Great Wolf Resorts. I stayed there for a few weeks, working on a single project integrating credit card transactions with the reservations. Essentially, if a guest uses a credit card for something on site (e.g., restaurant) and does not charge it back to the room, the transaction is not connected to the reservation. In order to connect the two, I had to merge transactions with reservations based on guest name and, if available, the last four digits of the credit card number. It was messy, and I was able to get about 57% of the transactions matched. I believe the best possible rate was somewhere around 65%, but it would have required a lot of exception handling, manual matching, and/or time-intensive matching processes (e.g., matching text within another text field). The company analyst and I decided the additional matches weren't worth the expense.

The position was a good fit for my skills, and I enjoyed working with the people there, but as a consultant role, the benefits were very expensive and of course it could have ended at any time. So, I kept looking for permanent roles while I was there. I need something more permanent right now, but I can definitely see myself as a successful consultant. In my short time there, I think I demonstrated a lot of value with my skills and the process and analysis I left behind.

Recently, I found a new role as a Senior Healthcare Analyst at SSM Health, a non-profit healthcare organization with hospitals from Wisconsin to Missouri. They also own Dean Health Plan, where I worked a few years ago. I still know a few people there, so it will be good to reconnect with them. I'll be analyzing healthcare data for a particular region of the system, starting in a few days. I feel very good about the team and the leader, so I'm looking forward to getting started. Luckily, I'll be working from home again, so I'll get to use my treadmill again.

Here's hoping quarter four is quite a bit less turbulent!

Tuesday, August 1, 2017

Calculus III

In the last year or so, I decided to apply for a master's program in applied statistics, but I was missing one of the prerequisite mathematics courses: Calculus III. I had taken calculus courses in high school and college, but those courses were more focused on applications. Furthermore, I hadn't covered any of Calculus II in those courses.

Instead of taking the entire series, which would have taken quite a bit of time and money, I decided to do something rather daunting: I took Calculus III online and used Khan Academy and other sources to catch up on Calculus I and II. I read reviews that the first few weeks were tough even if you had taken Calculus I and II just before III. Undeterred, I started the course earlier this year, and the first few weeks were indeed tough.

The online program I used - NetMath - used an online math tool for running code and submitting homework. Each student is assigned to a mentor who grades assignments, answers questions, and ensures each student is on schedule. Students receive feedback on their homework and are able to re-submit corrections a couple of times. The two midterms and final must be taken in person with a proctor.

My first mentor was not very responsive. On week 2, a critical week in the program, my mentor did not respond to emails or grade my assignments in a timely fashion (within 3 days, as noted in the program handbook). I notified the program administrators and they assigned me a new mentor. She had quite a bit of catching up to do, but she did her best and eventually graded the outstanding assignments and responded to my questions. Honestly, she was amazing, and I'd write her a letter of recommendation if she asked.

Lesson 2 is quite difficult. It's really the first lesson on the topic of the course, where lesson 1 was review of parametric equations and other necessary concepts, and it's there in case anyone missed or forgot these topics. With the combination of difficult content and slow responses from my mentor, it took me 2 weeks to finish lesson 2. In addition, I got sick for a couple of days and there was a death in the family, which put me behind another 2 weeks or so. Fortunately, the program offers a two-month extension, and I planned on using it if needed.

However, there were additional, serious problems with the course. One of the most grievous was incomplete or incorrect content. There were often no terms given to ideas, preventing students who have taken this course from communicating the concepts effectively. For example, vector projection was just called "vector push on another vector". It took me quite some time to find the right term to be able to research this concept online.

The course also neglected saddle points and claimed that whenever a gradient was {0, 0} (or more zeros depending on the number of dimensions), that the point was a minimum or maximum of the function. This is blatantly not true when a saddle point is present, and it would be terrible for students to internalize this falsity since it is profoundly meaningful in calculating predictive models with machine learning, specifically neural networks. You can't assume you've optimized a function when the gradient is {0, 0} without looking around it to see if you've found a saddle point.

All told, I was quite unhappy with this course. Not only was I spending a lot of time catching up, but I was spending time trying to learn the material through other sources since the course material was incomplete or inaccurate. Nearing the end of the course, I was able to catch up to the point where it looked like I could finish the course if I just had another week or two. I emailed the administrators for a course extension, noting the reasons I had for the delay and the issues I had with the course, and they offered a shorter extension so I could finish it without rushing and without taking the full extension. (My schedule to finish without an extension would have been very demanding for my mentor to complete all the grading in time.)

Despite all the delays and issues, I finished all the material within the original time frame, and I just needed to take the final. I studied for a few extra days and took the final about 2 weeks after the course originally ended. Since the final was comprehensive and included the last three lessons, I was very nervous about it. I had aced the midterms, but there was just so much to remember (e.g., the curl of a 3D field is difficult to remember). To my great surprise and delight, I not only aced the final, I got an A+ in the course. I was relieved!

Now I just have to re-take the GRE and apply for the master's program.

Friday, January 8, 2016

Gazetteer Database for Geographic Analysis

A couple of years ago, I had a tricky problem to solve. I inherited a tool a group of analysts were using to allocate website search based on ZIP code and location name (e.g., city, most commonly) for clients based on their own locations. The tool used the output of a predictive model for website search activity and inputs from the client, including addresses, for configuring the search locations that would be allocated for the client.

In addition to setting up relevant geographies based on the client's locations, the tool attempted to collect additional nearby locations that were likely relevant to the client (a "market"). The problem was that it did not find good matches for cities, towns, and other locations people were using on the website. As a result, the analysts were doing quite a bit of work to correct the output by removing and adding locations by hand. It was very time consuming, and I had to do something about it.

EDIT: I updated the following paragraph after I remembered how the algorithm was originally working. Initially I wrote that it calculated distances between locations, but it did not.

I reviewed the process and the data used to obtain location names. The algorithm used a simple lookup from ZIP code to location name, usually city or town. It did not attempt to look up nearby location names. The data did include latitude and longitude for the locations, so I thought I'd try adding code to lookup nearby locations with this data. I asked around in the software development area and found that they were using a fuzzy distance calculation based on a globe. When I tried it out using the existing location data, I found several problems. Some of the latitude/longitude coordinates were in the wrong state or in the middle of nowhere. Additionally, the data was missing quite a few relevant locations, like alternative names for cities and towns, as well as neighborhood names, parks, and a variety of other place names people use in web searches. I discovered it was several years out of date, and there was no chance it would be updated. So I decided the data was simply junk. I had to find a new source.

I began searching online for government sources of location information. After all, the US government establishes ZIP codes, city and town designations, and executes the census every once in a while. The US government also has to release this data publicly, according to law. (This doesn't mean it's free, or easy to obtain.) So there must be publicly-available data regarding locations. Luckily, I ended up finding a free online source: the US Gazetteer Files (see "Places" and "ZIP Code Tabulation Areas" sections).
What's a "gazetteer"? A gazetteer is a list of information about the locations on a map. In this case, the US Gazetteer data includes latitude and longitude, useful for geographic analysis.
As I used the data, I found a few gaps, so I searched again and found the US Board on Geographic Names (see "Populated Places" under "Topical Gazetteers"). By integrating these two data sets, I had a rather comprehensive listing of all sorts of places around the US.

Next, I had to get the new location data working with the search configuration tool. The tool was written with a web front-end for the inputs, SQL to collect the data and apply the inputs, and Excel as the output data. So I had to do a bit of ETL (actually, I did ELT, loading before transforming) to get the new location data working with the tool. I ended up designing the model pictured here:

The main data is in gz_place and gz_zip, storing locations and ZIP code data, respectively. On the right of gz_place are some lookup tables, including a table with alternative names (gz_name_xwalk - "xwalk" meaning crosswalk). The ZIP table references a master list of potential ZIP codes (see the prior post about creating that table), a list of invalid ZIP codes that showed up in the prior location data, and a list of ZIP codes I determined were "inside" other ZIP codes (the algorithm for is another discussion entirely).

The data on the left is a bit more interesting. There are some metadata tables not really connected to the rest (gz_metadata, gz_source), documenting quick facts about the data and where I found the data. Two reference tables also float off on their own, with a list of raw location names (gz_name) and a list of states (gz_state_51 - 51 to include DC), each including associated information.

Now I didn't want the tool to calculate distances between everything and everything else each time an analyst ran the tool, so I decided to precompute the distances and store only those within a certain proximity. I decided there were 3 types of distances required: ZIP to ZIP, location to location, and location to ZIP (and it could be used vice versa). To limit processing, I used a mapping of states and their neighbor states to connect the initial set of ZIPs and locations to use. This helped to decrease the run time. At the same time, I calculated the distances between each set of latitudes and longitudes, and retained only those within a certain number of miles. The final, filtered results are stored in gz_distance, with a lookup table describing the distance types (gz_distance_type).

Finally, I could get the better location data into the tool. I replaced the original code with new code that uses the new location data, doing a simple lookup of the locations specified by the client (ZIP codes) and filtering for an appropriate distance. I created a few new inputs to help the analyst tweak the distance that the tool would use to filter the crosswalk, with the idea that clients in rural areas may find a larger area more relevant, and clients in dense urban areas may find a smaller area more relevant.

The results were excellent. The analysts praised the new process for being more accurate, less time consuming, and easy to use. There were some manual aspects to the process, for example, correcting spelling errors entered by users on the website, but these would become less of an issue as time went by. (Especially the spelling errors. The website administrators were switching from one vendor data set to another, which had better location suggestions/requirements based on the user's input.) Overall, it was almost completely automated and only required updates once in a while when new locations were added.

This was one of those projects where I really enjoyed the autonomy I was given. I was simply given a task (make this tool work better), and given free reign over how to do that. I worked with many people to get their feedback and help, especially from the database maintainer and a few users for testing the new inputs on the tool. (One interesting thing I did with the database was to partition the gz_distance table based on distance type. I got help from the database maintainer on the best way to do that.) And best of all, I really enjoyed the project.