Tuesday, June 20, 2023

A COVID-19 Modeling Story

This is a retrospective post that I wrote some time ago, but never finalized.

Early in 2020, I was working with a healthcare organization that owned hospitals, clinics, and a health insurance company. I was asked by a colleague whether I thought this new disease coming out of China, called COVID-19, was a concern. At the time, I said that the flu season appeared to be more of a concern since it had been rather bad. Supporting this point of view and providing a source of personal bias, I had a very bad upper respiratory infection in Dec 2019 that lasted for weeks. Like many people, especially in the United States, I was very wrong about COVID-19

Once it became apparent that the spread of the virus SARS-CoV-2 was out of control in the US, I was pulled into a small team dedicated to providing the organization with predictive modeling of the COVID-19 hospital admissions, ICU beds, and ventilator usage. The team consisted of a statistician (PhD), an epidemiologist (PhD), an operations researcher / industrial engineer (PhD and masters, respectively), and me (bachelor's degree in psychology). The statistician was the only individual in the entire organization with the title "Data Scientist". My title at the time was "Senior Healthcare Analyst", and I barely had any qualifications to be included on this team. If I guessed as to why I was selected, I would say: 1) I'm a creative problem solver and pretty good at automation; 2) I am familiar with a variety of technologies, data sources, and analytical methods (including a bit of statistics and machine learning); and 3) there wasn't anyone else who could help.

We quickly started evaluating different methods of estimating COVID-19 hospital admissions. The leaders brainstormed with other organizations in the region and discovered a few tools to evaluate. The first was a SIR model, which breaks up populations into three groups: Susceptible (S), Infected (I), and Removed (R). (Often R is "Recovered", but death is also a possibility; thus, "Removed" is more appropriate, although people could get reinfected.) This model shifts people between the three categories and assumes an upward trend, a single peak, then a downward trend. Graphs based on these models resemble bell curves. 

In order to tell this story, I have masked the data and simply labeled the volumes according to their subjective magnitude. The models all project hospital "census", meaning the number of beds that are in use each day, for COVID-19 patients. Here is the projection from Model 1. 

Model 1. SIR Model with Admissions, ICUs, and Ventilators

Model 1 foretold absolute disaster. The magnitude was many more times the available capacity. In essence, it did not matter what the y axis displayed: it was so large, no one would be able to handle the volume of hospital admissions. Model 1 assumed no interventions from anyone and was based on initial measurements of the reproductive number at about 3.2. The reproductive number can be interpreted as the number of people who become infected for every 1 person who already is infected. The first value of this number is the "basic reproductive number", R-naught, or R0; whereas subsequent numbers are called the "effective reproductive number", Re, or Rt.

Quickly, interventions began at many levels: national, state, and county. As a result, the initial model was no longer accurate. At this time, we still did not have actual admission data to compare with the models to assess accuracy, so we were still "flying blind".

Model 2, based on a logistic regression model fit to the Italian data, was much less dire. However, the model quickly doesn't make any sense, as the set of functions, called sigmoid functions, are monotonic functions - they never decrease (or increase if you flip it around). So the model was only good for the next couple of weeks. However, it gave us a more reasonable number to work with. We still didn't have actual data, but we did have bed capacity information.

Model 2. Logistic Model with Bed Capacity

Model 3, attempting to solve the problems with Model 2, was based on new studies from New York and other areas of the world, and was fit with a polynomial regression. Again, not a great model, but it did provide a more palatable near- and far-future state. The peaks of both Model 2 and Model 3 were very similar. We continued to compare to bed capacity.

Model 3. Polynomial Model with Bed Capacity

Model 4 was developed at about the same time as receiving actual hospital admissions data. Within 24 hours, I worked closely with an electronic medical record subject matter expert ("EMR SME", I suppose) and collected admissions, ICU stays, and ventilator usage for COVID-19 patients. Finally, we had actual data to compare with our models. 

Model 4. New SIR Model with Actual Hospitalizations

This model was another SIR model, based on a publicly available algorithm; however, unlike other SIR models, it incorporated changes in interventions by adjusting the reproductive number over time. Instead of assuming the number of cases would always increase at a flat rate, it spread the cases out based on the observed spread. Just like the prior SIR model, it suffered from modeling only one peak and monotonically descending from that peak. That is, it never increased again.

This worked very well for a few months. 

As the actual data and other public data sources demonstrated, cases can increase and decrease chaotically over time, based on the severity of interventions and whether people adhere to those interventions. It was possible that SARS-CoV-2 mutated, maybe a few times, and became more infectious as a result. In some regions we observed a "third peak", some of which were higher than the first, and at the time it wasn't done increasing, either.

Model 5 attempted to overcome the issues with SIR models, allowing for chaotic increases and decreases while still incorporating changes in the reproductive number. So far in this discussion, I have not explicitly named the models based on the sources from which they were based, for a variety of reasons. This model, though, is based on the LEMMA tool (Local Epidemic Modeling for Management & Action). Initially, this tool was built in R.

Model 5. LEMMA (R)

The LEMMA tool was "designed to provide regional... projections of the SARS-CoV-2 (COVID-19) epidemic under various scenarios" (from the tool's website). The model provided inputs for dates, population, observed cases (admissions, ICUs, and deaths), various parameters for PUI (patients under investigation) estimates, and more. It was incredibly flexible and, at the time, well-received.

At a certain point, the LEMMA developers decided to switch the foundation of the algorithm from the R-based models to a C++ package called Stan, which at its core is used for Bayesian statistics. As noted on the LEMMA website, "[The LEMMA] Stan implementation is based on the 'Santa Cruz County COVID-19 Model' (https://github.com/jpmattern/seir-covid19) by Jann Paul Mattern (UC Santa Cruz) and Mikala Caton (Santa Cruz County Health Services Agency)".

This model was a significant improvement, providing more intervention dates and input parameters. Model 6 was the last model, based on the LEMMA Stan package: 

Model 6. LEMMA (Stan)

In this variant of the model, we were able to extract the percentiles for the simulations and treat them as upper and lower estimates of the main prediction, specifically using the 95th and 5th percentiles, respectively. 

Note in the graphs, when comparing Model 5 and 6, that the two "bumps" in the early pandemic stages are more accurately modeled by Model 6. The first LEMMA model almost appeared as if there were two SIR models hiding underneath, with the maximum predicted values taken from each. Model 5 monotonically descended from the the first peak until a specific point where it changed direction. Model 6, on the other hand, had many changes of direction, leading to a more trusted result.

We ran this model once a week. Interventions, social behaviors, and the reproductive numbers changed rapidly within each of the geographic regions where we projected COVID-19 admissions, and as a result, we needed to continuously recalculate the model based on the latest information. 

Initially in Model 6, we guessed at the impact of various interventions in each region. At the same time, we were calculating the reproductive number in a separate process and reporting on it internally. My idea was to merge these two analyses in order to calculate the LEMMA model "intervention" percentage based on the actual observed reproductive number. I calculated the percent change from week to week and used that figure instead of guessing at the impact of various changes.

It wasn't perfect, but it worked rather well. I left the organization, and I heard from my former colleagues that the model continued to be used, and continued to take longer and longer to run. I suggested a few times that they simply cut off the model at a certain date and restart it, then merge the two results. Fortunately, by the time of this publication, it is no longer being used at all.

The models served their purpose - to provide foresight for planning and maybe a bit of fear, to help monitor, and to see the impact of interventions. Fortunately, we're out of the pandemic now (in my opinion). I post this now, finally, after having written it nearly two years ago, as a sort of professional horror story, a warning to others in the future, a reminder of what we need to do better.

Saturday, March 13, 2021

Swenson's Law

The other day we were going through agile training, and one of my colleagues was struggling with the concept of assigning unit-less numbers to work effort / difficulty instead of hours or a 1-10 rating.

I tried to convey that when we estimate tasks in hours or dollars it is often very wrong anyway, and gave a couple of examples. 

Let's say you estimate $10,000 for a project. On day one, you find out that the software you expected to be available is not, and you have to pay $2,000, immediately. So now your estimate is off by $2,000, immediately.

Instead of estimating the project with a specific dollar amount and being immediately wrong, the agile approach encourages figuring out the meaning of work effort through experience, by assigning somewhat random values to projects and reflecting on that estimate, ideally, improving your ability to estimate tasks in your own terms.

This led me back to an earlier realization I had while doing home projects:

Swenson's Law: It's never just "one thing".

Let's say you want to wash the exterior of your spouse's car. You go to pull it out of the garage / parking space and realize that it's filthy inside. It really needs to be cleaned inside, too. So you go to get the vacuum cleaner, but you find out that has a broken part. So you go to order the replacement part online, but you get an email from your tax accountant noting a missing document. So you go to scan the document and send it. 

If this were a joke:

Spouse: I thought you were going to wash my car!

Me, sitting at a computer: I am dammit!!!

I originally wrote a note about this back in 2015, before my daughter was born, but forgot to publish it:

I went into the kitchen to throw away a tissue. I noticed the trash was full. I thought: "Oh, I'll just take out the garbage." However, I then noticed that the top of the trash can was rather dirty, so I went to get a wet paper towel to clean it off. I turned to find that the paper towels were out.

So then I went downstairs to get some more paper towels. I remembered that we had laundry to put away, and that later today, when we usually do laundry, we would be out to the hospital for a tour. We had to do laundry earlier, likely about right now.

I got the paper towels, came back upstairs, cleaned the trash can, and while I was taking out the trash, noticed the recycling bin was also full. So that had to be taken care of, too.

This situation, along with many others (e.g., fixing anything in the house; replacing a light switch ended up taking half a day), resulted in a conclusion: It's never just "one thing".

Saturday, March 21, 2020

Home Projects

Stuck at home? Bored? Here are a few ideas:

1. Learn a new skill.

I highly encourage everyone to start learning a new skill online. Find a good service that is cheap or free, with a good online learning system. Be sure to schedule time on it every day, even if it's just 15 minutes. Learn to use Excel, program in SQL or Python, or learn a new language. These skills will always be useful.

Don't be limited by online learning. Has that piano, guitar, or flute been collecting dust? Clean it up first, show it some care that you don't usually have time for, and give it a go. If you're already good at it, consider teaching someone else.

That said, see if someone around you (physically or digitally) is interested in the same skill, and try to learn it together. If you learn a new skill together, you can help each other, encourage each other, and ensure you're committing to it every day.

On the flip side, if you're good at teaching, consider tutoring online or creating content for one of those online learning systems.

2. Check your living space for expired or out of date things.

Check your light bulbs for any incandescent bulbs. Replace any you find, if you have extra bulbs. Consider ordering LEDs, if it doesn't interfere with other deliveries. Initially I bought daylight LEDs, but we found them too obnoxiously bright in the evening and at night, so I recommend soft yellow lights.

Similarly, check your fire extinguishers. You may not be able to get them officially checked, but you can make a list of things to do post-quarantine. Expired fire extinguishers should be replaced if they have plastic handles; otherwise, they can be recharged by a certified specialist.

There are lots of other things in your house to check. Furnace filter, water filter, fridge filter, humidifier filters. Ok, so lots of filters.

3. Organize your storage items.

The best idea that I've had regarding storage is to NOT label boxes with words. Instead, use a number and keep a list with these columns: Box Number, Contents, and Location. If you're living in a small space, you may not need a Location; but even in a small spaces boxes are easier to locate when the location is noted. Using this method of numbering boxes helps to avoid covering up old labels or leaving misleading labels.

To identify the location of a box when they're on shelves, I use a location name with a letter / number coordinate just like spreadsheet software (Spreadsheet Cell Reference). Letters for columns (groups of boxes going up and down) and numbers for rows (groups of boxes going left and right). For example:

Number: 5, Contents: Pictures, Location: Storage Shelf B2
      A      B
1 |______|______|
2 |______|_Box5_|
3 |______|______|

Don't forget your digital storage! Organize your photos and videos, too. I use year/month/day format for pictures and videos, and I group pictures and videos by year/month/day folders. Sometimes if there aren't enough pictures to justify an entire folder, I lump some together.

For example, if there are only 5 pictures in Feb 2019, I put them all in a folder named 20190201 with picture names like 2019-02-01_0950.png or 2019-02-25_1735.png. Using the year/month/day format at the beginning, the pictures will sort correctly even if you edit them at a later date. For major events, like birthdays with lots of photos, I add an even name after the folder year/month/day, like so: 20190525_Anniversary_Party.

4. Clean frequently touched items or replace them with automated items.

The CDC recommends cleaning frequently touched things every day. What I find quite obnoxious about this advice is that they don't tell you how to do it. I use a mix of water and bleach for light switches, doorknobs, and keyboards, but what about your phone? What are you supposed to use when it shouldn't get wet? I have a product that says it cleans phones, but there's little evidence that it does. Perhaps getting a slightly damp cloth and wiping down your phone and then immediately drying it is the best you can do.

Consider replacing frequently used light switches with automatic switches so you don't have to touch them. I bought one automatic switch for the main floor bathroom. This is likely not feasible for everyone, especially if you only have 1 bathroom. I suppose a voice-command light would be better so it doesn't turn on at night, but likely much more expensive than a simple motion detection switch. While you're at it, if you have any outlets that are lose, figure out the breaker, turn them off, and tighten them.

Personally, I don't like smart voice-command items like Alexa, but right now they seem like a smart choice since you can do so much without touching anything! No need to touch the speaker to play music, use your keyboard to search and order something, or get a reminder on your phone that you have to swipe to view. I might have just convinced myself to buy one...

5. Write about your experience.

One thing that has tripped me up over the years regarding my health is that I'll do something or something will happen, and months later I will have forgotten the solution or details of what occurred. If I had written a health journal, it would have helped me remember. This may be a great idea during this time of health crisis, especially if you aren't able to communicate your past condition.

It may also be cathartic to write about your experience, to write about how you feel, or to translate your experience and feelings into a work of fiction. My wife writes fiction as sort of a therapeutic practice. She doesn't let anyone read it (so far), but it really helps her work out emotions and life events in a different way.

One thing that I've forgotten to do over the years is to send an "update" email to people I care about, personally and professionally, updating them about my life. I usually do this by email, but I treat the writing of it like a long-form letter. I write it as if I won't get a response, like a one-way communication method. I usually get quite a few responses, and I think people appreciate hearing about changes in my life. I take the time to thank people, too, for helping me get where I am. And I always ask, "what's new with you?" It's been a great way to keep in touch.

Whatever you do in large-group emails, DO NOT put everyone's email in the TO or CC field! Put everyone in the BCC field and put your own address in the TO field. Your contacts may not want their email shared with 10s or 100s of others, and if any of those addresses are compromised to a hacker, you didn't share any email address with that hacker except your own.

6. Listen to music and stories the old-fashioned way.

How often do we sit back on the couch and just listen to music? No phones or tablets or books. People used to do that! Just listen. Share your music interests with those around. Nowadays music preferences are so private, who knows what you like? Does anyone know you like to listen to death metal during your workout? Or J-POP on the way home from work? Who knows, someone you share with may really like it too.

I heard about a deal on a certain website that sells audio books. Maybe it would be fun to gather the family around the "radio" and listen to a chapter from a good book! Make some popcorn! Make it a weekly event.

I recently pulled out some old tapes I made as a kid, and my son and I listened to the goofy tapes for a good 30 minutes.

7. Share your ideas about what to do while stuck at home!

Of course we should all exercise more, and we need to be rather creative about it stuck at home. How do you manage? Any other ideas about what to do at home?

Wednesday, September 13, 2017

Tough Year

It's been a tough year. I covered the beginning of the year in another post regarding the Calculus course I took in preparation for applying to a master's program in applied statistics. I had planned on studying for the GRE this summer and taking it this fall as well as applying for the master's program, then working through the master's program for the next 3-5 years. My employer, HDMS (a subsidiary of Aetna), was going to help pay for the degree. However, those plans were going to get thrown off course.

I had just finished submitting my coursework for reimbursement when I was sent an out-of-place meeting request. At the meeting, I found out HDMS was letting me go. At first it appeared it was just me, but as I found out later in the day, they were laying off about 10 other employees and closing about 10 other open positions. For two hours, I was trying to figure out what I did wrong - but it had nothing to do with my performance. I was the most recent hire on the team, and there were others getting laid off too.

I hit the ground running. The severance package included a career consulting service - I looked it up and scheduled time to review my resume with a consultant. The paper / PDF version has undergone quite a few revisions over the past few weeks. I also started contacting my network, browsing through online job boards, and all the usual job-hunting tasks. I did find quite a few roles through my network, and a couple of them resulted in offers.

I quickly found a role as a consultant data analyst with Great Wolf Resorts. I stayed there for a few weeks, working on a single project integrating credit card transactions with the reservations. Essentially, if a guest uses a credit card for something on site (e.g., restaurant) and does not charge it back to the room, the transaction is not connected to the reservation. In order to connect the two, I had to merge transactions with reservations based on guest name and, if available, the last four digits of the credit card number. It was messy, and I was able to get about 57% of the transactions matched. I believe the best possible rate was somewhere around 65%, but it would have required a lot of exception handling, manual matching, and/or time-intensive matching processes (e.g., matching text within another text field). The company analyst and I decided the additional matches weren't worth the expense.

The position was a good fit for my skills, and I enjoyed working with the people there, but as a consultant role, the benefits were very expensive and of course it could have ended at any time. So, I kept looking for permanent roles while I was there. I need something more permanent right now, but I can definitely see myself as a successful consultant. In my short time there, I think I demonstrated a lot of value with my skills and the process and analysis I left behind.

Recently, I found a new role as a Senior Healthcare Analyst at SSM Health, a non-profit healthcare organization with hospitals from Wisconsin to Missouri. They also own Dean Health Plan, where I worked a few years ago. I still know a few people there, so it will be good to reconnect with them. I'll be analyzing healthcare data for a particular region of the system, starting in a few days. I feel very good about the team and the leader, so I'm looking forward to getting started. Luckily, I'll be working from home again, so I'll get to use my treadmill again.

Here's hoping quarter four is quite a bit less turbulent!

Tuesday, August 1, 2017

Calculus III

In the last year or so, I decided to apply for a master's program in applied statistics, but I was missing one of the prerequisite mathematics courses: Calculus III. I had taken calculus courses in high school and college, but those courses were more focused on applications. Furthermore, I hadn't covered any of Calculus II in those courses.

Instead of taking the entire series, which would have taken quite a bit of time and money, I decided to do something rather daunting: I took Calculus III online and used Khan Academy and other sources to catch up on Calculus I and II. I read reviews that the first few weeks were tough even if you had taken Calculus I and II just before III. Undeterred, I started the course earlier this year, and the first few weeks were indeed tough.

The online program I used - NetMath - used an online math tool for running code and submitting homework. Each student is assigned to a mentor who grades assignments, answers questions, and ensures each student is on schedule. Students receive feedback on their homework and are able to re-submit corrections a couple of times. The two midterms and final must be taken in person with a proctor.

My first mentor was not very responsive. On week 2, a critical week in the program, my mentor did not respond to emails or grade my assignments in a timely fashion (within 3 days, as noted in the program handbook). I notified the program administrators and they assigned me a new mentor. She had quite a bit of catching up to do, but she did her best and eventually graded the outstanding assignments and responded to my questions. Honestly, she was amazing, and I'd write her a letter of recommendation if she asked.

Lesson 2 is quite difficult. It's really the first lesson on the topic of the course, where lesson 1 was review of parametric equations and other necessary concepts, and it's there in case anyone missed or forgot these topics. With the combination of difficult content and slow responses from my mentor, it took me 2 weeks to finish lesson 2. In addition, I got sick for a couple of days and there was a death in the family, which put me behind another 2 weeks or so. Fortunately, the program offers a two-month extension, and I planned on using it if needed.

However, there were additional, serious problems with the course. One of the most grievous was incomplete or incorrect content. There were often no terms given to ideas, preventing students who have taken this course from communicating the concepts effectively. For example, vector projection was just called "vector push on another vector". It took me quite some time to find the right term to be able to research this concept online.

The course also neglected saddle points and claimed that whenever a gradient was {0, 0} (or more zeros depending on the number of dimensions), that the point was a minimum or maximum of the function. This is blatantly not true when a saddle point is present, and it would be terrible for students to internalize this falsity since it is profoundly meaningful in calculating predictive models with machine learning, specifically neural networks. You can't assume you've optimized a function when the gradient is {0, 0} without looking around it to see if you've found a saddle point.

All told, I was quite unhappy with this course. Not only was I spending a lot of time catching up, but I was spending time trying to learn the material through other sources since the course material was incomplete or inaccurate. Nearing the end of the course, I was able to catch up to the point where it looked like I could finish the course if I just had another week or two. I emailed the administrators for a course extension, noting the reasons I had for the delay and the issues I had with the course, and they offered a shorter extension so I could finish it without rushing and without taking the full extension. (My schedule to finish without an extension would have been very demanding for my mentor to complete all the grading in time.)

Despite all the delays and issues, I finished all the material within the original time frame, and I just needed to take the final. I studied for a few extra days and took the final about 2 weeks after the course originally ended. Since the final was comprehensive and included the last three lessons, I was very nervous about it. I had aced the midterms, but there was just so much to remember (e.g., the curl of a 3D field is difficult to remember). To my great surprise and delight, I not only aced the final, I got an A+ in the course. I was relieved!

Now I just have to re-take the GRE and apply for the master's program.