Tests by Grand Strategy

Posted on December 23, 2023 by Tylor Kobierski

A couple of years back, I had to replace a legacy product with a modernized version with the same exact feature set.

The old version of this application was running on a deprecated system that had two critical problems:

  1. The particular application I was replacing was very resource intensive.
  2. The deprecated system it ran on was the primary platform that everything ran on.

In most cases, the volume of data moving through this application meant that the resource-intensive nature of this application wasn’t too much of a problem. But when we hit the peak of traffic volume and the entire system ran hot, this application would start to tank the throughput of everything around it.

Lots of clever tricks by clever engineers were pulled out to keep everything running smoothly with the architecture we had, but as the company grew and grew, so too did the base load we had on the system, and so too did the efficacy of those tricks shrink. Some of them even became entirely untenable to pull as some customers became more reliant on the output of this resource-intensive application.

At some point it became very clear that we needed to take this application out of that legacy system, because if nobody did anything it was bound to bring that system to its knees one year or another. This was easier said than done, though. The application was built over many years; many of the hands that touched it were long gone and of the ones that stayed their contribution was but a faded memory. To make it worse, the nature of the problem it was solving meant it was difficult to test, and so it was primarily tested in production, and given the age of this application, documentation into why it was built the way it was was well lost to time.

How do you deliver an application with full feature parity, as quickly as possible, in conditions like this?

The Flowchart

The only reliable documentation in this case was the code. However the code was written in an obscure programming language that few people had experience in, was difficult to parse as a human, lacked even the simple facility of a for-loop (everything had to be written as a while), and lived on a platform that was on life-support (turns out the IDE was maybe about 2 MacOS versions away from being unrunnable outside of a VM!). We had to take it out of that format in some way and make it more comprehensible.

The first thing we did was build flowcharts and step-by-step documents of the application. Our first iteration of the application was then built off of these flowcharts and diagrams.

This got us quite close to a workable application, but it ended up being a bit error-prone: the flowchart revealed to us the general shape of the application we needed to build, but it turned out that, with the byzantine amount of detail contained in the flowchart, that it was difficult to properly track all of the different subtle things that the application did.

You don’t know what you don’t know, so sadly the first run of this replacement did not go as smoothly as we expected, and we regrettably had to do a swift rollback.

Time was unfortunately running out. There wasn’t much time to refine things before peak season hit. At the same time, as a result of that rollback, we wanted to tread very carefully, and make sure that we hit parity. The lead engineer of the project assigned me the task of getting those features exactly right.

The Inspiration

At the time we were building this, I was an avid player of Europa Universalis IV. EU4 is a grand strategy game played on a world map centered around the years 1444-1821. It’s a kind of alternate history simulator, where you pick a country and let it grow. It’s as addictive as Civilization and scratches a similar history nerd itch, but with far more detail and depth at the cost of only focusing on a particular fraction of history.

A prominent feature in EU4 is its concept of a “National Decision”: which are kind of declarations or laws that you can pass based on conditions that you meet. For example, if you were playing as the Duchy of Milan, and you conquered all major population centers of Italy from Sicily to Savoy, you could pick a national decision to declare yourself King of Italy and gain prestige and entirely new bonuses. Or, if you were England, and you have bad relations with the Pope, you could declare a Statute in Restraint of Appeals burning your bridges with the Catholic church but consolidating your authority.

To activate a national decision, you have to meet a strict set of criteria that are clearly labelled. For example, this screenshot shows you the requirements to form Germany:

Germany is pretty simplistic to form: it mostly needs you to be an independant Germanic nation that has collected specific provinces. Others can have more variable criteria (must be this religion or that religion, a country does not exist, must have completed at least one specific idea group, etc.)

The night I got the assignment, I was playing a game as the Duchy of Friesland with ultimate goal of forming the Netherlands. As I was staring at the national decision list figuring out which neighboring duchy I needed to conquer next, the idea hit me. I quit the game, opened up emacs and built out the initial bits of the orgmode file describing:

  • Each and every output that the program can make.
  • The input requirements to achieve that output.

I went through both the code and the flowchart I made previously. The result was a file that looked a little bit like this:

* Input-Output Analysis: My Microservice
** To Process a Notification of acceptance
*** All of the following must be true:
 - [ ] There must be an incoming document:
   - [ ] Have a primary ID
   - [ ] Have a secondary ID
 - [ ] In the database:
   - [ ] A routing record MUST exist:
     - [ ] Primary ID must match document
     - [ ] Secondary ID must match document
   - [ ] A communication channel record MUST exist
     - [ ] Communication channel must have a defined destination.
*** Then:
  - [ ] A record in the notification table
    - [ ] The notification record's status must be "Accepted"
  - [ ] A queue message is written to the dispatcher containing:
    - [ ] record_id
    - [ ] destination_address

... und so weiter

The result of this analysis was a clean series of conditions and rules that could then be translated into a clear and useful set integration tests.

We then did so, labelling the tests with each pathway the document described:

@SpringBootTest
public class DocumentTest {
  @Autowired
  DocumentProcessor unit;

  @Autowired
  AcceptanceDAO dao;

  @EndpointInject("mock:endpoint")
  MockEndpoint mockEndpoint;

  @Test
  @DisplayName("The test document can be accepted.")
  public void testDocumentAcceptance() {
    // This test is backed by database, which gets stood up at the beginning of the test and contains sample test data.
    var testDocument = TestDocumentBuilder.newDocument()
      .setId("test-identifier")
      .setSecondaryId("test-specifier")
      .build();

    unit.process(testDocument)

    var processedDocument = dao.getAcceptanceRecord("test-identifier", "test-specifier")
    assertEquals(processedDocument.getAcceptanceStatus(), DocumentStatus.ACCEPTED)

    mockEndpoint.assertEndpointReceivedBody(testDocument)
    mockEndpoint.assertHeader("destination", "my-destination")
  }

  // i tak dalej...
}

With this strategy, each item in the list became a piece in a checklist that we could validate on two levels: that we have a test that covers the case and that the case passes. We ended up finding several holes in the original implementation of the rewrite, which we easily filled, then went ahead with a corrected re-release.

The Deployment

We couldn’t just have the application immediately replace the original process, of course. The deploy had to be done slowly and carefully.

The input/output analysis document became a launch checklist, which we would try and use to verify as much behavior as we could.

To control the potential rate of failures against production, we trickled in very small samples data we could easily verify against the checklist, then quickly reprocess that data with the legacy system if there was indeed an error.

As we verified the most common and important pathways, we trickled larger and larger samples into the new process to trigger increasingly obscure code pathways, until we were running production at full throttle.

After the Full Release

Eventually it became evident that certain pathways were either extremely infrequent, or no longer used at all. We then marked those pathways with metrics counters and monitored them for an extended period of time to see if they were used, and if not, eliminated them from the application.

While we no longer touched the input/output documents after we successfully got the application running, the tests that were generated off of that initial analysis became incredibly useful. The initial problem was solved but the application could always be improved, or extended based on customer needs. Those tests ensured that we avoided adding regressions if we created new features or changed existing functionality.

Finally, without the time limit to work against, I eventually did make a fully functional end-to-end test, taking the integration tests I made, and transforming them into a real working document that we could use to validate the application before we went to production, sealing the last major testing gap the application had.

Overall, the input/output analysis was a wonderful success. The replacement application did exactly what we needed it to do: nobody had to worry about the legacy application wreaking havoc on all our throughput anymore. And the process we used to release this application became the template for all others, resulting in many clean releases in the future.

I also got a cool t-shirt out of it!

Follow me on the fediverse from your favorite instance: @tmk@social.lugal.io