Handling slowly changing datasets using text files
posted 2022.05.26 by Clark Wilkins

Recently, we've been working on a backend project for our Healthica platform, and I wanted to share some ideas and approaches to slowly moving datasets. I look at data in three contexts: static, semi-static, and dynamic.

Static data

Static data is things like a condition codes list in an inventory platform (tested, untested, bad, etc.). This sort of information almost never changes, but it's accessed a lot.

Since we almost never have change it, but we do have to contend with it being used in lots of places, we encode this in JSON text files, and read them wherever we need a values list (array or object). A simple call to a larger collection file of these static items works well.

const { someArray } = require ( "[path]/static.json" );

I think this has simplicity and elegance advantages over an fs.readfile call. The array or object is maintained in one larger file of static data, and it can be used as a starting point for all manner of operations, without any database overhead.

Dynamic data

On the other end of the spectrum is all of the actual customer-facing data that is bespoke and needs to be persisted to a datastore. It's really clear that we need to have a database-interface API, and, quite often, some backend processing before passing it along to a GUI component for use in a visual interface.

Semi-static data

What we were working on recently was data that (a) is accessed in a lot of places, and (b) doesn't change that often, but can (and will) update on an intermittent basis. In the case of Healthica, we worked on manufacturers and modalities — two evolving datasets, used in lots of places, that we expand as new items suitable for either set are discovered. (For example, we are asked to start supporting a new manufacturer that's not “in our system”.)

We don't want the overhead of a lot of database calls to get a manufacturer, but we also don't want to have to manually upload a revised .json file either. What's called for here is an API to enforce our data structure policies (no duplicates, properly sorted, etc.), but persisting in a text file for fast, low-overhead access.

APIs for maintaining semi-static data

Here's the approach we took:

(1) Store the object in a single member JSON file.
{
  "manufacturers": [
    { name, value}
      .
      .
      .
    { name, value}
  ]
}
(2) Get the object by a standard call
let { manufacturers } = require( "[path]/manufacturers.json" );
(3) Manipulate the object.
In our case, we just want to add a new manufacturer, but the same principle works for editing a member, or removing one. We also needed sorting by the name attribute, and since the value is what's stored in actual database records, the relationship value:name cannot be modified. We created a new object, post-sorted after we added the new manufacturer pair.
(4) Persist the object back to the original JSON file.
For this we use fs.writeFile, as discussed in a bit more detail below.

For the actual write, we have to take into account two things:

  • Create a string that looks like an object, and convert it to an actual JSON object.
  • Remember that fs.writeFile does not reference the path from the route, but the actual root path of the JS API (not ../[path] as referenced from api/routes/thisRoute.js, but ./[path] as referenced from the API route).

Conclusions

By setting up this updating route, we are able to enforce proper data structure on a list that doesn't change that often, but is malleable, and is also very highly used. We can use a Postman-style API client, or even put a GUI together to work with it, but what we don't have to do is edit a JSON file by hand.

We maintained the overhead advantages of a quick file read, and the (apparent) ease of API-based manipulation.