Schema flexibility for power users

In software, the thing I’m most excited about at the moment is schema flexibility. (I first saw that term in a tweet by Emily Ann Kolvitz.) I think we’re losing a lot of valuable metadata, and business value, because the software we keep our structured data in makes it so hard to change the data model.

Example #1: Your system stores each customer’s e-mail address. Now you want to extend this to allow multiple addresses per customer, each with a label (“work e-mail”, “personal e-mail” etc.)

Example #2: Your archival system knows the publication date for each of your newspaper articles. Now you want to archive Web articles as well, but their publication date includes the time of day whereas print articles only have the day.

Example #3: Users can already add simple custom fields (say, “Photographer name”), but sometimes they really need to add custom structures and relations (i.e. a separate “Photographer” record with its own fields, and links to to these records).

Sounds simple? Well, you’ll need a developer and database administrator for all of the above. And it might be a lot of work for them.

Most structured data still lives in relational (SQL) databases. They’re wonderful, but they make it especially hard to change your data model. Demian Hess illustrates this in the first part of his excellent DAM and the Need for Flexible Metadata Models series: “As new asset types are discovered, you need to restructure the database by adding new tables or new columns. Database restructuring requires expensive and disruptive changes in queries and application-layer logic. […] The fundamental flaw is that we are attempting to define all the attributes for every type of digital asset in our data model in advance. In other words, we are imposing an inflexible data model.”

This rigidity is one reason for the current wave of NoSQL databases. There’s document databases like MongoDB, way more flexible but they “tend to suffer in supporting relationships between documents” (Demian Hess – DAM and Flexible Data Models Using Document Databases). Graph databases or RDF triple stores like BrightstarDB also fall into the NoSQL category. I don’t like their data model, but they do give you schema flexibility.

To be exact, these NoSQL products give your developers schema flexibility… In my opinion, the real game-changer is when power users can extend the data model. Of course this isn’t for everyone. But why can’t the librarian, a skilled user in marketing or sales, or your IT support staff enhance the database schema? And not just with a simplistic custom field, but any structure that makes sense? Having to wait for your developer (or worse, for a vendor) costs time and money, and kills many sensible ideas. Yes, developers may be needed to add polish or use the new data in integrations with other software. But power users should be able to model the data exactly as your business needs it.

This vision is why I’ve started to experiment with a user-friendly Topic Maps engine, TopicCards. It’s in a very early stage right now, but I’ll have something for you to play with sometime in 2015 🙂

P.S.: See what I mean in the Sourcefabric Superdesk description: “co-ordinated, managed and configured by journalists to suit their normal workflow — and for them to change that on the fly to cope with events needing a non-standard workflow.”

P.P.S.: Loosening your database schema has its disadvantages, of course. See Martin Fowler’s slide deck on Schemaless Data Structures. But I’m siding with one of his conclusions: “Custom fields and non-uniform types are both good reasons to use a schemaless approach.”