Dreaming of a shared content store

All the content-based software I know (WCMS, DAM and editorial systems) is built the same way: It stashes its data (content, metadata, workflow definitions, permissions) in a private, jealously guarded database. Which is great for control, consistency, performance, simpler development. But when you’re running multiple systems – each of which is an isolated data silo – what are the drawbacks of this approach?

First, you’ve got to copy data back and forth between systems all the time. We’re doing that for our DAM customers, and it’s painful: Copying newspaper articles from the editorial system into the DAM. Then copying them from the DAM into the WCMS, and WCMS data back into the DAM. Developers say “the truth is in the database”, but there’s lots of databases which are slightly out of sync most of the time.

You’re also stuck with the user interfaces offered by each vendor. There’s no way you can use the nice WordPress editor to edit articles that are stored inside your DAM. You’d first have to copy the data over, then back again. User interface, application logic and the content store are tightly coupled.

And your precious content suffers from data lock-in: Want to switch to another product? Good luck migrating your data from one silo into the other without losing any of it (and spending too much time and money)! Few vendors care about your freedom to leave.

I don’t believe in a “central content repository” in the sense of one application which all other systems just read off and write to (that’s how I understand CaaS = Content as a Service). No single software is versatile enough to fulfill all other application’s needs. If we really want to share content (unstructured and structured) between applications without having to copy it, we need a layer that isn’t owned by any application, a shared content store. Think of it like a file system: The file system represents a layer that applications can build on top of, and (if they want to) share directories and files with other software.

Of course, content (media files and text) and metadata are an order of magnitude more complex than hierarchical folders and named files. I’m not sure a generally useful “content layer” can be built in such a way that software developers and vendors start adopting it. Maybe this is just a dream. But at least in part, that’s what the Semantic Web folks are trying to do with Linked Data: Sharing machine-readable data without having to copy it.

P.S.: You don’t want to boil the ocean? For fellow developers, maybe I can frame it differently: Why should the UI that displays search results care where the displayed content items are stored? (Google’s search engine certainly doesn’t.) The assumption that all your data lives in the same local (MySQL / Oracle / NoSQL) database is the enemy of a true service-oriented architecture. Split your code and data structures into self-contained, standalone services that can co-exist in a common database but can be moved out at the flip of a switch. Then open up these data structures to third party data, and try to get other software developers to make use of them. If you can replace one of your microservices with someone else’s better one (more mature, broadly adopted), do so. (We got rid of our USERS table and built on LDAP instead.) How about that?

Update (2017-01-31): The “headless CMS” fits my “shared content store” vision pretty well, and it finally starts entering the hype cycle – see Greg Luciano’s CMSWire piece What’s Next for Headless CMS in 2017?.

Update (2025-11-07): The Solid Project “Pods” are designed to achieve something similar for individual people’s data.