System architecture: Splitting a DAM into Self-Contained Systems

While we’re gathering ideas for the next generation of our DAM product’s user interface, we’re taking the opportunity to reflect on the system architecture of our DAM software. It’s currently a monolithic architecture: All of the user interface and back-end features are implemented within a single software system, and based on one large database. This traditional approach to software development has been criticized for a while, starting with SOA (service-oriented architecture), and more recently microservices and Self-Contained Systems as the proposed alternative. That’s because with multiple smaller systems, each one should be simpler, have less bugs and be easier to extend. (Of course, there’s added complexity in making these systems interoperable, and maintaining multiple systems.) Stefan Tilkov’s Breaking the Monolith analyzes the problem well.

Coming from a different angle, Ralph Windsor and Naresh Sarwan – back in 2013 – wrote in depth about Digital Asset Management Value Chains on DAM News. Their idea was to move DAM systems to a component-based architecture where “operators will review, select and assemble custom applications using a variety of component choices available”. I also blogged my thoughts on DAM value chains. Except for the Nuxeo platform, I haven’t heard of any DAM vendors implementing such an architecture, but of course this is a pretty far-reaching and bold move; maybe some vendors have added it to their long-term plans.

Now what would a DAM system look like if you split it up into several smaller, self-contained systems?

I started with the DAM Foundation’s Ten Core Characteristics of a DAM, pretty much the canonical definition of DAM system functionality. The characteristics are: ingest, secure, store, render / transform, enrich, relate, process, find, preview, produce / publish. Trying to map these to more fine-grained systems, I arrived at this list of 13 distinct services:

asset database (asset identifiers and metadata including metadata editing capabilities, and pointers to files)
file storage (locally or in the cloud; this is where image files etc. reside and where the asset database points to)
image processing engine (creating preview images, and cropping images for export)
video conversion engine (taking stills from videos, and transcoding into Web-friendly formats)
file metadata extraction and embedding engine (for IPTC, EXIF, XMP metadata and office document text contents)
ingestion engine (combining the above engines to process files and move them into the asset database and file storage)
search engine (fulltext and metadata search across assets)
collections database (where users can organize assets in collections)
rights management database (usage rights and usage tracking for assets)
controlled vocabulary database (thesaurus, country and keyword lists etc., with editing capabilities)
workflow database (tracking which assets are in which stages of a workflow, in which status and assigned to which user – e.g. approval, rights clearance)
user roles and permissions database (defining which users have which permissions on assets, using ACLs)
asset publishing engine (adapters for sharing, publishing assets on Facebook etc.)

According to the definition of Self-Contained Systems, each of those is a standalone software stack, with its own database, business logic, Web user interface and API. Interactions between systems are supposed to be minimized, with API calls done asynchronously, and the various Web UIs only connected via links. To be honest, never having developed software this way, I have a hard time imagining what the end result would look like. But it seems worth exploring.

One beneficial side effect of this rigid “separation of concerns” is that it makes it much easier to swap a component out for a cloud service. How about using a cloud video conversion engine or rights management database?

Another major win is that your new mini-systems are available for reuse. For example, many of the subsystems of a DAM system are equally useful for a Web Content Management system. It’s probably hard to find a CMS vendor who supports this system architecture, but it would make a whole lot of sense. Why have separate search engines for DAM and CMS? Why can’t collections and workflows span both systems, unifying DAM assets and Web articles? Why does every system have its own user roles and permissions management?

The Self-Contained Systems definition doesn’t say much about how to best connect the systems. I suspect that Semantic Web technology might be especially well-suited for such a zoo of disparate systems. And UI integration is another important and interesting topic. I will keep you posted about what I’m learning!