2014-12-22

Hunting for well-known Semantic Web vocabularies and terms

As a Semantic Web / Linked Data newbie, I’m struggling with finding the right URIs for properties and values.

Say I have a screenshot as an PNG image file.

If I were to describe it in the Atom feed format, I’d make an “entry” for it, write the file size into the “link/@length” attribute, the “image/png” MIME type into the “link/@type” attribute, and a short textual description into “content” (with “@xml:lang” set to “en”). Very easy for me to produce, and the semantics would be clear to everyone reading the Atom standard.

Now I want to take part in the “SemWeb” and describe my screenshot in RDFa instead. (In order to allow highly extensible data exchange between different vendors’ Digital Asset Management systems, for example.) But suddenly life is hard: For each property (“file size”, “MIME type”, “description”) and some values (“type: file”, “MIME type: image/png”, “language: English”) I’ve got to provide a URL (or URI).

Read the full article…

Mon, 22 Dec 2014 10:38:13 +0000
2014-12-10

Dreaming of a shared content store

All the content-based software I know (WCMS, DAM and editorial systems) is built the same way: It stashes its data (content, metadata, workflow definitions, permissions) in a private, jealously guarded database. Which is great for control, consistency, performance, simpler development. But when you’re running multiple systems – each of which is an isolated data silo – what are the drawbacks of this approach?

First, you’ve got to copy data back and forth between systems all the time. We’re doing that for our DAM customers, and it’s painful: Copying newspaper articles from the editorial system into the DAM. Then copying them from the DAM into the WCMS, and WCMS data back into the DAM. Developers say “the truth is in the database”, but there’s lots of databases which are slightly out of sync most of the time.

You’re also stuck with the user interfaces offered by each vendor. There’s no way you can use the nice WordPress editor to edit articles that are stored inside your DAM. You’d first have to copy the data over, then back again. User interface, application logic and the content store are tightly coupled.

And your precious content suffers from data lock-in: Want to switch to another product? Good luck migrating your data from one silo into the other without losing any of it (and spending too much time and money)! Few vendors care about your freedom to leave.

I don’t believe in a “central content repository” in the sense of one application which all other systems just read off and write to (that’s how I understand CaaS = Content as a Service). No single software is versatile enough to fulfill all other application’s needs. If we really want to share content (unstructured and structured) between applications without having to copy it, we need a layer that isn’t owned by any application, a shared content store. Think of it like a file system: The file system represents a layer that applications can build on top of, and (if they want to) share directories and files with other software.

Of course, content (media files and text) and metadata are an order of magnitude more complex than hierarchical folders and named files. I’m not sure a generally useful “content layer” can be built in such a way that software developers and vendors start adopting it. Maybe this is just a dream. But at least in part, that’s what the Semantic Web folks are trying to do with Linked Data: Sharing machine-readable data without having to copy it.

P.S.: You don’t want to boil the ocean? For fellow developers, maybe I can frame it differently: Why should the UI that displays search results care where the displayed content items are stored? (Google’s search engine certainly doesn’t.) The assumption that all your data lives in the same local (MySQL / Oracle / NoSQL) database is the enemy of a true service-oriented architecture. Split your code and data structures into self-contained, standalone services that can co-exist in a common database but can be moved out at the flip of a switch. Then open up these data structures to third party data, and try to get other software developers to make use of them. If you can replace one of your microservices with someone else’s better one (more mature, broadly adopted), do so. (We got rid of our USERS table and built on LDAP instead.) How about that?

Related posts: Web of information vs DAM, DM, CM, KM silosCloud software, local files: A hybrid DAM approachLinked Data for better image search on the Web.

Update (2017-01-31): The “headless CMS” fits my “shared content store” vision pretty well, and it finally starts entering the hype cycle – see Greg Luciano’s CMSWire piece What’s Next for Headless CMS in 2017?.

Wed, 10 Dec 2014 11:40:16 +0000
2014-12-04

Deborah Fanslow: Information Professionals: A Field Guide

Deborah Fanslow – Who Needs a DAM Librarian? Part II: Information Professionals: A Field Guide

“Information professional specimens often manifest the following dispositions: perpetual curiosity, creativity, technical fluency, a compulsive need to create order out of chaos, and an intense passion for connecting people with information.

[…] Originating around the turn of the 19th century (and known initially as the field of “documentation”), information science research was initially focused on scientific, technical, and medical information due to its base of practitioners within science and industry who were looking for ways to manage large amounts of data and resources.”

Wonderful in-depth article. Great to see the “documentation” roots included; my German university degree is “Diplom-Dokumentar (FH)” – and no-one understands what that means. Now I can point people to Deb’s explanation!

Thu, 04 Dec 2014 08:21:48 +0000
2014-12-02

Schema flexibility for power users

In software, the thing I’m most excited about at the moment is schema flexibility. (I first saw that term in a tweet by Emily Ann Kolvitz.) I think we’re losing a lot of valuable metadata, and business value, because the software we keep our structured data in makes it so hard to change the data model.

Example #1: Your system stores each customer’s e-mail address. Now you want to extend this to allow multiple addresses per customer, each with a label (“work e-mail”, “personal e-mail” etc.)

Example #2: Your archival system knows the publication date for each of your newspaper articles. Now you want to archive Web articles as well, but their publication date includes the time of day whereas print articles only have the day.

Example #3: Users can already add simple custom fields (say, “Photographer name”), but sometimes they really need to add custom structures and relations (i.e. a separate “Photographer” record with its own fields, and links to to these records).

Sounds simple? Well, you’ll need a developer and database administrator for all of the above. And it might be a lot of work for them.

Most structured data still lives in relational (SQL) databases. They’re wonderful, but they make it especially hard to change your data model. Demian Hess illustrates this in the first part of his excellent DAM and the Need for Flexible Metadata Models series: “As new asset types are discovered, you need to restructure the database by adding new tables or new columns. Database restructuring requires expensive and disruptive changes in queries and application-layer logic. […] The fundamental flaw is that we are attempting to define all the attributes for every type of digital asset in our data model in advance. In other words, we are imposing an inflexible data model.”

This rigidity is one reason for the current wave of NoSQL databases. There’s document databases like MongoDB, way more flexible but they “tend to suffer in supporting relationships between documents” (Demian Hess – DAM and Flexible Data Models Using Document Databases). Graph databases or RDF triple stores like BrightstarDB also fall into the NoSQL category. I don’t like their data model, but they do give you schema flexibility.

To be exact, these NoSQL products give your developers schema flexibility… In my opinion, the real game-changer is when power users can extend the data model. Of course this isn’t for everyone. But why can’t the librarian, a skilled user in marketing or sales, or your IT support staff enhance the database schema? And not just with a simplistic custom field, but any structure that makes sense? Having to wait for your developer (or worse, for a vendor) costs time and money, and kills many sensible ideas. Yes, developers may be needed to add polish or use the new data in integrations with other software. But power users should be able to model the data exactly as your business needs it.

This vision is why I’ve started to experiment with a user-friendly Topic Maps engine, TopicCards. It’s in a very early stage right now, but I’ll have something for you to play with sometime in 2015 :-)

P.S.: See what I mean in the Sourcefabric Superdesk description: “co-ordinated, managed and configured by journalists to suit their normal workflow — and for them to change that on the fly to cope with events needing a non-standard workflow.”

P.P.S.: Loosening your database schema has its disadvantages, of course. See Martin Fowler’s slide deck on Schemaless Data Structures. But I’m siding with one of his conclusions: “Custom fields and non-uniform types are both good reasons to use a schemaless approach.” 

Tue, 02 Dec 2014 20:17:23 +0000