Web of information vs DAM, DM, CM, KM silos

I have spent years of my life making our software work with other software, and I think we have a problem: The “enterprise” is managing overlapping information in disparate systems that don’t interoperate well. There’s lots of system flavors: DAM (interesting stuff like photos, videos, articles). DM (boring stuff like forms, business letters, emails). CM for publishing on the Web. KM holds expert’s contact info and instructions. CRM, employee directories, project management tools, file sharing, document collaboration… Each one with a different focus, but with overlapping data.

Now one system’s asset metadata can be another system’s core asset… Take the Contact Info fields from the IPTC Photo Metadata standard, for instance: When a photographer’s phone number changes, will you update it in your DAM system? How many places will you have to update it in the DAM – is it stored in a single place, or has it been copied into each photo? You’ll probably just update your address book and ignore the DAM. A DAM system simply isn’t a good tool for managing contact information. But it still makes sense for it to display it…

For a more complex example, here’s a typical scenario from our customers: A freelance journalist submits a newspaper article with a photo. It’ll be published in print and online, copied into the newspaper archive, and the journalist is going to get paid. Now when an editor sees that nice photo in her Web CMS and wants to reuse it, can she click on it to see 1) the name of the editor who placed it in the print edition, 2) the photo usage rights, 3) the amount paid to the journalist for the current use, and 4) the journalist’s phone number? No, she can’t. The data for 1) is stored in the print editorial system, 2) in the DAM (rights) and the DM system (contracts), 3) in the SAP accounting system, and 4) in the employee directory.

Of course, all of this can be made to work since each system has some sort of API. With one-off interoperability hacks, for which you need a programmer who’s familiar with the systems involved! Incompatible information silos are hurting the business and wasting a lot of developer time. This is a known problem, and the subject of two more acronyms: II = Information Integration, and MDM = Master Data Management. As a software developer, I see two possible solutions:

First, going back to a monolithic system that does everything at once is not a solution. Neither its user interface nor its backend implementation would be well-suited to the host of different tasks that users need software for.

But we could find a clever, generic way to link information from various systems together so that we can “surf” it in any direction. Linked data in the form of HTML+RDFa is a great way to do this, see my post Publish your data, don’t build APIs. (And Lars Marius Garshol on Semantic integration in practice.)

Or a much more complicated (but fascinating) solution: Product developers stop rolling their own databases and assume they’re going to operate on a shared datastore that is created and managed by someone else. Their software accesses it through a configurable data access layer. Imagine running WordPress and Drupal simultaneously on top of the same MySQL database, working on the same content! A shared datastore would allow for centralized business rules and permissions. But for practical reasons (performance!), this is likely not going to happen. (A baby step in the right direction: Use LDAP instead of creating your own users and groups database tables. We’ve done this and it works great.) 

In real life, information doesn’t stand alone – it lives inside a web of interlinked data. Until our systems can handle this reality, we’ve got to break it down, remodel and copy it for each siloed system. Let’s try to improve on that!

Update: See also Ralph Windsor – Digital Asset Management And The Politics Of Metadata Integration. Related blog post by me: Dreaming of a shared content store.

Tue, 25 Feb 2014 19:17:42 +0000

“DC” has moved out of the old chocolate factory

Today, our company Digital Collections has moved from the office in Hamburg, Wendenstr. 130 into another part of Hamburg, Hindenburgstr. 49.

Here’s some pictures I have taken in the old office during the last years. The new office is more modern and practical, but I’ll miss the lovely historical building, a former chocolate factory built in 1908. (That Web site – shouldn’t every building have one? – has much nicer photos of the location.)

Update: First pictures of the new offices.

Sat, 22 Feb 2014 21:29:45 +0000

A simple JavaScript component architecture (first draft)

Reusable user interface components for Web development are trending. I’ve been thinking about them for quite some time now (see JavaScript components from almost a year ago). Now that we’re building a new UI, we need to finally decide on something and start using it!

While we’ve done single page Web apps in the past, the UI we’re building now will start out as a more traditional Web site with distinct pages, plus some dynamic elements within the page. We want to keep things as simple as possible for version one, so we’ll render most of the HTML on the server (no need to generate HTML in JavaScript). Elements that are refreshed via Ajax will have to redraw themselves, but their HTML can be sent from the server for now. Still, the component architecture we’re choosing now must work for single page apps as well, so that we can switch if needed.

We’re doing a lot of customization in our projects. We want a set of configurable UI widgets that can be freely combined when building custom pages, and partners need to be able to add their own widgets. The UI will be based on the Bootstrap framework, and we want to be able to integrate widgets from libraries like jQuery UI.

The MVC (model / view /controller) approach seems to make sense; maybe as implemented in the separable model architecture from Java Swing components: A component can manage its own data, or be configured to share data with other components. Our UI components should be “loosely coupled”, exclusively communicating through events in order to avoid breakage if a component is missing or not initialized (and to make replacing components easier). The Twitter Flight framework has been a wonderful inspiration, make sure to read about it! We’ve extended their event approach a little bit: Events can collect and return responses using event.result in jQuery custom events (with promises/Deferred for asynchronous results).

There’s dozens of JavaScript frameworks out there, but we’re not too eager to rely on one of those. We improve and support our software products for years; it’s bad for us if a framework dies or changes its direction. That’s why we’ll probably go for frameworkless JavaScript – or rather, we’ll build our own small framework. (Relying on jQuery as the only hard dependency is hopefully okay.)

Small is important to us. The simpler, the better – we need to find a clever, powerful, extensible, future-proof architecture with minimal lines of code. (Good luck with that, I know.)

This demo page (JavaScript code on Github) contains the first draft of that mini-framework: Two embedded Google Maps that, when you drag them, send the address in the map center to a text input field. It won’t make sense for you if you aren’t a Web developer interested in JavaScript :-) But if you are, I’d love to hear your feedback!

Wed, 19 Feb 2014 13:54:11 +0000

Short links (2014-02-19)

Wed, 19 Feb 2014 22:35:35 +0000

DC-X in action: Public video archive for the German Federal Archives (Bundesarchiv), Transit Film

Last Friday, a project I’ve been involved with was officially launched: filmothek.bundesarchiv.de. It’s a Web site showing contemporary history videos from the German Federal Archives (Bundesarchiv), distributed by their partner Transit Film, implemented by our company Digital Collections (based on our Digital Asset Management system DC-X), with Web design by our partner Pier2Port.

Content is king – the most interesting thing about this is the amazing videos from post-WW2 Germany and beyond (most of them in German): The surrender of Nazi Germany (1945), John F. Kennedy in Germany (1963), the fall of the Berlin Wall (1989). I love that it’s original, unadulterated material. The videos can be viewed freely, no registration required. (Film producers can buy licenses, of course.)

At the moment, there’s about 2,300 videos in the archives. In the back end, there’s a standard DC-X installation that holds the video files (in MP4 and WebM format) and the video metadata. Most of the customization is in the importers and metadata schema. Our customers can edit metadata in the back end, which is then replicated to a front end server.

The front end, the actual Web site you’re seeing, is a fully customized HTML UI built on DC-X APIs and its Solr search integration. I had a lot of fun developing it… We intentionally kept the architecture simple, doing most of the work on the server side in PHP, with just a little bit of JavaScript where needed. This results in a search engine friendly site that also performs nicely on tablets (it’s not yet designed for smaller screens, though).

There’s lots more features to come, and even more videos will be made available – so make sure to come back to the site again in a few months!

Tue, 18 Feb 2014 08:22:55 +0000

Questions to ask before building a DAM importer

Our DC-X DAM systems often are the central content hub at large publishers, with lots of data flowing in from photographers, news agencies, editorial systems, Web CMSes. These provide data (articles, photos, graphics, ads, pages) in a host of different formats, which means we’re building “importers” all the time to ingest content into the DAM system.

As a developer, I’m often told to estimate how long building an importer will take. I can be sure that there’s some information missing, so here’s my checklist of things I need to know before I can give a rough estimate of the development time:

  • Is the data copied into a local “hotfolder” (DC-X default), or does the importer have to fetch it (via FTP, an RSS feed etc.)?
  • Which file format does the data come in (XML, HTML, CSV, JPEG, PDF, …)? Can it be in different formats?
  • Can you provide the data in a format that the DAM system already supports? (Then we’re done.)
  • How large are the files (typically, and maximum)? How many files are expected per hour/day/week?
  • Is there a naming convention for directories and files? What should the importer do if files don’t follow that convention?
  • Should metadata be read from the file and directory name? Which exactly?
  • If some data arrives as a set of multiple files (e.g., a PDF file with an accompanying XML file): When starting with one of the files, how can the importer find the other files in the set (naming convention, file name given in the XML etc.)? Will they arrive roughly at the same time? If the set is incomplete, how long should the importer wait for the missing files to arrive? Should it import anyway when files are missing, or report an error?
  • How about duplicate files coming in? Can they simply be rejected by the importer (DC-X default), or is there a need to update or replace data from previous imports? How can the importer detect duplicates? (DC-X default: A checksum on the file’s contents.)
  • Should preview images be rendered? (DC-X will do this by default.) Or are preview images provided? Any special requirements when rendering preview images (like adding a watermark)?
  • When rendering preview images from graphical file formats, is a colorspace or ICC profile conversion needed? (By default, DC-X will detect CMYK and create RGB previews.)
  • Should text be extracted from textual files (PDF, EPS, Word)? (DC-X will do this by default, details depending on file format specifics.)
  • Are there special requirements for reading file metadata (EXIF, IPTC, XMP etc.)? (DC-X reads and imports common metadata by default.)
  • Have you provided representative samples of the input files?
  • What exactly do your XML / CSV files contain? Have you provided a textual description? (It’s great if you’re using a standardized format, but please describe how exactly you’re using that format – most standards leave room for interpretation or extensions.) What metadata fields should the XML tags be mapped into on import?
  • Are the files linked in some way? How can the importer find out what links where, and must the files be imported in a certain order to be able to establish these links?
  • Does the new data fit in with the existing metadata schema, or will we have to define new fields? Any special expectations regarding searching the new data?

I’m sure this list is incomplete – please let me know what I’m missing!

Tue, 18 Feb 2014 14:05:00 +0000

Simpler DAM UI: Main navigation (3)

Here’s an update to our thoughts on the main navigation for our new, simpler DAM UI:

The filter column on the left has been removed in favor of Google-style dropdown lists between search box and results. This saves space, and I hope will encourage filter usage because they now appear where the user is actually looking.

The search section indicator (“Bilder” in the screenshot) to the left of the search box has a brighter background; in the old draft you couldn’t really see that it belonged to the search input field.

A nice detail is that the search box now expands once focused. To make space for the larger input field, the links to the right of it switch from icon + text to icon-only while the search box is expanded.

(Please excuse the German screenshot. I took it a few days ago and cannot produce an English one because my development environment is messed up and ugly right now – we’ve switched from frontend to backend experiments for a while.)

If you’re interested in this stuff, you should read the brand new FogBugz Visits the Head(er) Shrinker post by Adam Wishneusky. Looks a bit similar, and also has a search box that grows when you type in it!

Update: The Nielsen Norman Group says we shouldn’t hide the available search sections in a mega dropdown… Jennifer Cardello and Kathryn Whitenton – Killing Off the Global Navigation: One Trend to Avoid: “Even if the global navigation is difficult to design and hard to maintain, most sites will still be better off showing top-level categories to users right away. It's simply one of the most effective ways of helping users quickly understand what the site is about.”

Fri, 07 Feb 2014 14:56:59 +0000

Raph Koster: Self-promotion for game developers

Raph Koster – Self-promotion for game developers:

“If you do not take your field seriously enough to study it, and try to know everything about it, and try to add new knowledge and understanding to the field, then you probably shouldn’t be self-promoting.

[…] You will earn respect for being honest enough to admit mistakes. It will not harm your standing at all. […] You will learn more about those mistakes from writing about them, and that will make your own work better.

[…] Odds are very good that well over half your career will be “dark matter” — stuff that will not be seen by the public. So those parts that are seen matter more than you think.”

[…] Say “we” not “I.” Because it’s almost always the truth.

[…] Have your own website, and have a portfolio of some sort on it. Ideally, the website’s domain is your name. […] Slideshare and its widgets will be the detritus of history in fifteen years. Post/host copies of everything you can on your own site.

[…] Get comfortable with public speaking. Develop a sense of humor if you haven’t got one. Be very good at demoing. […] Your marketing dept will start asking for you because devs with these skills are rare and valuable.”

(Via Patrick Durusau.)

Fri, 07 Feb 2014 21:35:15 +0000

Short links (2014-02-05)

Wed, 05 Feb 2014 21:15:34 +0000

James Rourke: DAM for Beginners: User Interface & experience

James Rourke – DAM for Beginners: User Interface & experience:

“A note to vendors: don’t underestimate the value of how your system looks; you want to wow your client in a demo. A well-functioning system that looks dated or too technical might miss out to a less well-functioning system that looks nicer and easier to use.

[…] This technical UI can be used at the ‘back-end’ of a DAM system, where administrative functions and other complex actions are carried out, whilst the ‘front-end’ remains a user-friendly portal allowing for more basic actions. In this case only a limited number of well-trained, technically-aware users would operate on the ‘database’ UI.”

Exactly what we’re building right now: A friendlier, simpler UI for the casual user, complementing our complex, fully-featured UI.

Tue, 04 Feb 2014 10:51:01 +0000