Tim's Weblog
Tim Strehle’s links and thoughts on Web apps, software development and Digital Asset Management, since 2002.
2014-05-04

How I archive Web pages in the DAM (screencast)

Since July 2011, I’ve been archiving interesting Web pages in my personal instance of DC-X (the Digital Asset Management system our company is building). My archive contains 12,300 pages already and is growing daily.

I’m totally in love with this feature: It’s my “private file and library” (a quote from Vannevar Bush’s 1945 As We May Think) – a highly relevant, searchable pool of content I might want to revisit or read later. In an instant, I get back to that great or helpful article when I need it. It’s also a tool for curating the links I’m publishing here. And finally, a backup for the day when these articles vanish from the Web or the links to them break (sooner or later, this happens to most of them).

The alternatives don’t cut it for me: Browser bookmarks or Safari’s “reading list” don’t scale well to 10,000 pages, and have very limited search/browse functionality. Services like Delicious or Pinterest can’t be trusted with an archive (which I expect to last for decades). And software that does the archiving from a server process doesn’t see the page exactly as I’m seeing it, and fails at sites that require authentication.

I couldn’t build up this archive if the process wasn’t quick and easy (no metadata entry required). It requires a small Firefox add-on that I custom-built for myself (no customers are using this feature yet). The browser add-on takes a screenshot of the currently displayed page and posts it, along with the HTML source code, to the DAM in a new browser tab. The DC-X DAM asks me to log in (only once per day), creates an import job and waits for its completion. Then I’m redirected to the details page of the “archived Web page” document that was just created. Here’s a screencast:

How are you keeping track of important Web pages? What’s your personal digital archiving workflow?