Steve Pepper started a (lengthy) discussion in the LinkedIn Topic Maps Community group – If we were to redesign Topic Maps based on what we have learnt in the last decade, what would we do differently?
Since LinkedIn groups are closed, I’m posting my comment here as well, in response to proposals to replace occurrences (properties) with associations:
“In my opinion, the strength of topic maps is that they’re quite intuitive and easy to explain.
When I’m modeling data, I’m thinking of topics (or subjects/objects/things) with names, classes, identifiers and arbitrary, repeatable properties (often literal values). And then about relations between topics, with relation types and roles. SQL database design made some of this rather hard, so I was very happy when I discovered topic maps back in the day.
Reducing that model to “everything is an association” seems counter-intuitive to me. This reminds me of the RDF “everything is a triple” approach that is dumbing down data structures so much that it makes them harder to understand; see my blog. “Everything is a row in a database table” is on the same, not really helpful level. And as a programmer, I don’t look forward to such a change; simple property look-ups are faster and easier to implement when they don’t need to go through the whole association complexity.
I’d like datatype support in topic maps (literals that are annotated with a datatype of “datetime”, “nonNegativeInteger” etc.). And maybe associations could extend topics, i.e. inherit all topic functionality without optional reification?”
Mon, 08 Jul 2013 14:25:44 +0200
Ralph Windsor discusses my previous blog post on DAM News – Applying Linked Data Concepts To Derive A Global Image Search Protocol. He finds better words than I did, rephrasing my suggestion as “a universal protocol where images get described like web pages (HTML) so you can crawl them using search engine techniques”.
Ralph points out that large commercial image sellers might not want to participate in an open network: “Allowing their media out into the open for some third party to index – who they probably regard with wary suspicion (e.g. Google) is likely to be a step too far.” Maybe. Although they’ll go where the customers are – a Google Images search for “airport hamburg 92980935” turns up Getty Images image #92980935, so I assume that Getty Images wants Google to crawl their database. If an open image network emerges on the public Web, the commercial platforms will want to become a part of it once it reaches critical mass. What’s more, one of them could even embrace the change and start building the best image search engine that crawls the Web! (A bit like the Getty Images Flickr cooperation but without the need to copy the images over into their database.)
But “out in the open” is an important point: Many images (and other content types) will always be restricted to limited groups of users. Still, this is no reason to invent a complicated API for accessing them: In intranets, lots of non-public documents are available as HTML, allowing users and internal search engines to easily access them. You can do the same for image metadata – restrict access to the local network, require username and password (or API key, authorization token etc.) as you see fit, but serve it to authenticated search engines (and users) as HTML + RDFa anyway.
A Web of images (to paraphrase Mike Eisenberg) with rich metadata that’s easy to read for machines and humans? I have no idea whether we’ll actually get there in the near future, but that’s what we should aim for!
Wed, 05 Jun 2013 23:21:42 +0200
Richard Wallis – Putting Linked Data on the Map:
“Linked Data is just there – without the need for an API the raw data (described in RDF) is ‘just there to consume’. With only standard [http] web protocols, you can get the data for an entity in their dataset by just doing a http GET request on the identifier.
[…] So why is this often missed? Maybe it is because there is nothing to learn, no API documentation required, you can see and use it by just entering a URI into your web browser – too simple to be interesting perhaps.”
Mon, 27 May 2013 10:12:50 +0200
Before you start thinking about common metadata for your images (creator, date created, caption, license), first consider what I think is the most important piece of metadata: A unique identifier for your image. And please make it a URL. Why?
First, you want to avoid duplicates in search engine results. You’ll be using the same image on different Web pages, possibly with slight variations: Different sizes, file formats, or cropping. Which means that the URL to the image file is not the same. A unique identifier makes sure others can find out these are just renditions or variations of the same image. (Current image search engines often show lots of duplicates. If they don’t make use of our nice identifiers once we add them, we can always roll our own search engine… ☺ Yes, I’m serious.)
Second reason: A well-groomed image will have lots of metadata. Temporal, geographical, creator and licensor related, subject descriptions, licensing terms. You don’t want to add all this baggage to each Web page the image is used on, so you need a separate place to publish all the metadata for that image. And once you have it, it makes perfect sense use that place as the permanent home for your image and use its URL as the image’s unique identifier.
Suppose that you’re using that URL/identifier whenever you publish or distribute the image: You put it into your HTML, embed it into the image files, and make sure it doesn’t get lost if you register the image with a registry like PLUS or distribute it through third parties like Flickr or Getty Images. What have you just gained? Well, now you can remain the authoritative source of your image’s metadata! You can fix mistakes, add renditions or links or legal notes and change licensing terms at will because you’re in control of that URL. (Third parties probably won’t recognize your self-hosted metadata yet, but let’s move into that direction.)
To practice what I preach, I have added an RDFa resource attribute to the HTML div containing the blog post’s photo (you might want to view the HTML source code of the previous post). An example:
<div resource="http://www.strehle.de/tim/data/document/doc69wpi6bms01kix6470q" typeof="schema:ImageObject">
<img src="/device_strehle/dev1/2013/05-02/72/65/file69wpi6cfox11c7cgw70q.jpg" />
With this HTML markup, I’m also telling search engines that the referenced URL is about an image, using the schema.org ImageObject type. (I’m a newbie re schema.org and RDFa, suggestions for improvement are welcome!)
What if someone just downloads the image file, ignoring my lovingly-crafted HTML markup? I want them to see my URL as well. So I’m embedding it in the XMP-plus:ImageSupplierImageID metadata field of the JPEG file using ExifTool:
exiftool -XMP-plus:ImageSupplierImageID=http://www.strehle.de/tim/data/document/doc69wpi6bms01kix6470q IMG_1980.jpg
(This is just a first try, there’s probably other metadata fields I should write it to. I’m choosing this field for now because you can see and modify it in Photoshop: File / File Info… / IPTC Extension / Supplier’s Image ID.)
Note that the URL I’m pointing to doesn’t yet exist: I’ll create that page in the next step. For now, I have just added a unique identifier that looks like a URL (so the correct name is probably URI or IRI, can’t get used to that).
For reference, here’s a few other places that I don’t fully understand yet, but look like they should possibly also contain the URL/identifier if the image gets distributed in a suitable format:
EXIF ImageUniqueID. PLUS LDF Terms and Conditions URL / Licensor Image ID / Copyright Owner Image ID / Image Creator Image ID. ODRL Asset uid. schema.org url property. IPTC NewsML G2 newsItem guid attribute / web (Web address) element. PRISM url element. XMP xmp:Identifier / xmpRights:WebStatement / xmpMM:DocumentID. Dublin Core Metadata Element Set identifier.
(I’m sure there’s more. Yes, this makes my head explode as well. Please tell me that it’s much simpler than that.)
What do you think? I’d love to hear your feedback (@tistre on Twitter; for e-mail addresses see my home page).
Wed, 08 May 2013 07:44:37 +0200
Tom Heath, Christian Bizer – Linked Data: Evolving the Web into a Global Data Space:
“This book gives an overview of the principles of Linked Data as well as the Web of Data that has emerged through the application of these principles. The book discusses patterns for publishing Linked Data, describes deployed Linked Data applications and examines their architecture.”
The Web page contains the whole book, for free. I still dislike RDF triples, but there’s heaps of useful information.
I especially like this one:
“Linked Data commits itself to a specific pattern of using the HTTP protocol. This agreement allows data sources to be accessed using generic data browsers and enables the complete data space to be crawled by search engines. In contrast, Web APIs are accessed using different proprietary interfaces.”
Each big corporation’s information silo uses their own API. That’s crazy. If you don’t want to be open to the public, prevent access by requiring authentication. But don’t force developers to reimplement simple data access (search, read). I’m currently in favor of HTML with semantic markup (probably RDFa)…
Mon, 29 Apr 2013 10:14:06 +0200
I enjoy modeling data. As students, we were taught the relational data model (as used by SQL databases) and hierarchical database structures. But the real eye-opener was when our professor started modeling a supposedly simple example: an address book. Very soon, we ran into lots of questions with no easy answers: How are persons and addresses, companies, and other persons actually related? How about several persons sharing the same address? What about the temporal dimension, would you want to keep former addresses or employers? We learned what questions to ask, that there’s no silver bullet for the perfect data model, and how to choose a good compromise.
I did a lot of SQL database modeling, which was fun and powerful and easy to code against, but still relatively limited and complicated. (Think multi-valued fields and the need for separate tables for m:n relations.) So when I first read the Topic Maps specification (XTM 1.0 back in the day) and the TAO of Topic Maps article, I was thrilled. The data structures immediately made sense to me: Every thing can have names, types, properties, and identifiers. Then there’s relations between two or more things, where each thing can play a certain role. Metadata can have its own metadata, and scopes help qualify it. That’s all.
It took a few years before I could sneak a tiny Topic Map engine into our DAM software (see the blog posts). It still isn’t fully standard conformant but serves us very well: People started using it for simple lists of countries or keywords without even knowing anything about Topic Maps. (This works fine because almost every Topic Map feature is optional.) Some time later, they would notice how powerful and flexible it is: Whether hierarchical thesaurus structures, names in multiple languages, subsets of lists or custom metadata for a topic, it’s easy to think up and implement new stuff. And you don’t have to change database structures or throw away existing data.
When I learned about RDF, it totally didn’t “click” for me. Everything’s a triple? How is this better than “everything’s a row in a table”? Yes, I’m simplifying and probably not getting it – but I know that RDF doesn’t help me think. To me, it’s a low-level abstraction, too technical and too theoretical. There’s too many options for implementing basic use cases, which makes interoperability harder. Topic Maps provide me with a way to think about data structures that makes my work easier, that helps clarify my thinking and communicate it to others.
It’s a bit sad that Topic Maps have never been widely used or even known. In terms of adoption, RDF has certainly won (even though the Semantic Web is failing so far). And I love that RDFa allows embedding data structures into HTML: Now Web service APIs can be built in HTML, to be browsed by humans and still be machine readable (the ability to “view source” is a pillar of the Web). So I’ll go with keeping the data in a Topic Map, but will probably make it available through RDFa. (I hope these two can be made to play nicely together…)
Fri, 08 Feb 2013 09:02:42 +0100
Thu, 09 Aug 2012 09:32:50 +0200
Dianne Kennedy – Finally — an XML Markup Solution for Design-Based Publishers: Introducing the PRISM Source Vocabulary:
“Until the tablet-publishing tsunami hit, design-based publications were able to justify their labor-intensive design-based publication process.
[…] We have come to believe the Source is the Solution. We must capture and store platform-agnostic content as early as possible.
[…] Source content must be semantically rich enough to enable the publisher to select content and automate layout and delivery to a wide variety of publishing platform in platform-native formats.
[…] In order to refine what we mean by the generic term article, the PRISM Content Type Controlled Vocabulary has been developed. […] Some content types that describe the unit-of-storage include an advertisement, article, blog entry, book chapter, cover, masthead, introduction and navigational aid.
[…] The Where Used metadata block allows for usage tracking. […] PSV allows for tracking the platform and even the device where the content was published. PSV also allows for tracking the section or page of the publication where the content appeared. Altogether, PSV offers nearly 40 optional fields to describe where content was used.
[…] The Usage Rights metadata block provides optional metadata fields that can be used by publishers to track usage rights of content in a repository. The 15 optional metadata fields in this block are based on the PRISM Usage Rights Metadata Specification.
[…] Unlike EPUB3, PSV makes no extensions to HTML5 and has no restrictions. PSV recommends that the new HTML5 <article tag be used as the root element for any content unit.
[…] PSV recommends a number of PRISM semantic classes that you can use to qualify any HTML5 element. Examples include box, caption, dateline, credit, and pull quote.”
(Via Simon St. Laurent at O’Reilly Radar – Applying markup to complexity).
Thu, 09 Aug 2012 21:41:40 +0200
Evan Sandhaus at New York Times Open – rNews is here. And this is what it means.:
“On September 21, the IPTC and Schema.org officially announced their work together.
So by October 2011, we had a supported standard for embedding publishing specific metadata into HTML documents. Now all we had to do was actually implement rNews on nytimes.com.
And that’s what we did.
[…] all you have to do is view source on any nytimes.com article published on or after January, 23 2012. In the HTML you will see new attributes like ‘itemtype’, ‘itemprop’ and ‘itemid’. If you paste an article URL into the Google Rich Snippets tool, you can see a parse of the structured data now embedded into every nytimes.com article.”
Thu, 16 Feb 2012 22:32:55 +0100
Joel Spolsky – How Trello is different:
“The great horizontal killer applications are actually just fancy data structures.
Spreadsheets are not just tools for doing "what-if" analysis. They provide a specific data structure: a table. Most Excel users never enter a formula. […]
Word processors are not just tools for writing books, reports, and letters. They provide a specific data structure: lines of text which automatically wrap and split into pages.
PowerPoint is not just a tool for making boring meetings. It provides a specific data structure: an array of full-screen images.”
Mon, 09 Jan 2012 12:30:38 +0100
Stijn Debrouwere – Taxonomies don’t matter anymore:
“Automated recommendation engines are mainly useful as cute but non-essential pageview drivers and if your journalists are too lazy to add links.
[…] We don't come to topic pages for automatically aggregated sort-of-relevant content with no editorial guidance as to what's important and what's not. Sometimes, you just have to do things by hand, in prose.
[…] There is really no way to sidestep curation unless we don't care that we're annoying our users.
[…] Stepping away from mediocrity, for me, means putting power back in the hands of the newsroom. To make that happen, I'll be building prosthetics, not machines.”
Tue, 20 Dec 2011 22:26:35 +0100
W3C Candidate Recommendation Ontology for Media Resources 1.0 (July 2011):
“The intent of this vocabulary is to bridge the different descriptions of media resources, and provide a core set of descriptive properties. This document defines a core set of metadata properties for media resources, along with their mappings to elements from a set of existing metadata formats.”
Mapped metadata standards: CableLabs 1.1, DIG35, Dublin Core, EBUCore, EXIF 2.2, ID3, IPTC NewsML-G2, LOM 2.1, Media RSS, MPEG-7, OGG, QuickTime, DMS-1, TTML, TV-Anytime, TXFeed, XMP, YouTube
Example XML for most standards can be viewed in the testsuite.
(Via Johannes Schmidt.)
Wed, 23 Nov 2011 08:47:39 +0100
Tue, 20 Sep 2011 12:36:42 +0200
Stijn Debrouwere – Context is not a bolt-on:
"Topic pages, story trackers and Q&As fail because they’re never an integral part of a news website. They’re Google landing pages, designed to poach traffic from Wikipedia.
[…] What no newspapers, online or offline, seems to have perfected is how this broad, topical information stream should mesh with the daily news that’s presented on our front page.
If somebody clicks on a story and is dazzled by an array of unfamiliar names and places and events, how do we turn that experience around?"
Sat, 16 Apr 2011 00:00:05 +0200
Tony Russell-Rose – Interaction Models for Faceted Search:
"Note that the facet values examined in the two-stage examples above are disjunctive (multi-select OR), e.g. the selection of a value for a facet such Make & Model does not preclude the selection of another value from the same facet. In this case, selecting multiple independent facet values has the effect of widening the search. However, if the facet values are conjunctive (multi-select AND), then the choice of which interaction model to apply is quite different. […] In this case, the only meaningful interaction model is the instant update, as this is the only approach which will ensure that facet values and the current result set stay in sync."
(Via Patrick Durusau.)
Fri, 15 Apr 2011 23:41:07 +0200
A List Apart – Faceted Navigation:
"The distinction between faceted navigation and parametric search is relevant. In parametric search applications, users specify their search parameters up front using a variety of controls such as checkboxes, pull-downs, and sliders to construct what effectively is an advanced Boolean query. Unfortunately, it’s hard for users to set several parameters at once, especially since many combinations will produce zero results. […] It’s a solution that’s hard on people but soft on hardware. In other words, it’s an unfortunate compromise that sacrifices immediate response to reduce the server load."
Fri, 08 Apr 2011 14:11:46 +0200
Patrick Durusau – A Blogging Lesson For Topic Maps?:
"An emphasis on giving users an immediate sense of accomplishment, with results they can use immediately could lead to a different adoption curve for topic maps."
Fri, 08 Apr 2011 13:49:51 +0200
Jon Udell – Pub/sub networking for enterprise awareness:
"In theory everyone talks to everyone and everything gets taken care of. In practice, as we know, not so much. Interpersonal messaging alone can’t create a resilient and discoverable web of connections. That’s why interpersonal messaging must be embedded in a pub/sub network where messages flow person-to-person, person-to-topic, topic-to-person, and topic-to-topic."
Mon, 28 Mar 2011 12:21:40 +0200
Kim Schroeder – Fixing Metadata (or Let’s Do it Right the First Time):
"The majority of people do not understand the work that goes into providing quality. In our current era of fast and cheap; people have lost the quality aspect almost completely. When they can not successfully execute an accurate search in their database, then they call us to fix it. I am absolutely happy to do so, but make no mistake, I wish for that collection to have done it right the first time; rather than to have called us after hundreds of hours of wasted work."
Thu, 10 Mar 2011 21:31:45 +0100
Stijn Debrouwere – Tags don't cut it:
"We need to re-engineer tags so that they’ll allow us to represent the rich relationships between our content and the things that content talks about. If we do that, newspapers can infuse the news with necessary context that allows readers to see the broader picture. Quite literally, too: relationship-infused content can easily be enriched with maps and timelines, which goes way beyond what tags have to offer.
Tags have a deceiving simplicity that hides their complexity as a taxonomic concept. Relationships are closer to the way journalists think about their writing. Relationships are a direct answer to the question “what is this story about?” Because they’re more intuitive than tags, they’re actually harder to mess up.
If we re-imagine tags as rich connections that relate content to the persons, organizations, locations, events and themes they talk about, hopefully magic will happen."
Fri, 04 Mar 2011 09:40:46 +0100
Richard Padley – Integrating taxonomies with search:
"Alongside a set of search results a search engine can provide a series of drill down categories which allow the user to refine their query and cut down the result set until they find the information they need. If properly structured faceted taxonomies have been used to tag the search documents then the terms from these taxonomies can be used to provide the drill-down categories for the search engine."
(Via all things cataloged.)
Sun, 20 Feb 2011 21:48:56 +0100
Fran – Serendipity and large video collections:
"Serendipity is rarely of use to the asset manager, who wants to find exactly what they expect to find, but is a delight for the consumer or leisure searcher. People sometimes cite serendipity as a being a reason to abandon classification, but in my experience classification often enhances serendipity and can be lost in simple online search systems.
For example, when browsing an alphabetically ordered collection in print, such as an encyclopedia or dictionary, you just can’t help noticing the entries that sit next to the one you were looking for."
(Via Digital Asset Management.)
Wed, 16 Feb 2011 10:08:16 +0100
Stijn Debrouwere – Looking for a co-conspirator:
"Drupal and WordPress are perfectly fine for publishing to the web. What we want to build is a content hub for managing the gloriously messy editorial process. A content hub that loves structured data and semantic annotations. A launch pad for pushing content to any platform you can think of."
Thu, 16 Dec 2010 10:19:59 +0100
William Kent back in 1988 – The Many Forms of a Single Fact:
"There is an underlying fallacy, namely the assumption that a simple binary fact (relationship or attribute) always maps simply into a pair of fields. While that is the foundation of current data design methodologies, there exist a troublesome number of exceptions in practice."
(Via Johannes Schmidt.)
Wed, 03 Nov 2010 10:17:47 +0100
all things cataloged – Data, not records:
"Cataloging huge amounts of 19th century material, I often wonder: what if users had a link to the year of publication (e.g. from Wikipedia) that could provide some background information about what happened that year and could assist them in understanding the historical situation and the context a book fits into? Same for place of publication – which state was Sarajevo part of in 1894?"
Wed, 22 Sep 2010 09:59:33 +0200
Stijn Debrouwere – We’re in the information business:
"The goal is to make our content management system like a miniature world in a snowglobe. Not just a system that publishes text, but a system that talks like we do: it knows that an interview implies one or more interviewees.
[…] An issue is more than just a number: it has a date of publication, a cover image, a chief editor, it might revolve around a special theme, it has a circulation, it has one or more cover stories. Don’t think too soon that something is just a number or merely a line of text.
[…] We need domain-specific ways of indicating, err, marking up a text. We need to start creating our own little Markdown-like languages for journalism.
[…] A well-architected news website leads to content that will keep on providing value, rather than leaving stories to wither away when their immediate news value has faded. Structured content is the stuff that makes a website malleable."
(Via Jayson Lorenzen.)
Wed, 01 Sep 2010 13:05:48 +0200
Patrick Durusau with Sam Hunting: "Our goal was to create something as simple, if not simpler than HTML 3.2 to allow users to create and annotate identifiers for entities. The result was Pretty Good Semantics
Fri, 06 Aug 2010 09:34:37 +0200
Timothy M. O'Brien at O'Reilly Radar – Google Announces Support for Microformats and RDFa:
"On Tuesday, Google introduced a feature called Rich Snippets which provides users with a convenient summary of a search result at a glance. They have been experimenting with microformats and RDFa, and are officially introducing the feature and allowing more sites to participate. While the Google announcement makes it clear that this technology is being phased in over time making no guarantee that your site's RDFa or microformats will be parsed, Google has given us a glimpse of the future of indexing."
Wed, 13 May 2009 09:09:59 +0200
Dan McCreary at O'Reilly Broadcast – How Entity Extraction is Fueling the Semantic Web Fire:
"I have been very impressed at the scope and depth of some of the new OpenSource entity extraction tools as well as the robustness of commercial products. I thought I would discuss this since these technologies could start to move the semantic web (Web 3.0) up the hockey stick growth curve."
Wed, 25 Feb 2009 12:10:09 +0100
Simon St. Laurent – Web, meet Semantic Web:
"The key point of [Sam] Hunting's experience, which emphasized letting users do what they wanted to do, valid or not valid, was that "People really do care about tagging - they really do tag - when they get an immediate positive result." The key phrase there is "immediate positive result." Hunting showed examples of the kinds of features that users could add easily if they were willing to take the time to add some Topic Maps markup to their documents."
Thu, 14 Aug 2008 14:05:22 +0200
Josh Catone at ReadWriteWeb – New York Times API Coming:
"An API is a logical next step for newspapers. It will give developers access to their vast amounts of well-researched data, and allows the paper's brand to be spread easily across the web. More access to Times content and the ability to mash it up in new and interesting ways can only be a win for both readers and the paper.
[…] Says Aron Pilhofer, the paper's interactive news editor, the goal of an API is to "make the NYT programmable. Everything we produce should be organized data.""
Thu, 26 Jun 2008 12:13:50 +0200
Alex Iskold – Semantic Search: The Myth and Reality:
"Probably the most striking revelation about the semantic search space is User Interface. First, to go on the tangent, Powerset got it right by realizing that semantics needs to be surfaced in the UI. After a user searches Powerset, a contextual gadget, aware of the semantics of the results, helps the user complete the search experience."
Tue, 03 Jun 2008 22:29:37 +0200
Kurt Cagle – Drupal and The Future of News:
"The role of editor as arbiter and gate keeper is increasingly becoming automated because the taxonomy systems are becoming too complex for any one person to keep abreast of. However, this is also important because taxonomy is the new navigation, something which I believe Drupal does inordinately well. Most news sites have transcended the level where a human being can reasonably serve to build navigation, search engines face a problem of geometric expansion of content in the long term, and thus its likely that taxonomic navigation will be the dominant face of finding news moving forward.
Watch the space of stochastic taxonomic analyzers; I suspect it will be a significant growth industry in the comparatively near term. The irony of course is that in building the initial web, the metaphor most commonly used was that of the magazine, but as with any new technology, the metaphors that drove the initial adoption eventually fade away as the capabilities of the new technology shape the parameters of what can be done in that medium. Whether the existing news providers will in fact survive that transition remains to be seen."
Tue, 03 Jun 2008 22:24:27 +0200
Calais powered by Reuters - Frequently Asked Questions:
"From a user perspective it’s pretty simple: You hand the web service unstructured text (like news articles, blog postings, your term paper, etc) and it returns semantic metadata in RDF format. What’s happening in the background is a little more complicated.
Using natural language processing and machine learning techniques, the Calais web service looks inside your text and locates the entities (people, places, products, etc), facts (John Doe works for Acme Corp) and events (Jane Doe was appointed as a Board member of Acme Corp) in the text. Calais then processes the entities, facts and events extracted from the text and returns them to the caller in RDF format."
(via Slashdot - Semantic Web Getting Real)
Mon, 11 Feb 2008 10:12:01 +0100
Tim Berners-Lee - Giant Global Graph:
"In the long term vision, thinking in terms of the graph rather than the web is critical to us making best use of the mobile web, the zoo of wildy differing devices which will give us access to the system. Then, when I book a flight it is the flight that interests me. Not the flight page on the travel site, or the flight page on the airline site, but the URI (issued by the airlines) of the flight itself. That's what I will bookmark."
Fri, 23 Nov 2007 16:16:00 +0100
Jon Udell - Entity extraction everywhere:
"Gnosis [a Firefox extension] finds and highlights entities — that is, companies, people, products, and industry terms. Here’s an expanded view of the industry terms, products, and technologies it extracted.
I’d love to see this kind of entity extraction turn into a commodity service that we can wire into our existing email, blogging, social networking, and social bookmarking systems. Being able to easily express, in all those contexts, that twine refers to the company, or the product, not the strong kind of string, would be a huge win."
Fri, 26 Oct 2007 09:29:17 +0200
Tim O'Reilly at O'Reilly radar - Web2Summit: Radar Networks Unveils twine.com:
"Nova Spivack of Radar Networks plans to unveil the first application built on their semantic web platform, twine, a new kind of personal and group information manager. I've only seen a demo, and haven't had a chance to play with it hands-on or load in my own documents, but if it delivers what Nova promises, it could be revolutionary.
Underlying twine is Radar's semantic engine, trained to do what is called entity extraction from documents. Put in plain language, the semantic engine auto-tags each document, turning each entity into what looks like a web link as well as a tag in the sidebar. Type a note in twine, and it picks out all of the people, places, companies, books, and other types of information contained in the note, separating them out by type."
Fri, 19 Oct 2007 09:47:30 +0200
Scott Adams at The Dilbert Blog - Invent This Product:
"When the vacation is over, the scrapbook is 85% complete. You just have to check its assumptions and add/correct any descriptors you want.
You could run it as a slide show, with a little icon of a car traveling from location to location on the Google map, while the calendar date appears in the corner. When the icon reaches a destination from which there are photos, it displays them in a slide show. Optionally, the system could bring in pictures from other sources to beef up your scrapbook. For example, if you visited the Grand Canyon, it could bring in some stock pictures to round out your album. It could also capture a screen shot of the hotel or resort’s web site during the period you visited."
Tue, 28 Aug 2007 09:23:18 +0200
Rick Jelliffe at XML.com - The fall of the Desktop and the File and the rise of Topical Interfaces and Topical Documents:
"The rise of Topics represents a great challenge to operating system and desktop suite vendors. When we look at Windows, or Mac or Linux window managers, we see that they really interact with the user at the wrong level. They say that the topic the user is interested in is applications and files. But how many people nowadays start their computer interaction with a web browser pointed to Google? There are still people whose organizing topic of interest in their computer interaction is the file or application, of course, but they have been swamped by people who are interested in the topic."
Mon, 27 Aug 2007 22:15:41 +0200
Techquila - Thesaurii:
"There are two possible patterns for the representation of a thesaurus in a topic map [...]:
- Thesaurus Pattern 1: The Topic-Per-Term Pattern
- Thesaurus Pattern 2: The Topic-Per-Concept Pattern"
Wed, 04 Jul 2007 09:38:47 +0200
Cover Pages - Resource Description and Classification:
"Being a collection of references on matters of Subject Classification, Taxonomies, Ontologies, Indexing, Metadata, Metadata Registries, Controlled Vocabularies, Terminology, Thesauri, Business Semantics. A collection of references and survey based upon links and cribbings from various resources on the Internet."
Wed, 27 Jun 2007 14:32:57 +0200
Moritz Stefaner - Elastic lists:
"Elastic lists enhance traditional facet browsing approaches by
- visualizing relative proportions (weights) of metadata values by size
- visualizing unusualness of a metadata weight by brightness
- and animated filtering transitions."
(Via Ryan Eby.)
Mon, 04 Jun 2007 13:17:23 +0200
Jon Udell - Tagging is declarative programming for everybody:
"Among other things, tagging may become to ordinary folks what attributes are becoming to programmers: a language that doesn’t just describe things, but also invokes and coordinates behaviors."
Mon, 07 May 2007 11:00:54 +0200
Jon Udell - Like a moth to the Freebase flame:
"I created my first user-defined Freebase type. Because the system is so new, there are some quite fundamental things that (so far as I can see) haven’t yet been defined. I wanted to create entries for some of my personal projects, such as LibraryLookup and elmcity.info, so I created a type called Project and added the properties Goal and Collaborators. That enabled me to add entries for my two personal projects, describe their goals, and associate myself with them as a collaborator."
Tue, 27 Mar 2007 00:54:21 +0200
Tim O'Reilly - Freebase Will Prove Addictive:
"But once you understand a bit about what metaweb is doing, you realize just how remarkable it is. Metaweb has slurped in the contents of several of the web's freely accessible databases, including much of wikipedia, and song tracks from musicbrainz. It then turns its users loose on not just adding more data items but making connections between them by filling out meta tags that categorize or otherwise connect the data items, using a typology that can be extended by users, wiki-style."
Fri, 09 Mar 2007 12:28:11 +0100
Bob DuCharme - Introducing RDFa:
"For a long time now, RDF has shown great promise as a flexible format for storing, aggregating, and using metadata. Maybe for too long—its most well-known syntax, RDF/XML, is messy enough to have scared many people away from RDF. The W3C is developing a new, simpler syntax called RDFa (originally called "RDF/a") that is easy enough to create and to use in applications that it may win back a lot of the people who were first scared off by the verbosity, striping, container complications, and other complexity issues that made RDF/XML look so ugly."
Wed, 14 Feb 2007 23:58:49 +0100
The NeoSmart Files - The Need for Creating Tag Standards:
"Basically, it’s too late for a tagging standard that will be used unanimously throughout the web. A truly semantic web most certainly won’t ever exist because of the reluctance to change and the unwillingness to compromise and accept defeat. A semantic web requires objective analysis of methods and data, culminating in honestly evaluated options, and immediate acceptance of the outcome. But that’s never going to happen."
Tue, 16 Jan 2007 00:14:27 +0100
Alex Faaborg - Microformats - Part 3: Introducing Operator:
"Today Mozilla Labs is releasing Operator, a microformat detection extension developed by Michael Kaply at IBM. Operator demonstrates the usefulness of semantic information on the Web, in real world scenarios."
Wed, 03 Jan 2007 22:33:34 +0100
Wolfgang Bartelme - Microformats Icons:
"As Microformats have gained much popularity over the last year we thought it was time to standardize the way they are represented on a website. So we created the Microformats Icon Set. The starter set contains icons for hCal, hResume, hCard, XFN and a generic TAG icon."
Wed, 06 Dec 2006 23:52:04 +0100
Jon Udell at InfoWorld - We need a universal canvas that doesn't suck:
"While e-mail dissolves barriers to the exchange of data, we need another solvent to dissolve the barriers to collaborative use of that data. Applied in the right ways, that solvent creates what I like to call the “universal canvas” -- an environment in which data and applications flow freely on the Web.
Here’s the best definition of the universal canvas: “Most people would prefer a single, unified environment that adapts to whichever environment they are working in, moves transparently between local and remote services and applications, and is largely device-independent -- a kind of universal canvas for the Internet Age.”
You might expect to find that definition in a Google white paper from 2006. Ironically, it comes from a Microsoft white paper from 2000, announcing a “Next Generation Internet” initiative called .Net."
Wed, 29 Nov 2006 15:01:50 +0100
Jenn Riley - More structured metadata:
"I often encounter people who see my job title (Metadata Librarian) and assume I have an agenda to do away with human cataloging entirely and rely solely on full-text searching and uncontrolled metadata generated by authors and publishers. That’s simply not true; I have no such goal. I am interested in exploring new means of description, not for their own sake, but for the retrieval possibilities they suggest for our users.
[...] I’m a big fan of faceted browsing. The ability to move seamlessly through a system, adding and removing features such as language, date, geography, topic, instrumentation (hey, I’m a musician…), and the like based on what I’m currently seeing in a result set is something I believe our users will be demanding more and more. But we can’t do this if that information isn’t explicitly coded."
Mon, 13 Nov 2006 13:01:40 +0100
Simon St. Laurent at XML.com - The Next Web?:
"Developers who craft smart APIs on their servers for use by AJAX-based web pages can then expose those APIs to other developers, getting the benefits of better interfaces for users who use web browsers to consume the data and for users who have their own custom programs consuming the data. Depending on how carefully the developer models AJAX transactions on traditional web HTTP transactions, these services even look a lot like the REST approach proposed earlier for web services."
Wed, 08 Nov 2006 16:32:00 +0100
W. Eliot Kimber - Topic Maps, Knowledge, and OpenCyc:
"I think that topic maps are useful and attractive as far as they go: for the general business problem of managing metadata and associating it with data objects, it's well suited and well thought out.
Why do I think that topic maps (and anything similar, such as RDF) is not suitable for knowledge representation?
For the simple reason that knowledge representation is much more sophisticated and subtle than just topics with associations. "
Thu, 12 Oct 2006 00:06:12 +0200
Erik Hatcher - Lucene Summit:
"I really found the Collex interface concept to be very interesting. Everything is a contraint or limit and you can easily add or invert the contraint. It’s also easy to add things to a personal collection and parts of the personal collection then become facets/contraints themselves. He’s really using all of the metadata (archive and user) to it’s full extent. He also has more plans including “exhibits” where people can “curate collections”. These collections themselves can then become objects in the index and so on. "
Fri, 22 Sep 2006 16:23:00 +0200
Jon Udell - Del.icio.us is a database:
"Although it's intuitively obvious to me, I suspect that most people don't yet appreciate how easily, and powerfully, tagging systems can work as databases for personal (yet shareable) information management.
Del.icio.us isn't simply backed by a database, it can function as a database to which you add (a lot of) queryable columns.
[...] It strikes me that there's a sweet spot somewhere between this shoestring approach and the likes of Dabble DB, an application that offers powerful web-based data management. Consider how dBase and later Access were overkill for most people's recipe lists and address books, and how 1-2-3 and Excel wound up meeting the need instead. Tag systems might turn out to be the spreadsheets of modern information management."
Wed, 23 Aug 2006 00:14:57 +0200
"Semantic MediaWiki introduces some additional markup into the wiki-text which allows users to add "semantic annotations" to the wiki. While this first appears to make things more complex, it can also greatly simplify the structure of the wiki, help users to find more information in less time, and improve the overall quality and consistency of the wiki. To illustrate this, we provide some examples from the daily business of Wikipedia:
[...] Inflationary use of categories. The need for better structuring becomes apparent by the enormous use of categories in Wikipedia. While this is generally helpful, it has also lead to a number of categories that would be mere query results in SMW. For some examples consider the categories Rivers in Buckinghamshire, Asteroids named for people, and 1620s deaths, all of which could easily be replaced by simple queries that use just a handful of annotations. Indeed, in this example Category:Rivers, Relation:located in, Category:Asteroids, Category:People, Relation:named after, and Attribute:date of death would suffice to create thousands of similar listings on the fly, and to remove hundreds of Wikipedia categories."
Thu, 17 Aug 2006 16:47:53 +0200
Robert Cooper - Why I Hate Microformats:
"Yay, you have an iCal microformat in your page. You can use Trails, now to stick it right into your Google calendar. Neat.
The problem is, this is a serious abuse of HTML. The way you SHOULD have done this is:
Then present your iCal entry with CSS. Yes, we have waited years and years and years for Microsoft to get off their rears and implement CSS with namespaces, which everyone else has had for years. However, IE7 is around the proverbial corner, and we should finally get the option to embed actual real data into our HTML pages and style it. There is no reason to use semantically incorrect HTML and beat up on the class attribute."
Tue, 18 Jul 2006 12:40:38 +0200
"SIMILE is focused on developing robust, open source tools based on Semantic Web technologies that improve access, management and reuse among digital assets."
Wed, 05 Jul 2006 22:12:52 +0200
Dan Zambonini at XML.com - The 7 (f)laws of the Semantic Web:
"Creating metadata and classifications is difficult (let’s not get started on Ontologies). People are biased (whether they mean to be or not), and fallible. Metadata, which the Semantic Web relies on, is not always going to be of great quality.
[...] My clients don’t want to create ontologies. They don’t want to map one set of data to another. They want to use something that’s out there and ready for them to use, and will give them the maximum benefit (so if the Imperial War Museum say that they have a tank from “World War One” and the Science Museum has a video of the firing mechanism from a gun from “World War One”, they can both use the same term/URI)."
Fri, 09 Jun 2006 23:03:47 +0200
"PHPTMAPI implements a PHP API for manipulating topic maps, based on the TMAPI project."
Tue, 30 May 2006 14:28:06 +0200
"Topincs is a Topic Map authoring tool, that allows groups to create Topic Maps in Firefox. Even though it is run in an ordinary browser window it feels like an application installed on your computer. [...] It consists of a client, for editing maps and a server, for storing them. [...] The Server requires Apache 2, PHP 4 and MySQL."
Sun, 14 May 2006 22:19:28 +0200
Jon Udell at InfoWorld - Accessing the web of databases:
"I’ve always regarded the Web as a programmable data source as well as a platform for the document/software hybrid that we call a Web page. Early on, programmable access to Web data entailed a lot of screen scraping. Nowadays it often still does, but it’s becoming common to find APIs that serve up the Web’s data.
[...] Free text search is an even more popular access API. Nearly every site provides that service, or outsources it to Google or another engine.
[...] What you can’t typically do, though, is create mashups by running ad hoc queries against remote Web data. There are good reasons to think that it’s just crazy to export open-ended query interfaces over the Web. No responsible enterprise DBA would permit such access to the crown jewels. But there are all kinds of data sources -- or what Idehen likes to call data spaces -- and a range of feasible and appropriate access modes."
Thu, 04 May 2006 17:05:43 +0200
Jon Udell at InfoWorld - Reinventing the intranet:
"Inside the enterprise, teams, tasks, products, and services define metadata vocabularies that the Internet search giants would kill for. Exploiting those vocabularies to deliver search results that are better than what’s available on the open Web is low-hanging fruit. As we roll out SOAs that route well-formed messages through a fabric of intermediaries, it’ll get even easier."
Wed, 05 Apr 2006 14:51:00 +0200
"Onlife is an application for the Mac OS X that observes your every interaction with apps such as Safari, Mail and iChat and then creates a personal shoebox of all the web pages you visit, emails you read, documents you write and much more. Onlife then indexes the contents of your shoebox, makes it searchable and displays all the interactions between you and your favorite apps over time."
Tue, 04 Apr 2006 23:41:43 +0200
W3C - Image Annotation on the Semantic Web:
"The goals of this document are (i) to explain what the advantages are of using Semantic Web languages and technologies for the creation, storage, manipulation, interchange and processing of image metadata, and (ii) to provide guidelines for doing so. The document gives a number of use cases that illustrate ways to exploit Semantic Web technologies for image annotation, an overview of RDF and OWL vocabularies developed for this task and an overview of relevant tools."
Sun, 26 Mar 2006 23:29:48 +0200
"IBM today announced a company-wide initiative that combines its software and industry consulting expertise to help clients better compete in the global economy through uninhibited access to accurate, reliable and trustworthy business information.
[...] Additionally, IBM is announcing six new solution portfolios and new software products to help clients transform their businesses from an outdated model in which data is managed as an afterthought from within applications, to an environment in which information is set free and managed as a strategic asset and to drive better decision making."
Fri, 17 Feb 2006 16:57:20 +0100
Semapedia.org - The Physical Wikipedia: "Our goal is to connect the virtual and physical world by bringing the best information from the internet to the relevant place in physical space. We do this by combining the physical annotation technology of Semacode with high quality information from Wikipedia."
Fri, 20 Jan 2006 00:18:49 +0100
Karl Vogel at ONLamp.com - Organizing Files:
"The problem: the filesystem on my Unix workstation was a mess. I couldn't find anything without grepping all over creation. About half the time, I'd actually find something useful. Usually I'd get no hits at all, or I'd match something like a compiled binary and end up hosing my display beyond belief.
[...] I went so far as to buy a copy of the Abridged Dewey Decimal Catalog, which is actually pretty nifty; if you're looking to organize your paper files, you could do a lot worse than use an existing classification scheme like this.
[...] My job as a system administrator doesn't change every day, but it's much easier to keep track of things via date rather than via subject. I tend to remember things in time order, so I finally stopped trying to change the way I work to fit some hierarchy. Instead, I made a directory structure on the machine to match my work habits."
Sat, 14 Jan 2006 23:20:31 +0100
“SKOS Core provides a model for expressing the basic structure and content of concept schemes such as thesauri, classification schemes, subject heading lists, taxonomies, ‘folksonomies’, other types of controlled vocabulary, and also concept schemes embedded in glossaries and terminologies.
The SKOS Core Vocabulary is an application of the Resource Description Framework (RDF), that can be used to express a concept scheme as an RDF graph.”
Wed, 30 Nov 2005 15:13:00 +0100
Simon Willison - Google Base is interesting:
“Base is a very interesting product for a whole bunch of reasons. The data model is surprisingly simple on the surface: all items have a title, description, (optional) external URL, a “type” and a set of labels (a.k.a. tags) and “attributes". Attributes are something for tag enthusiasts to get excited by - they’re name/value pairs that are kind of like tags in that you can apply them to anything, but more structured and with a greater level of implied meaning.
[…] There’s definitely a trend towards this kind of loose data model at the moment. JotSpot allows all pages within a wiki to have as many extra name/value attribute pairs as you like (even the wiki body itself is internally implemented as a special attribute), and Ning works along similar lines.”
Thu, 17 Nov 2005 13:19:00 +0100
“Dabble combines the best of group spreadsheets, custom databases, and intranet web applications into a new way to manage and share your information online.”
Wed, 16 Nov 2005 23:38:00 +0100
“Google Base is a place where you can easily submit all types of online and offline content that we’ll host and make searchable online. You can describe any item you post with attributes, which will help people find it when they search Google Base. In fact, based on the relevance of your items, they may also be included in the main Google search index and other Google products like Froogle, Google Base and Google Local.”
Wed, 16 Nov 2005 11:29:00 +0100
“Wikidata is a proposed wiki-like database for various types of content. This project as proposed here requires significant changes to the software (or possibly a completely new software) but has the potential to centrally store and manage data from all Wikimedia projects, and to radically expand the range of content that can be built using wiki principles.”
Mon, 07 Nov 2005 22:30:00 +0100
Adam Bosworth at ACM Queue - Learning from THE WEB:
“Successful systems on the Web are bottom-up. They don’t mandate much in a top-down way. Instead, they control themselves through tipping points. For example, Flickr doesn’t tell its users what tags to use for photos. Far from it. Any user can tag any photo with anything (well, I don’t think you can use spaces). But, and this is a key but, Flickr does provide feedback about the most popular tags, and people seeking attention for their photos, or photos that they like, quickly learn to use that lexicon if it makes sense. It turns out to be amazingly stable.
[…] It is time that the database vendors stepped up to the plate and started to support a native RSS 2.0/Atom protocol and wire format; a simple way to ask very general queries; a way to model data that encompasses trees and arbitrary graphs in ways that humans think about them; far more fluid schemas that don’t require complex joins to model variations on a theme about anything from products to people to places; and built-in linear scaling so that the database salespeople can tell their customers, in good conscience, for this class of queries you can scale arbitrarily with regard to throughput and extremely well even with regard to latency, as long as you limit yourself to the following types of queries. Then we will know that the database vendors have joined the 21st century.”
Tue, 01 Nov 2005 22:10:00 +0100
Jon Udell at InfoWorld - Managing metadata:
"Everyone knows the common definition: Metadata is data about data, a secondary thing that's separate in some way from the primary thing to which it refers. But that definition begs a series of questions. Is metadata something we derive from data, or assign to it? Does it classify things, or enable us to search for things, or govern the behavior of things? If data that is described by metadata also, in turn, refers to other data, does it then qualify as both data and metadata?
These questions can verge on the philosophical, but by working through some examples, we can define various types of metadata, list the benefits that we expect from using it, and identify the challenges associated with maintaining it. Programs, documents, messages, files, Web resources, and Web services are some of the IT constructs often described by metadata. Let's review the roles that metadata can play in these different scenarios."
Thu, 27 Oct 2005 16:40:00 +0200
David Weinberger at Wired - Point. Shoot. Kiss It Good-Bye.:
"As you pass the locked entrances to rooms - caverns, actually - that encompass entire patent-application warehouses and film libraries, you feel like you're navigating through the brain of a slumbering giant. And there, in one of its farthest recesses,is where the beast stores the 11 million photographs that constitute the Bettmann Archive, perhaps the best-known collection of photos in the world.
Although the photos are kept in one room, their sheer quantity means that locating any one of them requires an elaborate ritual. Suppose you want to find an image of President Coolidge talking with Native Americans. First, researcher Robinya Roberts looks up "Coolidge" in a central card catalog that looks like it's been transplanted from your local library to the Bat Cave. Yellowed and worn, the 3-by-5 cards contain surprisingly little information: only a caption, a brief description, and a reference number.
[…] This process of manual metadata tagging, subjective and labor-intensive, may work for Corbis, but it's a lot to ask of the rest of us. Even when software developers try to make it easy, it's not easy enough. For instance, Adobe Photoshop Album offers a similar type of drag-and-drop labeling. Right now, you have to enter keywords manually; presumably someday you'll be able to upload the names of people, places, and events from your address book and calendar so at least you can drag and drop familiar names. Still, mere mortals don't have a 60,000-term online taxonomy or twin screens. More to the point, we don't want to hire Nick Fraser to do the job."
Thu, 27 Oct 2005 15:03:00 +0200
Jon Udell at InfoWorld - WinFS and social information management:
“I saw my first demo of Microsoft’s Cairo OFS (Object File System) back in 1993. It was briefly unveiled at the Professional Developers Conference that year, and then shelved. This week I installed the beta version of its successor, WinFS.”
Thu, 08 Sep 2005 16:27:00 +0200
Peter Van Dijck at XML.com - Introduction to XFML:
"XFML is a simple XML format for exchanging metadata in the form of faceted hierarchies, sometimes called taxonomies. Its basic building blocks are topics, also called categories. XFML won't solve all your metadata needs. It's focused on interchanging faceted classification and indexing data."
Tue, 31 May 2005 14:24:13 +0200
At ONLamp.com, Daniel H. Steinberg summarizes Adam Bosworth's keynote at the MySQL Users Conference 2005:
"Adam Bosworth suggested that we "do for information what HTTP did for user interface." [...] As a result of a simple, sloppy, standards-based, scalable platform, we have information at our fingertips from Google, Amazon, eBay, and Salesforce. Bosworth's own company, Google, gets hundreds of millions of hard queries a day. He said they see it as putting Ph.Ds in tanks to drive through walls rather than around them.
In addition to the advantages in software, there have been great gains in hardware. Bosworth said that one million dollars buys you five hundred machines with 2TB of in-memory data, a PetaByte of on-disk data, and a reasonable throughput of fifty thousand requests per second. This amounts to one billion requests per day. Having this sort of power changes the way you think."
Sat, 23 Apr 2005 21:45:31 +0200
* a way of thinking about data * design principles for formats * adapted to current behaviors and usage patterns * highly correlated with semantic xhtml, AKA the real world semantics, AKA lowercase semantic web, AKA lossless XHTML"
Take a look at the hCalendar example.
Fri, 15 Apr 2005 10:55:18 +0200
Robert Kaye - High order bits and Ontologies:
"Then later in the afternoon, Clay Shirky talked about the difference between ontologies and folksonomies in his "Ontology is Overrated: Links, Tags, and Post-hoc Metadata". With his usual flair Clay delivered a great overview of classic ontologies and all the issues that limit their usefulness on the Internet. [...]
Clay went on to outline the conditions under which classical ontologies can thrive:
* Domain: small corpus, formal categories, stable entities, restricted entities, clear edges * Participants: Coordinated users, expert users, expert catalogers, authoritative sources
In a nutshell, ontologies work best in small and controlled environments where experts are using the system. Unfortunately, the Internet is the the exact opposite of all of these. And thus, argues Clay, ontologies are not suited for the Internet. Fortunately, the Internet has brought us a solution to all these problems in the form of Folksonomies."
Thu, 17 Mar 2005 23:26:28 +0100
Nikita Ogievetsky's (Cogitech, Inc.) and Terry Badger's (Eastman Kodak Company) XML Europe 2003 presentation on Topic Map Solutions for Kodak Digital Camera Accessories:
"This presentation shows how Topic Map based solutions are used to build, organize and maintain Kodak digital cameras accessories web site. The chosen approach did not require software investment. Excel, an available and familiar spreadsheet software was used as an affordable and easy to use Topic Map GUI editor and repository. [...] All processing is done with XSLT scripts."
Thu, 10 Feb 2005 11:11:09 +0100
Graham Moore, Kal Ahmed: "Topic Map Relational Query Language [PDF] (TMRQL) has been designed in order to provide a sound foundation for querying topics maps. To this end it does not define an entire new language but instead presents a core set of abstract relational views. The relational model provides a firm foundation for the development of a topic map query language.
Development in this direction would lead to a more accessible and usable language by a greater number of developers than a new and bespoke language. Developers would be familiar with the concepts and their existing tools would work with the data structures returned. To them, the topic map data model would appear as just another schema or view. In order that the TMRQL language is not bound to a single implementation schema, nor even, bound to a relational database implementation we define a set of Relational Views that provide an abstract relational model of the topic map data model. This abstract data structure is independent of any particular implementation yet provides a foundation to use the full power of the SQL language and helps with portability of TMRQL queries."
Tue, 01 Feb 2005 15:11:49 +0100
Vannevar Bush's legendary essay from 1945, As We May Think:
"Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and to coin one at random, "memex'' will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.
Vannevar Bush's legendary essay from 1945, As We May Think:
"Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and to coin one at random, "memex'' will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.
[...] It affords an immediate step, however, to associative indexing, the basic idea of which is a provision whereby any item may be caused at will to select immediately and automatically another. This is the essential feature of the memex. The process of tying two items together is the important thing.
When the user is building a trail, he names it, inserts the name in his code book, and taps it out on his keyboard. [...]
Thereafter, at any time, when one of these items is in view, the other can be instantly recalled merely by tapping a button below the corresponding code space. Moreover, when numerous items have been thus joined together to form a trail, they can be reviewed in turn, rapidly or slowly, by deflecting a lever like that used for turning the pages of a book. It is exactly as though the physical items had been gathered together to form a new book. It is more than this, for any item can be joined into numerous trails."
It affords an immediate step, however, to associative indexing, the basic idea of which is a provision whereby any item may be caused at will to select immediately and automatically another. This is the essential feature of the memex. The process of tying two items together is the important thing.
When the user is building a trail, he names it, inserts the name in his code book, and taps it out on his keyboard.
Fri, 14 Jan 2005 15:26:32 +0100
By Eric Freeman and David Gelernter back in 1997: "Lifestreams is built on a simple storage metaphor --- a time-ordered stream of documents combined with several powerful operators --- that replaces many conventional computer constructs (such as named files, directories, and explicit storage) and in the process provides a unified framework that subsumes many separate desktop applications to accomplish and handle personal communication, scheduling, and search and retrieval tasks. While our current prototype is tailored to managing personal information, a "lifestream" is also a natural framework for managing enterprise information and web sites; we are just beginning to explore such use."
Fri, 14 Jan 2005 15:16:25 +0100
"TMCore05 allows developers to take full advantage of the power of topic maps in their applications. The engine provides a robust store for multiple topic maps; an extensive API accessible via any language supported by the Microsoft CLR; and a high-level web services interface that allows both reading and updating of topic maps using SOAP-based web service calls.
The engine makes use of Microsoft SQLServer 2000 to provide scalable, persistent storage and is designed to allow multiple instances to access the same data store simultaneously using an optimistic locking strategy to minimize development overhead."
Mon, 10 Jan 2005 17:28:53 +0100
Sriram Krishnan - Tyranny of the geeks:
"Nowadays, it is the 'in'-thing to be CSS-aware. If you're dumb enough to use a table tag, you're branded as a clueless moron. However, no one really tells you why table tags are bad. In fact, the equivalent CSS for generating something like your standard sign-up form is downright scary. And with every browser (Opera, Firfox, IE) having a different idea on what 'right' CSS is, you're much safer with table tags. For those using CSS and use divs and floats to build their tables, I ask them why. Why do something that is so un-intuitive? I could teach a kid about rows and columsn.
[...] A year ago, I read up a lot on the Semantic Web and RDF. I have to admit that I didn't understand any of it. Any of it. Ontologies, RDF, OWL, what not. However, you see blogs and enclosures getting the same effect with only a fraction of the complexity. I dont need smart agents to find what I want - I just search in Google and it is usally smart enough to give me what I need. I dont have high hopes for the semantic web unless they simplify and do it real soon."
Mon, 06 Dec 2004 12:45:32 +0100
Adam Bosworth - ISCOC04 Talk:
"That software which is flexible, simple, sloppy, tolerant, and altogether forgiving of human foibles and weaknesses turns out to be actually the most steel cored, able to survive and grow while that software which is demanding, abstract, rich but systematized, turns out to collapse in on itself in a slow and grim implosion.
[...] What is more, in one of the unintended ironies of software history, HTML was intended to be used as a way to provide a truly malleable plastic layout language which never would be bound by 2 dimensional limitations, ironic because hordes of CSS fanatics have been trying to bind it with straight jackets ever since, bad mouthing tables and generations of tools have been layering pixel precise 2 dimensional layout on top of it. And yet, ask any gifted web author, like Jon Udell, and they will tell you that they often use it in the lazy sloppy intuitive human way that it was designed to work. They just pour in content. In 1996 I was at some of the initial XML meetings. The participants' anger at HTML for "corrupting" content with layout was intense. Some of the initial backers of XML were frustrated SGML folks who wanted a better cleaner world in which data was pristinely separated from presentation. In short, they disliked one of the great success stories of software history, one that succeeded because of its limitations, not despite them. I very much doubt that an HTML that had initially shipped as a clean layered set of content (XML, Layout rules - XSLT, and Formatting- CSS) would have had anything like the explosive uptake.
Now as it turns out I backed XML back in 1996, but as it turns out, I backed it for exactly the opposite reason. I wanted a flexible relaxed sloppy human way to share data between programs and compared to the RPC's and DCOM's and IIOP's of that day, XML was an incredibly flexible plastic easy going medium. It still is. And because it is, not despite it, it has rapidly become the most widely used way to exchange data between programs in the world. And slowly, but surely, we have seen the other older systems, collapse, crumple, and descend towards irrelevance.
Consider programming itself. There is an unacknowledged war that goes on every day in the world of programming. It is a war between the humans and the computer scientists. It is a war between those who want simple, sloppy, flexible, human ways to write code and those who want clean, crisp, clear, correct ways to write code. It is the war between PHP and C /Java. It used to be the war between C and dBase. Programmers at the level of those who attend Columbia University, programmers at the level of those who have made it through the gauntlet that is Google recruiting, programmers at the level of this audience are all people who love precise tools, abstraction, serried ranks of orderly propositions, and deduction. But most people writing code are more like my son. Code is just a hammer they use to do the job. PHP is an ideal language for them. It is easy. It is productive. It is flexible. Associative arrays are the backbone of this language and, like XML, is therefore flexible and self describing. They can easily write code which dynamically adapts to the information passed in and easily produces XML or HTML.
[...] I remember listening many years ago to someone saying contemptuously that HTML would never succeed because it was so primitive. It succeeded, of course, precisely because it was so primitive. Today, I listen to the same people at the same companies say that XML over HTTP can never succeed because it is so primitive. Only with SOAP and SCHEMA and so on can it succeed. But the real magic in XML is that it is self-describing. The RDF guys never got this because they were looking for something that has never been delivered, namely universal truth."
Mon, 22 Nov 2004 14:56:47 +0100
Jamie Zawinski outlines a hypothetical program - vast volumes of email:
"There are other interesting data-visualization possibilities here as well; since really what we have is nodes and connections between them, tools like graphers and histogram charts might be applicable as well, to answer questions like
* show me a graph of the age-distribution of my unanswered mail, or,
* show me a graph of people who are known to have directly exchanged mail with each other so that I can see the "clumping'' of my correspondents."
Thu, 14 Oct 2004 14:44:48 +0200
David Sklar - Isaac Newton, sha1, and the Semantic Web:
"Which made me think: is the Semantic Web the 21st century equivalent of Diderot's Encyclopédie? What lessons have we learned (or not) from previous generations' attempts to taxonomify (and neologize? :) all information?"
Sun, 10 Oct 2004 07:50:27 +0200
Edd Dumbill - Drowned out by keywords:
"So here's a case for the semantic web. It's stupidly difficult to search for news of my hometown.
I live in the beautiful city of York, UK. In most search oriented applications I cannot search for my city. Why?
Because "New York" always matches a search for "York", too."
Thu, 30 Sep 2004 22:53:41 +0200
Interesting hypertext/hypermedia look-back by Randall H. Trigg, Xerox Palo Alto Research Center - Hypermedia as Integration: Recollections, Reflections and Exhortations:
"I had decided that hypertext links needed "types" (really "labels") that could distinguish in what way the link was serving either as a traversible connection, a structuring means, or an argument representation."
"The great thing about the digital library craze is how much we're learning from librarians, not just how much we can teach them about technology."
Mon, 06 Sep 2004 13:00:35 +0200
DiamondWiki's Faceted Navigation:
"Faceted navigation lets people browse a website by using FacetedClassification to automatically generate relevant hyperlinks. If you want to see an example of faceted navigation in action, go to the BrowseFacets page and start clicking, paying attention to the categories on the left-hand side. Notice how pages can have both an "Author" and a "Subject", and you can navigate by either one. This may seem obvious to you, but the point is that pages are not restricted to a single position in one hierarchy -- this is what faceted classification is all about. It's nothing earth-shattering.
An essential part of FacetedNavigation is that the interface lets you view items that are in more than one category. In other words you can intersect two sets of items. So for example, you can view "items about diamond wiki that are authored by kim burchett", instead of being restricted to viewing "items about diamond wiki" or "items authored by kim burchett". Most hierarchical categorization systems only let you view one hierarchy at a time."
Tue, 31 Aug 2004 00:50:01 +0200
Michael Denny - Ontology Tools Survey, Revisited:
"Reference to taxonomies and ontologies by vendors of mainstream enterprise-application-integration (EAI) solutions are becoming commonplace. Popularly tagged as semantic integration, vendors like Verity, Modulant, Unicorn, Semagix, and many more are offering platforms to interchange information among mutually heterogeneous resources including legacy databases, semi-structured repositories, industry-standard directories and vocabularies like ebXML, and streams of unstructured content as text and media. Ontologies, for example, are being used to guide the extraction of semantic content from collections of plain-text documents describing medical research, consumer products, and business topics."
Fri, 16 Jul 2004 01:00:04 +0200
Jon Udell - The Google PC generation:
"Job No. 1 for the Google PC would be to vacuum up all available sources of data. Job No. 2 would be to exploit that data to the hilt.
On the Google PC, you wouldn’t need third-party add-ons to index and search your local files, e-mail, and instant messages. It would just happen. The voracious spider wouldn’t stop there, though. The next piece of low-hanging fruit would be the Web pages you visit. These too would be stored, indexed, and made searchable. More ambitiously, the spider would record all your screen activity along with the underlying event streams.
[...] Instead of idly slacking most of the time, our PCs ought to be indexing, analyzing, correlating, and classifying."
Mon, 21 Jun 2004 10:42:45 +0200
Found Cory Doctorow's great piece on why the Semantic Web will not exist - Metacrap: Putting the torch to seven straw-men of the meta-utopia:
"2. The problems
- 2.1 People lie
- 2.2 People are lazy
- 2.3 People are stupid
- 2.4 Mission: Impossible -- know thyself
- 2.5 Schemas aren't neutral
- 2.6 Metrics influence results
- 2.7 There's more than one way to describe something"
Mon, 14 Jun 2004 10:26:33 +0200
"Rhizome is a Wiki-like content management and delivery system that exposes the entire site -- content, structure, and metadata as editable RDF. This means that instead of just creating a site with URLs that correspond to a page of HTML, with Rhizome you can create URLs that represent just about anything, such as:
- structural components of content (such as a bullet point or a definition).
- abstract entities that can be presented in different ways depending on the context.
- relationships between entities or content, such as annotations or categories."
Wed, 09 Jun 2004 12:36:32 +0200
Alexander Johannesen's essay "Here is a How to Topic Maps, Sir!":
"The truth about relational databases is that they really are Topic Maps that are trying to get out. Think about what your RDBMS is trying to do; you have a lot of tables with information bits, and you create relations between them to represent something vital to your business requirements, write SQL to mirror that and try your best at fixing a user interface on top to make it all work. The more relations you've got, the more complex your model is going to be. And for what? To create an application that that both a computer and human can handle well.
Where do you stop expanding your model and when? When it gets too complex? Too slow? Too unmaintainable? Too crazy to keep going? Too often you get bogged down in the design of models; what relations are hogs, which ones are necessary, which ones are not?"
Tue, 04 May 2004 13:42:51 +0200
The Flamenco Search Interface: "We are creating a search interface framework, called Flamenco, whose primary design goal is to allow users to move through large information spaces in a flexible manner without feeling lost. A key property of the interface is the explicit exposure of hierarchical faceted metadata, both to guide the user toward possible choices, and to organize the results of keyword searches. The interface uses metadata in a manner that allows users to both refine and expand the current query, while maintaining a consistent representation of the collection's structure."
Tue, 04 May 2004 00:11:03 +0200
Found an old piece written by Micah Dubinko - "The Brain Attic", where he's asking for Personal Information Management software. (He's written his own software now - using plain text files: "It's the Data, Stupid")
"What we really need is a better way for our computers to be our brain-attics, freeing us up to do whatever it is that we do best.
So, we need to be able to enter text, and shuffle existing content into the system. We also need to be able to store email and web pages and integrate with browser bookmarks. Contacts. Todo lists. Calendars. Anything that we're currently scribbling on yellow notes stuck on our monitors. And it needs to be searchable. Really quickly searchable, as in keystroke-at-a-time results.
Personal Information Managers (PIMs) have already been invented, right? Well, technically true, the late Lotus Agenda, Outlook, and Evolution being the top contenders. But something's still missing: despite these programs, people still have sticky notes, or worse, a physical desktop that looks like mine."
Mon, 03 May 2004 12:02:03 +0200