Jahr: 2004

  • The Sum of Ant

    Ken Arnold on the The Sum of Ant: „Ant is nothing more than the sum of its parts By this I mean that ant has not learned the basic power of composition, building things out smaller parts. This was the great insight that our Unix forbearers bequeathed us toolsmiths, and it’s pretty sad to see…

  • Xapian

    „Xapian is an Open Source Probabilistic Information Retrieval library, released under the GPL. It’s written in C , and bindings are under development to allow use from other languages (Perl, Python, and PHP are working; Java will be available shortly). Xapian is designed to be a highly adaptable toolkit to allow developers to easily add…

  • Why is Distributed so Hard?

    Dale Asberry on the importancy of loose coupling – Why is Distributed so Hard?: „In some ways, marriage vows do a disservice to the richer subtleties in intimate human interaction. Namely, two people don’t come together to become one, they come together to become three! There will always be the self and the other. The…

  • Amberfish

    „Amberfish is general purpose text retrieval software. Its distinguishing features are indexing/search of semi-structured text (i.e. both free text and multiply nested fields), built-in support for XML documents using the Xerces library, structured queries allowing generalized field/tag paths, hierarchical result sets (XML only), automatic searching across multiple databases (allowing modular indexing), and relatively low memory…

  • Bayesian classification using Rainbow

    Fascinating stuff: „Rainbow is a program that performs statistical text classification.“ It can use Bayesian classification to automatically categorize documents. Jon Udell tried it out last year: “ There’s been some discussion in the blog world about using a Bayesian categorizer to enable a person to discriminate along various interest/non-interest axes. I took a run…

  • Here is a How to Topic Maps, Sir!

    Alexander Johannesen’s essay „Here is a How to Topic Maps, Sir!“: „The truth about relational databases is that they really are Topic Maps that are trying to get out. Think about what your RDBMS is trying to do; you have a lot of tables with information bits, and you create relations between them to represent…

  • Flamenco Search Interface

    The Flamenco Search Interface: „We are creating a search interface framework, called Flamenco, whose primary design goal is to allow users to move through large information spaces in a flexible manner without feeling lost. A key property of the interface is the explicit exposure of hierarchical faceted metadata, both to guide the user toward possible…

  • The Brain Attic

    Found an old piece written by Micah Dubinko – „The Brain Attic“, where he’s asking for Personal Information Management software. (He’s written his own software now – using plain text files: „It’s the Data, Stupid“) „What we really need is a better way for our computers to be our brain-attics, freeing us up to do…

  • Do As They Need, Not As They Say

    Jeff Lowery at ONJava.com – Do As They Need, Not As They Say: „‚Do it the way we’ve always done it, except better.‘ This is the unstated initial requirement of any new system I’ve been asked to develop. Nobody really wants to change the way things are done, even though they recognize the problems. It’s…

  • Multibyte-character processing in J2EE

    There’s a lot to consider when dealing with multibyte characters in your programs – see this JavaWorld article: „Most J2EE servers can support multibyte-character languages (like Chinese and Japanese) very well, but different J2EE servers and browsers support them differently. When developers port some Chinese (or Japanese) localized applications from one server to another, they…