Tim’s Weblog Tim's Weblog
Tim Strehle’s links and thoughts on Web apps, managing software development and Digital Asset Management, since 2002.

Entity extraction everywhere

Jon Udell - Entity extraction everywhere:

"Gnosis [a Firefox extension] finds and highlights entities — that is, companies, people, products, and industry terms. Here’s an expanded view of the industry terms, products, and technologies it extracted.

I’d love to see this kind of entity extraction turn into a commodity service that we can wire into our existing email, blogging, social networking, and social bookmarking systems. Being able to easily express, in all those contexts, that twine refers to the company, or the product, not the strong kind of string, would be a huge win."

Fri, 26 Oct 2007 07:29:17 +0000

Using LDAP groups in a web application

Is there a standard way to integrate a web application with LDAP groups? Let's see what others are doing: 

  • Confluence supports both "static groups" (the group's LDAP entry lists user DNs or IDs in an attribute like "member" or "memberUid" - typical objectClasses are "posixGroup" and "groupOfNames") and "dynamic groups" (the user entry lists group DNs in an attribute like "member" or "memberOf"; Active Directory does the latter). Which (static) groups are being read can be defined with a custom LDAP query filter ("baseGroupNamespace" and "groupSearchAllDepths" configuration settings).
  • Trac seems to use just "static groups". What's interesting is that they can store permissions directly in LDAP, with "objectclass: trac" and "tracperm" attributes. They're distinguishing group and user DNs internally by prefixing groups with an "@" character. They also filter which groups are being used ("group_rdn" configuration setting).
  • Drupal can work with both group types. They mention the problem with hierarchical group membership...
  • Typo3 I'm not sure about - the documented configuration settings sound like they only support "dynamic groups" ("use memberOf-Attribute", "build usergroup"), but at the bottom of the page they say: "Can I assign users to groups?Yes, currently standard implementations of AD, NDS and OpenLDAP are supported."

Update (2007-11-14):

  • Liferay has a detailed explanation of their LDAP integration. They've got a configuration setting "ldap.import.method" which is set to "user" or "group", depending on from which side group membership is to be read.
Thu, 25 Oct 2007 09:16:49 +0000

Why Enterprise Software Sucks

Jason Fried - Why Enterprise Software Sucks:

"The people who buy enterprise software aren’t the people who use enterprise software. That’s where the disconnect begins. And it pulls and pulls and pulls until the user experience is split from the buying experience so severely that the software vendors are building for the buyers, not the users. The experience takes a back seat to the feature list, future promises, and buzz words."

Wed, 24 Oct 2007 20:51:27 +0000

Operations is a competitive advantage...

Jesse Robins at O'Reilly Radar - Operations is a competitive advantage... (Secret Sauce for Startups!):

"In my experience it takes about 80 hours to bootstrap a startup. This generally means installing and configuring an automated infrastructure management system (puppet), version control system (subversion), continuous build and test (frequently cruisecontrol.rb), software deployment (capistrano), monitoring (currently evaluating Hyperic, Zenoss, and Groundwork). Once this is done the "install time" is reduced to nearly zero and requires no specialized knowledge."

Tue, 23 Oct 2007 20:58:53 +0000

Radar Networks Unveils twine.com

Tim O'Reilly at O'Reilly radar - Web2Summit: Radar Networks Unveils twine.com:

"Nova Spivack of Radar Networks plans to unveil the first application built on their semantic web platform, twine, a new kind of personal and group information manager. I've only seen a demo, and haven't had a chance to play with it hands-on or load in my own documents, but if it delivers what Nova promises, it could be revolutionary.

Underlying twine is Radar's semantic engine, trained to do what is called entity extraction from documents. Put in plain language, the semantic engine auto-tags each document, turning each entity into what looks like a web link as well as a tag in the sidebar. Type a note in twine, and it picks out all of the people, places, companies, books, and other types of information contained in the note, separating them out by type."

Fri, 19 Oct 2007 07:47:30 +0000

What about research, interviews, and documentation?

Ryan Singer - Ask 37signals: What about research, interviews, and documentation?:

"It’s like a conversation. You don’t sit down at the cafe, listen to your friend for two hours straight, and then talk for two hours straight. You take turns, constantly going back and forth, and the discussion finds its way.

Of course, you might wonder how to start. We build products we need ourselves, so our initial research is made of our own wishes, itches, and frustrations. When it comes to client work, my best advise is to become friends. Spend time together and discuss what they do until you can see through their eyes a bit."

Fri, 19 Oct 2007 07:40:13 +0000


"The Squish for Web edition enables testing HTML-based Web and Web 2.0 (Ajax) applications in different web browsers running on different platforms.

Squish for Web is, unlike many available web testing tools, not restricted to a single web browser or platform. Squish for Web supports running and recording tests for web applications in Microsoft Internet Explorer, Mozilla, Firefox, Apple's Safari and KDE's Konqueror on Windows, Linux, Unix and Mac OS X."

Tue, 16 Oct 2007 11:38:22 +0000

LAMP and the Spread Toolkit

Jason R. Briggs at ONLamp.com - LAMP and the Spread Toolkit:

"I don't believe it makes a lot of sense to receive messages in a PHP app (which is not to say in certain circumstances it might not be necessary, just that I'd prefer otherwise). So, from a design perspective in a multi-language environment, while I might send messages from a PHP application, I would potentially look at using Python daemons to handle writing the responses to those messages into a database; perhaps using Ajax polling for live notification to clients, or--the lightest-weight approach--including notifications in a standard page response.

[...] That said, the PHP extension for Spread is, admittedly, not production-ready. There are some stability issues; in particular, if the Spread daemon restarts after the PHP extension has made a connection, a reconnection can cause a persistent crash (not Apache httpd, just the extension itself). Therefore, I would invest in some serious development time with a C-and-PHP guru before relying on the extension for a mission-critical system."

Sun, 14 Oct 2007 19:50:19 +0000

Yahoo! Susceptible to Cross Site Request Forgery (XSRF) Attacks

Nitesh Dhanjani - Yahoo! Susceptible to Cross Site Request Forgery (XSRF) Attacks:

"It is possible for malicious sites to add or delete arbitrary Yahoo! calendar entries. The following HTML on a malicious site will add a Task and Event to the victim’s Yahoo! calendar."

Thu, 11 Oct 2007 13:40:11 +0000

Is it really the number of features that matter?

Jason Fried - Ask 37signals: Is it really the number of features that matter?:

"It’s not so much about consciously saying “we have three too many features here” it’s about saying “let’s solve most of this problem with less code and simpler design.” If we need to solve more of the problem later we can, but let’s solve most of it now—and quickly. And most of the time the partial solution is the plenty solution.

So remember: Good software is about balancing value and screen real estate and understanding and outcome."

Wed, 10 Oct 2007 14:22:23 +0000


"rBuilder is the first and only development tool that simplifies and automates the creation of software appliances. rBuilder combines powerful features with innovative packaging techniques to yield a repeatable appliance creation process."

Wed, 10 Oct 2007 12:09:44 +0000

Sphinx - Free open-source SQL full-text search engine

"Sphinx is a full-text search engine, distributed under GPL version 2.

[...] Generally, it's a standalone search engine, meant to provide fast, size-efficient and relevant fulltext search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages. Currently built-in data sources support fetching data either via direct connection to MySQL or PostgreSQL, or using XML pipe mechanism (a pipe to indexer in special XML-based format which Sphinx recognizes)."

The largest installation has indexed over 1 billion records

Mon, 08 Oct 2007 14:37:14 +0000

APC or Memcached

Peter Zaitsev - APC or Memcached:

 "APC Cache (Eaccelerator and other similar caches) is Fast but it is not distributed so you’re wasting cache and reducing possible hit rate by caching things locally if you have many web servers. MemcacheD is relatively slow but distributed and so you do not waste memory by caching same item in a few places, it is also faster to warmup as you need only one access to bring item into the cache, not access for each of web servers."

Mon, 08 Oct 2007 13:10:29 +0000

War Criminal

Andrew Sullivan - War Criminal:

"The decision to allow one man - the decider - to pre-empt and knowingly distort the rule of law in order to detain and torture anyone he wants - is a function not of conservatism, but of fascism.

[...] There is no doubt - no doubt at all - that these tactics are torture and subject to prosecution as war crimes.

[...] We have war criminals in the White House. What are we going to do about it?"

Sun, 07 Oct 2007 19:27:03 +0000

Building and Blogging again

Adam Bosworth - Building and Blogging again:

"Some extremely clear-headed and smart people can work out everything abstractly in their heads and then just go and implement it. I’m not one of them. Watching me write code is like watching an indecisive sculptor work with clay. I shape it. I look. I wince. I reshape it. I play with it. I wince some more. I ask my friends, nurse my wounds, and then reshape it yet again. And so on. Constant iterative development."

Sat, 06 Oct 2007 20:31:53 +0000

Tagging and foldering

Jon Udell - Tagging and foldering:

"On the desktop as well as on the web, we’re in the midst of a long transition from container-based to query-based storage and retrieval. And really, transition is the wrong word, because the two approaches will coexist into the indefinite future.

Given that coexistence, how can we help people understand the relationship between these two approaches?"

Fri, 05 Oct 2007 11:05:24 +0000

Key + Data

Sam Ruby - Key + Data:

"What do dynamo, memcached, Berkley DB, and couchdb have in common with each other, and in many ways with other structures like my hard drive or your mail or the www? Namely that everything is accessed by a primary key, and that metadata is either attached to, or embedded within, that data."

Thu, 04 Oct 2007 10:01:18 +0000