Kategorie: Tim’s Weblog
-
Congrats to Tika and Welcome to the Lucene Stack!
Grant Ingersoll – Congrats to Tika and Welcome to the Lucene Stack!: „Tika is a content extraction framework that wraps many other content extraction libraries such as PDFBox, POI, and others into a single, easy to use framework that makes it easy to add extracted content to Lucene, Solr and any other text application.“
-
The depths of OS X: SIPS
Jon Simpson – The depths of OS X: SIPS: „That solution is sips. The “scriptable image processing system” by self-description and a tool built right into Mac OS X, using all of the format support and output support available to the OS. sips -s format jpeg test.png –out test.jpg“ Here’s the Apple man page for…
-
Business Software Needs a Revolution
Jim Kerstetter in a BusinessWeek commentary – Business Software Needs a Revolution: „Last year, the National Institute of Standards & Technology estimated that the annual cost of difficult-to-use or flat-out buggy software on the U.S. economy was $59.5 billion. Analysts estimate business-software customers spend $5 installing and fixing their software for every $1 they spend…
-
PDFBox
„PDFBox is an open source Java PDF library for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. PDFBox also includes several command line utilities.“ It’s being used by Alfresco.
-
Where have all the filters gone?
Mark Bennett at New Idea Engineering – Where have all the filters gone?: „Through various mergers and acquisitions, the three main vendors for commercial document filters are now owned by companies who are already selling their own search products. […] The 3 top commercial filters being used now are: Stellent [Outside In Technology] – Now…
-
Sun Presenter Console extension is useful but undocumented
Bruce Byfield at Linux.com – Sun Presenter Console extension is useful but undocumented: „After installation, you will not see any sign of SPC except in the Extension Manager. To use it, open your slide show in OOo Impress and go to Slide Show -> Slide Show Settings in the menu. Under Presentation monitor, select the…
-
XRX and Context Delivery Architecture
Dan McCreary – XRX and Context Delivery Architecture: „When we create a form dynamically through auto-generation we usually know the id of person who is filling out the form and we can use an XQuery look-up service to see what roles that person has, and what departments, projects and groups they are in. We can…
-
Ad blocking with ad server hostnames and IP addresses
Peter Lowe provides a great list of ad servers for blocking ads: „So, to start blocking ads: find your hosts file download the list of ad servers copy the list of ad servers on the end of your hosts file (see Where’s my hosts file? if you don’t know where it is) restart your browser“
-
What’s new with Apache Solr
Grant Ingersoll at IBM developerWorks – What’s new with Apache Solr: „With the 1.3 release, Solr adds in distributed search capabilities. The application splits up the documents across several machines, commonly referred to as shards by Solr (and others). Each shard contains its own self-contained index, and Solr can coordinate the querying of the indexes…