Grant Ingersoll – Congrats to Tika and Welcome to the Lucene Stack!:
„Tika is a content extraction framework that wraps many other content extraction libraries such as PDFBox, POI, and others into a single, easy to use framework that makes it easy to add extracted content to Lucene, Solr and any other text application.“