- Aperture
- Aperture这个Java框架能够从各种各样的资料系统(如:文件系统、Web站点、IMAP和Outlook邮箱)或存在这些系统中的文件(如:文档、图片)爬取和搜索其中的全文本内容与元数据。它当前支持的文件格式如下:
- Plain text
- HTML, XHTML
- XML
- PDF (Portable Document Format)
- RTF (Rich Text Format)
- Microsoft Office: Word, Excel, Powerpoint, Visio, Publisher
- Microsoft Works
- OpenOffice 1.x: Writer, Calc, Impress, Draw
- StarOffice 6.x - 7.x+: Writer, Calc, Impress, Draw
- OpenDocument (OpenOffice 2.x, StarOffice 8.x)
- Corel WordPerfect, Quattro, Presentations
- Emails (.eml files)
该项目主页:http://aperture.sourceforge.net/
-