Tech stuff: Devoxx 2009: Full Text Search for Hibernate

17/11/2009, University sessions, Emmanuel Bernard

Search solutions:

Plain SQL search limits:

Full-text search solutions:

Hibernate Search, general features:

LGPL
uses Hibernate core
uses Lucene under the hood
solves object vs text mismatch
convert object to text document (+reverse) → Hibernate application uses objects, not text
documents
convention over configuration
heavily built on annotations
Optimize Lucene access:
- update Lucene docs on commit
- object graphs are consolidated to single Lucene docs to provide relevant searches
- avoid flooding Lucene indexer:
  - batch Lucene updates on commit
  - optionally trigger the Lucene indexer asynchronously
- support clustering (JMS)

Hibernate Search Annotations:

@Indexed
@Field: tunable how to convert to text with, among others, @FieldBridge. E.g. convert number to 0-padded number.
@IndexedEmbedded
@Boost: promote a particular field in the relevance score (can be at indexing time or at query time)
@Analyzer: e.g. anagram-support

Lucene Index as used by Hibernate Search:

Query:

Advanced stuff:

tokenizer: split text in words, remove common words
complex searches: combination of indexing and querying
fuzzy search:
- “Levenstein distance”: quantifies similarity
- “n-gram”: word is split in groups of 3 letters → matching groups determines score. (demo looked a bit hackery)
phonetic search (soundex-like): disappointing in practice
synonyms: use your application-specific list
stemming: → 'reduction'
- Porter Algorithm
- Snowball stemmer
filters: provide efficient an pluggable support for
- security, categories, temporal data, caching...
“explain” query result

green tea said...: Hibernate Search integrates transparently with Hibernate, the object/relational (O/R) mapping and persistence engine, with little to no configuration (past specifying what entities to index). With advanced features such as query filter and index sharding, Hibernate Search can be embedded into user applications.; November 26, 2009 at 2:37 PM

2009/11/17