I've finished reading another book, in this case not about web development, related instead with searching.
Sphinx is an opensource search engine that uses indexing to do fast queries (non-realtime, though, as it has to index the data).
Title: Sphinx Search Beginner's Guide
Author: Abbas Ali
Sphinx is an opensource search engine that instead of doing real-time searches, works by building indexes of the data and then allowing really fast search operations using those indexes.
This book explains all basics (including setup), then goes into deep detail of searching: modes, modifiers, attributes, filters (both basic and advanced/low-level), grouping, indexing and delta indexing...
It also deals with how to modify configuration, from sources to Sphinx API specific parameters, change or extend the charset tables..
My complains about this book are two:
- From 244 pages it has around 100 of two PHP examples. Having a "full PHP website example" is ok, but two of them looks more like "page filling" than real interest on explaining concepts (both could just be combined).
- And more importantly, the author seems to deriberately avoid complex stuff like partial word matches: For example, how does Sphinx matches "Ser" or "Serg" or "Sergi" if we have indexed "Sergio"? Do them all get the same weight for the result? What happens with each different SPH_MATCH_xxx machting mode?
To compensate a bit for those complains, we'r explained a few interesting and non-basic concepts:
- How to setup distributed indexes (distributed among multiple sphinx servers).
- How the morphology works, how to use it for stemming (reducing a word to their stem) and how the morphologic processor works in sphinx. This is not used by default so is a good addition. It even mentions "wordforms", mappings of words to handle synonyms.
The book ends with a brief introduction to SphinxQL, to use it from MySQL if you feel more confortable writing SQL queries.
So, overall you get a really good view of how this search engine works, including some advanced topics. It just lacks more depth in some of the explained subjects instead of so many (trivial) examples.