Advanced Sphinx Search in a PHP Application (Part 2)
In part one of this series I discussed the basics of getting a Sphinx search up and running in a custom PHP application. However, Sphinx has so many features and options that the previous article really only scratched the surface of what can be done, so in this post we'll get into some of the more advanced tricks and techniques that can be employed to customize your search results. Specifically, we'll cover how to handle pagination and custom weighting.
With the project that I described in part 1, the client had a custom identification field that was frequently used by the employees to reference certain content. Building on the previous example, let's say that the document name field was one that we wanted to weight more heavily than other fields. Marking that field as a weighted field begins in the creation of the view. For reference, here's the view from Part 1:
CREATE OR REPLACE VIEW search AS SELECT UUID_SHORT() AS sphinxid, users.uid AS itemid, CONCAT_WS(' ', users.fname, users.lname, users.email) AS data, 1 AS datatype FROM users *UNION* /* Remove asterisks, there's a bug in how Drupal stores this data */ SELECT UUID_SHORT() AS sphinxid, documents.did AS itemid, CONCAT_WS(' ', documents.name, documents.filename) AS data, 2 AS datatype FROM documents
Since we're going to weight the document name field we need to return that field as a specific alias, in this case we'll simply call it data_weighted. But as we're dealing with a UNION we also need to return a data_weighted field from the users table too -- but we'll just return a NULL value since there is no document name involved there.
CREATE OR REPLACE VIEW search AS SELECT UUID_SHORT() AS sphinxid, users.uid AS itemid, CONCAT_WS(' ', users.fname, users.lname, users.email) AS data, 1 AS datatype, NULL AS data_weighted FROM users *UNION* /* Remove asterisks, there's a bug in how Drupal stores this data */ SELECT UUID_SHORT() AS sphinxid, documents.did AS itemid, documents.filename AS data, 2 AS datatype, documents.name AS data_weighted FROM documents
So again, we just took documents.name out of the data concatenation and aliased it specifically as data_weighted. As always, make sure to re-run the indexer once you've make any changes to the MySQL view:
indexer --config /etc/sphinx/sphinx.conf --all --rotate
Back on the PHP side of things, we only need a single line to effect the change in the search results:
include '/PATH_TO_INCLUDE/sphinxapi.inc'; // Build search query $cl = new SphinxClient(); $cl->SetServer( "localhost", 3312 ); $cl->SetMatchMode( SPH_MATCH_EXTENDED ); $cl->SetRankingMode ( SPH_RANK_SPH04 ); // Set weighting, weighted field x 50, item name x 10 $cl->SetFieldWeights( array('data_weighted' => 50, 'data' => 10)); // Execute the query $q = '"' . $cl->EscapeString($_GET['q']) . '"/1'; $searchresults = $cl->Query($q, 'search' );
The data_weighted field has its value multiplied by 50 and the data field (most often just an item name, in this case) multiplied by 10, with everything else utilizing a 1x multiplier.
Pagination and Sorting
Now that you're returning properly weighted results, it's time to make those results easier to manage by adding in paging and sorting. While a full pagination system is outside of the scope of this article, Sphinx provides some useful paging methods in the PHP API and here's how you interact with them. First of all, you need some way to determine which page you want to display, and that's typically done through a query string (in the $_GET collection.) For example:
(isset($_GET['page']) && int_check($_GET['page']) && $_GET['page'] > 0) ? $page = $_GET['page'] - 1 : $page = 0; $cl->SetLimits($page, 20); // Execute the query $q = '"' . $cl->EscapeString($_GET['q']) . '"/1'; $searchresults = $cl->Query($q, 'search' );
The first line looks for a query string parameter called page and sets the $page variable accordingly. Since Sphinx expects 0-based paging I've subtracted 1 from the actual page number. The second line then uses the Sphinx API to display the records that would be found on that page (the first parameter) based on the number of records per page (second parameter, 20, in my case.) Your paging system will likely need to know the total number of results in order to calculate the number of pages that can be displayed, and again, Sphinx makes this easy:
// Execute the query $q = '"' . $cl->EscapeString($_GET['q']) . '"/1'; $searchresults = $cl->Query($q, 'search' ); $totalcount = $searchresults['total'];
At this point you now have the total number of records stored in the $totalcount variable which you can pass along to your pagination system for proper display.