Advanced Sphinx Search in a PHP Application (Part 2)

In part one of this series I discussed the basics of getting a Sphinx search up and running in a custom PHP application. However, Sphinx has so many features and options that the previous article really only scratched the surface of what can be done, so in this post we'll get into some of the more advanced tricks and techniques that can be employed to customize your search results. Specifically, we'll cover how to handle pagination and custom weighting.

Custom Weighting

With the project that I described in part 1, the client had a custom identification field that was frequently used by the employees to reference certain content. Building on the previous example, let's say that the document name field was one that we wanted to weight more heavily than other fields. Marking that field as a weighted field begins in the creation of the view. For reference, here's the view from Part 1:

CREATE OR REPLACE VIEW search AS
  SELECT
    UUID_SHORT() AS sphinxid,
    users.uid AS itemid,
    CONCAT_WS(' ', users.fname, users.lname, users.email) AS data,
    1 AS datatype
  FROM users
  *UNION* /* Remove asterisks, there's a bug in how Drupal stores this data */
  SELECT
    UUID_SHORT() AS sphinxid,
    documents.did AS itemid,
    CONCAT_WS(' ', documents.name, documents.filename) AS data,
    2 AS datatype
  FROM documents

Since we're going to weight the document name field we need to return that field as a specific alias, in this case we'll simply call it data_weighted. But as we're dealing with a UNION we also need to return a data_weighted field from the users table too -- but we'll just return a NULL value since there is no document name involved there.

CREATE OR REPLACE VIEW search AS
  SELECT
    UUID_SHORT() AS sphinxid,
    users.uid AS itemid,
    CONCAT_WS(' ', users.fname, users.lname, users.email) AS data,
    1 AS datatype,
    NULL AS data_weighted
  FROM users
  *UNION* /* Remove asterisks, there's a bug in how Drupal stores this data */
  SELECT
    UUID_SHORT() AS sphinxid,
    documents.did AS itemid,
    documents.filename AS data,
    2 AS datatype,
    documents.name AS data_weighted
  FROM documents

So again, we just took documents.name out of the data concatenation and aliased it specifically as data_weighted. As always, make sure to re-run the indexer once you've make any changes to the MySQL view:

indexer --config /etc/sphinx/sphinx.conf --all --rotate

Back on the PHP side of things, we only need a single line to effect the change in the search results:

include '/PATH_TO_INCLUDE/sphinxapi.inc';
// Build search query
$cl = new SphinxClient();
$cl->SetServer( "localhost", 3312 );
$cl->SetMatchMode( SPH_MATCH_EXTENDED  );
$cl->SetRankingMode ( SPH_RANK_SPH04 );
// Set weighting, weighted field x 50, item name x 10
$cl->SetFieldWeights( array('data_weighted' => 50, 'data' => 10));
// Execute the query
$q = '"' . $cl->EscapeString($_GET['q']) . '"/1';
$searchresults = $cl->Query($q, 'search' );

The data_weighted field has its value multiplied by 50 and the data field (most often just an item name, in this case) multiplied by 10, with everything else utilizing a 1x multiplier.

Pagination and Sorting

Now that you're returning properly weighted results, it's time to make those results easier to manage by adding in paging and sorting. While a full pagination system is outside of the scope of this article, Sphinx provides some useful paging methods in the PHP API and here's how you interact with them. First of all, you need some way to determine which page you want to display, and that's typically done through a query string (in the $_GET collection.) For example:

(isset($_GET['page']) && int_check($_GET['page']) && $_GET['page'] > 0) ? $page = $_GET['page'] - 1 : $page = 0;
$cl->SetLimits($page, 20);
// Execute the query
$q = '"' . $cl->EscapeString($_GET['q']) . '"/1';
$searchresults = $cl->Query($q, 'search' );

The first line looks for a query string parameter called page and sets the $page variable accordingly. Since Sphinx expects 0-based paging I've subtracted 1 from the actual page number. The second line then uses the Sphinx API to display the records that would be found on that page (the first parameter) based on the number of records per page (second parameter, 20, in my case.) Your paging system will likely need to know the total number of results in order to calculate the number of pages that can be displayed, and again, Sphinx makes this easy:

// Execute the query
$q = '"' . $cl->EscapeString($_GET['q']) . '"/1';
$searchresults = $cl->Query($q, 'search' );
$totalcount = $searchresults['total'];

At this point you now have the total number of records stored in the $totalcount variable which you can pass along to your pagination system for proper display.

Comments

loved the post!

Great blog here! Additionally your website lots up very fast! What host are you the usage of? Can I get your associate hyperlink for your host? I wish my website loaded up as fast as yours lol

Thank you -- we currently use AN Hosting/Midphase for our shared accounts and LiquidWeb for VPS accounts for higher-traffic websites.

What's Taking place i am new to this, I stumbled upon this I have found It positively useful and it has helped me out loads. I'm hoping to contribute & help other customers like its aided me. Good job.

What's up to all, as I am in fact keen of reading this web site's post to be updated daily. It consists of good material.

Great content compared to some of the similar posts I've discovered. Carry on the good work.

This really demonstrates that there are still people who care about the things they submit on the web. I actually liked reading the comments.

Definitely is good to finally find decent postings like this. I actually liked it.

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.

Questions?

Do you have a question about something you saw here in our blog? Are you interested in a custom website for your business? If so, please contact us here.