Integrating Sphinx Search into a PHP Application (Part 1)

We recently re-developed an application that was originally built using ASP Classic and MS SQL Server and converted it into PHP/MySQL and along the way, one of our goals was to simplify the search system. The original application had individual searches for specific content types -- for example, if you wanted to search for documents you would have to browse to that area and click the search drop-down. The fields listed there would be things like document title, date posted, document type, etc. But if you were searching for a user you would go to the Users area and search using different fields. While that method worked it became unwieldy to manage dozens of different search queries, not to mention that forcing users to browse to a specific area before searching was a less-than-stellar user experience.

Enter the Sphinx

With the redesign, we decided to simplify things by creating a single search text field (ala Google) that would query all content and display the most relevant results. We decided on Sphinx to handle the task for a few reasons, but particularly because it is an open source system that has thousands of people using it, working on it, and providing support for it. Installing Sphinx will depend largely on your website hosting configuration -- you will need to have command-line access to the server in order to do so, which means that it will likely not be available on shared accounts. Fortunately there are plenty of good articles on how to install Sphinx on a VPS or dedicated environment so I won't bother with that part here.

The tricky part that we ran into was how to combine disparate content types into a unified search system, and that begins with creating a MySQL view which includes a unique ID in order for Sphinx to return the proper results:

CREATE OR REPLACE VIEW search AS
  SELECT UUID_SHORT() AS sphinxid

Note that this unique ID is different from your content ID -- for example, let's say you're want to search for both users and documents. It's possible that you have a user with an ID of 100 as well as a document with an ID of 100. So you need UUID_SHORT() to create a globally unique ID for Sphinx to use when sorting and processing the results So now you just need to add the other field(s) that you want to include in your search query:

CREATE OR REPLACE VIEW search AS
  SELECT
    UUID_SHORT() AS sphinxid,
    users.uid AS itemid,
    CONCAT_WS(' ', users.fname, users.lname, users.email) AS data,
    1 AS datatype
  FROM users

At this point we've added in the uid field aliased to itemid and a concatenation of fname, lname, and email into a single field that we're aliasing as data. Since we're not using data in any display capacity we're not concerned with formatting -- only that the data is available for indexing and searching. The itemid field will be used when we display the results to link to the proper data along with the datatype field, which can be anything at this point. Now we have to UNION the other table to combine its data:

CREATE OR REPLACE VIEW search AS
  SELECT
    UUID_SHORT() AS sphinxid,
    users.uid AS itemid,
    CONCAT_WS(' ', users.fname, users.lname, users.email) AS data,
    1 AS datatype
  FROM users
  *UNION* /* Remove asterisks, there's a bug in how Drupal stores this data */
  SELECT
    UUID_SHORT() AS sphinxid,
    documents.did AS itemid,
    documents.name AS data,
    2 AS datatype
  FROM documents

Create a Sphinx Indexer

With the previous view we've combined the data from the users and documents tables and specifically it's the data field that will contain the information to be indexed and searched. To do so, we now need to edit the sphinx.conf file on the server itself. Using vi or another editor of choice, edit your sphinx.conf file like so:

source my_search
{
        type                    = mysql
 
        sql_host                = localhost
        sql_user                = myuser
        sql_pass                = mypass
        sql_db                  = mydbname
        sql_port                = 3306  # optional, default is 3306

        sql_query               = \
                SELECT itemid, data, datatype \
                FROM search;
 
        sql_attr_uint           = itemid
        sql_attr_uint           = data
}
 
index my_search
{
        source                  = my_search
        path                    = /var/data/sphinx/search
        morphology              = stem_en
 
        min_word_len            = 3
        min_prefix_len          = 0
        enable_star             = 0
}
 
searchd
{
        listen                  = 9312
        listen                  = 9306:mysql41
        log                     = /var/log/sphinx/searchd.log
        query_log               = /var/log/sphinx/query.log
        read_timeout            = 5
        max_children            = 30
        pid_file                = /var/run/sphinx/searchd.pid
        max_matches             = 1000
        seamless_rotate         = 1
        preopen_indexes         = 1
        unlink_old              = 1
        workers                 = threads # for RT to work
        binlog_path             = /var/lib/sphinx/
}

There's a lot happening there but the key lines are in the source area. That's where we've defined the SQL query and the database connections, as well as the attributes that we're going to be using a little later, once we get to the PHP side of things. There are some different options for morphology so if you want more information about that there are tons of resources out there.

Once you've saved your sphinx.conf file then you need to fire the indexer with the following command line statement:

indexer --config /etc/sphinx/sphinx.conf --all --rotate

The PHP Side

For the purposes of this demonstration we're going to be working with two PHP pages: search.php and searchresults.php. Starting with search.php we need to create a simple HTML form like so:

<form name="search" method="get" action="/searchresults.php">
  <input type="text" name="q" id="q" />
  <input type="submit" value="GO" class="form-submit" />
</form>

Of course, the real work is done on the searchresults.php page. And fortunately, executing a search using PHP is actually pretty simple. First make sure that the PHP Sphinx API is included on the results page and then add the following code:

include '/PATH_TO_INCLUDE/sphinxapi.inc';
// Build search query
$cl = new SphinxClient();
$cl->SetServer( "localhost", 3312 );
$cl->SetMatchMode( SPH_MATCH_EXTENDED  );
$cl->SetRankingMode ( SPH_RANK_SPH04 );
// Execute the query
$q = '"' . $cl->EscapeString($_GET['q']) . '"/1';
$searchresults = $cl->Query($q, 'search' );

If all goes well, you'll be left with your results in the $searchresults array. As before, there are different options for matching mode and ranking mode and you may want to experiment with those to get the results you're after -- but this should be a good place to start.

This is a very basic search -- in part 2 we'll look at pagination, filtering, sorting, and custom weighting so be sure and check back once that article is posted. Follow @TenTenStudios on Twitter to be updated when this and other articles are up and running!

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.

Questions?

Do you have a question about something you saw here in our blog? Are you interested in a custom website for your business? If so, please contact us here.