Saturday, March 7, 2009

Drupal's Search Framework: The execution of a search

Drupal's Search Framework: The execution of a search

drupal post data after submit

Posted on April 25, 2008 - 02:07 by Robert Douglass. acquia drupal planet Drupal 6 drupal planet search sprint
Drupal's ambitious search module provides a framework for building searches of all kinds. By isolating the tasks involved in searching, and allowing the actual search implementations to be handled by other modules, the search framework sets the stage for all sorts of creative search applications. This article, which applies to Drupal 6, explores the structure of the search framework by following the steps needed to execute a search.

Stucture of a search
Here are the basic steps involved in searching:

Build a search index.
Build a search form.
Accept a POST request from the form.
Redirect POST to GET with search query values expressed in the URL.
Parse search query values.
Construct search based on query values.
Return formatted results.

Build a search index
The search module's API for indexing HTML content is very simple.

<?php
search_index($sid, $type, $text);
?>

Example 1: search_index is the way you put stuff into the search index.

$sid is the unique id for a piece of content, $type corresponds to the name of the search implementation (see 'name' $op for hook_search), and $text is the HTML that is to be indexed.


Build a search form
The basic search form is simply a text field with a submit button. This form is available by default to every module that implements search using hook_search. The main method for extending this form is through hook_form_alter (see node.module, node_form_alter). It is also possible to build search functionality using other tools that don't rely totally on the search framework. See views_fastsearch for one such hybrid approach.


Accept a POST request from the form
The search_menu and the search_view functions cooperate to make sure that any incoming POST requests for search get redirected to the same path using GET, only with the POST information expressed as part of the GET request.

<?php
// In search.module
function search_menu() {

[...snip...]

foreach (module_implements('search') as $name) {
$items['search/'. $name .'/%menu_tail'] = array(
'page callback' => 'search_view',
'page arguments' => array($name),
'type' => MENU_LOCAL_TASK,
'parent' => 'search',
);
}
return $items;
}

// In search.pages.inc

/**
* Menu callback; presents the search form and/or search results.
*/
function search_view($type = 'node') {
// Search form submits with POST but redirects to GET. This way we can keep
// the search query URL clean as a whistle:
// search/type/keyword+keyword
if (!isset($_POST['form_id'])) {
if ($type == '') {
// Note: search/node can not be a default tab because it would take on the
// path of its parent (search). It would prevent remembering keywords when
// switching tabs. This is why we drupal_goto to it from the parent instead.
drupal_goto('search/node');
}

// [...snip...]

// Do the search and build the form, expressed as $output

return $output;
}

return drupal_get_form('search_form', NULL, empty($keys) ? '' : $keys, $type);
}
?>


Example 2: search_menu and search_view.

In search_menu it can be seen how a path is being built for every search module that implements a search. You can see this in action on any Drupal installation with search.module enabled at the path http://example.com/search. The Content and the Users tabs come from the node and user modules' search implementations, respectively. An interesting and important detail is the path description: $items['search/'. $name .'/%menu_tail']. The %menu_tail bit passes everything it matches as a parameter, verbatim, without splitting it into further segments. This is important if you want to be able to search for the string "foo/bar", for example. Normally that would be split into path segments based on the forward slash, but %menu_tail prevents the splitting.



Figure 1: A basic search form with Content and User tabs.


Redirect POST to GET with search query values expressed in the URL
The first lines of search_view promise to redirect to GET, but the mechanism for doing this isn't visible in the code:

<?php
// Search form submits with POST but redirects to GET. This way we can keep
// the search query URL clean as a whistle:
// search/type/keyword+keyword
if (!isset($_POST['form_id'])) {
?>

Example 2: Detail of search_view. Only do something on GET.

Clearly the search itself doesn't happen unless there is no POST form_id value (ie it is a GET request), but how does the redirect happen? The answer lies deep within the handling of the search form:

<?php

// From search.module
/**
* As the search form collates keys from other modules hooked in via
* hook_form_alter, the validation takes place in _submit.
* search_form_validate() is used solely to set the 'processed_keys' form
* value for the basic search form.
*/
function search_form_validate($form, &$form_state) {
form_set_value($form['basic']['inline']['processed_keys'], trim($form_state['values']['keys']), $form_state);
}

/**
* Process a search form submission.
*/
function search_form_submit($form, &$form_state) {
$keys = $form_state['values']['processed_keys'];
if ($keys == '') {
form_set_error('keys', t('Please enter some keywords.'));
// Fall through to the drupal_goto() call.
}

$type = $form_state['values']['module'] ? $form_state['values']['module'] : 'node';
$form_state['redirect'] = 'search/'. $type .'/'. $keys;
return;
}
?>


Example 3: Search form validation and submission.

When the search form is submitted, the values first go to search_form_validate(). The sole purpose of the validation is to make sure the processed search keys (the values of the form submission), are passed on to the submit handler. The submit handler, search_form_submit(), does the unusual task of validating the form (checking if there are actually keys, or if an empty form was submitted). It can be debated whether that validation actually belongs in the search_form_validate function. More interesting to us, however, is the setting of $form_state['redirect']. This is how POSTed search forms get redirected via GET with the search query in the URL. The Forms API will do the redirect after the submit handler has finished.

This process is one of the first mysteries of the search module that often confuses people when they attempt to understand its inner workings. Despite being somewhat mystical in its behavior, the POST -> GET redirect has a very practical advantage: search result pages can be bookmarked.


Parse search query values
The one thing that virtually every function needs in the process of doing a search is the $keys variable that contains the search query. In Drupal 6, the entire search query is represented as a string. The function search_get_keys() can be used to fetch this string, and it is a simple function that looks first to the path, and then to the submitted form values in order to find a keyword query. Whatever is found is stored statically in the function and cannot be changed during the lifetime of the request.

Management of this keyword query string is an interesting issue, especially in the context of the advanced search form. The search module offers two functions, search_query_insert($keys, $option, $value = '') and search_query_extract($keys, $option), which aid in the manipulation of the query string. If you call search_query_extract("foo nid:4711", "nid"), you get the value 4711 in return. If you call search_query_insert("bar", "uid", 42), you get "bar uid:42" in return. Neither of these functions actually interact with search_get_keys, however, so they cannot be used to fetch or manipulate the statically cached keys. See node_form_alter and the $op = 'search' part of node_search for usage examples of these functions. Note in particular how the form is always used as the storage mechanism for the search query string.


Construct search based on query values
The search framework expects modules to use the parsed search query string to do a search for values and return a structured array of results. This process gets triggered in search_view, which calls search_data, which is a wrapper first and foremost for this code: $results = module_invoke($type, 'search', 'search', $keys); In other words, the $op = 'search' phase of hook_search is initiated. The other responsibility of search_data is to theme the results page, either by invoking the hook_search_page implementation for the module doing the search, or by defaulting to theme('search_results').

Despite the fact that the search framework expects modules to do their own searches, it also provides a mechanism for searching the search index (see step #1). The function do_search, in its simplest form, is a breeze to use. Take a list of keywords and specify a type ('node' for searching node content), and get search results like this:

<?php
$results = do_search('foo bar baz', 'node');
?>

Example 4: Using do_search to find content.

The $results will be an array of top ranking node ids for the keywords "foo" or "bar" or "baz". As this function is one of the core API functions of the search module, you can feel free to call it for your own purposes any time you want. For example, call it from within a block, taking keywords from taxonomy terms or a user's profile interests, and use the returned results as a form of content recommendation.

The full function signature for do_search, however, is quite intimidating:

<?php
do_search($keywords, $type, $join1 = '', $where1 = '1', $arguments1 = array(),
$columns2 = 'i.relevance AS score', $join2 = '', $arguments2 = array(),
$sort_parameters = 'ORDER BY score DESC') {
?>

Example 5: do_search, Search's API function for finding content.

Discussing all the possible values for the parameters is outside the scope of this article, but the plethora of options are there so that calling code can interact with two distinct queries by injecting JOIN and WHERE clauses into each of them. Sorts can be specified as well, although I don't recall ever seeing this feature utilized.


Return formatted results
If you want to utilize the search module's standard formatting for search results, your hook_search('search') has to build a structured array of results where each result follows the format:

Required keys:
link: The URL of the item.
type: The translated type, et. "Blog entry".
Optional keys:
title: The title of the result.
user: The themed username of the user who created the search result (ie. node author).
date: The timestamp associated with the search result.
snippet: An excerpt of text that gives the context of the keywords that were found in the search result. The search module provides a function, search_excerpt(), which can be used to highlight the keywords within this snippet, but you must call it yourself while building the search result.
Conclusion
There are potentially many steps that go into doing a search and displaying the results. The search module provides a framework for managing all of these steps, and an API for accessing the various bits and pieces even outside of the context of a traditional search page. The functions search_excerpt, search_index and do_search, in particular, can be called by modules outside of the traditional hook_search context.

Login or register to post comments
Excellent article. OneFri, Apr/25/2008 - 11:10am GMT - Mark (not verified)
Excellent article. One aspect of the search API that is a bit limiting is that it creates a separate tab in the search results for other modules' invocation of hook_search. It would be nice if module developers could override that behavior and integrate the results from their module's search into the same tab as the results from search.module.

The reason I am pointing this out is that I am the maintainer of the search_attachments module, and the most requested feature from its users is the ability to put the hits on regular node content (i.e., those found by search.module) and hits on files (i.e., those found by search_attachments) in the same tab. In responding to a user request at http://drupal.org/node/242748 I've started to think about ways of doing this but haven't dug too far in so for. Any suggestions?

Login or register to post comments
Mark, the first suggestionFri, Apr/25/2008 - 11:32am GMT - Robert Douglass
Mark, the first suggestion is to open up a feature request against Drupal 7 and add it to the list of issues here: http://groups.drupal.org/node/10569

There is a lot of activity going on with search at the moment, and every bit of help counts. Thanks for your awesome contribution (the search_attachments mod). Look forward to discussing "Unified search across implementations" with you in the search group.

Login or register to post comments
Thanks a lot, will do.Fri, Apr/25/2008 - 11:35am GMT - Mark (not verified)
Thanks a lot, will do.

Login or register to post comments
I'm trying to hack inTue, Apr/29/2008 - 5:53am GMT - Bit Santos (not verified)
I'm trying to hack in search-by-date fields (published before and published after fields) into the advanced search of nodes on my D6 site but I'm running into trouble. I've found how to add the necessary fields in node_form_alter() and the code to add these parameters to the search query in node_search() but then the submitted data from those fields don't make it to the generated $keys in search_form_submit(). I've been trying for the past few hours to figure out what's going on when search_form_validate(), form_set_value(), and finally _form_set_value() are called, but at this point in the day I'm getting totally lost.

I've pretty much confirmed that search_form_validate() is the point at which it breaks. If I manually type the URL with the appropriate GET query, the search works just fine.

Can I get a little help? :-)

Login or register to post comments
You’re on the right track.Tue, Apr/29/2008 - 6:34am GMT - Robert Douglass
You're on the right track. If you're doing this in a module you need to add a validation function. If you're hacking to make a core patch, you need to update node_search_validate. Whichever option you choose, you need to study node_search_validate to see how it rebuilds the string with the $keys using search_query_insert and then packs that string into the form like this:

<?php
if (!empty($keys)) {
form_set_value($form['basic']['inline']['processed_keys'], trim($keys), $form_state);
}
?>

This is awkward and I hope that we will soon come up with a nicer paradigm for building this (and other) advanced search forms.

Login or register to post comments
I didn't think of looking atWed, Apr/30/2008 - 1:53am GMT - Bit Santos (not verified)
I didn't think of looking at node_search_validate()! Thanks. Now I'm confident that my new search parameters are getting through, but now I'm having a problem with the query results. I'm getting zero results with search parameters that I'm sure should yield at least one result.

I have the following code in node_search():
if ($start = search_query_extract($keys, 'after')) {
$conditions1 .= ' AND nz.created >= %d';
$arguments1[] = intval(date('U', strtotime($start)));
}
if ($end = search_query_extract($keys, 'before')) {
$conditions1 .= ' AND n.created <= %d';
$arguments1[] = intval(date('U', strtotime($end)));
}

Is there anything I missed?

Login or register to post comments
Woops, the third line shouldWed, Apr/30/2008 - 3:50am GMT - Bit Santos (not verified)
Woops, the third line should have n.create. I intentionally made it "nz" so I could see the generated query in the error. :-P

Login or register to post comments
I'd have to see the actualWed, Apr/30/2008 - 7:44am GMT - Robert Douglass
I'd have to see the actual query being generated before I could say. Make sure to use the devel module and turn query logging on so that you can see all of the queries getting executed, and analyze the query being built, comparing it to the query you expected.

Login or register to post comments
Hi, simple question. AfterTue, May/6/2008 - 5:25pm GMT - Anonymous (not verified)
Hi, simple question. After the post back, the keys query string is set textually in the "Enter your keywords:" text box. so, if my keys value is "somesearch xvalue:something", then my textbox has this entire string instead of just "somesearch".

how can we ensure the proper value is set at postback?

thanks!

Login or register to post comments
This depends on what youWed, May/7/2008 - 4:26pm GMT - Robert Douglass
This depends on what you mean by "proper value". In the ApacheSolr module I decided that no matter what comes in as the URL or POST, any field queries (like nid:5) would not be displayed in the form. This is because the module relies heavily on faceted searching and if you click 5 facets to drill down, with their somewhat long, non-human friendly names, the form will become overpopulated with all sorts of trash. So in the ApacheSolr module all of this extra information is stored in a special singleton object. Look here at the apachesolr_form_alter function, and look here at the get_query_basic function to see how it is done.

In Drupal's core search, the field values are passed on in the $form. Look at node_form_alter, node_search_validate, and node_search (in that order) to see how the values of the field queries are persisted.

Login or register to post comments
Hi Robert, I'm using yourThu, Nov/6/2008 - 2:53pm GMT - Claudio Cicali
Hi Robert,

I'm using your great ApacheSolr module in a pre-production site (looking forward for the 1.0 and reading all the opens issues so far, particularly about the DISMAX query).

One thing that puzzles me is how to change the default behaviour of the search form *block*. I'm on D6, and I'd like it to take the search directly to Solr and not to the default Drupal one (which, in production, I'll hide to the users), where I need to click the "Search" tab.

Thank you!

No comments: