PRISE Search Design Notes

The NIST PRISE search engine treats documents and queries as lists of words and responds to a query with a list of documents ranked in order of their statistical similarity to the query. The PRISE search engine assumes the documents to be searched have been indexed using the PRISE indexer. The steps taken to process the query and return a list of documents (a result set) are:

Preprocess the query
This process identifies and normalizes the query terms.
Generate query enhancement terms, query expansion terms, using relevance feedback; if the caller requests them.
Append the reformulated query to the initial query; if enhancement terms are requested and the return of the reformulated query is requested.
Return the reformulated query; if requested.
Reweight the query using relevance feedback processing; if the caller requests it.
Create a new result set; if a search is requested.
Free information created by the search which is not needed by the caller.
Return the result set; if a search is performed.

Processing Details and Locations of Source Code by Step

This section describes in greater detail the major steps executed in the PRISE search engine. Direct access to the PRISE search engine should be achieved using the PRISE Programming Interface. This interface consist of three functions init_search0(), search0(), and close_search0() located in DISTRIBUTION_ROOT/prise_search/src/lib/search/search0.c.
What follows is a short description of search0() processing.

Preprocessing the query involves:
- Allocating the space for the PRISE internal query representation.
- Initializing the internal query representation data structures.
- Indexing the query which transforms the caller's query into the internal data structures used in the search process.
Source: init_qrep() DISTRIBUTION_ROOT/prise_search/src/lib/search/qindex.c
Creating the list of enhanced terms. See Generate the enhancement terms under relevance feedback.
Appending the reformulated query into the query involves:
- Duplicating the original query in the reformed query.
- Removing the original key words from the reformed query key words to make room for the new key words.
- Removing the original weights from the reformed query to make room for the new weights.
- Placing the original key words and their new weights in the reformed query key words. This step is performed because the weights may change due to the use of relevance information.
  If the original query key words contained duplicate terms and duplicate terms are being removed during query preprocessing only one of the original duplicate query terms will appear in the reformulated query.
- Appending the enhancement terms or term groups and their weights in the reformed query key words.
Source: appendReformulatedQuery() in DISTRIBUTION_ROOT/prise_search/src/lib/search/prise_query.c
The caller now has access to the reformulated query, because of the append, through the query.
Reweighting the query. See Reweight the query under relevance feedback.
Creating a result set requires running the user query against the document collection to create a list of documents ranked by similarity to the query.
Source: returnResultSet() in DISTRIBUTION_ROOT/prise_search/src/lib/search/search.stem.neg.c
Returning a result set involves:
- Performing the document keyword search the steps of which are:
- Applying stage one of special restrictions to remove documents from the document array according to caller requirements involves:
  - Examining the query for restrictions and translating them into a usable format if they exist.
  - Compare the term occurrence bitmaps produced by query2restrictions for the terms occurring in special restrictions to the term occurrence bitmaps in the document array produced by the document keyword search. If a document is proven to not contain a minimal subset of the terms needed to meet caller restrictions, remove it from the result set.
    Source: applyRestrictionMasks() in DISTRIBUTION_ROOT/prise_search/src/lib/special/restrictions.c
- Ranking of the documents of the result set is achieved by sorting them into descending order based on their document score.
  Source: rank_results() in DISTRIBUTION_ROOT/prise_search/src/lib/search/rank.c.
- Applying stage two of special restrictions involves doing string search on the piece of text, document or field, to determine if it meets the caller's special requirements. Documents which do not meet the caller's requirements are removed from the result set of documents which will be returned.
  Source: applyRestrictionScans() in DISTRIBUTION_ROOT/prise_search/src/lib/special/restrictions.c

Query indexing

Source: qinit() DISTRIBUTION_ROOT/prise_search/src/lib/search/qindex.c.

The process of indexing the query consist of:

Parsing the string of document keywords in the query.
Source: qparser() in .../misc/src/lib/client_util/qparser.dfa.c
Creating the internal data structures to represent the parsed terms.
- Creating a token list for the terms.
  Source: qindex(), term_index() DISTRIBUTION_ROOT/prise_search/src/lib/search/qindex.c.
  - If the token is a stop word an indicator flag is set in its token list entry.
  - If the token is not a stop word it is lexically normalized. Currently this involves stemming only.
    Source: lexnmzn() in DISTRIBUTION_ROOT/prise_tools/src/lib/lexnmzn/lexnmzn.c. This library was previously located in DISTRIBUTION_ROOT/prise_search/src/lib.
  - The token's dictionary information is placed in the token. If no dictionary information is found for the token its dictionary information is set to NULL.
    Source: setde() in DISTRIBUTION_ROOT/prise_search/src/lib/prise_search/qindex.c.
    - If the token is a repeat it is marked as such.
      A token, term, is considered a repeat, duplicate, if a token already found in the query has the same stem. The reason the stem is used to determine if a term is present in the query is that PRISE indexed collection dictionary information is stored based on stems. Thus all tokens with the same stem will retrieve the same dictionary information. Therefore only the first occurrence of a stem in the query will retrieve any documents associated with the stem which have not already been retrieved.
- Create a list of qterms for the tokens.
  Source: qtinit(), qterm_init() DISTRIBUTION_ROOT/prise_search/src/lib/search/qterm.c.
  This process creates qterms for any tokens which pass the token filter. The token filter removes:
  - Stop words.
  - Terms which do not occur in the dictionary.
  - Duplicate tokens. The removal of duplicate tokens is optional.

Relevance Feedback

Two modes of relevance feedback operation, controlled by four flags and a pointer, are available in the current search engine.
The modes are:

transparent
opaque

The four flags are:

doSearch
returnReformulatedQuery
reformulateQuery
newQuery

The pointer is:

doc_rels

Transparent Relevance Feedback

In transparent mode the caller issues a query and receives as a response a reformulated query which contains the original query plus additional key words, enhancement terms, which may prove helpful in finding documents if added to the query when searching. In this mode the caller has the option to add zero or more of the enhancement terms to the query before requesting a search.
The basic sequence of events in transparent mode is:

The caller requests a search with some initial query getting a result set.
This initial search is denoted in the PRISE query by the following:
- doSearch = 1
- returnReformulatedQuery = 0
- reformulateQuery = 0
- newQuery = 1
- doc_rels = NULL
The caller selects one or more documents from the result set and provides information about the relevance of the document(s) - relevance feedback.
The caller submits the query with the list of selected documents (called "relevance documents" in what follows) and associated relevance judgements, requesting a list of candidate enhancement terms. .
This step is denoted in the query as follows:
- doSearch = 0
- returnReformulatedQuery = 1
- reformulateQuery = 1
- newQuery = 0
- doc_rels != NULL
The search engine generates the candidate enhancement terms from the relevance documents.
The search engine returns the reformulated query to the caller.
Source: appendReformulatedQuery() in DISTRIBUTION_ROOT/prise_search/src/lib/search/prise_query.c
The caller selects zero or more enhancement terms and adds them to the query.
The caller submits the expanded query with the list of relevance documents requesting a search be executed and result set returned.
- doSearch = 1
- returnReformulatedQuery = 0
- reformulateQuery = 0
- newQuery = 0
- doc_rels != NULL
The search engine reweights the query to adjust for any user changes to the query.
The search engine performs the requested search using the weights based on the relevance documents and returns the result set.

Opaque Relevance Feedback

In opaque mode the caller may issue a query and ask that it be automatically expanded with enhancement terms. In this mode the enhancement terms are append to the query, without a chance for caller intervention, and the expanded query is used to create a result set.
The basic sequence of events in opaque mode is:

The caller requests a search with some initial query and gets a result set.
This initial search is denoted in the PRISE query in the same manner as in transparent mode.
The caller selects documents from the result set and provides relevance feedback for these (relevance) documents.
The caller submits the query with the list of relevance documents requesting enhancement terms be automaticly added to the query and a search be executed to return a result set.
- doSearch = 1
- returnReformulatedQuery = 0
- reformulateQuery = 1
- newQuery = 0
- doc_rels != NULL
The search engine generates the enhancement terms from the relevance documents.
The search engine adds the enhancement terms to the original query.
Source: appendQterms() in DISTRIBUTION_ROOT/prise_search/src/lib/search/prise_query.c
The search engine performs the requested search using the weights based on the relevance documents and returns the result set.

Generating enhancement terms

This process is intended to:

Return a ranked list of unique terms, from the relevance documents which will be most useful when added to the existing query in retrieving the desired information. Unique, in this case, means the term is not a duplicate term for one of the caller' current query terms.
Calculate weights, based on the relevance documents, which will be more effective in retrieving the desired information for both the existing query terms and the list of unique terms.

Source: Create_enhancedList() in DISTRIBUTION_ROOT/prise_search/src/lib/relevance/feedback.c

The fact that many terms from the relevance documents can have the same stem raises the issue of which term for a given stem is returned for user consideration in transparent mode. PRISE currently returns a term group to the caller.
A term group is a comma separated list of all the unstemmed terms which occur for the stem of an enhancement term in the relevance documents.
The steps in generating enhancement terms are:

Parse the relevance documents creating a list of terms and stems. Include in the list the information used in term sorting and reweighting for each stem returned.
Source: RelDocQterms() in DISTRIBUTION_ROOT/prise_search/src/lib/relevance/feedback.c
The steps are:
- Apply a document parser very similar to the one used by the PRISE indexer. See:
  - Source: Parse() in DISTRIBUTION_ROOT/prise_search/src/lib/relevance/feedback.c
    This function:
    - Determines the number of relevance documents on the list.
    - Calls parser()
    - Condenses parser's output based on stems.
  - Source: parser() in DISTRIBUTION_ROOT/prise_tools/src/lib/rf.parser/rf.parser.c
    This parser along with dividing documents into a list of terms returns with each term:
    - Its stem.
    - The number of documents it occurs in.
    - This ids of the documents which contain the term.
    - The number of occurrences of the term for each document it appears in.
- Create the internal data structures to represent the parsed stems and terms created by the previous step. This step uses the same functions to create the internal data structures as query indexing.
- Place the relevance information for each of the stems in its corresponding QTERM.
Remove parsed duplicates of original terms and update original term data.
- Remove, from the list of terms found by parsing the relevance documents, any terms already found in the query so that only unique terms remain.
- This step also updates the relevance feedback data for the terms already in the query so reweighting may be accomplished. A term is considered to be in the query if its stem is already part of the query.
Source: RemoveDuplicatesStems() in DISTRIBUTION_ROOT/prise_search/src/lib/relevance/termSort.c
Sort the unique terms remaining in the list created by the parsing.
Source: SortTerms() in DISTRIBUTION_ROOT/prise_search/src/lib/relevance/termSort.c
This allows the search engine to place at the top of the list the terms which, by the sorting method used, are determined to be most likely to help find information relevant to the user's query.
The following is a list of the term sorting methods available with the string used in the options.spec file of a collection to select the method:
- Term Idf. Options.spec string: TermIdf.
  Reference: Harman1
- Postings - the number of retrieved relevant documents containing the term. Options.spec string: TermPostrec
  Reference: Harman1
- Idf within postings. Options.spec string: TermIdfWIPostrec
  Reference: Harman1
- Idf * Frequency within postings, where frequency is the Log2 of total frequency of the term in the retrieved relevant set. Options.spec string: TermIdfFreqWIPostrec
  Reference: Harman1
- Idf * Frequency * Postings. Options.spec string: TermIdfFreqPostrec
  Reference: Harman1
- Idf * Frequency. Options.spec string: TermIdfFreq
  Reference: Harman1
- Porter Options.spec string: Porter.
  Source: Porter() in DISTRIBUTION_ROOT/prise_search/src/lib/relevance/termSort.c
  Reference: Harman1
- Smeaton and vanRijsbergen Options.spec string:SmeatonvanRijsbergen
  Source: SmeatonvanRijsbergen() in DISTRIBUTION_ROOT/prise_search/src/lib/relevance/termSort.c
  Reference: Harman1
- Robertson where w_ij is calculated using the Croft reweighting formula. Options.spec string: RobertsonCroft
  Source: RobertsonCroft() and Robertson() in DISTRIBUTION_ROOT/prise_search/src/lib/relevance/termSort.c
  Reference: Harman1
- Robertson where w_ij is the idf. Options.spec string: RobertsonIdf
  Source: calculate_rf_RobertsonIdfrankValues() and Robertson() in DISTRIBUTION_ROOT/prise_search/src/lib/relevance/termSort.c
  Reference: Harman1
Return the top N unique terms from the sorted list.
Source: BestTerms() in DISTRIBUTION_ROOT/prise_search/src/lib/relevance/termSort.c
This reduces the list of unique terms form relevance documents to at most the number of terms the caller wishes returned or added to the query. This avoids any additional processing on terms which are guaranteed not to be used.
The current default value of N is 10.
The value of N can be set in the options.spec file in PRISE indexed collections using the maxrfterms option. A minrfterms option may also appear in the options.spec file but the current implementation of BestTerms ignores this option.
Reweight both the terms returned by the previous step and the original query terms.
Source: ReweightTerms() in DISTRIBUTION_ROOT/prise_search/src/lib/relevance/reweightTerms.c
The goal of this step is to calculate weights for terms based on the relevance documents which will improve the quality of the search results.
The following is a list of the term reweighting methods available with the string used in the options.spec file of a collection to select the method:
- Robertson & Sparck Jones. Options.spec string: Rsj.
  Source: Rsj() in DISTRIBUTION_ROOT/prise_tools/src/lib/weights/reweights.c
  Reference: Okapi1
- Robertson & Sparck Jones with negative weights being set to zero. Options.spec string: RsjNeg0
  Source: RsjNeg0() in DISTRIBUTION_ROOT/prise_tools/src/lib/weights/reweights.c
- Robertson & Sparck Jones using the Bookstein modification. Options.spec string: BooksteinRsj
  Source: BooksteinRsj() in DISTRIBUTION_ROOT/prise_tools/src/lib/weights/reweights.c
- Idf. Use the term's idf as its weight. Options.spec string: ReweightIdf
  Source: ReweightIdf() in DISTRIBUTION_ROOT/prise_search/src/lib/relevance/reweightTerms.c
- Croft. Options.spec string: Croft
  Source: Croft() in DISTRIBUTION_ROOT/prise_tools/src/lib/weights/reweights.c
  Reference: Harman1
- Croft with negative weights being set to zero. Options.spec string: CroftNeg0.
  Source: CroftNeg0() in DISTRIBUTION_ROOT/prise_tools/src/lib/weights/reweights.c
  Reference: Harman1
- Croft using the Bookstein modification. Options.spec string: Bookstein.
  Source: Bookstein() in DISTRIBUTION_ROOT/prise_tools/src/lib/weights/reweights.c
  Reference: Harman1

Reweighting the query

This process calculates the weights of the terms in the expanded query during transparent relevance feedback.

Reweighting the query is a very similar to generating the enhanced list. Currently the implementation is a distinct function with an explicit step for re-parsing the relevance documents. Future versions may improve on this leaving the interface and its callers unaffected. Currently we re-parse in order to account for user changes to the query terms and relevant document list.

A query term can be added/changed.
If a query term is added/changed then the reweight value of this new term must be calculated. Reweighting in the this case could be accomplished by remembering all terms found in the relevant documents with the information needed for reweighting. Supporting last item in this list makes re-parsing necessary and so the decision was made to re-parse in this case as well.
A user term can be deleted.
If a user term is deleted it is just forgotten and reweighting for that term is not required.
Relevance documents can be added/deleted/changed.
If the relevance document set is changed by the user while selecting enhancement terms or for some other reason then the reweight value of the terms, which is derived from information in the relevance documents on the list at the time this function is called, must be recalculated. To obtain the information needed to recalculate the reweight values parsing of the relevance documents is required.

This function is executed every time there are relevance documents in the query and transparent relevance feedback mode is in use even if no query enhancement terms were added by the caller. Thus the caller's query terms will be reweighted with respect to the relevance documents supplied even though no enhancement terms are in use.

Source: ReweightQuery() in DISTRIBUTION_ROOT/prise_search/src/lib/relevance/feedback.c

The steps in reweighting the query are:

Parse the relevance documents creating a list of terms and stems.
Place in the original query terms the data from the parser needed for term reweighting.
Source: CopyQtermRFData() in DISTRIBUTION_ROOT/prise_search/src/lib/relevance/termSort.c
Reweight the query terms with one of the reweight methods.

Special restrictions

Document phrase search

This restriction requires that, of the documents found by the keyword search, only those containing the phrase supplied by the caller should be returned. For a document to pass phrase processing it must contain the same string the caller has supplied, with the following conditions

Any string of contiguous white space in the query phrase matches any string of contiguous white space in the document phrase regardless of the lengths of the white space strings. Thus
```
Four score and   seven years
```
matches
```
Four   score and seven     years
```

The comparison is case insensitive. Thus

Four score and seven years

matches

four Score and seven YEARS

No stemming is applied prior to the comparison. Thus
```
seven years
```
does not match
```
seven year
```
The order of terms matters. Thus
```
stocks and bonds
```
does not match
```
bonds and stocks
```
Punctuation in the query phrase must appear in the document phrase for them to match. Thus the query phrase
```
beef prices, rise
```
does not match the document phrase
```
beef prices rise
```
Punctuation in the document phrase must appear in the query phrase for them to match unless the punctuation is the last character, immediately follows the last word, of the document phrase. Thus the query phrase
```
beef prices rise
```
does not match the document phrase
```
beef prices, rise
```
but does match the document phrase
```
beef prices rise,
```
Hyphens which are embedded as part of terms in queries or documents must appear in the corresponding query or document terms to match. Thus
```
 mother in law
```
does not match
```
mother-in-law
```

Document proximity search
For this restriction the caller supplies two terms and a distance measured in words. Of the documents found by the document keyword search only those which contain the two terms where the distance between the terms is less then or equal the distance supplied will be returned. The following conditions apply to proximity processing:
- The comparison of query terms to document terms is case insensitive. Thus
```
Stocks
```
  matches
```
stocks
```
- No stemming is applied prior to the comparison. Thus
```
stocks
```
  does not match
```
stock
```
- The order of terms does not matter. Thus
```
stock exchange
```
  using distance
```
10
```
  and
```
exchange stock
```
  using distance
```
10
```
  produce identical results.
- Punctuation in the query terms must appear in the document terms for them to match. Thus the query term
```
NIST's
```
  does not match the document term
```
NIST
```
- Punctuation in the document terms is treated as a token separator if it does not appear in the query terms. Thus the query term
```
NIST
```
  matches the document term
```
NIST's
```
- Hyphens which are embedded as part of terms in queries or documents must appear in the corresponding query or document terms to match. Thus
```
 mother in law
```
  does not match
```
mother-in-law
```

Field keyword search
For this restriction the caller supplies a field name and a list of keywords to search for in the field. Of the documents found by the document keyword search only documents which contain the specified field where the field contains at least one of the listed keywords are returned. The following conditions apply to fielded keyword restrictions:
- The comparison of query terms to document terms is case insensitive. Thus
```
Stocks
```
  matches
```
stocks
```
- Stemming is applied prior to the comparison. Thus
```
stocks
```
  matches
```
stock
```
- The order of terms does not matter. Thus
```
stock exchange
```
  and
```
exchange stock
```
  produce identical results.
- All punctuation is treated as token separators. Thus
```
mother-in-law
```
  becomes three tokens
```
 mother in law
```

The use of special restrictions has the effect of refining the set of documents returned. This occurs because special restrictions can only remove documents after the initial document keyword search. It cannot add documents.

Special restrictions should be used carefully. Although an effort has been made to minimize the use of string searching, a very time consuming process, it is still the last step in supporting special restrictions. Therefore, if it is applied using terms which occur in many documents in the collection and produces a large number of documents even after Stage 1 of restriction processing, then this system may prove very slow.

Limitations

The PRISE Search Engine has some limitations

PRISE will process a query of up to 6000 terms.
Only the first 32 query terms used to retrieve documents will be accounted for in the bitmaps associated with the documents. Any additional query terms will add documents to the result set and be eligible to influence document scores but will not be represented in the bitmap (and will not be elligible for highlighting by the zclient).
Note: When using special restrictions, any special terms which do not occur in the bitmaps will not contribute to Stage 1 of special restrictions. This will result in more documents remaining to be processed in Stage 2 of special restrictions, string searching, which will slow response time.
Keyword searching, document or fielded, on numbers is not possible because numbers are not indexed by the PRISE Indexer.
The string searches used to support special restrictions will time out after 5 minutes and the result set returned will consist of the documents which have passed special restrictions at that time. This time out interval may be adjusted by changing the #define TIME_OUT_SECS.
Source: DISTRIBUTION_ROOT/prise_search/include/restrictions.h
Punctuation in the query is ignored for keyword searching.

PRISE Programming Interface

Includes

#include <stdio.h>
#include <malloc.h>
#include <math.h>
#include <fcntl.h>
#include <string.h>

#include "type_defs.h"
#include "symb_defs.h"

#include "tkn.h"
#include "query.h"

#include "coll.h"
#include "options.h"
#include "result.h"
#include "search_proto.h"
#include "search0.h"

#include "io.h"
#include "errors.h"

Main Data Structures

Collection

typedef struct {
  DICTFILE*     dict;
  POSTFILE*     postings;
  TPIFILE*      tpi;
  STOPFILE*     stoplist;
  COLLSTATS*    stats;
  DOCNO_TABLES* docno_tables;
  FIELD_TABLE*  field_table;
  char*         lexnmzn;      /* Name of lexical normalization function.  */
  int           stemfn;
  char*         locale;
  char*         dir;
  char*         name;
  int           mode;      /*mode refers to SEARCH_MODE or INDEXER_MODE*/
  OPTIONS*      options;
} COLL;

Query

typedef struct TEXT_QUERY
  {
  /*-------------------------------------------------------------------------*
   * Tags delimiting the part of the document to search
   *-------------------------------------------------------------------------*/
  char* start_end_tags;          /* When full document is intended, this     */
                                 /* contains a pointer to a null string      */
  /*-------------------------------------------------------------------------*
   * Data indicating the sort of search to [perform on the document part
   *-------------------------------------------------------------------------*/
  /*-------------------------------------------------------------*
   * Keyword search
   *   No limit on number of terms
   *   Stemming
   *-------------------------------------------------------------*/
  struct
    {
    char* terms;
    char* weights;
    int   numterms;
    } keywds;       
  /*-------------------------------------------------------------*
   * Proximity search
   *   2  terms
   *   no stemming
   *   order does *not* matter
   *-------------------------------------------------------------*/
  struct 	
    {                           
    char* terms;              /* in the order they must appear */
    int   dist;
    } prox;
  /*-------------------------------------------------------------*
   * Phrase search
   *   No limit on number of terms
   *   No stemming
   *   Order matters
   *-------------------------------------------------------------*/
  struct 				
    {
    char* terms;
    } phrase;
  struct TEXT_QUERY* next;  /* text query for next document section */
                            /* NULL if none follows                 */   
  } TEXT_QUERY;


/*---------------------------------------------------------------------*
 * Typedefs to support relevant/nonrelevant document list and relevance
 * feedback operations.
 *---------------------------------------------------------------------*/
typedef struct docrelnode{ /* structure for passing relevant document
			      lists */
  unsigned long docno;     /* document id number */
  float relevance;         /* how relevant the document is */
  struct docrelnode *next; /* pointer to next list member */
} DOCRELNODE;

typedef struct 
  {
  int doSearch;                /* Execute a search */
  int returnReformulatedQuery; /* Send the reformulated query back to
				* the client */
  int reformulateQuery;        /* Use the relevance documents to create a
				* list of unique terms which can be added
				* to the query.                        */
  int newQuery;                 /* This query is not related to any
				 * previous query aka we have not done
				 * relevance feedback to get to this
				 * query
  /*-------------------------------------------------------------------*
   * Limits used to control the number of additional relevant terms
   * used/returned for review. Supporting relevance feedback created the
   * need for these so they are being kept with the relevance document
   * lists.
   * These may also be useful if we were to support boolean queries and
   * the expand operation.
   *-------------------------------------------------------------------*/
   int minrelterms;              /* The minimum number of additional
				  * relevant terms to be used/return for
				  * review. This may be useful if we use
				  * a selection method based on something
				  * other then position in ranked list, 
				  * ex. top N terms.                   */
   int maxrelterms;              /* The maximum number of additional
				  * relevant terms to be used/return for
				  * review.                            */


  DOCRELNODE *doc_rels;      /* List of document relevance information */
  int  num_wanted;            /* Desired number of items in the result */
  enum query_type 
    {
    Q_TYPE_TEXT,
    Q_TYPE_DOCID,
    Q_TYPE_BOOLEAN,
    Q_TYPE_LIMIT	      /* The number of query types */
    } qtype;
  union q_type_union 
    {
    TEXT_QUERY* text_query;
          void* boolean;      /* Not defined, yet.  Not implemented, yet. */
    } u;
union q_type_union reformed;  /* Reformed query returned by searcher. */
  } QUERY;

Result

typedef struct {
  DOC* docs;		  /* Data representing retrieved documents.    */
  int  doc_ct;		  /* The number of documents retrieved.	       */

  int  query_id;	  /* For experimentation and evaluation.       */
  QUERY *query;           /* A PRISE query structure                   */
  QTERM* qterms;	  /* A list of structures representing query   */
                          /* terms.*/
                          /* This is typically a subset of 'tknlist'   */
                          /* list. */
			  /* Stopwords, for example, have been removed.*/
  			  /* QTERMs are not strictly necessary, since  */
                          /* they are redundant given TKNs, but are    */
                          /* provided for compatibility with other     */
                          /* components, e.g., feedback.               */
  			  /* Allocated and set via qtinit().	       */
                          /* qterms added by dld to allow for          */
                          /* highlighting                              */
  TKNLIST* tknlist;	  /* A list of structures representing the     */
                          /* original                                  */
} RESULT;

Functions

These functions are located in DISTRIBUTION_ROOT/prise_search/src/lib/search/search0.c.

int init_search0
	(	     
  	char*  coll_name	Collection
  	char*  coll_dir		Collection directory	     
  	COLL** coll    		Structure representing collection components
  	RESULT** result		Search results.			        	
	)

	Returns:
		RET_OK
  		RET_ERROR	Failed to initialize the collection
				Failed to malloc RESULT
				Failed to allocate result space
		RET_FAIL	Initonly option - show specs no retrieval

The interface function init_search0() must be called prior to the first call to search0.

int search0
	( 
	COLL*   coll 		Structure representing collection components
	QUERY*  query 		Query 
	RESULT* result		Search results
       	)

	Returns:
		RET_OK		Retrieved %d document(s)
		RET_ERROR	Error returned by search()
		RET_FAIL	No documents retrieved

The interface function search0() activates the PRISE search engine function, search() in DISTRIBUTION_ROOT/prise_search/src/lib/search/search.stem.neg.c.

int close_search0
	( 
	COLL*   coll		Structure representing collection components
	RESULT* result		Search results
	)

	Returns:
		RET_OK

References

Harman1: Relevance Feedback Revisited by Donna Harman.
15th Ann Int'l SIGIR'92/Denmark-6/92.
Okapi1: Okapi at TREC-3 by S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, M. Gatford
Overview of the Third Text REtrieval Conference (TREC-3)
NIST Special Publication 500-225, April 1995

National Institute of Standards and Technology Home Page

Last updated:

Date created: Monday, 31-Jul-00