Cheshire II Information Retrieval System Cheshire II allows you to search a very large collection of documents for those that are relevant to some topic of interest to you. You do this by formulating a query composed of words that are key to the topic of interest. When you ask Cheshire II to perform a search using your query, it finds the set of all documents which it believes are relevant to your query, or those which match a condition which you specify (e.g., the title must contain the word "cat"). This result set may be a very large, but you needn't necessarily look at all the documents in order to find those of interest to you. For topical searches, Cheshire II tries to list the documents with the ones most likely to be relevant closest to the top. The relevance of a document to a query is predicted based on several factors, among them: - how many of the query words also occur in the document, - how many times each such word appears in the document, - how often each query words appears in the collection as a whole, - and how long the document is. You can ask Cheshire II to display brief information about each document, or the full-text of the original document. You can also mark documents of interest to be remembered/saved. The following tutorial will lead you through some examples of the process of searching for, displaying, and remembering/saving documents using a collection of over 200000 articles (article = document) from the Financial Times of London during the years 1991 - 1994. Follow the instructions for the user (marked with ">>") and note the system's response. We will answer any questions that persist, but may do so by pointing you to places in the tutorial. Cheshire II is a research prototype that may occasionally fail or display an error message meant for system designers or users. If such an error message appears, simply click on the "OK" button to remove the message and continue. If the systems stops working altogether, let someone know and we will help you restart it. You cannot harm the system or the data in any way by using it. Help information is available and can be displayed by clicking on the "Help" menu item in the upper right corner of the main window. To start the tutorial please turn to the next page. Initially you will see a single window. The window has four sections: a menu bar, a section for entering a search, a window for seeing documents retrieved by your search, and a section for performing actions on the documents you've retrieved. At the bottom of the last section is a pair of status indicators: the scale on the left shows how many documents your search located, and how many have been downloaded to your client for display. The small, empty field to the right will display status messages from the client when it's engaged in a task. >> Use the mouse to click on the button on the menu bar labeled "START". This will cause a list of hosts you can search to appear. Click on the list item for the host named "TREC". The status light will blink the message "Connecting" briefly, then the search entry portion of the window will become active. >> Click on the button in the search area labeled "Ranking?". This will cause another list to appear. Click on the first list item, "By Record". Then click in the text area to the right of the "Ranking?" button. The frame of the "Ranking?" field gets dark and a blinking vertical bar - the text cursor - appears in field to indicate where the next letter you type will appear. You can only enter text at the blinking vertical bar cursor. >> Assume you are interested in answering the following question: "What drugs have been used to treat asthma?" You want to find and remember documents which, taken together, mention as many different asthma drugs as possible. If you save several documents that talk about some of the same drugs, it doesn't matter as long as the total set of documents you remember/save covers as many asthma drugs as possible. >> Type the following words in the Ranking text entry field: drugs for the treatment of asthma If you make a mistake, you can use the backspace key to undo your error and start again. Moving the I-beam cursor to the position you want and clicking will change the location of the text cursor. >> Click on the "SEARCH" button. The status field will blink "Searching" briefly, then "Retrieving". At this point, Cheshire II has created a list of documents that it believes are relevant to your query. This list is called a result set. The scale at the bottom left will show how many documents are contained in that result set, and how many of those have been downloaded for display. The documents which have been downloaded for display will appear automatically in the text display area. >> Locate each piece of information discussed below for the first title in the document display area: The rank of the document in the current result set (the number displayed at the beginning of each displayed record next to the "Select" button). 1 A document identifier FT941-10709 The date of the article/document 08 FEB 94 The title of the article/document UK Company News: Glaxo asthma drug wins US approval >> Click on the button labeled "Format" on the button bar, and then click on "Long". The display of the documents will alter, so that the full text of each document is displayed in addition to the title and other bibliographic information. >> Notice that some words in the documents appear black against a grey background. Why? (These are words from the query, which were also found in the document.) They may be helpful in finding relevant parts of a document. The document display area supports paging up or down through the various records, by using the scroll bar on the right of the document display. If you scroll all the way to the end of the currently displayed group of records, Cheshire II will automatically go to retrieve the next group of records from this result set and append them to the current display. >> Click once on the trough of the scrollbar below the scroll button to move down in the document display. >> Click once in the trough of the scrollbar above the scroll button to move back in the other direction. >> Read the document to determine if it mentions asthma drugs. It mentions two different asthma drugs, a new drug called Serevent, and an older drug called Ventolin. It seem this is a document you should ask Cheshire II to remember. >> Scroll backwards to the beginning of this document's information. Click on the button labeled "Select" at the beginning of the document. Then click on the button below the document display area labeled "Save". A dialog box will appear asking you if you want to save all of the currently displayed records, or only the selected ones. Click on the button to indicate only selected records should be saved. The dialog box will disappear, and the button below the document display labeled "View Saved" will become active (the text of the window will switch from grey to black). This indicates that you have saved some records which you may view if you choose, by clicking on the "View Saved" button. NOTE: Cheshire II will forget all saved documents if you exit Cheshire II or click on the "Clear" button which will appear if you view the saved documents. It will also forget the saved documents if you leave Cheshire II unattended for more than 10 minutes. At this point you have tried the basic functions in Cheshire II. Most of the other buttons are labeled so as to make their purpose clear. To find and save documents naming more asthma drugs, you could read more of the documents in the current result set and ask Cheshire II to remember the ones that mention additional drugs. You could modify your query and click on "SEARCH" again to find more documents to be reviewed and possibly remembered. Adding words to the query will usually increase the number of documents Cheshire II finds relevant to your query and may change the composition and/or order of the documents which Cheshire II presents to you. >> Try adding the word "ventolin" to the query and see if the total number of documents found changes and/or the first group of documents displayed changes. Ventolin is the name of an asthma drug mentioned by several of the articles identified by the first query. >> Now suppose that you wanted to restrict your search to articles which focused primarily on the products of the Glaxo company. You might assume that documents which contained the word "Glaxo" in their title are more likely to focus on the Glaxo Company than those which do not. Click on one of the buttons in the search area labeled "Index?" to produce a list of document indexes you can search, then click on the word "Headline" to specify you want to include a headline component to your search. Click in the text entry field to the right of the "Index" button, and type "Glaxo". Then click on the "SEARCH" button once more. Notice that your result set has become considerably smaller. While the documents in your result set are still ranked based on the terms you entered in the "Ranking" text field, only those records from your original ranked search which contain the "Glaxo" in their headline are in this new result set. >> Now suppose that you decide that the first document in your retrieved set is a perfect example of the type of document you=re looking for. Click on the ASelect@ button next to the number >1' on the first record. Then click on the AMore like Selected@ button underneath the document display. A dialog box will appear asking if you really want to search for similar records. Click on the AYes@ button. A search for similar records can be time consuming, taking between 30 seconds to a minute. However, its benefits can outweigh the time taken to perform such a search. A search for similar records will find all records in the database which contain any of the terms in the document you initially selected, and then will attempt to rank them by their similarity to that initial document. When the search is concluded, check the size of the total result set. You will notice it has become quite large. Fortunately, you don=t have to look through all of these records. The ones most likely to be useful to you should be listed first. This concludes the tutorial.