User evaluation of automatically generated semantic hypertext links in a heavily used procedural manual

John Tebbutt

The National Institute of Standards and Technology,

Gaithersburg, MD 20899

Abstract

This paper is an interim report on our efforts at NIST to construct an information discovery tool through the fusion of Hypertext and Information Retrieval (IR) technologies. The tool works by parsing a contiguous document base into smaller documents and inserting semantic links between these documents using document-document similarity measures based on IR techniques. The focus of the paper is a case study in which domain experts evaluate the utility of the tool in the performance of information discovery tasks on a large, dynamic procedural manual. The results of the case study are discussed, and their implications for the design of large-scale automatic hypertext generation systems are described.

Introduction

This paper is an interim report on our efforts to build an information discovery tool to aid in the interrogation, analysis and modification of a large, online procedural manual. The tool uses Information Retrieval (IR) technology to process the text of the manual, augmenting it with semantic hypertext links based on document-document comparison. The manual text is marked up using HTML (Raggett, 1997), adding the potential for access via the World Wide Web (Berners-Lee et al, 1992), as well as locally, using any of a variety of increasingly sophisticated browsers.

We first introduce the problem domain and give an overview of the tool and its mode of operation. However, our main emphasis is to present the results of a case study which was undertaken in order to assess user reactions to the novel presentation of the manual information and thus refine our approach in developing the software.

The case study consisted of in-depth interviews with domain experts and detailed monitoring of their behavior while using the augmented manual. Our objective was to observe and record user reactions to the semantic links, the perceived relevance and usefulness of the links, and, most importantly, whether the users felt that the links enhanced their ability to perform information discovery tasks on the manual text.

The problem domain

Our work in this field grew out of a series of earlier projects on behalf of the United States Social Security Administration (SSA), in which IR or IR-related techniques had been used as the basis of prototype tools with the potential to aid SSA staff with various information management problems.

The work reported on here forms part of our ongoing research into the creation of a tool to enhance information discovery and management potential for users of the SSA?s Program Operations Manual System (POMS). The POMS is a moderately large procedural manual (currently approx. 100MB, or 41,000 pages) which is consulted by Agency employees on a frequent and routine basis for a variety of information discovery and management tasks, including:

informational searches to determine correct procedures relating to a particular benefit claim or set of circumstances;
retrieval of standardized text or form letters for communication with claimants and others;
insertion, deletion or modification of material in response to legislation, court rulings, changes in policy or administrative procedure, or other miscellaneous editorial changes.

The latter task usually involves searching the manual for duplicate, contradictory, or related material in an effort to maintain consistency within the POMS document base. This can become a laborious and time-consuming process, even in the presence of scrupulous manual cataloging, a resource that was not available in this instance.

The master copy of the manual resides at SSA headquarters, and is in a continual state of revision and growth, as new laws, court rulings, policy changes and the like are manually incorporated into it by human writers and editors, as referred to above. A snapshot of this master copy is distributed monthly on CDROM to SSA field offices throughout the U.S., with each month's copy superceding that of the previous month.

The manual has a fairly strict, hierarchical document structure, with six levels: Part, Chapter, Subchapter, Section, Subsection, and Subsubsection. Text segments at the various levels of the hierarchy are systematically labeled to reflect their position in the hierarchy. Table 1 gives as an example a breakdown of the hierarchical path to Subsubsection DI 10505.010.A.3. This hierarchy resembles that described by Frisse (1988), though the depth of subpart dependency is not as great.

LEVEL	CODE	TITLE
Part	DI, Part 4	Disability
Chapter	DI 10500.000	SUBSTANTIAL GAINFUL ACTIVITY (SGA)
Subchapter	DI 10505.000	EVALUATION AND DEVELOPMENT OF EMPLOYMENT
Section	DI 10505.010	Determining "Countable Earnings"
Subsection	A.	SUBSIDIES
Subsubsection	3.	Nonspecific Subsidy

Table 1. Hierarchical structure of the POMS manual segments

The text of the CDROM distribution is marked up in a proprietary format resembling SGML, and is designed to be used in conjunction with a browser which is included with the distribution and which incorporates a Boolean search engine. This version of the POMS is, in essence, a hypertext document, since it includes structural links (e.g. a link from a table of contents to a Section) and referential links (e.g. a link from a citation to the cited text).

In summary, the POMS consists of a large, text-based manual accessed via a fairly standard Boolean search interface. At any given time, moreover, it exists in two versions: a snapshot of the master copy on CDROM in the SSA field offices, which is stable on a monthly basis; and the master copy itself, at SSA headquarters, which is under frequent revision.

Fundamentals: Why hypertext and IR?

The combination of hypertext and information retrieval technology seems particularly well suited to the kinds of tasks associated with information discovery. Both technologies possess the indisputably positive attribute of being easy to use and learn: with the ever more popular World Wide Web, the casual use and understanding of hypertext has become widespread, while the natural language interface to IR search engines enables users to formulate complex queries with little or no training.

A study by Dimitroff and Wolfram (1995) testifies to the ease of use of hypertext. They conducted a usability evaluation study of a hypertext-based information retrieval system which found little or no variation in the duration of search between experienced and novice users. The researchers concluded that such systems "can be successfully searched by individuals with a wide range of experience". This was of particular interest to us, since one of our objectives was the production of a tool which is easily used and has a rapid learning curve. Baron et al. (1996) describe a study in which the efficacy of typed or labeled hypertext links in aiding information discovery was assessed against that of both unlabeled links and the absence of links. Their conclusions indicate that labeled links produced significant gains in what they term query performance (i.e. speed of information discovery), when compared to either of the other link categories. This conclusion supports that of Dimitroff and Wolfram and extends it to indicate that typed or labeled links are of most use. Our proposed design included the production of a document base with a single semantic link type. In order to maximize the utility of these links, we decided to group them together under a title describing the type of the links.

The appeal to us of IR technology lay not only in the natural language-style user interface, but also in its potential use in the automatic creation and updating of hypertext document webs. Agosti (1997), sums up a pervasive theme in the field of automatic hypertext construction: "Since the most difficult part of the automatic construction of a hypertext is building up the links that connect semantically related documents..., it is natural to concentrate on IR techniques, that have always dealt with the construction of relationships dependent on the mutual relevance of objects...". Significant efforts in the automatic linking of documents or of passages within documents (for example Agosti et al. (1996), Allan (1997), Salton et al. (1996)) have demonstrated that IR technology can readily be applied in the construction of semantic links between one body of text and another. It seemed that this approach would work well in our case, as our envisioned end product would be a fairly dynamic, internally linked document web which would be automatically recreated on a regular basis to reflect changes to the base document collection (the "master copy" referred to above). We decided that IR technology might easily be integrated into such a process, and we had a tried and tested IR search engine to hand in NIST's PRISE system (Harman and Candela, 1990).

Thus we envisioned a tool with three key components:

A user interface to a natural language search engine;
A user interface to a hypertext browser; and
A module to automatically create and update the document base in the form of a semantic web, using IR technology.

Creating and maintaining a semantic hypertext

Previously, we had focused our attention solely on the automatic creation of a static hypertext from a collection of autonomous documents, such as those contained in the TREC collections (e.g. Harman, 1996). However, the availability of the POMS gave us the opportunity to apply our model to a dynamic document base, in addition to a ready source of domain expertise which we could tap in order to evaluate our approach.

We hoped to use the POMS as the basis of a prototype online, dynamic, adaptive manual system.

Our objectives were fourfold:

To create a semantic web based on the POMS, thereby providing a browsing capability within subject areas and enabling searchers quickly to move between similar documents;

To provide a flexible, automatic maintenance facility for the semantic web;
To present the material in a clear, text format, while enabling users to navigate quickly and easily around the manual;
To add the potential to access the prototype document base over the Web, giving potential users instant access to the most current edition.

Users would require some sort of entrance point into the prototype hypertext, one which was as intuitive and uncomplicated to use as we hoped our augmented hypertext would be. We decided to use a Web-based interface to PRISE as a gateway into the document web. This way, awkward or complicated interfaces involving Section numbers and so on could be avoided, and we conjectured that this interface would be more flexible than, for example, a hypertext-based table of contents. We also agreed with Thistlethwaite (1997) that the original copy of the document base should not be marked up (to avoid issues such as alienation of the authors and concomitant difficulties in updating, etc.). A separate copy of the semantic web would thus be automatically created from the original on each occasion our tool was run.

Addition of semantic links: the LEIDIR system

The following is a brief description of the mechanics of the prototype system we developed in an attempt to satisfy the requirements discussed above. We call the system LEIDIR (Link-Enhanced Information Discovery using Information Retrieval). This description relates specifically to its use with the POMS manual.

The POMS was augmented using NIST's prototype LEIDIR software, which is designed to create a web of documents from documents or document collections of arbitrary size and format. This is achieved by inserting into each document semantic links to the n most similar documents in the collection, where n is a user-configurable value. In this study, n was set to 5: we wanted to provide a variety of target documents while maintaining a manageably-sized list of links. We refer to the inserted links as semantic links because they are intended to convey a relationship between the content or subject matter of the linked documents. According to Allan's (1997) link taxonomy, the links we employ are equivalence links, because they link documents with a similar subject matter (semantic links of other types are for future study). The resultant documents are formatted in HTML, making them accessible via any World Wide Web browser and thereby removing the need for specialized browsers to view the text.

The system is designed with the goal of providing automatic maintenance of the semantic web as necessary. At the moment this requires rebuilding the web and is thus limited to an overnight run, but with incremental indexing and refinement of software and hardware components, the web could be maintained in real time.

LEIDIR has four principal software components:

a variable preprocessor module;
an invariant main processor module;
a search engine, and
a browser capable of displaying HTML-formatted documents.

The preprocessor module is used to break up the initial document collection into a second generation collection of documents of a manageable size, and to format these documents into HTML, if necessary. The notion of manageable size reflects several important factors:

the documents should be of a size that is optimal for users - this depends on the user community in question;
the documents should be of a size that is acceptable as a query to the search engine;
the documents should be self-contained, i.e. the user should not need to refer to other documents in order to put a document in context.

Based on our limited familiarity with the POMS, it appeared that the Section was the principal content-bearing unit of the manual.¹ It also seemed that parsing at the Section level would satisfy the three criteria outlined above, and the POMS was parsed accordingly, yielding a collection in which the median number of terms per document, after stoplist word removal, was 123, with a minimum of 4 and a maximum of 20,750 terms per document. For the sake of expediency, the input documents were simply parsed at each document header representing a Section or greater level in the document hierarchy. In doing this, we circumvented the need to develop optimal starting points for browsing (Frisse, 1988) as all pertinent descendant documents of a topic would be incorporated into their ancestor, to use Frisse?s terminology. The preprocessor generally needs to be custom-coded for a class of documents, since the properties of document classes vary so widely. Thus our preprocessor module was tailored to the POMS document structure.

The main processor module is the heart of LEIDIR, and yet, for the time being, its operation remains relatively straightforward: each second generation document is passed to the search engine as a query over the entire second generation collection. HTML links to the top n-ranked (see above) documents are then inserted at the end of the query document, thus creating a document collection in which each document is linked to n others. The inserted links are preceded by a short text label which identifies their origin and function. Along the way, a simplistic calculation is employed to assign a Percentage Similarity Measure (PSM) to each inserted link: the idea behind this measure is to give a rough estimate of the similarity between the document containing the link and the target document of the link, thus aiding the user in making the decision as to whether to follow the link. Finally, a structural link to the search engine interface is inserted at the end of each document.

Figure 1 shows the general appearance of a completed document, with links.

DI 12005.175 Black Lung Procedures

A. General -- Claimant Wishes to Appeal a Black Lung Decision

Although SSA's role in the black lung program has diminished considerably, we are still responsible for maintaining the Part B beneficiary rolls, certain new Part B survivor claims, and some appealed black lung claims which are still in the pipeline. The Department of Labor (DOL) is responsible for all Part C claims (see DI 11045.001 and DI 11045.190 ) including those which were reviewed by DOL and--or SSA under the 1977 amendments to the Federal Coal Mine Health and Safety Act.

. . .

B. Request for Reconsideration

ODO has responsibility for both the disability and nondisability aspects of Part B black lung claims. Therefore, if a claimant files a request for reconsideration and the black lung folder is not needed by the DO, forward the request for reconsideration and any evidence the claimant wishes to submit in connection with his appeal directly to ODO following the procedures used in title II disability claims. Be sure to show that the request for reconsideration pertains to a black lung claim. If the DO needs the black lung folder in connection with any inquiry or request for reconsideration, it should be requested following the same guides used in requesting title II DIB folders. Again be sure to show that the request pertains to a black lung folder.

Automatically generated links to 5 related documents [with % similarity measure]:

AO 10010.099 May, 1995 [82%]

NL 00709.100 Black Lung Paragraphs [81%]

AO 10010.113 MARCH, 1994 [74%]

NL 00708.100 Numbered Paragraphs [68%]

AO 10010.101 MARCH, 1995 [66%]

Return to PRISE Search Engine

Figure 1. A typical POMS Section page augmented with semantic links (abridged).

The search engine we employed was NIST's experimental PRISE system, a statistically-based, ranking information retrieval engine. The search engine serves two vital functions: it is essential in the creation of the semantically linked document web, as described above: the ranked output from the search engine determines which semantic links will be inserted between documents; it also serves as an entry point into the document web. The user initially submits a text query to the search engine, and is returned links to a number of documents in the collection, ranked in order of relevance. Following one of these links takes the user to a document, and the user is then free to browse within the web using the semantic links included in each document.

The HTML browser is essential for viewing the document web, and the choice of browser can markedly affect the utility of the system. Such features as a search utility, a good history mechanism, bookmarking facility, and display configuration were all found to be of assistance to the users in this study.

Method

Participants

Because of the interim nature of this study, we felt it would be sufficient to gather together a relatively small group of POMS domain experts to consult with in depth: LEIDIR was not yet sufficiently mature to merit the type of rigorous and extensive studies one might envision for the future. Nevertheless, we needed to gather as much detailed feedback as possible.

The participants in the study were SSA staff with a minimum of three years' experience using the POMS on a daily basis. Of the six volunteers who took part, three drew mainly upon their experience as field officers, two on their experience as POMS editors and instructors, and one as a POMS writer. All were very knowledgeable about the POMS, to the degree that they could easily tell the subject area of a Section by looking at the Section number.²

Prior to this study, the principal mode of interaction with the POMS for all participants had been through the CDROM editions and the associated Boolean search engine. About half of the participants had used a modified version of this software in order to carry out editing tasks, as opposed to simple lookup.

Study Sessions

The study was split into four sessions, each of which included one or two participants. Each session began with a brief introduction to LEIDIR and the augmented POMS, and the goals of the study, which were stated thus:

Having created a fairly robust software prototype, I'm trying to get some feedback from experienced POMS users as to how useful and usable LEIDIR is as an information discovery tool for the POMS. I'm especially interested in a qualitative assessment of the links within each document, as opposed to the search engine interface, but I encourage you to use the system in the way in which you feel most comfortable. There followed a brief tutorial on how to access the augmented POMS through the PRISE Web interface, and how to navigate around the document base using the semantic links. The tutorial included a "dry run" session in which each volunteer was invited to supply a sample query. The participants were asked to regard the system, not as a replacement for the present system, but as an experimental alternative method of accessing the information in the POMS (in fact, this was impressed on them several times during the course of the study).

The fourth element of the session lasted about an hour. The participants were asked to use the system as best they could to gather information about items or tasks they might typically attempt to find in the course of their regular work (indeed, the usual starting point for most participants was to move directly to an inquiry or subject area they were working on at the time). Participants were encouraged to describe and discuss their actions and motivations at every stage, as well as any other comments or thoughts they might have, as openly as possible. The participants acted independently during this element, the researcher restricting his actions to observations and technical assistance with the system.

The last part of each session consisted of a fourteen-part evaluation questionnaire in which participants were asked to give specific thoughts on selected aspects of the system.

The fourth and fifth elements of each session were recorded on audio tape for later analysis, with the permission of the participants. Each session yielded about one hour and ten minutes of audio tape.

System specifics

The augmented document base resided on a Sun UltraSparc system at NIST in Gaithersburg, MD and was accessed by the participants via the World Wide Web from SSA headquarters in Baltimore, MD, using Netscape 2.0 running on a PC box.

Results

As stated above, the intent of this study was to obtain qualitative feedback on the usefulness of the semantic links. This section attempts to summarize the feedback obtained around several issues we felt to be key. Implications of this feedback, both for our prototype and in a wider context, will be discussed in the next section.

Advantages conferred by the links over the search engine alone

Reactions spanned the spectrum from very positive, through confusion, to negative. Generally, the less experienced users were more enthusiastic about the links.

On the positive side, the links were seen as enhancing navigation through the document space, enabling the user to home in on a subject area, and getting the user on track with a particular topic. A contrasting view was that the links helped widen the user's information discovery options.

The predominant view was that the presence of the links decreased the search time required to retrieve a given information item significantly, as compared to the search engine alone. Reasons cited for this included continual back-stepping to the search results screen of the search engine, and reformulation of search topics in order to browse a variety of documents. Browsing via the links was seen as an aid to steering the user in the right direction, especially in cases where there was no clear vision of what information was required.

One user felt the presence of the links made no difference in the utility of the system.

Appearance and positioning of the links

There was strong support for moving the links to the beginning of the document, immediately after the document title. Reasons cited for this centered around removing the need to scroll to the end of the document in order to view the links, especially in cases where the document could immediately be identified as irrelevant from its title (but presumably some similar document might be of use).

There was also support for the insertion of a structural link back to the search engine at the top of each document, as well as at the bottom (and points in between, for long documents), and for structural links to the previous and next document (in the order they occur in the POMS) at the top and bottom of each document.

The participants were asked whether they would find the links more useful if they were further typed, specifically according to the location (POMS Part) of the target document, and, further, if this type was indicated by displaying the links in different colors. None of the participants responded favorably to this suggestion, saying that the fact that the link titles showed the Section number and title of the target document was sufficient.

Suggested improvements included a better explanation of the Percentage Similarity Measure and a plea for the addition of referential links.

Percentage Similarity Measure (PSMs)

Three camps formed around the utility of the PSM. The first considered the PSM irrelevant at best, citing such reasons as that the document identifier of the document was a much more important measure of relevance. Also, if one is looking at an irrelevant document, the ability to judge the similarity of the included links would seem superfluous.

The second camp did not trust the figures, feeling that they had been "misled" by high PSMs into looking at target documents with very little connection with the source document, or, mysteriously, that just because a document appeared similar to the source document did not mean that it was similar. This latter view seems more a product of gut feeling than empirical experience.

The third camp was very positive about the PSMs, and indeed suggested that the value of the PSM be used as a threshold for inclusion of links, thus introducing a variable number of links per document, an idea which is under consideration for the next phase of the LEIDIR prototype.

Accuracy of the links

Some participants felt that the semantic links were very accurate, though did not offer any evidence or comment in support of their assertions. Others felt that the links were not accurate, and cited by way of example a particular first level source document (POMS Part) in which many passages are couched in very similar language, thus increasing the likelihood that many of these passages may be linked together while not actually related in any meaningful way.

Number of links

The general consensus was that the five semantic links included in each document for the study was a minimum and perhaps too few, given the size of the source document collection. Again, the suggestion was made that the number of links be governed by their relevance, with a minimum of five or so. Interestingly, this position was espoused even by those who had previously expressed mistrust of or indifference towards the links.

A strenuous pitch was made for the addition of referential links, i.e. the insertion of hypertext links at existing cross-references between documents. Such references were seen as a crucial class of link between different Parts of the POMS - a class of link which could not be replaced by the semantic links installed by LEIDIR.

Ease of use of the system

On the whole, the participants found the system easy to understand and straightforward to use. The main point of confusion which came up was the Percentage Similarity Measure: it was not clear without further explanation what it was that the linked documents were similar to, the initial query or the current document.⁴

Suggested improvements

We asked the participants for their advice in improving the system. Specifically, what features would they like to see in the system which were not currently present? In order of preference, the features most desired which were not available in the study prototype were:

The ability to use numbers as search terms (e.g. Section numbers, dates, other references);
The inclusion of referential links, i.e. the hypertext linking of cross-references in the text;
The inclusion of structural links, such as from existing tables of contents, or forward and back pointers between Sections;
The highlighting of search terms in the text;
The ability to search for phrases, using quotes;
Direct access to POMS Sections by Section number.

These suggestions and their implications are discussed below under User Mental Models.

User adaptability

All the study participants adapted very well to the new system and had little difficulty processing information presented to them. All were familiar with the capabilities of the Web browser, using them to good effect, and all recognized and navigated the semantic links with ease. However, all participants at some stage seemed to reach some sort of cognitive barrier, beyond which they felt they could not go without resorting to features of the familiar Boolean search engine (see "Suggested improvements", above). The more experienced participants seemed to hit this mental roadblock sooner. We found that at this point, the participants might stop work, cast about for a solution, ask one or more questions pertaining to the availability of familiar search engine features, start a discussion as to the nature of the new tool, make recommendations, or perhaps discard the task and start afresh, such that it was fairly clear that some sort of impasse had been reached in the task at hand.

Discussion

General

This study has provided valuable feedback on the reactions of domain experts to the insertion of automatically generated semantic links into the POMS. While the feedback is clearly specific to the POMS collection and the LEIDIR tool, we believe much of it is also applicable in the general case. Comments regarding the usefulness of the links when the exact target document was unknown, or about confusion over the PSMs, the perceived accuracy of the links, or the ease of use of the system as a whole would seem to be collection and implementation independent.

Overall, the introduction of semantic links into the POMS document base was well received. In particular, the links were useful when the information requirement was ill-defined, i.e. when it was necessary to browse the document collection. The links were also useful in targeting a known document within a set of documents whose subject matter was well-defined - also a browsing task.

Percentage Similarity Measures and the concept of document similarity

We were puzzled by the lack of enthusiasm on the part of the study participants for the PSMs associated with the links, as these had been designed as an easy to read, easy to follow guide to the similarity between the current document and the document pointed to by the link. It appears that the listing of the links in rank order was sufficient, with the PSM serving only to confuse matters. This finding is of particular interest since many of the major Web search engines incorporate percentage scores into their results, similar to our PSMs.

The problems with link reliability (discussed below) also affected the perception of the PSMs, so that the participants were less willing to trust the figures.

Overall, there seemed to be some difficulty with the notion of document similarity. Participants were unsure of how two documents could be determined to be semantically similar; some believed document similarity to be a subjective concept. Coupled with the unreliability of some of the links, participants' faith in the concept was diminished..

Variable number of links per document

Some of the study participants questioned the value of having a static, predefined number of semantic links in each document, especially if the linked documents had a low PSM. Much more appealing would be the notion of including a link based on its PSM or other measure of similarity, with preset maximum and minimum values for the number of links to be included. Such a scheme might include a link only if its PSM (or other similarity measure) exceeded, say, 70%, subject to a minimum of 5 links and a maximum of 20.

This idea is very appealing and has obvious merits - the participants were often more concerned about what they were not seeing than with what they were. Such a formula for link insertion would go a long way to address such concerns.

Inclusion of structural and referential links

The inclusion of structural and referential links within the final document base was seen as highly desirable by many of the study participants, with a particular emphasis on referential links. Unfortunately it was not possible for us to present the users with a system embodying these link types at the time of the study. Work is underway independently to produce a hypertext version of the POMS containing only structural and referential links, which we will be able to use both as a baseline for future work and as input to LEIDIR for augmentation with semantic links.

User mental models

Many features of the study prototype combined to create a novel presentation of the POMS material which was substantially different to that to which the participants were accustomed. The documents were presented in a contemporary layout through a modern HTML browser, which contrasted with the tty-style of presentation the participants were used to. The search engine was natural language-based as opposed to simple Boolean, affording the opportunity to enter phrases as queries and returning a list of documents ranked in order of relevance. The semantic links enabled the participants to browse the collection, and the features of the browser, such as bookmarks, search tools and document history files, offered new ways to interact with the manual text.

The participant "wish list" of suggested improvements to the prototype interface is instructive in that the features the users most wanted to see added were precisely those features they were accustomed to using in the Boolean POMS search engine. In many instances the users clearly felt hamstrung by the absence of these features.

In return, our study was hamstrung, to a degree, by the users? reliance on familiar frames of reference and by their expertise, which was a quality we had sought but which turned out to be far greater than expected. Domain familiarity effectively nullified two of our principal areas of inquiry: link typing and the document similarity measure (PSM). Since the document numbers (POMS Section numbers) were displayed in the link titles, the participants required no additional information. Thus link typing, at least in the way we suggested, was dismissed completely, and the document similarity measure was welcomed as the basis for computing the number of links to be displayed, but not as a visual aid in the interface.

It may be that the user-related effects described here would be offset by a longer exposure time to the prototype. Our study was, in every case, very short (less than half a day). Given time, the more experienced users would have the opportunity to assimilate the workings of the prototype and the effect of domain expertise might diminish. This is an area for further study.

Unreliability of semantic links and PSMs

Some of the semantic links and their corresponding document similarity measures were perceived as inappropriately placed or unreliable. The root of this problem lies in an unforeseen interaction between a number of very long documents in the collection and the PRISE search engine, which had not been used with long (i.e. greater than sentence-length) queries prior to this study.

Since completion of the study, we have modified the PRISE software to incorporate query processing and a more effective length normalization function, Robertson et al?s BM25 function (Robertson, et al, 1995), and the problem has been eliminated.

Since this is entirely an implementation issue, it will not be treated further here except to say that, given the generally positive reaction to the links in the study, it is reasonable to suggest that, given better links, the reaction would be more positive still.

The problem is discussed in greater depth in Appendix 1 for the record.

Conclusions

This study set out to examine user reactions to automatically inserted semantic hypertext links in a large, familiar, document collection. The feedback we gathered confirmed much of what we had hoped regarding the usefulness of the semantic links, but opened up various issues we had not anticipated.

Our results confirm that it is possible, desirable and worthwhile to use Information Retrieval-based technology to create semantic document webs from existing large-scale document collections. Further, such webs can be dynamic in nature (though our study does not directly demonstrate this) and widely accessible through the World Wide Web.

Automatically-generated semantic links installed between the documents in such a web are generally reliable and provide a valuable browsing capability, allowing users to search around a general subject area and target a specific information item, or to start from a known item and search around it to build a broader information base. However, it appears that the value of this browsing capability decreases with the domain expertise of the user. This finding complements those of Dimitroff and Wolfram (1995) and Baron et al (1996), since it relates to the expertise of the user relative to a knowledge domain, while the earlier studies evaluated the expertise of the user in terms of experience with hypertext systems.

The inclusion of automatically-generated semantic links between documents in a collection speeds the location of information, when compared with the use of a search engine alone. This is in accordance with Dimitroff and Wolfram (1995) and Baron et al (1996).

While valuable in the information discovery process, semantic links alone are insufficient as the basis of an automatically-generated hypertext: structural and referential links should also be present, especially if the document collection is specialized and/or domain experts are likely users.

Ideally, the number of links included per document should be a function of the similarity of the target documents to the source document, as opposed to some fixed quota. The links should be incorporated in such a way as to be easily accessible from the head of the source document (perhaps in a separate window), and a link back to the search or entry page should be available at all times.

Our results indicated that visual typing of links or the addition of percentage scores to indicate document similarity are of little or no benefit in the automatic creation of a hypertext. However, it may be that this phenomenon is an artifact of the experimental procedure, and that such features might be better appreciated by users after a longer exposure to the system.

Future Work

Improvement of the interface to LEIDIR is already underway: we aim to offer an interface through which the document source, its associated links and a link to the search engine are presented simultaneously but independently in separate areas of the browser window. We feel this should go a long way towards addressing the concerns of the study participants regarding the placement of the semantic links and the search engine link.

The inclusion of structural and referential links into the document base is a work in progress, as is the introduction of additional link types. User reaction to the latter is of great interest to us.

We intend to begin serious consideration of the use of some threshold, based on document similarity, for inclusion of semantic links into documents. This is another area which is of great interest to us.

In the longer term, we hope to repeat and extend this case study in order to determine whether LEIDIR is a more useful tool as a result of the modifications we have made, before going on to a more broadly based series of user evaluations.

Acknowledgements

I would like to extend my sincerest gratitude and appreciation to Donna Harman and Martin P. Smith for their inspiration and encouragement in the production of this report, as well as the staff of the Social Security Administration in Baltimore, Maryland, for their cooperation and assistance in the completion of the case study which this paper describes.

References

Agosti, M., Crestani, F., and Melucci, M. Design and Implementation of a tool for the automatic construction of hypertexts for information retrieval. Information Processing and Management, 32(4): 459-476, 1996.

Agosti, M., Crestani, F., and Melucci, M. On the use of information retrieval techniques for the automatic construction of hypertext. Information Processing and Management, 33(2):133-144, 1997.

Allan, J., Building hypertext using information retrieval. Information Processing and Management, Vol. 33, No. 2, 1997.

Baron, L., Tague-Sutcliffe, J., and Kinnucan, M.T. Labeled, typed links as cues when reading hypertext documents. Journal of the American Society for Information Science. 47(12):896-908, 1996.

Berners-Lee, T., Cailliau, R., Groff, J., and Pollermann, B. World Wide Web: The Information Universe. Electronic Networking: Research, Applications and Policy 1(2):74-82, 1992.

Dimitroff, A., and Wolfram, D. Searcher response in a hypertext-based bibliographic information retrieval system. Journal of the American Society for Information Science. 46(1):22-29, 1995.

Frisse, M. E. Searching for information in a hypertext medical handbook. Communications of the ACM, 31(7):880-886, 1988.

Harman, D., and Candela, G. Bringing natural language information retrieval out of the closet. SIGCHI Bulletin, 22(1):42-48, 1990.

Harman, D.K. (Editor) The Fourth Text REtrieval Conference (TREC-4). The National Institute of Standards and Technology Special Publication 500-236, 1996.

Raggett, D., HTML 3.2 Reference Specification, W3C Recommendation REC-html32, 14-Jan-1997, 1997.

Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M. and Gatford, M., The Third Text REtrieval Conference (TREC-3). The National Institute of Standards and Technology Special Publication 500-225, 1995.

Salton, G., Singhal, A., Buckley, C., and Mitra, M. Automatic Text Decomposition Using Text Segments and Text Themes. Proceedings of the Seventh ACM Conference on Hypertext, 1996.

Thistlethwaite, P., Automatic construction and management of large open webs. Information Processing and Management, 33(2):161-173, 1997.

Appendix 1: Disproportionate influence of very long documents

In several instances, the semantic links and/or their associated percentage similarity measures were unreliable or incorrect. For example, a participant might select the top ranked link, with a PSM of 85%, only to be presented with a document which, to all intents and purposes, has little or no connection with the original document (in fact all of the links shown in Figure 1 fall into this class, quite serendipitously).

It turns out that this problem revolves around a small number of documents, the size of which is orders of magnitude greater than that of the majority. These were not broken up by our preprocessor module because of the parsing rule that each document must stand alone and make sense. Thus, the preprocessor has no length restrictions built into the documents it creates - it relies instead on each POMS Section to fall within a fairly narrow length distribution. Indeed, this is the case for Sections in all but two of the twelve Parts of the POMS.

We experienced some fairly serious effects as a result. First, long documents were ranked artificially highly in response to users' initial queries to the search engine. Second, and more damaging for our purposes here, the document web created by LEIDIR was distorted such that the link distribution, instead of being fairly uniform throughout the collection, was skewed towards the long documents. As a result, many regular documents contained links to one or more long documents, which usually were not closely related in terms of content. Finally, following a link to a long document almost invariably lead to a document with links only to other long documents, resulting in an information discovery "dead end".

It was the links to long documents which were felt to be inaccurate or unreliable by the study participants and, because such links were accompanied by a corresponding PSM, both the link and the PSM were perceived to be unreliable.

Figure 2 shows the total number of links to a document in the augmented POMS as a function of the document?s length. The figure is included here to provide a context for the discussion, and depicts clearly the powerful attraction of the very long documents to semantic hypertext links. In fact the eight longest documents (length > 10,000) attracted 33,944 links out of a total of 138,380, or 24.53% of all links. The total number of documents in the collection was 27,676, and they ranged in length from 4 to 20,750 terms, with a median length of 123 terms.

The disproportionate influence of very long documents on both the search results returned to the study participants and on the structure of the document web created by LEIDIR pointed up two shortcomings in the software which we had failed to account for.

First, the PRISE system does not perform any query processing, other than stemming and the

Graph of total links per document as a function of document length in the POMS

Figure 2. Total links per document as a function of document length in the POMS.

removal of stop words. Thus the term weight for a given term effectively increases by one for each occurrence of that term in the query. The term weighting scheme usually associated with the PRISE system uses the familiar "tf-IDF" type of formula:

Where:

N = the number of documents in the collection, and

n = the number of documents containing the term.

qf = the frequency of the term in the query.

However, this effect mushrooms when whole documents are used as queries, and its size is indeterminate, though clearly dependent on the length of the query document simply because of the number of terms it contains. We feel that the presence of this effect in the document retrieval process had the greater part in promoting the disproportionately high ranking of very long documents, together with the associated side effects. However, the inclusion of this measure should not be discounted, as the terms in the query document provide a valuable source of information, just as is the case with short queries. The effect should be mitigated, though, by factoring in the length of the query document. This is an area for future research.

The second shortcoming of LEIDIR that came to light as a result of the disproportionate influence of very long documents was insufficient document length normalization. Though we feel this to be a secondary feature to the presence of query term frequency in the weighting function, the size difference between very long documents and the majority of documents is too great to be accounted for by PRISE's length normalization function:

F =

length +

length = the total number of terms in the document once the common words or stoplist words have been removed.