From adv-search-list-request@lexis-nexis.com Sat Sep 30 10:42:15 1995
Organization: National Institute of Standards and Technology, Gaithersburg, MD
Date: Sat, 30 Sep 95 10:23:52 EDT
From: lwibberle@cas.org (Les Wibberley - CAS - ext. 2330)
Subject: Raw Notes from the 9/95 Advanced Query Group meeting
To: adv-search-list@lexis-nexis.com
Cc: lwibberle@cas.org, jlahm@cas.org, jskevingt@cas.org, lwibberle@cas.org,
        rledwith@cas.org, tbangert@cas.org, vnichols@cas.org,
        mmarquand@cas.org, jsjostrom@cas.org
Cc: thill@cas.org, janson@cas.org, jamoss@cas.org, mpiekenbr@cas.org,
        jdrobina@cas.org
Content-Length: 22474

Dear Advanced Query folks,

Here are my raw notes from the Advanced/Type 102 Query meeting this past
week in D.C.  Please note that these are very raw notes (accuracy not
guaranteed), and are not intended to serve as official meeting minutes.
Peter and/or other participants may wish to clarify/correct some of these
items, as necessary.

-Les.

Type 102/RLQ meeting
9/27/95 - 9/28/95

Peter: provided overview & background of RLQ effort.

(new draft of RLQ distributed, dated 9/22/95)

Q: is the result set resulting from the restriction accessible to the client?
A: not yet determined.   But could specify a result set as the restriction.
This hasn't come up yet.  Could return name of the result set in the
additionalsearchinfo structure (as an intermediate result set).

Ray: would like to be able to specify the restriction as an RPN query.  Need
to identify what the requirements are before specifying how to specify it.

Q: How would query reformulation happen?
A: Simplest: return the query in the form submitted for review, perhaps as an
external, like within additionalsearchinfo.  Ray didn't like this approach.
But the current searchResult-1 already contains the target's interpretation
of the query: subqueryInterpretation.  Since the type is query, this could
carry the type 102 query.

Q: use of ResouceControl where target prompts the user for reformulation?
Option of user seeing the results of query formulation should be optional.
The subqueryInterpretation could be carried in either ResouceControl or
SearchResponse.

Q: target maintain state of the query formulation.  The protocol definition
should neither assume nor preclude a stateful nor stateless implementation.
The user may reformulate, throw away, or submit as formulated.

Q: how many modes of usage do you envision?
A: two flags: return reformulation and search.  Any combination of the
possible settings.

Q: what are people currently doing along these lines?  What mode will people
expect to be standard behavior (most common)?  Set reformulation OFF,
returnResultSet ON, mData OFF.

Q: what if server doesn't support returning reformulated query?
A: This would be a diagnostic.
Q:  What if reformulated Query doesn't fit in the SearchResponse?
A: could create a retrievable metadata element containing the reformulated
query, which can be retrieved via Present, and use segmentation.  This is a
good example of the creation of result-level metadata.  Create a logical
result set containing one record containing the reformulated result set,
which the client can retrieve using present with segmentation.  Make this an
option to returnReformulatedQuery.

Issue of annotating the query on return (actually expanded, plus human
strings).  But the query ASN.1 would include optional fields to carry the
annotation.  The client could turn the query around and resubmit to the
target.
Q: necessary to remove before resubmitting to the target?  Return
the query twice: one normal, one annotated?

Issue of reformulation by database.  Additionalsearchinfo does allow
reformulated queries to be returned by database.  This is perhaps a valid
issue, since the databases may reside on different servers, which use
different search engines with different reformulation logic.

In oversized case, the result set could contain a record for each database.

Above was discussion of reformulation (item 6 on the agenda).

Paul:  How will we proceed?  Just jump into the agenda?

Return to restrictions.  (item 1 on agenda)

Q: does the restriction need to support multiple databases?  For example,
look at the examples on p.5.  Database names need to be specified by
component since, it may be a multicomponent query.  This could also be
accomplished by having an attribute for databasename, and express in the
query.

Peter had issued a list of candidates for restrictions.

Not sure about supporting the restriction subset of a group (cluster)
database (2 out of 100).  Other components included:  result set, RPN query,
etc.  See 9/5/95 message. Redefine document as perhaps a record or something
else.  John: how about making this http compatible?

<break>

Amended asn.1 handout has proposal for specifying restriction.

Q: need to include list of databases excluded by the databasenames (subset of
cluster database name).  Not peculiar to the type 102 case.  Therefore,
propose this to the ZIG for consideration.  This could be done via an
attribute for databasename, and use it with an ANDNOT operator in the boolean
query.  This would be useful in type 1/101 queries as well.  Ability to apply
part of the query criteria to a subset of a collection.  this same concept
applies to the digital library profile, as well.

Restrict set is a combination of inclusions and exclusions.

Q: is a sequence of restrict criteria adequate to express this?  The
resultsetids could be expressed via the RPNQuery.  So could localDocids?
What is the precedence of evaluation of the restrictSet?  This may need to be
spelled out explicitly.  localDocids may need to be characterstrings, instead
of Octetstring?   turn docid into a URL?  What is the requirement?  Where do
the docids come from?  from a previous result set record?  Need to restrict
the answers from a specific resultsetid?  Can we collapse these all into an
RPN query?  We could use the restriction feature of the attribute plus result
set, define a new USE attribute , where the term is the number, and a range
could be expressed with an AND construct.  It appears that all of these
restrictions can be rolled into the RPNQuery under the RPNQuery.   Now, is it
an RPNQuery, or a Sequence of RPNQuery?  Needlist then collapses into an
RPNQuery.   No.  Needlist is a sequence.  RestrictSet collapses into
RPNQuery.  combineNeedLists says how to combine the results of the queries
specified by the needlists.  Issue of the meaning of default.  Leave it in.
There will be a wide variation of how servers will interpret, and a client
will not always have control over this aspect.  How about databasenames.
Addweight is an algorithm, perhaps based on the weights applied to each
individual need statement.  External allows for other algorithms to be
specified.

Ray:  add an indicator to combineneedsllist: a new flag :  1.  use specified
algorithm, or fail search. 2.  recommend algorithm, but use other if better.
3.  do what you think is best.  Q: can we remove database names?  Yes, define
an attribute for database name.

Under the NeedStatement, can the rqquery be optional?  Could do an RPNQuery,
with relevance feedback.  Allow the type 102 query to degenerate to an RPN
query in the simplest case.  The original set of database names in the Search
request is the original universe of databases.

RelInfo comments:  relevance  is perhaps the same as weight.  Perhaps
collapse Relinfo into Rqquery?  These are documents cited for relevance
feedback.   WAIS profile has a way with a combination of attributes to
express relevance feedback.  The WAIS approach may not work in this
environment.  This is different than querying.  This provides a set of
documents similar to these identified documents with relevance.  Can express
positive and negative aspect of relevance.  Need to define what 0...1 means.
If 0=relevant, 1=relevant, then .5 is don't care.  Could have separate flag
relevant vs irrelevant.  The relevance indicates the degree to which it is
relevant/irrelevant.

Q: need to add databasename?  Why is this a localDocid?  Could this be a URL?
Some question about this issue - defer resolution on this.

Tomorrow: meet here at 9am room G35.

<day 2 - 9/28/95>

Chris Buckley opposed to putting the databases down within the type 1 query.
Trying to merge databases is one of the unsolved research problems in this
area.  Trying to merge ranked list results is unsolved; doing it at an
intermediate stage if databases are buried down in the query is hopeless.
Current syntax only combines it at the top level, where you can specify how
things can be combined.  Trying to specify this at a lower level, including
databases at both the lower and higher level is not reasonable to try to
support.

If you split a single query into multiple queries, have to have a mechanism
for handing back the results for multiple queries, so that the server can
combine the results for the client.  Question of supporting multiple
databases?  How to simplify the options?  Keep support for the multiple
databases in the protocol ASN.1.  But initial implementations may be limited
to single databases.   If reformulation must occur per database, then a
reformulation may need to be returned one per database.

Q: define a query more powerful than initial implementation, and profile down
for initial implementation; later perhaps revise the query based on
implementation experience.  A number of the search engines on the net are
doing this type of stuff (like InfoSeek), but not in the same way.   InfoSeek
is a prime example.

Issue of supporting databases as an attribute?  this happened as a byproduct
of simplifying the restrictSet definition, rolling it into the RPNQuery
structure.  If this causes concern in the ZIG, that we back off on it, rather
than debate at length?
Ray:  No, prefer to keep the definition simple and elegant, for better
acceptance.  Noted need to insure that the requirements and needs captured in
the text are adhered to, and that we don't lose sight of it relative to the
ASN.1.

How to proceed?  continue with the ASN.1, or discuss the document?   Work
from the ASN.1, but refer to the text.  First, recap the discussion from
yesterday, and reconfirm our agreements.

1.  What localDocids mean, and how to use.  under restrictset and under
    RelInfo.  under restrictset, fold into the query.

2.  Change to combineNeedList.  Sequence of 3 options, plus choice already
    there.  3 options: server choice, use specified algorithm, or try to
    use specified algorithm

Chris:  Does this already get covered under clientServerInfo by reformClause?
How specific do we want to get?  Combining needlist rolled into
reformulating?  Reformclause can be attached to every operand of the query.
Opinion that combining the need lists is separate from the reformClause.
There are tuning knobs at different levels of the query.  combineNeedLists
addresses how you do fusion - research topic.  The reformClause addresses a
different need.  The clientserverinfo is at highest level, as an overall
default, but can also be attached at the per-term level.  the reformclause is
perhaps not needed at the top level.  Still useful to have top-level
clientServerInfo.  Need at both levels.  How about the relationship between
combineNeedLists and clientServerInfo.  Agree that they are separate?  Apply
at separate stages of the query.  The reformclause applies to reformulation.
Keep structures as they are.  Agreed to the extensions to combineNeedLists.

Issue of DatabaseNames being rolled into the RPNQuery.  Databasenames in the
search apply to the RLQuery.  Tentatively: collapse RestrictSet to an
RPNQuery.  Chris: not collapse databasenames into the RPNQuery.  Not treat
them as an attribute in a rankedlist system.  it is a matter of semantics.
Databasenames that are in the restriction, it is also the set of databases
that the rlquery apply to.  Agreed compromise:  sequence of : databaseNames
and RPNQuery comprises the RestrictSet.  Query is optional.  Database names
semantics:  inclusion, exclusion, or what?  These are the only databases to
be used was intended meaning.  This should be a subset of the set of
databases specified in the SearchRequest.  Perhaps add another parameter
which specifies excluded databases?  Make it a choice between included or
excluded databases.  All referenced databases are a subset of the full list.
Make both the database list & query optional.

RelInfo - relevance - allow negative numbers?   Yes.

AttributeSet parameter applies to everything except the RPNQuery (RPNQuery
requires an AttributeSet to be specified).  This applies to everything except
the RPNQuery (the RestrictSet).

Q: rename relInfo?  Call it FeedbackInfo.  Range is -1 to 1 range.  Negative
is anti-relevant.  0= don't care.  1 is highly relevant.  -1 is highly
anti-relevant.

How about the localDocid?  not make it octet string?  Let the ZIG decide what
we specify.  Perhaps refer to the bib-1 doc-id, and its description within
the WAIS profile?

Q: Need to reference parts of the document to be relevant?  Yes.  Supply a
subsection of a document as relevant text.  Need at least the ability to
submit a section of text.  Perhaps change localDocId to be a choice of a full
document or text.  Text could be either user input or textual subset from a
document.  Also allow for an EXTERNAL to carry other relevance information
(such as a CXF structure).  Make RelInfo a Choice:  documentid, document
text, and external.  How is human-entered text handled?  Express it via the
query as part of the need statement?  Or include it here?  Is this user input
really feedback information?  No.  So it is just additional search criteria.
If it is negative feedback, need to include ability to indicate negative
relevance.

Note that we considered that weight within OperandPlusWeight might in the
future be extended to include negative values to indicate negative weights.

SearchOutputRequest - big topic.  defer metadata for now.

Q: why not throw out returnResultSet?  Ray: no.  Perhaps rename: DoSearch or
execute Search to better reflect what it is doing.  Could have any
combination between this and returnReformulatedQuery. Could turn both off,
and just return metadata.  See note about metadata Peter sent out. This is
metadata at various levels.  It provides metadata, which would be returned in
some way. Some question about using a tagset to define returned metadata.
Defer this topic for now.  Perhaps define a Tagset for this, perhaps reuse
Tags from TagSet-M.  There are some questions about just how independent the
3 items in the SearchOutputRequest parms can be.  Do all  combinations make
sense?  Probably not.  Need for response statuses to reflect various
combinations of success/failure of portions of the Search request.  Current
SearchStatus is only a boolean.  Make SearchOutputRequest a choice among the
various options, to control the legal choices?  Real issue is how to return
the detailed status of the request?  Add a sequence of status codes or
diagnostics to either the AdditionalSearchInfo, or the ServerClientInfo
structure.  The entire structure is sent to the target, and returned to the
client.  Easiest thing may be to echo the entire structure.

Ray: perhaps break up which pieces are sent vs returned.  Have a structure
such that RankedQuery contains clientServerInfo, but not ServerClientInfo,
and vice versa on response.  Problem: this info is attached on the operand
level.  Chris: everything is informally called Query.  Idea is for client to
send a query, target to be able to return the same query.  Ray's proposal:

RQ ::= seq {
	[1]
	:
	[5]
	but not [6] serverClientInfo

ServerInfo ::=
	[1]  RQ
	[2] ServerClientInfo


RQ ::= choice {
	RQRequest [1] RQ
	RQResponse [2]

There is a problem that the client/server, server/client information is also
at the operand level.  This precludes the above approach, without a lot of
restructuring.
Chris: ability to iteratively operate on the query at the client and server,
refining the definition at both ends.   This is a difference between the
traditional RPNQuery and this type of Query.  on request, this is carried as
a query, it comes back in additionalsearchinfo.  Break up pieces and send
appropriately?  At the lower level, this allows the client and server to
exchange details about the query.

Q: does the client send the serverClientInfo back?  Yes, but server ignores
it.  Does the server remove left-over serverClientInfo?  Perhaps.

Chris: 2 issues here:  1. We may want to say that the query back to client
doesn't have any serverClient info at the top level, only in queries
returning metadata.  At low level, to return metadata within the query
structure.  A server ignores ServerClient info arriving from client.

<lunch>

remaining agenda:

2.  Where the metadata will go.
4.  Support boolean operators within the type 102 query?

 If we do it, support under OperandPlusWeight structure, under operand, as a
new choice. That would retain the integrity of the query structure. Not at
the operator level.  Howard/West wanted it originally, but it might not be
needed, at least initially.  An rlqAND with weight of 1 is not the same as a
boolean AND.  It also depends on the weight of the operands.

Q: Add it now, since we know where we go, or wait to add it?  Perhaps need a
list of features and how we see them used, which features are placeholders,
etc.  Perhaps start a rationale document to try to capture some of these
ideas/proposals, and the disposition/position on each.  We may also need to
think about Explain, and using it to explain which of these
features/functions, etc. is supported.  Could define a record for Type 102
query details.  Add a comment in the ASN.1 marking the place where a boolean
query might go in the future (resolution of item 3).

5.  Need to talk some about the attribute types we have.  Agreed to submit
this list of attributes types to Cliff's ZIG attribute group.   Peter's Sept.
8 message, LHW's response.  Chris: Location in doc consists of content and
meta-content.  Cut down to items 1&2.

Q: Perhaps split metadata out of this attribute type.  Define a very simple
attribute set which contains attributes which apply to all databases.
Peter:  example: in Freestyle, ability to express indexes, subsets of
documents.  Most useful ones are headline, abstract, etc.

Kevin: Why do we need attributes in the RLQ part of the query?  Because
without attributes, the usefulness of the type102 query is significantly
reduced.  Should we try to take on the whole issue of an attribute set?
Chris: No one really agrees with this approach for the attribute sets for
type 102.

Chris: Does location in document get covered by the context concept?   How
much overlap is there in concept?  Disadvantage of separating one dimension
of attribute from the others.  Is context more like proximity?  Context
speaks more to the relation of the location of multiple terms, where
locationinDoc pertains to location of a given element.

Ray: Q: would renaming context "proximity" help?
Chris:  Attributes do not include elements.  Attributes imply a structure on
a document, which may not represent its actual structure.  Elements in the
document do not necessarily correspond to attributes in bib-1, and so
mappings are not always adequate.

Ray:  if we're willing to defer this attribute discussion, this could be
folded into the new attribute discussion of the ZIG.  Need to have
representation at that meeting.  Submit these ideas to that group.  Chris
will craft a statement about this.  Cliff is aware of these proposed
attribute types.  Ray concerned about the impact of defining yet-another
attribute set, without syncing up with the ZIG attribute effort.

Ray:  metadata:  possibly create a pseudo-result set with one record,
containing the metadata.  That could be the resultset corresponding to the
resultset in the query.  Do we want to be able to retrieve the metadata,
using a record syntax.   Then the tagset would fit in.  The serverClient
structure was intended to contain the metadata.  If you create a result
record of metadata, you can eliminate some of the mData information in the
ASN.1.  Can use usual Present features (segmentation, etc.).  Need to create
result set of metadata, plus actual results?   Perhaps return high-level
metadata in the search response.

Chris: metadata is not all related to the record database.  Applies to the
record in search.  High-level metadata in a separate record in a
pseudo-result set.   Record-level metadata would be retrieved from the result
set, via GRS-1.

Chris: concerned re use of GRS-1, since it's a hammer which solves all
problems.  Concerned about a record which contains transient metadata, (query
level metadata).
Peter:  GRS-1 does solve a lot of problems, and provides consistent syntax
for information.
Ray:  perhaps profile use of GRS-1 for use with type 102 query.  Type 102
query will ultimately become part of Z39.50 standard.  Alternatively, could
define a Type 102 record syntax for retrieving metadata.

Chris: Counter-proposal.  Record syntax defined for type102 record syntax,
and allow use of GRS-1, as well.

Ray: pseudo database for high-level search-level metadata, which could be
retrieved via GRS-1.  Record level metadata would be brought back from the
search result set via present.  Logically, this is additional-search-info
(search level metadata).  Can only come back in a Search Response.

Peter:  general need for result-set-level metadata, even for type 1 queries.
For V4, want the ability to reference relationships between search results.
One element in the metadata resultset could be a pointer/name to the related
search result set.  this could provide a linkage.

Ray: is there a requirement to create intermediate result sets in type 102:
No. not yet.  This could be modeled as representing a Need statement, for
example.   Any requirement for a client to present records from one of those
intermediate result sets?  No. not yet.

Kevin:  for set-level metadata, perhaps under otherinfo in Present request,
indicate that this is a request for set-level metadata.  Perhaps specify
record zero to retrieve the metadata for the resultset.  This last suggestion
was agreed to as the current working approach.  Another idea suggested was to
create dynamically generated database of metadata, with one record per result
set, containing the metadata for that resultset.  It would have a
well-defined name: IR-ResultSet-Metadata.   Each record would have a key of
the name of the result set it pertains to.

<wrap up>

Les agreed to post these raw notes to the adv.query list.

-- 

 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Les Wibberley                         Internet: les.wibb@cas.org
 Chemical Abstracts Service
 2540 Olentangy River Rd.              Voice:    (614) 447-3600 Extension 2330
 Columbus, Ohio  43210                 FAX:      (614) 447-3854 or 447-3697
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~