gov.nist.nlpir.irf.de.normalize
Class PorterStemmer

java.lang.Object
  |
  +--gov.nist.nlpir.irf.de.normalize.Stemmer
        |
        +--gov.nist.nlpir.irf.de.normalize.PorterStemmer

public class PorterStemmer
extends Stemmer

Implementation of the Porter stemming algorithm documented in: Porter M.F. "An Algorithm For Suffix Stripping," Program 14 (3), July 1980, pp. 130-137.

Adapted to Java by Willie Rogers from original C version by B. Frakes and C. Cox, 1986

Notes:

Version:
$Revision: 1.1 $
Author:
This software was produced by NIST, an agency of the U.S. government, and by statute is not subject to copyright in the United States. Recipients of this software assume all responsibilities associated with its operation, modification and maintenance.

Inner Class Summary
(package private)  class PorterStemmer.AddAnE
           
(package private)  class PorterStemmer.Condition
           
(package private)  class PorterStemmer.ContainsVowel
           
(package private)  class PorterStemmer.PorterRule
          Inner class to describle a single rule
(package private)  class PorterStemmer.RemoveAnE
           
 
Field Summary
private static int end
           
private static char EOS
           
private static java.lang.String lambda
           
private  java.util.Vector step1aRules
           
private  java.util.Vector step1b1Rules
           
private  java.util.Vector step1bRules
           
private  java.util.Vector step1cRules
           
private  java.util.Vector step2Rules
           
private  java.util.Vector step3Rules
           
private  java.util.Vector step4Rules
           
private  java.util.Vector step5aRules
           
private  java.util.Vector step5bRules
           
 
Constructor Summary
PorterStemmer()
          This constructor creates all the rules it will use when asked to stem a word.
 
Method Summary
(package private) static boolean endsWithCVC(java.lang.StringBuffer word)
          Some of the rewrite rules apply only to a root with this characteristic.
(package private) static boolean IsVowel(char c)
           
(package private) static int replaceEnd(java.lang.StringBuffer word, java.util.Vector ruleList)
          Apply a set of rules to replace the suffix of a word.
 java.lang.String stemWord(java.lang.String term)
          Stem word using porter stemming rules
static int wordSize(java.lang.StringBuffer word)
           
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, toString, wait, wait, wait
 

Field Detail

step1aRules

private java.util.Vector step1aRules

step1bRules

private java.util.Vector step1bRules

step1b1Rules

private java.util.Vector step1b1Rules

step1cRules

private java.util.Vector step1cRules

step2Rules

private java.util.Vector step2Rules

step3Rules

private java.util.Vector step3Rules

step4Rules

private java.util.Vector step4Rules

step5aRules

private java.util.Vector step5aRules

step5bRules

private java.util.Vector step5bRules

lambda

private static java.lang.String lambda

EOS

private static final char EOS

end

private static int end
Constructor Detail

PorterStemmer

public PorterStemmer()
This constructor creates all the rules it will use when asked to stem a word.
Method Detail

IsVowel

static final boolean IsVowel(char c)

wordSize

public static final int wordSize(java.lang.StringBuffer word)
Parameters:
word - the word whose size is to be calculated.

endsWithCVC

static final boolean endsWithCVC(java.lang.StringBuffer word)
Some of the rewrite rules apply only to a root with this characteristic.
Parameters:
word - Buffer with the word checked
Returns:
true if the current word ends with a consonant-vowel-consonant combination and the second consonant is not w, x or y,
false otherwise

replaceEnd

static final int replaceEnd(java.lang.StringBuffer word,
                            java.util.Vector ruleList)
Apply a set of rules to replace the suffix of a word.
Parameters:
word - buffer with the stemmed word.
ruleList - vector containing the replacement rules.
Returns:
the ID of the rule fired, 0 if none is fired.

stemWord

public final java.lang.String stemWord(java.lang.String term)
Stem word using porter stemming rules
Parameters:
term - the word to be stemmed
Returns:
stemmed version of word, word is null then stemmer returns null.
Overrides:
stemWord in class Stemmer