gov.nist.nlpir.irf.de.normalize
Class PorterStemmer
java.lang.Object
|
+--gov.nist.nlpir.irf.de.normalize.Stemmer
|
+--gov.nist.nlpir.irf.de.normalize.PorterStemmer
- public class PorterStemmer
- extends Stemmer
Implementation of the Porter stemming algorithm documented in:
Porter M.F. "An Algorithm For Suffix Stripping," Program 14 (3),
July 1980, pp. 130-137.
Adapted to Java by Willie Rogers
from original C version by B. Frakes and C. Cox, 1986
Notes:
- This code will make little sense without the Porter article.
- The stemming function converts its input to lower case.
- Version:
- $Revision: 1.1 $
- Author:
- This software was produced by NIST, an agency of the U.S. government,
and by statute is not subject to copyright in the United States.
Recipients of this software assume all responsibilities associated
with its operation, modification and maintenance.
Constructor Summary |
PorterStemmer()
This constructor creates all the rules it will use when asked to stem
a word. |
Method Summary |
(package private) static boolean |
endsWithCVC(java.lang.StringBuffer word)
Some of the rewrite rules apply only to a root with this characteristic. |
(package private) static boolean |
IsVowel(char c)
|
(package private) static int |
replaceEnd(java.lang.StringBuffer word,
java.util.Vector ruleList)
Apply a set of rules to replace the suffix of a word. |
java.lang.String |
stemWord(java.lang.String term)
Stem word using porter stemming rules |
static int |
wordSize(java.lang.StringBuffer word)
|
Methods inherited from class java.lang.Object |
,
clone,
equals,
finalize,
getClass,
hashCode,
notify,
notifyAll,
registerNatives,
toString,
wait,
wait,
wait |
step1aRules
private java.util.Vector step1aRules
step1bRules
private java.util.Vector step1bRules
step1b1Rules
private java.util.Vector step1b1Rules
step1cRules
private java.util.Vector step1cRules
step2Rules
private java.util.Vector step2Rules
step3Rules
private java.util.Vector step3Rules
step4Rules
private java.util.Vector step4Rules
step5aRules
private java.util.Vector step5aRules
step5bRules
private java.util.Vector step5bRules
lambda
private static java.lang.String lambda
EOS
private static final char EOS
end
private static int end
PorterStemmer
public PorterStemmer()
- This constructor creates all the rules it will use when asked to stem
a word.
IsVowel
static final boolean IsVowel(char c)
wordSize
public static final int wordSize(java.lang.StringBuffer word)
- Parameters:
word
- the word whose size is to be calculated.
endsWithCVC
static final boolean endsWithCVC(java.lang.StringBuffer word)
- Some of the rewrite rules apply only to a root with this characteristic.
- Parameters:
word
- Buffer with the word checked- Returns:
- true if the current word ends with a consonant-vowel-consonant combination
and the second consonant is not w, x or y,
false otherwise
replaceEnd
static final int replaceEnd(java.lang.StringBuffer word,
java.util.Vector ruleList)
- Apply a set of rules to replace the suffix of a word.
- Parameters:
word
- buffer with the stemmed word.ruleList
- vector containing the replacement rules.- Returns:
- the ID of the rule fired, 0 if none is fired.
stemWord
public final java.lang.String stemWord(java.lang.String term)
- Stem word using porter stemming rules
- Parameters:
term
- the word to be stemmed- Returns:
- stemmed version of word, word is null then stemmer
returns null.
- Overrides:
- stemWord in class Stemmer