lucene中StandardTokenizer中incrementToken方法解释

悬赏园豆：30 [已关闭问题] 关闭于 2013-11-03 10:56

1.在Tokenfiler中IncrementToken方法，对语汇单元进行处理过滤，是进行一个incrementToken判断，这是相当于对语汇单元的（if语句）一次处理，但是在analyzer分析器中，只需输入对应参数reader，就可以进行全部语汇单元处理，必然是对incrementToken方法进行了循环，这一步在哪里呀？还有为什么在方法里面全部处理使用while呢？

解释一下此方法，最好还能把其算法和涉及到的数据结构提示一下，因为最近想仔细看看

public final boolean incrementToken() throws IOException {
clearAttributes();
int posIncr = 1;

while(true) {
int tokenType = scanner.getNextToken();

if (tokenType == StandardTokenizerImpl.YYEOF) {
return false;
}

if (scanner.yylength() <= maxTokenLength) {//第一个问题的具体代码处
posIncrAtt.setPositionIncrement(posIncr);
scanner.getText(termAtt);
final int start = scanner.yychar();
offsetAtt.setOffset(correctOffset(start), correctOffset(start+termAtt.termLength()));
// This 'if' should be removed in the next release. For now, it converts
// invalid acronyms to HOST. When removed, only the 'else' part should
// remain.
if (tokenType == StandardTokenizerImpl.ACRONYM_DEP) {
if (replaceInvalidAcronym) {
typeAtt.setType(StandardTokenizerImpl.TOKEN_TYPES[StandardTokenizerImpl.HOST]);
termAtt.setTermLength(termAtt.termLength() - 1); // remove extra '.'
} else {
typeAtt.setType(StandardTokenizerImpl.TOKEN_TYPES[StandardTokenizerImpl.ACRONYM]);
}
} else {
typeAtt.setType(StandardTokenizerImpl.TOKEN_TYPES[tokenType]);
}
return true;
} else
// When we skip a too-long term, we still increment the
// position increment
posIncr++;
}
}

lucene StandardTokenizer incrementToken

江边流客 | 初学一级 | 园豆：5
提问于：2013-10-30 11:52

< >

清除回答草稿

您需要登录以后才能回答，未注册用户请先注册。

欢迎，请先 登录 或者 注册 。

lucene中StandardTokenizer中incrementToken方法解释

欢迎，请先登录或者注册。