首页新闻找找看学习计划

lucene中StandardTokenizer中incrementToken方法解释

0
悬赏园豆:30 [已关闭问题] 关闭于 2013-11-03 10:56

1.在Tokenfiler中IncrementToken方法,对语汇单元进行处理过滤,是进行一个incrementToken判断,这是相当于对语汇单元的(if语句)一次处理,但是在analyzer分析器中,只需输入对应参数reader,就可以进行全部语汇单元处理,必然是对incrementToken方法进行了循环,这一步在哪里呀?还有为什么在方法里面全部处理使用while呢?

2.

解释一下此方法,最好还能把其算法和涉及到的数据结构提示一下,因为最近想仔细看看

public final boolean incrementToken() throws IOException {
clearAttributes();
int posIncr = 1;

while(true) {
int tokenType = scanner.getNextToken();

if (tokenType == StandardTokenizerImpl.YYEOF) {
return false;
}

if (scanner.yylength() <= maxTokenLength) {//第一个问题的具体代码处
posIncrAtt.setPositionIncrement(posIncr);
scanner.getText(termAtt);
final int start = scanner.yychar();
offsetAtt.setOffset(correctOffset(start), correctOffset(start+termAtt.termLength()));
// This 'if' should be removed in the next release. For now, it converts
// invalid acronyms to HOST. When removed, only the 'else' part should
// remain.
if (tokenType == StandardTokenizerImpl.ACRONYM_DEP) {
if (replaceInvalidAcronym) {
typeAtt.setType(StandardTokenizerImpl.TOKEN_TYPES[StandardTokenizerImpl.HOST]);
termAtt.setTermLength(termAtt.termLength() - 1); // remove extra '.'
} else {
typeAtt.setType(StandardTokenizerImpl.TOKEN_TYPES[StandardTokenizerImpl.ACRONYM]);
}
} else {
typeAtt.setType(StandardTokenizerImpl.TOKEN_TYPES[tokenType]);
}
return true;
} else
// When we skip a too-long term, we still increment the
// position increment
posIncr++;
}
}

江边流客的主页 江边流客 | 初学一级 | 园豆:5
提问于:2013-10-30 11:52
< >
分享
清除回答草稿
   您需要登录以后才能回答,未注册用户请先注册