1.在Tokenfiler中IncrementToken方法,对语汇单元进行处理过滤,是进行一个incrementToken判断,这是相当于对语汇单元的(if语句)一次处理,但是在analyzer分析器中,只需输入对应参数reader,就可以进行全部语汇单元处理,必然是对incrementToken方法进行了循环,这一步在哪里呀?还有为什么在方法里面全部处理使用while呢?
2.
解释一下此方法,最好还能把其算法和涉及到的数据结构提示一下,因为最近想仔细看看
public final boolean incrementToken() throws IOException {
clearAttributes();
int posIncr = 1;
while(true) {
int tokenType = scanner.getNextToken();
if (tokenType == StandardTokenizerImpl.YYEOF) {
return false;
}
if (scanner.yylength() <= maxTokenLength) {//第一个问题的具体代码处
posIncrAtt.setPositionIncrement(posIncr);
scanner.getText(termAtt);
final int start = scanner.yychar();
offsetAtt.setOffset(correctOffset(start), correctOffset(start+termAtt.termLength()));
// This 'if' should be removed in the next release. For now, it converts
// invalid acronyms to HOST. When removed, only the 'else' part should
// remain.
if (tokenType == StandardTokenizerImpl.ACRONYM_DEP) {
if (replaceInvalidAcronym) {
typeAtt.setType(StandardTokenizerImpl.TOKEN_TYPES[StandardTokenizerImpl.HOST]);
termAtt.setTermLength(termAtt.termLength() - 1); // remove extra '.'
} else {
typeAtt.setType(StandardTokenizerImpl.TOKEN_TYPES[StandardTokenizerImpl.ACRONYM]);
}
} else {
typeAtt.setType(StandardTokenizerImpl.TOKEN_TYPES[tokenType]);
}
return true;
} else
// When we skip a too-long term, we still increment the
// position increment
posIncr++;
}
}