请问各位大佬能否实现,在拼音分词的时候,分析“一种”的时候,分词结果包括"yz"但不包含"y"或者"z"吗
这个是拼音分词器的设置
"filter": {
"my_pinyin": {
"type": "pinyin",
"keep_first_letter": true,
"keep_separate_first_letter": false,
"keep_full_pinyin": true,
"keep_joined_full_pinyin":true,
"keep_original": true,
"keep_none_chinese": true,
"keep_none_chinese_in_first_letter": true,
"keep_none_chinese_together":true,
"keep_none_chinese_in_joined_full_pinyin":true,
"limit_first_letter_length": 16,
"lowercase": true,
"trim_whitespace": true,
"remove_duplicated_term": true
}
}
这个是我的分词结果
{
"tokens": [
{
"token": "yi",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
}
,
{
"token": "一种",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
}
,
{
"token": "yizhong",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
}
,
{
"token": "yz",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
}
,
{
"token": "zhong",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 1
}
,
{
"token": "yi",
"start_offset": 0,
"end_offset": 1,
"type": "TYPE_CNUM",
"position": 2
}
,
{
"token": "一",
"start_offset": 0,
"end_offset": 1,
"type": "TYPE_CNUM",
"position": 2
}
,
{
"token": "y",
"start_offset": 0,
"end_offset": 1,
"type": "TYPE_CNUM",
"position": 2
}
,
{
"token": "zhong",
"start_offset": 1,
"end_offset": 2,
"type": "COUNT",
"position": 3
}
,
{
"token": "种",
"start_offset": 1,
"end_offset": 2,
"type": "COUNT",
"position": 3
}
,
{
"token": "z",
"start_offset": 1,
"end_offset": 2,
"type": "COUNT",
"position": 3
}
]
}
和你的tokenizer有关吧,因为你的分词结果里有单独的一, 种, 那个单独的y,z属于这两个的keep_first_letter,而不是keep_separate_first_letter吧。。。