lucene.net下的中英文混合搜索

悬赏园豆：50 [待解决问题]

索引的创建

string indexPath = @"c:\index";
//索引的存放路径
FSDirectory fsDire = FSDirectory.Open(new DirectoryInfo(indexPath));
//创建一个FsDirectory对象
bool isLive = IndexReader.IndexExists(fsDire);
//查看目录是否存在
if (isLive) System.IO.Directory.Delete(indexPath,true);
//删除旧的索引
System.IO.Directory.CreateDirectory(indexPath);
isLive = IndexReader.IndexExists(fsDire);
if (IndexWriter.IsLocked(fsDire))
{
//如果上锁则解锁
IndexWriter.Unlock(fsDire);
}

IndexWriter writer = new IndexWriter(fsDire, new PanGuAnalyzer(), !isLive, IndexWriter.MaxFieldLength.UNLIMITED);
//fsDire 存放位置
//new PanGuAnalyzer() 盘古分词
//!isLive 是否创建
//IndexWriter.MaxFieldLength.UMLIMITED 字段长度无限
//成功创建一个用以向文件夹写入索引数据的IndexWriter对象
string txtPath = @"c:\倾世魔师.txt";
//文章的地址
Document document = new Document();
//新建一个document(索引)对象
document.Add(new Field("id", "1", Field.Store.YES, Field.Index.NOT_ANALYZED));

document.Add(new Field("body", File.ReadAllText(txtPath, Encoding.Default), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));

//为索引对象添加字段，一个存放Id，另一个存放主要内容。
writer.AddDocument(document);
//将索引写到相对应的位置
writer.Close();
fsDire.Close();

搜索时所调用的方法

string indexPath = @"c:\index";
//索引存放的路径。
string keyWords = TextBox1.Text;
//关键字
FSDirectory fsDire = FSDirectory.Open(new DirectoryInfo(indexPath),new NoLockFactory());
//创建一个FSDirectroy对象
IndexReader reader = IndexReader.Open(fsDire);
//创建一个IndexReader对象
IndexSearcher searcher = new IndexSearcher(reader);
//创建一个IndexSearcher对象
PhraseQuery query = new PhraseQuery();
//创建一个PhraseQuery对象

Lucene.Net.Analysis.Analyzer analyzer = new PanGuAnalyzer();

Lucene.Net.Analysis.TokenStream tokenStream = analyzer.TokenStream("", new StringReader(keyWords));

Lucene.Net.Analysis.Token token = null;
while ((token = tokenStream.Next()) != null)
{
query.Add(new Term("body", token.Term()));

}

//为关键字进行分词，再作查询。
query.SetSlop(50);
TopScoreDocCollector collector = TopScoreDocCollector.create(200,true);
searcher.Search(query,null,collector);
ScoreDoc[] docs = collector.TopDocs(0,collector.GetTotalHits()).scoreDocs;
for (int i = 0; i < docs.Length; i++)
{
int docId = docs[i].doc;
Document document = searcher.Doc(docId);
Response.Write(document.Get("body"));
}

代码如上，索引创建过程没有问题，用的是盘古分词。

搜索的时候如果仅仅是搜一个单词（单词/word）或者是一组数字，都没有问题。

如果是中英混合，则搜索不到任何数据。

"单词 word"如此的搜索条件。

求指点，第一次用开源项目，很多地方都不动，代码笨拙之处还请见谅和耐心指导，谢谢= = 。

lucene.net下的中英文混合搜索 .NET技术 ASP.NET Web前端

Leo.Exia | 初学一级 | 园豆：152
提问于：2012-10-05 10:15

< >

所有回答(2)

lucene.net 2.0支持你可以使用lucene的分词算法+自己词库（词库可以去网上下载或者其他的分词开源代码），

中英问混淆得不到任何数据可能是你的词库和算法的原因建议你去找其他好点的算法或者词库如有需要请加QQ 我这里有C#实现的完整收索引擎性能肯定比不上百度那些的邮箱185367128@qq.com

落幕残情 | 园豆：34 (初学一级) | 2012-10-05 22:10

我用的是盘古分词。

支持(0) 反对(0) Leo.Exia | 园豆：152 (初学一级) | 2012-10-06 09:48

全文检索工具，我知道，但没用过。

jerry-Tom | 园豆：4077 (老鸟四级) | 2012-10-08 17:49

清除回答草稿

您需要登录以后才能回答，未注册用户请先注册。

欢迎，请先 登录 或者 注册 。

lucene.net下的中英文混合搜索

欢迎，请先登录或者注册。