从网上搞了一个2000万的数据库,总算有 这么多的数据可以拿来练手了。
结果使用lucene.net+盘古分词,没过多久IIS就超时了。
我的思路
方法如下。
int page = 1;
string sql = @" select * from ( select Name,Address,Gender,CtfID,BirthDay,Zip,Mobile,Tel,Email,Version,Nation,ROW_NUMBER() over(order by id asc) as rownumber from cdsgus ) as tbl where tbl.rownumber between (@page-1)*100+1 and @page*100"; for (int p = page; p < 200502; p++) { SqlDataReader reader = SqlHelper.ExecuteDataReader(sql.ToString(), new SqlParameter("@page", p)); while (reader.Read()) { FSDirectory directory = FSDirectory.Open(new DirectoryInfo(indexPath), new NativeFSLockFactory()); bool isUpdate = IndexReader.IndexExists(directory); if (isUpdate) { //如果索引目录被锁定(比如索引过程中程序异常退出),则首先解锁 if (IndexWriter.IsLocked(directory)) { IndexWriter.Unlock(directory); } } IndexWriter writer = new IndexWriter(directory, new PanGuAnalyzer(), !isUpdate, Lucene.Net.Index.IndexWriter.MaxFieldLength.UNLIMITED); Document document = new Document(); document.Add(new Field("name", reader["Name"].ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED)); document.Add(new Field("Address", reader["Address"].ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED)); document.Add(new Field("Gender", reader["Gender"].ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED)); document.Add(new Field("Address", reader["Address"].ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED)); document.Add(new Field("CtfID", reader["CtfID"].ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED)); document.Add(new Field("BirthDay", reader["BirthDay"].ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED)); document.Add(new Field("Zip", reader["Zip"].ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED)); document.Add(new Field("Mobile", reader["Mobile"].ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED)); document.Add(new Field("Tel", reader["Tel"].ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED)); document.Add(new Field("Email", reader["Email"].ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED)); document.Add(new Field("Version", reader["Version"].ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED)); document.Add(new Field("Nation", reader["Nation"].ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED)); writer.AddDocument(document); writer.Close(); directory.Close(); } Thread.Sleep(100); }
通过rownum函数来分页写入索引。
但是数据库语句中的“select Name,Address,Gender,CtfID,BirthDay,Zip,Mobile,Tel,Email,Version,Nation,ROW_NUMBER() over(order by id asc) as rownumber” 每循环一次就要执行一次,机子肯定是受不了了,2000万数据都读出来实在是够跄。
数据库语句应该要怎么优化?
使用异步后台服务进行索引就不受请求的时间限制了
你好,能给个这个数据库的下载链接么? 互相学习研究下。
同求数据库的下载链接~