信息检索导论(英文版)(Introduction to Information Retrieval)
分類: 图书,英语与其他外语,英语读物,英文版,文化教育,
品牌: 克里斯托弗 D. 曼宁
基本信息·出版社:人民邮电出版社
·页码:482 页
·出版日期:2010年01月
·ISBN:9787115218247
·条形码:9787115218247
·版本:第1版
·装帧:平装
·开本:16
·正文语种:中文
·外文书名:Introduction to Information Retrieval
产品信息有问题吗?请帮我们更新产品信息。
内容简介《信息检索导论(英文版)》是信息检索的教材,旨在从计算机科学的视角提供一种现代的信息检索方法。书中从基本概念讲解网络搜索以及文本分类和文本聚类等,对收集、索引和搜索文档系统的设计和实现的方方面面、评估系统的方法、机器学习方法在文本收集中的应用等给出了最新的讲解。书中所有重要的思想都是用示例进行解释,图文并茂。《信息检索导论(英文版)》非常适合作为计算机科学及相关专业的高年级本科生和研究生的“信息检索”课程的入门教材,当然也同样适合研究人员和专业人士阅读。
目录
目录
1Boolean retrieval1
1.1An example information retrieval problem3
1.2A first take at building an inverted index6
1.3Processing Boolean queries9
1.4The extended Boolean model versus ranked retrieval13
1.5References and further reading16
2The term vocabulary and postings lists18
2.1Document delineation and character sequence decoding18
2.2Determining the vocabulary of terms21
2.3Faster postings list intersection via skip pointers33
2.4Positional postings and phrase queries36
2.5References and further reading43
3Dictionaries and tolerant retrieval45
3.1Search structures for dictionaries45
3.2Wildcard queries48
3.3Spelling correction52
3.4Phonetic correction58
3.5References and further reading59
4Index construction61
4.1Hardware basics62
4.2Blocked sort-based indexing63
4.3Single-pass in-memory indexing66
4.4Distributed indexing68
4.5Dynamic indexing71
4.6Other types of indexes73
4.7References and further reading76
5Index compression78
5.1Statistical properties of terms in information retrieval79
5.2Dictionary compression82
5.3Postings file compression87
5.4References and further reading97
6Scoring, term weighting, and the vector space model100
6.1Parametric and zone indexes101
6.2Term frequency and weighting107
6.3The vector space model for scoring110
6.4Variant tf–idf functions116
6.5References and further reading122
7Computing scores in a complete search system124
7.1Efficient scoring and ranking124
7.2Components of an information retrieval system132
7.3Vector space scoring and query operator interaction136
7.4References and further reading137
8Evaluation in information retrieval139
8.1Information retrieval system evaluation140
8.2Standard test collections141
8.3Evaluation of unranked retrieval sets142
8.4Evaluation of ranked retrieval results145
8.5Assessing relevance151
8.6A broader perspective: System quality and user utility154
8.7Results snippets157
8.8References and further reading159
9Relevance feedback and query expansion162
9.1Relevance feedback and pseudo relevance feedback163
9.2Global methods for query reformulation173
9.3References and further reading177
10XML retrieval178
10.1Basic XML concepts180
10.2Challenges in XML retrieval183
10.3A vector space model for XML retrieval188
10.4Evaluation of XML retrieval192
10.5Text-centric versus data-centric XML retrieval196
10.6References and further reading198
11Probabilistic information retrieval201
11.1Review of basic probability theory202
11.2The probability ranking principle203
11.3The binary independence model204
11.4An appraisal and some extensions212
11.5References and further reading216
12Language models for information retrieval218
12.1Language models218
12.2The query likelihood model223
12.3Language modeling versus other approaches in information retrieval229
12.4Extended language modeling approaches230
12.5References and further reading232
13Text classification and Naive Bayes234
13.1The text classification problem237
13.2Naive Bayes text classification238
13.3The Bernoulli model243
13.4Properties of Naive Bayes245
13.5Feature selection251
13.6Evaluation of text classification258
13.7References and further reading264
14Vector space classification266
14.1Document representations and measures of relatedness in vector spaces267
14.2Rocchio classification269
14.3k nearest neighbor273
14.4Linear versus nonlinear classifiers277
14.5Classification with more than two classes281
14.6The bias–variance tradeoff284
14.7References and further reading291
15Support vector machines and machine learning on documents293
15.1Support vector machines: The linearly separable case294
15.2Extensions to the support vector machine model300
15.3Issues in the classification of text documents307
15.4Machine-learning methods in ad hoc information retrieval314
15.5References and further reading318
16Flat clustering321
16.1Clustering in information retrieval322
16.2Problem statement326
16.3Evaluation of clustering327
16.4K-means331
16.5Model-based clustering338
16.6References and further reading343
17Hierarchical clustering346
17.1Hierarchical agglomerative clustering347
17.2Single-link and complete-link clustering350
17.3Group-average agglomerative clustering356
17.4Centroid clustering358
17.5Optimality of hierarchical agglomerative clustering360
17.6Divisive clustering362
17.7Cluster labeling363
17.8Implementation notes365
17.9References and further reading367
18Matrix decompositions and latent semantic indexing369
18.1Linear algebra review369
18.2Term–document matrices and singular valuede compositions373
18.3Low-rank approximations376
18.4Latent semantic indexing378
18.5References and further reading383
19Web search basics385
19.1Background and history385
19.2Web characteristics387
19.3Advertising as the economic model392
19.4The search user experience395
19.5Index size and estimation396
19.6Near-duplicates and shingling400
19.7References and further reading404
20Web crawling and indexes405
20.1Overview405
20.2Crawling406
20.3Distributing indexes415
20.4Connectivity servers416
21Link analysis421
21.1TheWeb as a graph422
21.2PageRank424
21.3Hubs and authorities433
21.4References and further reading439
Inde469
Bibliography441
……[看更多目录]