[SearchEngine]text summarizer技术

王朝other·作者佚名  2006-03-04
窄屏简体版  字體: |||超大  

Kevin的工具:Tailrank text summarizer,是展示language categorization and text summarizer technology的,也就是说,他搜索你提供的blog永久连接时,就可以解析出何种语言,以及自动提炼出blog的概述(summarizer)。

我到http://tools.tailrank.com/上试验了一把,比如提供这么一个地址:

http://weblogs.java.net/blog/tomwhite/archive/2005/09/mapreduce.html

Tailrank解析的结果是:

Resulting Summary

summary:

NDFS provides a fault-tolerant environment for working with very large files using cheap commodity hardware. This processing model is ideal for the operations a search engine indexer like Nutch or Google needs to perform - like computing inlinks for URLs, or building inverted indexes - and it will transform Nutch into a scalable, distributed search engine. Currently MapReduce is a part of Nutch, but it has been proposed that it and NDFS be moved into a separate project.

title: Tom White's Blog

lang: en

当然,第一提炼得不准,第二很多blog url都不能算出来。

 
 
 
免责声明:本文为网络用户发布,其观点仅代表作者个人观点,与本站无关,本站仅提供信息存储服务。文中陈述内容未经本站证实,其真实性、完整性、及时性本站不作任何保证或承诺,请读者仅作参考,并请自行核实相关内容。
 
 
© 2005- 王朝網路 版權所有 導航