使用TextMiniing和Apache POI获得Word文件内容,无须MS-Office ActiveX

王朝java/jsp·作者佚名  2006-01-09
窄屏简体版  字體: |||超大  

/*

* Created on 2005/07/18

* 使用tm-extractors-0.4.jar

*/

package com.nova.colimas.common.doc;

import java.io.FileInputStream;

import java.io.FileOutputStream;

import org.textmining.text.extraction.WordExtractor;

/**

* Deal with ms-word 2000/xp files.

* @author tyrone

*

*/

public class WordProcess extends DocProcess {

public static String run(String filename){

WordExtractor extractor=null;

String text=null;

try{

FileInputStream in = new FileInputStream (filename);

extractor = new WordExtractor();

text=extractor.extractText(in);

}catch(Exception ex){

//log

return null;

}

return text;

}

public static void main(String[] args){

try{

FileOutputStream out=new FileOutputStream("result.txt");

out.write(WordProcess.run(args[0]).getBytes());

out.flush();

out.close();

}catch(Exception ex){

System.out.println(ex.toString());

}

}

}

 
 
 
免责声明:本文为网络用户发布,其观点仅代表作者个人观点,与本站无关,本站仅提供信息存储服务。文中陈述内容未经本站证实,其真实性、完整性、及时性本站不作任何保证或承诺,请读者仅作参考,并请自行核实相关内容。
 
 
© 2005- 王朝網路 版權所有 導航