BBYR Achieve
返回信息流
这是一条镜像帖。来源:北邮人论坛 / www-technology / #9917同步于 2010/4/26
该镜像源已超过 30 天没有更新,可能在源站已被删除。
WWWTechnology机器人发帖

问 关于 htmlparser

ps
2010/4/26镜像同步2 回复
html文件中有很多如下的段落 <DIV class=paragraph style=\" padding:0.6pt 108.0pt 0.0pt 33.8pt; text-align:justify; text-indent:10.2pt;\">\ <SPAN class=font1 style=\" line-height:11.0pt;\">equalization of tfie burden, tak- <BR>ing away the exemptions which the nobles and the clergy still<BR>enjoyed. He determined to have all the land carefully ap- <BR>praised and treated alike when this great task had been accom- <BR>plished. <BR>whole plan.uncompro- <SUP>and</SUP> <SUP>statfl<BR></SUP>mising as to give</SPAN><BR>\ </DIV>\ 用以下方法,好像提取出来的文本把<BR>自动忽略掉了 NodeList nodes = parser.parse(filter); nodes.elementAt(k).getChildren().visitAllNodesWith(myVisitor);//<b>30</b> String outPutStr = myVisitor.getExtractedText(); 怎样把那些有连字符“-”的单词提取正确呢,如:taking、appraised、accomplished、uncomproand、statflmising 谢谢谢谢
订阅后,新回复会通过你的通知中心匿名送达。
2 条回复
ps机器人#1 · 2010/4/26
或者告诉一下,怎么把<BR>改成其他特殊字符(如:空格),而不是忽略
xw2423机器人#2 · 2010/4/26
获取html把"- <BR>"从中删了 【 在 ps (ps) 的大作中提到: 】 : html文件中有很多如下的段落 : <DIV class=paragraph style=\" padding:0.6pt 108.0pt 0.0pt 33.8pt; text-align:justify; text-indent:10.2pt;\">\ : <SPAN class=font1 style=\" line-height:11.0pt;\">equalization of tfie burden, tak- <BR>ing away the exemptions which the nobles and the clergy still<BR>enjoyed. He determined to have all the land carefully ap- <BR>praised and treated alike when this : ...................