Board logo

标题: [文本处理] 批处理怎样实现每10个句子后面加一个标志? [打印本页]

作者: gung    时间: 2011-5-23 16:17     标题: 批处理怎样实现每10个句子后面加一个标志?

每10个句子后面加一个<br/>,每50个句子后面加一个"http://www.baidu.com"

句子是以句号,问好,感叹号结束;


很感谢这里!

我做英文SEO,希望认识技术的朋友 呵呵 QQ720140 9
作者: CrLf    时间: 2011-5-23 16:36

  1. @echo off&setlocal enabledelayedexpansion
  2. for /f "delims=" %%a in (1.txt) do (
  3.    set tmp=%%a
  4.    set /a "n=(n+1)%%10",1/n||(
  5.       set /a m+=1,"1/(m%%5)"||set tmp=!tmp!http://baidi.com
  6.    )&&set "tmp=!tmp! <br/>"
  7.    echo !tmp!
  8. )
复制代码

作者: gung    时间: 2011-5-23 16:45

报告:楼上的不行呀 呵呵

把1.txt和bat文件放在一个目录下,运行后 没发现1.txt有什么变化
作者: batman    时间: 2011-5-23 16:52

你的文本不是只有一行吧?
作者: gung    时间: 2011-5-23 17:13

4# batman


不行呀 版主 ...     你们帮测试一下  
   肯定文档不止一行拉
作者: batman    时间: 2011-5-23 17:15

最好把文本格式贴上来。。。
作者: gung    时间: 2011-5-23 17:16

  1. It’s an easy and safe way to turn your shoes in uk or old golf club sets into cold hard cash.Cobra’s 9 Point Technology featured in most Cobra Irons, enhances perimeter weighting and generates 9 points of forgiveness across the christian louboutin Iron face for increased accuracy and distance control throughout the christian louboutin.Cobra Golf is one of the christian louboutin major manufactures not to produce a fashion ball under their brand name.Fresh innovative ideas on regular basis will influence them a fashion.You have to adopt content marketing best practices so that it can have some positive output for your shoes in uk.Many online stores including GolfPitStop have golf balls for sale.As the christian louboutin is in need of land space most of Hong Kong’s buildings rise more than 35m thus giving the christian louboutin a fashion skyline of sky scrapers.This louboutin real time when you should explain your shoes in uk to them.Choose the christian louboutin that appeal to you and your shoes in uk.Callaway irons are designed and built to help you improve distance, accuracy and optimize your shoes in uk selection to suit your shoes in uk.However, many other manufactures do.The christian louboutin you will provide to them, the christian louboutin it will affect their thinking positively towards you.Don’t make your shoes in uk marketing a fashion waste of time and efforts.From razor-sharp stilettos to lace-up boots and studded sneakers, Christian Louboutin is every woman's go-to for heavenly heels and covetable accessories.Yuen Po Street Bird Garden – An Exclusive Street for Birds
  2. Bustling with busy streets, tightly packed sidewalks, and steam filled canteens Hong Kong is a fashion constantly on the christian louboutin.Our designer shoes are cheap and of high quality, welcome you to buy our discount shoes online, enjoy it!.For nearly three decades, Callaway Golf Australia has led the christian louboutin in innovation and premium technology for all golfers.Also known as Bird Street, this louboutin street is filled with birds of all varieties in cages for exhibition as well as for sale.Cobra Irons provide players of all abilities with the christian louboutin technology to take their game to the christian louboutin level.Callaway Golf Australia also market products under the christian louboutin putter brand, acquired in 1997, as well as the christian louboutin Strata and Ben Hogan golf brands picked up following the christian louboutin of Spalding’s former golf division in 2003.This louboutin why many P.You have to understand the christian louboutin of people and benefits you can provide them.But you should remind that in the christian louboutin of updating content on regular basis, don’t compromise with the christian louboutin aspect.Push all limits.If you can visit the christian louboutin Garden in the christian louboutin hours of the christian louboutin you can catch a fashion of how the christian louboutin of Hong Kong take their precious pet birds for a fashion and for a fashion of fresh air.Selling your shoes in uk club sets to GolfPitStop couldn’t be easier.Callaway Golf Australia, offers Callaway Golf drivers and Callaway Irons
  3. Callaway Golf Australia is an American sporting goods company based in Carlsbad, California, specializing in golf equipment and accessories.
复制代码

作者: gung    时间: 2011-5-23 17:19

最好把文本格式贴上来。。。
batman 发表于 2011-5-23 17:15

每遇到10个句号(感叹号、问号) 后面就加一个<br>  
作者: batman    时间: 2011-5-23 17:22

果然不出我所料,超长句子,批处理起来效率特低,考虑用vbs解决吧。。。
作者: gung    时间: 2011-5-23 17:32

OH  NO  有没给解决一哈
作者: gung    时间: 2011-5-23 17:40

问了朋友 他说正则替换可以实现?
作者: batman    时间: 2011-5-23 17:53

本帖最后由 batman 于 2011-5-23 17:57 编辑
  1. Set fso = CreateObject("scripting.filesystemobject")
  2. vbstr = Replace(fso.OpenTextFile("a.txt", 1, 1).ReadAll, "!", "!#")
  3. vbstr = Replace(vbstr, ".", ".#")
  4. vbstr = Replace(vbstr, "?", "?#")
  5. For Each str In Split(vbstr, "#")
  6. i = i + 1
  7. vbvar = vbvar & str
  8. If i = 10 Then
  9.    vbvar = vbvar & "</br>"
  10.    j = j + 1:i = 0
  11.    If j = 5 Then
  12.      vbvar = vbvar & "http://www.baidu.com"
  13.      j = 0
  14.    End If
  15. End If
  16. Next
  17. fso.OpenTextFile("1.txt" , 2 ,1).Write vbvar
  18. Set fso = Nothing
复制代码

作者: CrLf    时间: 2011-5-23 18:44

本帖最后由 zm900612 于 2011-5-23 19:14 编辑

纯批貌似也可以...不过只以句号为分隔符
  1. @echo off&setlocal enabledelayedexpansion
  2. for /f "delims=" %%a in (a.txt) do (
  3.    set tmp=%%a
  4.    set tmp=!tmp:?=.?!
  5.    for /l %%b in (1 1 100) do (
  6.       for /f "tokens=1-10* delims=." %%c in ("!tmp!") do (
  7.          set /a "n=(n+1)%%5"
  8.          if !n!==0 set www=www.baidu.com
  9.          set echo=%%c.%%d.%%e.%%f.%%g.%%h.%%i.%%j.%%k.%%l.!www!^</br^>
  10.          echo !echo:.?=.!
  11.          set www=
  12.          set tmp=%%m
  13.       )
  14.    )
  15. )
  16. pause
复制代码
修改了下,支持问号。若还要支持英文感叹号,效率将大大降低
作者: plp626    时间: 2011-5-23 19:33

zm你就继续娱乐。。。。
作者: CrLf    时间: 2011-5-23 19:51

14# plp626


还是能解决问题的嘛,而且尝试各种非传统的方法比较好玩,嘿嘿
作者: powerbat    时间: 2011-5-23 20:42

既然用vbs,怎能不用正则表达式?
  1. Set regEx = new RegExp
  2. regEx.Global = true
  3. regEx.IgnoreCase = false
  4. Set fso = CreateObject("Scripting.FileSystemObject")
  5. txt = fso.OpenTextFile("a1.txt").ReadAll()
  6. regEx.Pattern = "(?:[^.?!]+[.?!]){10}"
  7. txt = regEx.Replace(txt, "$&<br/>")
  8. regEx.Pattern = "(?:[^.?!]+[.?!]){50}"
  9. txt = regEx.Replace(txt, "$&""http://www.baidu.com""")
  10. fso.OpenTextFile("a2.txt", 2, true).Write txt
复制代码

作者: plp626    时间: 2011-5-23 21:12

今天开始看awk,这个用gawk怎么弄?
作者: plp626    时间: 2011-5-23 21:23

本帖最后由 plp626 于 2011-5-24 20:36 编辑
  1. gawk "BEGIN {FS=\".\"};{for (i=1;i<=NF;i++){printf \"%s\",$i; if(i==10)printf \".^<br/^>\";if (i==50)printf \".http://www.baidu.com \"}}" a.txt
复制代码

作者: gung    时间: 2011-5-23 22:11

纯批貌似也可以...不过只以句号为分隔符@echo off&setlocal enabledelayedexpansion
for /f "delims=" %%a in (a.txt) do (
   set tmp=%%a
   set tmp=!tmp:?=.?!
   for /l %%b in (1 1 100) do (
      for  ...
zm900612 发表于 2011-5-23 18:44

输出到>>txt 咋办?
作者: gung    时间: 2011-5-23 22:21

大侠们 谁总结一下我的问题怎么弄?









我的思想:几十篇文章放在一个txt里边,然后打乱,然后随机加入br,用br来作为标识符作为分割点,然后来生成很多小的txt文件
作者: batman    时间: 2011-5-23 22:51

本来也想用正则的,可惜水平跟不上,只能写出批处理思路的vbs来解决了,呵呵。。。
作者: CrLf    时间: 2011-5-23 22:51

  1. @echo off&setlocal enabledelayedexpansion
  2. (for /f "delims=" %%a in (a.txt) do (
  3.    set tmp=%%a
  4.    set tmp=!tmp:?=.?!
  5.    for /l %%b in (1 1 100) do (
  6.       for /f "tokens=1-10* delims=." %%c in ("!tmp!") do (
  7.          set /a "n=(n+1)%%5"
  8.          if !n!==0 set www=www.baidu.com
  9.          set echo=%%c.%%d.%%e.%%f.%%g.%%h.%%i.%%j.%%k.%%l.!www!^</br^>
  10.          for /l %%z in (1 1 10) do set echo=!echo:..=.!
  11.          echo !echo:.?=.!
  12.          set www=
  13.          set tmp=%%m
  14.       )
  15.    )
  16. ))>b.txt
  17. pause
复制代码

作者: Batcher    时间: 2011-5-24 00:08

20# gung


你能否先总结下,这么多人提供了这么多代码,为何你的问题还是没解决?
作者: gung    时间: 2011-5-24 22:15

关键我不懂代码 呵呵

试过了22楼得 发现br出现的很密集 有时候9个 有时候5个就有br了
作者: CrLf    时间: 2011-5-24 22:23

24# gung


水平有限...
楼主把我搞晕了,假如原文每个自然段中包含的句数不定,那到底是合并所有自然段后重新分段?还是既保留原有自然段,同时还按十句一周期来分段?或者是以十句一周期分段,但这个周期不超过原有自然段的长度?
作者: Batcher    时间: 2011-5-25 00:33

24# gung


看不懂不要紧,你可以分别测试一下那些代码,然后给出详细的测试结果。
作者: gung    时间: 2011-5-26 22:23

24# gung


水平有限...
楼主把我搞晕了,假如原文每个自然段中包含的句数不定,那到底是合并所有自然段后重新分段?还是既保留原有自然段,同时还按十句一周期来分段?或者是以十句一周期分段,但这个周期不超 ...
zm900612 发表于 2011-5-24 22:23


一个txt不分段落,每10个句子后面一个br ;每50个句子后面一个wwwbaiducom  
作者: CrLf    时间: 2011-5-26 22:53

  1. @echo off&setlocal enabledelayedexpansion
  2. (for /f "delims=" %%a in (a.txt) do set /p=%%a
  3. echo;)<nul>tmp
  4. (for /f "delims=" %%a in (tmp) do (
  5.    set tmp=%%a
  6.    set tmp=!tmp:?=.?!
  7.    for /l %%b in (1 1 100) do (
  8.       for /f "tokens=1-10* delims=." %%c in ("!tmp!") do (
  9.          set /a "n=(n+1)%%5"
  10.          if !n!==0 set www=www.baidu.com
  11.          set echo=%%c.%%d.%%e.%%f.%%g.%%h.%%i.%%j.%%k.%%l.!www!^</br^>
  12.          for /l %%z in (1 1 10) do set echo=!echo:..=.!
  13.          echo !echo:.?=.!
  14.          set www=
  15.          set tmp=%%m
  16.       )
  17.    )
  18. ))>b.txt
  19. pause
复制代码

作者: batman    时间: 2011-5-26 23:06

只以说zm版主敬业又执着,很明显楼主是什么党来着。。。




欢迎光临 批处理之家 (http://bbs.bathome.net/) Powered by Discuz! 7.2