Board logo

标题: [文本处理] [已解决]批处理如何提取这个文本中的字符串? [打印本页]

作者: batbat001    时间: 2019-10-24 17:08     标题: [已解决]批处理如何提取这个文本中的字符串?

有多个文本,里面只有一行,格式如下。请问各位大牛,能否通过bat,提取“pdfnum“后面的8位数字,其他的乱七八糟的都不要。这个数字在不同文本中的数量是不一样的,至少有1个(含)以上。

{"indexCadalInfoList":[{"id":"539168","title":"古史研究","title_old":"古史研究·第二集·上册","title_keyword":"古史研究·第二集·上册","title_standard":"古史研究·第二集·上册","title_handle":"古史研究·第二集·上册","pdfnum":"06342858","pdfnum","status":"1","tag_library":"古史;研究;上册;三十年代;**二十六年;**;专著","borrow_times":null,"booklist_id":null,"catalogue":null,"collect_num":"0","click_num":"0","comment_num":"0","creator":"卫聚贤(编)","creator_old":"卫聚贤(编)","creator_keyword":"卫聚贤(编)","creator_stop":"卫聚贤(编)","subject":null,"subject_old":null,"description":"本书出版者不详。","description_standard":"本书出版者不详。","description_old":"本书出版者不详。","publisher":null,"publisher_old":null,"date":"1937","date_standard":"1937","date_old":"1937-04(**二十六年)","title_standard":"古史研究·第二集·下册","title_handle":"古史研究·第二集·下册","pdfnum":"06342859","status":"1","tag_library":"古史;研究;下册;三十年代;**二十三年;**;专著","borrow_times":null,"booklist_id":null,"catalogue":null,"collect_num":"0","click_num":"0","comment_num":"0","creator":"卫聚贤(编)","creator_old":"卫聚贤(编)","creator_keyword":"卫聚贤(编)","creator_stop":"卫聚贤(编)","subject":"史评-中国-古代-文集","subject_old":null,"description":"本书出版者不详。","description_standard":"本书出版者不详。",
作者: zaqmlp    时间: 2019-10-24 19:46

  1. @echo off
  2. mode con lines=3000
  3. set info=互助互利,支付宝扫码头像,感谢打赏
  4. rem 有问题,可加QQ956535081及时沟通
  5. title %info%
  6. cd /d "%~dp0"
  7. powershell -NoProfile -ExecutionPolicy bypass ^
  8.     $enc=[Text.Encoding]::UTF8;[System.Collections.ArrayList]$s=@();^
  9.     $files=@(dir^|?{('.txt' -eq $_.Extension) -and ($_ -is [System.IO.FileInfo])});^
  10.     for($i=0;$i -lt $files.length;$i++){^
  11.         write-host $files[$i].Name;^
  12.         write-host ('-'*20);^
  13.         $text=[IO.File]::ReadAllText($files[$i].FullName,$enc);^
  14.         $m=[regex]::matches($text,'\"pdfnum\":\"(\d+)\"');^
  15.         foreach($j in $m){^
  16.             write-host $j.Groups[1].value;^
  17.             [void]$s.add($j.Groups[1].value);^
  18.         };^
  19.     };^
  20.     [IO.File]::WriteAllLines('#result.log', $s, [Text.Encoding]::Default);
  21. :end
  22. echo;%info%
  23. pause
复制代码

作者: batbat001    时间: 2019-10-24 19:52

回复 2# zaqmlp


看不懂啊,大侠。是不是直接复制,保存成bat运行即可?
作者: batbat001    时间: 2019-10-24 19:57

本帖最后由 batbat001 于 2019-10-24 20:03 编辑

回复 2# zaqmlp
成功了!!!感谢大佬!!!
怎样打赏?直接扫描头像即可吧?是否有微信的方式?
作者: zaqmlp    时间: 2019-10-24 20:10

回复 4# batbat001
嗯,扫头像
作者: Batcher    时间: 2019-10-24 21:40

回复 1# batbat001
  1. grep -Po "\"pdfnum\":\"[0-9]{8}\"" 1.txt | more > 2.txt
复制代码
推荐下载一个 grep 命令试试:
http://bcn.bathome.net/s/tool/index.html?key=grep
作者: batbat001    时间: 2019-10-24 22:11

回复 6# Batcher


大神确实厉害 ,多问一下,如果是要提取“title”后面的内容,应该怎么改?
作者: sxw    时间: 2019-11-6 17:50

使用 Raku Programming Language:
  1. my $line = '{"indexCadalInfoList":[{"id":"539168","title":"古史研究","title_old":"古史研究·第二集·上册","title_keyword":"古史研究·第二集·上册","title_standard":"古史研究·第二集·上册","title_handle":"古史研究·第二集·上册","pdfnum":"06342858","pdfnum","status":"1","tag_library":"古史;研究;上册;三十年代;**二十六年;**;专著","borrow_times":null,"booklist_id":null,"catalogue":null,"collect_num":"0","click_num":"0","comment_num":"0","creator":"卫聚贤(编)","creator_old":"卫聚贤(编)","creator_keyword":"卫聚贤(编)","creator_stop":"卫聚贤(编)","subject":null,"subject_old":null,"description":"本书出版者不详。","description_standard":"本书出版者不详。","description_old":"本书出版者不详。","publisher":null,"publisher_old":null,"date":"1937","date_standard":"1937","date_old":"1937-04(**二十六年)","title_standard":"古史研究·第二集·下册","title_handle":"古史研究·第二集·下册","pdfnum":"06342859","status":"1","tag_library":"古史;研究;下册;三十年代;**二十三年;**;专著","borrow_times":null,"booklist_id":null,"catalogue":null,"collect_num":"0","click_num":"0","comment_num":"0","creator":"卫聚贤(编)","creator_old":"卫聚贤(编)","creator_keyword":"卫聚贤(编)","creator_stop":"卫聚贤(编)","subject":"史评-中国-古代-文集","subject_old":null,"description":"本书出版者不详。","description_standard":"本书出版者不详。",';
  2. .say for $line.comb(/pdfnum'":"'<( \w+ )>'"'/);
复制代码





欢迎光临 批处理之家 (http://bbs.bathome.net/) Powered by Discuz! 7.2