标题: [文本处理] [已解决]批处理如何提取这个文本中的字符串? [打印本页]
作者: batbat001 时间: 2019-10-24 17:08 标题: [已解决]批处理如何提取这个文本中的字符串?
有多个文本,里面只有一行,格式如下。请问各位大牛,能否通过bat,提取“pdfnum“后面的8位数字,其他的乱七八糟的都不要。这个数字在不同文本中的数量是不一样的,至少有1个(含)以上。
{"indexCadalInfoList":[{"id":"539168","title":"古史研究","title_old":"古史研究·第二集·上册","title_keyword":"古史研究·第二集·上册","title_standard":"古史研究·第二集·上册","title_handle":"古史研究·第二集·上册","pdfnum":"06342858","pdfnum","status":"1","tag_library":"古史;研究;上册;三十年代;**二十六年;**;专著","borrow_times":null,"booklist_id":null,"catalogue":null,"collect_num":"0","click_num":"0","comment_num":"0","creator":"卫聚贤(编)","creator_old":"卫聚贤(编)","creator_keyword":"卫聚贤(编)","creator_stop":"卫聚贤(编)","subject":null,"subject_old":null,"description":"本书出版者不详。","description_standard":"本书出版者不详。","description_old":"本书出版者不详。","publisher":null,"publisher_old":null,"date":"1937","date_standard":"1937","date_old":"1937-04(**二十六年)","title_standard":"古史研究·第二集·下册","title_handle":"古史研究·第二集·下册","pdfnum":"06342859","status":"1","tag_library":"古史;研究;下册;三十年代;**二十三年;**;专著","borrow_times":null,"booklist_id":null,"catalogue":null,"collect_num":"0","click_num":"0","comment_num":"0","creator":"卫聚贤(编)","creator_old":"卫聚贤(编)","creator_keyword":"卫聚贤(编)","creator_stop":"卫聚贤(编)","subject":"史评-中国-古代-文集","subject_old":null,"description":"本书出版者不详。","description_standard":"本书出版者不详。",
作者: zaqmlp 时间: 2019-10-24 19:46
- @echo off
- mode con lines=3000
- set info=互助互利,支付宝扫码头像,感谢打赏
- rem 有问题,可加QQ956535081及时沟通
- title %info%
- cd /d "%~dp0"
- powershell -NoProfile -ExecutionPolicy bypass ^
- $enc=[Text.Encoding]::UTF8;[System.Collections.ArrayList]$s=@();^
- $files=@(dir^|?{('.txt' -eq $_.Extension) -and ($_ -is [System.IO.FileInfo])});^
- for($i=0;$i -lt $files.length;$i++){^
- write-host $files[$i].Name;^
- write-host ('-'*20);^
- $text=[IO.File]::ReadAllText($files[$i].FullName,$enc);^
- $m=[regex]::matches($text,'\"pdfnum\":\"(\d+)\"');^
- foreach($j in $m){^
- write-host $j.Groups[1].value;^
- [void]$s.add($j.Groups[1].value);^
- };^
- };^
- [IO.File]::WriteAllLines('#result.log', $s, [Text.Encoding]::Default);
- :end
- echo;%info%
- pause
复制代码
作者: batbat001 时间: 2019-10-24 19:52
回复 2# zaqmlp
看不懂啊,大侠。是不是直接复制,保存成bat运行即可?
作者: batbat001 时间: 2019-10-24 19:57
本帖最后由 batbat001 于 2019-10-24 20:03 编辑
回复 2# zaqmlp
成功了!!!感谢大佬!!!
怎样打赏?直接扫描头像即可吧?是否有微信的方式?
作者: zaqmlp 时间: 2019-10-24 20:10
回复 4# batbat001
嗯,扫头像
作者: Batcher 时间: 2019-10-24 21:40
回复 1# batbat001 - grep -Po "\"pdfnum\":\"[0-9]{8}\"" 1.txt | more > 2.txt
复制代码
推荐下载一个 grep 命令试试:
http://bcn.bathome.net/s/tool/index.html?key=grep
作者: batbat001 时间: 2019-10-24 22:11
回复 6# Batcher
大神确实厉害 ,多问一下,如果是要提取“title”后面的内容,应该怎么改?
作者: sxw 时间: 2019-11-6 17:50
使用 Raku Programming Language:- my $line = '{"indexCadalInfoList":[{"id":"539168","title":"古史研究","title_old":"古史研究·第二集·上册","title_keyword":"古史研究·第二集·上册","title_standard":"古史研究·第二集·上册","title_handle":"古史研究·第二集·上册","pdfnum":"06342858","pdfnum","status":"1","tag_library":"古史;研究;上册;三十年代;**二十六年;**;专著","borrow_times":null,"booklist_id":null,"catalogue":null,"collect_num":"0","click_num":"0","comment_num":"0","creator":"卫聚贤(编)","creator_old":"卫聚贤(编)","creator_keyword":"卫聚贤(编)","creator_stop":"卫聚贤(编)","subject":null,"subject_old":null,"description":"本书出版者不详。","description_standard":"本书出版者不详。","description_old":"本书出版者不详。","publisher":null,"publisher_old":null,"date":"1937","date_standard":"1937","date_old":"1937-04(**二十六年)","title_standard":"古史研究·第二集·下册","title_handle":"古史研究·第二集·下册","pdfnum":"06342859","status":"1","tag_library":"古史;研究;下册;三十年代;**二十三年;**;专著","borrow_times":null,"booklist_id":null,"catalogue":null,"collect_num":"0","click_num":"0","comment_num":"0","creator":"卫聚贤(编)","creator_old":"卫聚贤(编)","creator_keyword":"卫聚贤(编)","creator_stop":"卫聚贤(编)","subject":"史评-中国-古代-文集","subject_old":null,"description":"本书出版者不详。","description_standard":"本书出版者不详。",';
- .say for $line.comb(/pdfnum'":"'<( \w+ )>'"'/);
复制代码
欢迎光临 批处理之家 (http://bbs.bathome.net/) |
Powered by Discuz! 7.2 |