- 帖子
- 19
- 积分
- 42
- 技术
- 0
- 捐助
- 0
- 注册时间
- 2024-1-4
|
[文本处理] 【已解决】求助批处理提取JSON文件内特定文字后的信息
基本情况:
文件是JSON,用记事本打开后,如下:(似乎是没有行的概念,整个文本就是一行/段)
[{"id":"e2e2ed190348ed30802f6f08f4f5c1b9","title":"中国证券监督管理委员会人才引进","enabled":1,"is_cloud_task":0,"url":"http://www.csrc.gov.cn/csrc/c100082/common_list.shtml?channelid=59b9842485ba41adad957f55cb90be52","icon":"http://www.csrc.gov.cn/favicon.ico","page":"","path":"div:nth-of-type(4)","ignore_path":"","click_path":"","data_path":"","scroll_down":"0","type":"dom","code":"200","json_query":"","json_header":"","json_data":"","json_data_format":"form","rss_field":"title","star":0,"ua":"","puppeteer_code":"","browser_code":"","shell_type":"javascript","shell_cookie_name":"COOKIE","shell_code":"","ai_prompt_on":"off","ai_prompt_user":"请提取返回值中的价格,并以数字形式返回。注意只返回数字,不要返回多余内容。返回值如下:{{value}}","replace_from_regex":"","replace_to_regex":"","tab_activity":"background","interval":"720","delay":"0","retry":10,"cron":"* * * * *".........(省略,此内容与上面的不重复)"last_time":1715225597400,"retry_times":0},{"id":"b96d4b076418874b2291e99362e86bf9","title":"重要通知-商务部外贸发展局","enabled":1,"is_cloud_task":0,"url":"https://www.tdb.org.cn/zytz/index.jhtml","icon":"https://www.tdb.org.cn/r/cms/www/default/img/favicon.ico","page":"","path":"div > div:nth-of-type(2) > div:nth-of-type(2) > div:nth-of-type(2) > div > ul","ignore_path":"","click_path":"","data_path":"","scroll_down":"0","type":"dom","code":"200","json_query":"","json_header":"","json_data":"","json_data_format":"form","rss_field":"title","star":0,"ua":"","puppeteer_code":"","browser_code":"","shell_type":"javascript","shell_cookie_name":"COOKIE","shell_code":"","ai_prompt_on":"off","ai_prompt_user":"请提取返回值中的价格,并以数字形式返回。注意只返回数字,不要返回多余内容。返回值如下:{{value}}","replace_from_regex":"","replace_to_regex".........(省略,此内容与上面的不重复)
具体要求:
1、提取,"title":后面的双引号内文字(不需要双引号)。
2、提取,"url":后面的双引号内网址(不需要双引号)。
3、以上两个提取内容之间加上一个空格以分隔。
4、提取文字及其后网址后,另起一行来提取下一个。
5、在每一行前面加上三位数的行号,以001开始(即不足三位则前面补0);同时加一个空格分开。
按上述,则提取后文字为:
001 中国证券监督管理委员会人才引进 http://www.csrc.gov.cn/csrc/c100082/common_list.shtml?channelid=59b9842485ba41adad957f55cb90be52
002 重要通知-商务部外贸发展局 https://www.tdb.org.cn/zytz/index.jhtml
6、最后在同级文件夹内生成相应TXT文件,以原文件名加上“内容提取”这四个字来重命名。比如原文件为“ABC.JSON”,则生成“ABC内容提取.txt”
7、最好是能把JSON拖到BAT文件上就能完成提取。
请大家帮忙一个!
谢谢!
网盘下载:
链接:https://pan.baidu.com/s/1BJ-6N2KLgoPIXCDMhU-HHA?pwd=4188
提取码:4188
--来自百度网盘超级会员V9的分享 |
-
1
评分人数
-
|