|
|
楼主 |
发表于 2017-7-23 22:08:47
|
显示全部楼层
正则提取这个区域后,匹配章标题后wget下载怎么弄?这个下载500章以后的,wget下载弄不了,怎么弄进去将index.html.txt的链接下载?- @set @n=0;/* & echo off
- setlocal enabledelayedexpansion
- del /a /f /q index.html.txt 2>nul
- curl -o index.html http://www.yssm.org/uctxt/109/109767/
- wfr index.html -any -encin:utf-8 -encout:gbk -force
- dir /b index.html|cscript -nologo -e:jscript "%~0"
- pause & exit/b & rem */
- fso = new ActiveXObject("Scripting.FileSystemObject");
- while (!WSH.StdIn.AtEndOfStream) {
- f = WSH.StdIn.ReadLine();
- txt = fso.OpenTextFile(f, 1).ReadAll();
- txt = txt.replace(/<a .*href="[^"]+\.html">第?(一|0*1)[章节][\s\S]*(<a .*href="[^"]+\.html">第?五百章)/g, '$2')
- .replace(/(href=")([0-9]+\.html)/g, '$1http://www.yssm.org/uctxt/109/109767/$2');
- s = "";
- re = /(<a .*href=")([^"]+\.html)(">第?.+章)/ig;
- //提取内容
- while ((ar = re.exec(txt)) != null) {
- s += ar[2] + "\r\n";
- };
- fso.OpenTextFile(f + ".txt" ,2, true).Write(s);
- }
复制代码 |
|