正则提取这个区域后,匹配章标题后wget下载怎么弄?这个下载500章以后的,wget下载弄不了,怎么弄进去将index.html.txt的链接下载?- @set @n=0;/* & echo off
- setlocal enabledelayedexpansion
-
- del /a /f /q index.html.txt 2>nul
- curl -o index.html http://www.yssm.org/uctxt/109/109767/
- wfr index.html -any -encin:utf-8 -encout:gbk -force
-
- dir /b index.html|cscript -nologo -e:jscript "%~0"
- pause & exit/b & rem */
- fso = new ActiveXObject("Scripting.FileSystemObject");
- while (!WSH.StdIn.AtEndOfStream) {
- f = WSH.StdIn.ReadLine();
- txt = fso.OpenTextFile(f, 1).ReadAll();
- txt = txt.replace(/<a .*href=\"[^\"]+\.html\">第?(一|0*1)[章节][\s\S]*(<a .*href=\"[^\"]+\.html\">第?五百章)/g, '$2')
- .replace(/(href=\")([0-9]+\.html)/g, '$1http://www.yssm.org/uctxt/109/109767/$2');
- s = "";
- re = /(<a .*href=\")([^\"]+\.html)(\">第?.+章)/ig;
- //提取内容
- while ((ar = re.exec(txt)) != null) {
- s += ar[2] + "\r\n";
- };
- fso.OpenTextFile(f + ".txt" ,2, true).Write(s);
- }
复制代码
|