[新手上路]批处理新手入门导读[视频教程]批处理基础视频教程[视频教程]VBS基础视频教程[批处理精品]批处理版照片整理器
[批处理精品]纯批处理备份&还原驱动[批处理精品]CMD命令50条不能说的秘密[在线下载]第三方命令行工具[在线帮助]VBScript / JScript 在线参考
返回列表 发帖

[问题求助] 获取到的请求头 Content-Disposition 乱码如何解决?

  1. Set objWinHttp = CreateObject("WinHttp.WinHttpRequest.5.1")
  2. objWinHttp.Open "HEAD", "https://www.nyaa.se/?page=download&tid=613616"
  3. objWinHttp.Send
  4. MsgBox objWinHttp.GetResponseHeader("Content-Disposition")
复制代码
以上代码显示的结果是:
inline; filename="娴疯醇鐜?65.rar.torrent"


bat 的话可以用 iconv 转码,vbs 要如何解决呢?

可以使用adodb.stream转码

TOP

  1. Const adTypeBinary = 1
  2. Const adTypeText = 2
  3. ' accept a string and convert it to Bytes array in the selected Charset
  4. Function StringToBytes(Str,Charset)
  5.   ' Dim Stream
  6.   Set Stream = CreateObject("ADODB.Stream")
  7.   Stream.Type = adTypeText
  8.   Stream.Charset = Charset
  9.   Stream.Open
  10.   Stream.WriteText Str
  11.   Stream.Flush
  12.   Stream.Position = 0
  13.   ' rewind stream and read Bytes
  14.   Stream.Type = adTypeBinary
  15.   StringToBytes= Stream.Read
  16.   Stream.Close
  17.   Set Stream = Nothing
  18. End Function
  19. ' accept Bytes array and convert it to a string using the selected charset
  20. Function BytesToString(Bytes, Charset)
  21.   ' Dim Stream
  22.   Set Stream = CreateObject("ADODB.Stream")
  23.   Stream.Charset = Charset
  24.   Stream.Type = adTypeBinary
  25.   Stream.Open
  26.   Stream.Write Bytes
  27.   Stream.Flush
  28.   Stream.Position = 0
  29.   ' rewind stream and read text
  30.   Stream.Type = adTypeText
  31.   BytesToString= Stream.ReadText
  32.   Stream.Close
  33.   Set Stream = Nothing
  34. End Function
  35. ' This will alter charset of a string from 1-byte charset(as windows-1252)
  36. ' to another 1-byte charset(as windows-1251)
  37. Function AlterCharset(Str, FromCharset, ToCharset)
  38.   Dim Bytes
  39.   Bytes = StringToBytes(Str, FromCharset)
  40. ' HEXS=""
  41. ' for i = 1 to LenB(Bytes)
  42. ' HEXS = HEXS & hex(ascb(MidB (Bytes, i, 1))) & ","
  43. ' next
  44. ' MsgBox HEXS
  45.   AlterCharset = BytesToString(Bytes, ToCharset)
  46. End Function
  47. Set objWinHttp = CreateObject("WinHttp.WinHttpRequest.5.1")
  48. objWinHttp.Open "HEAD", "https://www.nyaa.se/?page=download&tid=613616"
  49. objWinHttp.Send
  50. MsgBox objWinHttp.GetResponseHeader("Content-Disposition")
  51. ' MsgBox LenB ( objWinHttp.GetResponseHeader("Content-Disposition") )
  52. MsgBox AlterCharset( objWinHttp.GetResponseHeader("Content-Disposition"), "GB2312", "utf-8")
复制代码

TOP

本帖最后由 tmplinshi 于 2016-11-29 10:17 编辑

谢谢二位!

显示的结果是 "海贼�?65.rar.torrent",正确的应该是 "海贼王765.rar.torrent"。
不知道是不是因为 objWinHttp.GetResponseHeader("Content-Disposition") 返回的字符本身就丢失了数据。

TOP

本帖最后由 523066680 于 2016-11-29 11:08 编辑
  1. use Encode;
  2. use LWP::Simple;
  3. my $all = get("https://www.nyaa.se/?page=download&tid=613616");
  4. $all =~ /name\d+:(.*?rar)/i;
  5. print encode('gbk', decode('utf8', $1));
复制代码
海贼王765.rar

补充修改一下:
  1. use Encode;
  2. use LWP::Simple;
  3. my $h = head("https://www.nyaa.se/?page=download&tid=613616");
  4. print encode('gbk', decode('utf8',  $h->{'_headers'}->{'content-disposition'} ))
复制代码
inline; filename="海贼王765.rar.torrent"

TOP

回复 5# 523066680


    多谢。你这个方法好像是从文件内容中读取的文件名。如果指向的不是种子文件,而是其他的文件类型比如 exe 就无效了。

TOP

本帖最后由 523066680 于 2016-11-29 11:31 编辑

回复 6# tmplinshi


    补充修改了

关于 _headers , 和 content-disposition 的键值由来,Perl的说明文档没有具体介绍,但是可以通过 Data::Dump 输出整个数据结构

  • use LWP::Simple;
  • use Data::Dump qw(dump);

  • my $h = head("https://www.nyaa.se/?page=download&tid=613616");
  • print dump $h;

do {
  my $a = bless({
    _content => "",
    _headers => bless({
      "cf-ray" => "3092ded3aec707eb-LAX",
      "client-date" => "Tue, 29 Nov 2016 03:11:05 GMT",
      "client-peer" => "104.20.74.106:443",
      "client-response-num" => 1,
      "client-ssl-cert-issuer" => "/C=GB/ST=Greater Manchester/L=Salford/O=COMODO CA Limited/CN=COMODO ECC Domain Validation Secure Server CA 2",
      "client-ssl-cert-subject" => "/OU=Domain Control Validated/OU=PositiveSSL Multi-Domain/CN=ssl366349.cloudflaressl.com",
      "client-ssl-cipher" => "ECDHE-ECDSA-AES128-GCM-SHA256",
      "client-ssl-socket-class" => "IO::Socket::SSL",
      "connection" => "close",
      "content-disposition" => "inline; filename=\"\xE6\xB5\xB7\xE8\xB4\xBC\xE7\x8E\x8B765.rar.torrent\"",
      "content-type" => "application/x-bittorrent",
      "date" => "Tue, 29 Nov 2016 03:11:07 GMT",
      "last-modified" => "Thu, 23 Oct 2014 12:11:17 GMT",
      "server" => "cloudflare-nginx",
      "set-cookie" => "__cfduid=d41adfbdcefc8d9c55b9a6c24451c6fb61480389066; expires=Wed, 29-Nov-17 03:11:06 GMT; path=/; domain=.nyaa.se; HttpOnly",
      "vary" => "Accept-Encoding",
    }, "HTTP::Headers"),
    _msg => "OK",
    _protocol => "HTTP/1.1",
    _rc => 200,
    _request => bless({
      _content => "",
      _headers => bless({ "user-agent" => "LWP::Simple/6.00 libwww-perl/6.04" }, "HTTP::Headers"),
      _method => "HEAD",
      _uri => bless(do{\(my $o = "https://www.nyaa.se/?page=download&tid=613616")}, "URI::https"),
      _uri_canonical => 'fix',
    }, "HTTP::Request"),
  }, "HTTP::Response");
  $a->{_request}{_uri_canonical} = \${$a->{_request}{_uri}};
  $a;
}

我觉得这件事(网络爬虫)有三种语言比较合适:ruby python perl

安利 ruby

TOP

本帖最后由 tmplinshi 于 2016-11-29 11:50 编辑

@523066680 非常感谢!不过我仍然希望有 VBS 的解决方案。其实我是想在 AHK 中处理这个问题,而 VBS 代码可以直接在 AHK 中使用。

TOP

本帖最后由 523066680 于 2016-11-29 11:52 编辑

回复 4# tmplinshi


    从第一个弹窗显示的内容
inline; filename="娴疯醇鐜?65.rar.torrent"

可以发现已经有一个变成问号,将 “娴疯醇鐜?65” 还原 gbk 编码(假装是gbk),
其编码内容是:
[e6 b5] [b7 e8] [b4 bc] [e7 8e] 3f 36 35

而原本的编码内容是(utf8):
[e6 b5 b7] [e8 b4 bc] [e7 8e 8b] 37 36 35

由于gbk解读的话,>127的部分是2个字节为一个宽字符的,
提取 [e6 b5] [b7 e8] [b4 bc] [e7 8e] 后剩下 8b 37 36 35,
由于 [8b 37] 在 gbk 表中没有对应的字符,所以变成问号,就变成 3f 咯

按理说如果数据完整提取了,也只是按gbk解读会显示乱码,不应该丢失。
看看是 AlterCharset 的问题,还是  objWinHttp.GetResponseHeader("Content-Disposition")
最好打印编码出来看看

TOP

D:\Desktop>cscript /nologo test.vbs
inline; filename="娴疯醇鐜?65.rar.torrent"

D:\Desktop>cscript /nologo test.vbs | xd
000000  69 6e 6c 69 6e 65 3b 20 66 69 6c 65 6e 61 6d 65    inline; filename
000010  3d 22 e6 b5 b7 e8 b4 bc e7 8e 3f 36 35 2e 72 61    ="........?65.ra
000020  72 2e 74 6f 72 72 65 6e 74 22 0d 0a                r.torrent"..


TOP

curl 返回的是这样的:

TOP

本帖最后由 aa77dd@163.com 于 2016-11-29 16:50 编辑

数据如此: 这是什么编码我完全不知道
  1. 69,0,6E,0,6C,0,69,0,6E,0,65,0,3B,0,20,0,66,0,69,0,6C,0,65,0,6E,0,61,0,6D,0,65,0,3D,0,22,0,34,5A,AF,75,87,91,1C,94,3F,0,36,0,35,0,2E,0,72,0,61,0,72,0,2E,0,74,0,6F,0,72,0,72,0,65,0,6E,0,74,0,22,0,
复制代码
其中 34,5A,AF,75,87,91,1C,94,3F  按 UTF-16 LE 解码为 娴疯醇鐜

另外 ahk 也可以用
ComObjCreate("Msxml2.XMLHTTP")

之类

我觉得是 WinHttp 或者 VBS 的问题
  1. Const adTypeBinary = 1
  2. Const adTypeText = 2
  3. ' accept a string and convert it to Bytes array in the selected Charset
  4. Function StringToBytes(Str,Charset)
  5.   ' Dim Stream
  6.   Set Stream = CreateObject("ADODB.Stream")
  7.   Stream.Type = adTypeText
  8.   Stream.Charset = Charset
  9.   Stream.Open
  10.   Stream.WriteText Str
  11.   Stream.Flush
  12.   Stream.Position = 0
  13.   ' rewind stream and read Bytes
  14.   Stream.Type = adTypeBinary
  15.   StringToBytes= Stream.Read
  16.   Stream.Close
  17.   Set Stream = Nothing
  18. End Function
  19. ' accept Bytes array and convert it to a string using the selected charset
  20. Function BytesToString(Bytes, Charset)
  21.   ' Dim Stream
  22.   Set Stream = CreateObject("ADODB.Stream")
  23.   Stream.Charset = Charset
  24.   Stream.Type = adTypeBinary
  25.   Stream.Open
  26.   Stream.Write Bytes
  27.   Stream.Flush
  28.   Stream.Position = 0
  29.   ' rewind stream and read text
  30.   Stream.Type = adTypeText
  31.   BytesToString= Stream.ReadText
  32.   Stream.Close
  33.   Set Stream = Nothing
  34. End Function
  35. ' This will alter charset of a string from 1-byte charset(as windows-1252)
  36. ' to another 1-byte charset(as windows-1251)
  37. Function AlterCharset(Str, FromCharset, ToCharset)
  38.   Dim Bytes
  39.   Bytes = StringToBytes(Str, FromCharset)
  40.   
  41.   AlterCharset = BytesToString(Bytes, ToCharset)
  42. End Function
  43. Set objWinHttp = CreateObject("WinHttp.WinHttpRequest.5.1")
  44. objWinHttp.Open "HEAD", "https://www.nyaa.se/?page=download&tid=613616"
  45. objWinHttp.Send
  46. MsgBox objWinHttp.GetResponseHeader("Content-Disposition")
  47. MsgBox LenB ( objWinHttp.GetResponseHeader("Content-Disposition") )
  48.         HEXS=""
  49.         for i = 1 to LenB(objWinHttp.GetResponseHeader("Content-Disposition"))
  50.                 HEXS = HEXS & hex(ascb(MidB (objWinHttp.GetResponseHeader("Content-Disposition"), i, 1))) & ","
  51.         next
  52.         MsgBox HEXS
  53. MsgBox AlterCharset( objWinHttp.GetResponseHeader("Content-Disposition"), "GB2312", "utf-8")
复制代码

TOP

本帖最后由 523066680 于 2016-11-29 14:40 编辑

回复 12# aa77dd@163.com


    就用普通的 asc
  1. Set objWinHttp = CreateObject("WinHttp.WinHttpRequest.5.1")
  2. objWinHttp.Open "HEAD", "https://www.nyaa.se/?page=download&tid=613616"
  3. objWinHttp.Send
  4. name = objWinHttp.GetResponseHeader("Content-Disposition")
  5. say = ""
  6. for i = 1 to len(name)
  7.     say = say & hex( asc(mid(name, i, 1)) ) & " "
  8. next
  9. msgbox say
复制代码
---------------------------

---------------------------
69 6E 6C 69 6E 65 3B 20 66 69 6C 65 6E 61 6D 65 3D 22 E6B5 B7E8 B4BC E78E 3F 36 35 2E 72 61 72 2E 74 6F 72 72 65 6E 74 22
---------------------------
确定   
---------------------------

可以看到某字节已经丢失,变成了问号(0x3f)

TOP

回复 13# 523066680

问题可能出现在 GetResponseHeader("Content-Disposition") 方法
  1. req := ComObjCreate("Microsoft.XMLHTTP")
  2. ; Open a request with async enabled.
  3. req.open("GET", "https://www.nyaa.se/?page=download&tid=613616", true)
  4. ; Set our callback function (v1.1.17+).
  5. req.onreadystatechange := Func("Ready")
  6. ; Send the request.  Ready() will be called when it's complete.
  7. req.send()
  8. ; /*
  9. ; If you're going to wait, there's no need for onreadystatechange.
  10. ; Setting async=true and waiting like this allows the script to remain
  11. ; responsive while the download is taking place, whereas async=false
  12. ; will make the script unresponsive.
  13. while req.readyState != 4
  14.     sleep 100
  15. ; */
  16. #Persistent
  17. Ready() {
  18.     global req
  19.     if (req.readyState != 4)  ; Not done yet.
  20.         return
  21.     if (req.status == 200 || req.status == 304) {
  22.         MsgBox % "responseText: " req.responseText
  23. t:=req.GetResponseHeader("Content-Disposition")
  24.         MsgBox % "Content-Disposition: " t
  25. }
  26.     else
  27.         MsgBox 16,, % "Status " req.status
  28.     ExitApp
  29. }
复制代码

TOP

本帖最后由 523066680 于 2016-11-29 15:13 编辑

回复 14# aa77dd@163.com


    换一种语言海阔天空哈哈~ (如果ruby python perl 也不喜欢,那么,C# 是坠吼的

恩,这句话是和 tmplinshi 说的

TOP

返回列表