[新手上路]批处理新手入门导读[视频教程]批处理基础视频教程[视频教程]VBS基础视频教程[批处理精品]批处理版照片整理器
[批处理精品]纯批处理备份&还原驱动[批处理精品]CMD命令50条不能说的秘密[在线下载]第三方命令行工具[在线帮助]VBScript / JScript 在线参考
返回列表 发帖

[文本处理] [已解决]如何能打开文件时自动识别编码处理后保存为ANSI?

本帖最后由 xp3000 于 2023-1-8 08:33 编辑

一些txt文件,里面很多多余内容需要删除,例如
PS:……
PS2:……
PS3:……
最后需要保存为ANSI编码。
搜索到一个这个不知道怎么用
  1. function detectEncoding(str) {
  2.   // 检查字符串开头的 BOM
  3.   if (str.charCodeAt(0) === 0xFEFF) {
  4.     return 'UTF-16BE';
  5.   } else if (str.charCodeAt(0) === 0xFFFE) {
  6.     return 'UTF-16LE';
  7.   } else if (str.charCodeAt(0) === 0xEFBB && str.charCodeAt(1) === 0xBF) {
  8.     return 'UTF-8BOM';
  9.   }
  10.   // 检查字符串开头的“UTF-32”标识符
  11.   if (str.charCodeAt(0) === 0x0000 && str.charCodeAt(1) === 0x0000 && str.charCodeAt(2) === 0xFEFF) {
  12.     return 'UTF-32BE';
  13.   } else if (str.charCodeAt(0) === 0xFEFF && str.charCodeAt(1) === 0x0000 && str.charCodeAt(2) === 0x0000) {
  14.     return 'UTF-32LE';
  15.   }
  16.   // 如果未找到上述模式,则假定字符串以 ASCII 或 ANSI 编码
  17.   return 'ANSI/ASCII';
  18. }
复制代码
  1. To determine the character encoding of a string in JavaScript, you can use the .charCodeAt() method to get the numeric Unicode representation of each character in the string and then use some logic to determine the character encoding based on the values that are returned.
  2. For example, to determine if a string is encoded in ANSI, you could use the following approach:
  3. Copy code
  4. function isANSI(str) {
  5.   for (let i = 0; i < str.length; i++) {
  6.     // If any character has a Unicode value greater than 255,
  7.     // the string is not encoded in ANSI
  8.     if (str.charCodeAt(i) > 255) return false;
  9.   }
  10.   // If all characters have a Unicode value less than or equal to 255,
  11.   // the string is encoded in ANSI
  12.   return true;
  13. }
  14. To determine if a string is encoded in UTF-8, you could use the following approach:
  15. Copy code
  16. function isUTF8(str) {
  17.   for (let i = 0; i < str.length; i++) {
  18.     let c = str.charCodeAt(i);
  19.     // If the Unicode value of the character is in the range 0-127,
  20.     // it is encoded as a single byte in UTF-8
  21.     if (c >= 0 && c <= 127) continue;
  22.     // If the Unicode value of the character is in the range 128-2047,
  23.     // it is encoded as two bytes in UTF-8
  24.     if (c >= 128 && c <= 2047) {
  25.       i++;
  26.       continue;
  27.     }
  28.     // If the Unicode value of the character is in the range 2048-65535,
  29.     // it is encoded as three bytes in UTF-8
  30.     if (c >= 2048 && c <= 65535) {
  31.       i += 2;
  32.       continue;
  33.     }
  34.     // If the Unicode value of the character is in the range 65536-1114111,
  35.     // it is encoded as four bytes in UTF-8
  36.     if (c >= 65536 && c <= 1114111) {
  37.       i += 3;
  38.       continue;
  39.     }
  40.     // If the character is not encoded as a single, double, triple, or
  41.     // quadruple byte in UTF-8, the string is not encoded in UTF-8
  42.     return false;
  43.   }
  44.   // If all characters in the string are encoded as a single, double,
  45.   // triple, or quadruple byte in UTF-8, the string is encoded in UTF-8
  46.   return true;
  47. }
  48. To determine if a string is encoded in UTF-8 with a BOM (Byte Order Mark), you could use the following approach:
  49. Copy code
  50. function isUTF8BOM(str) {
  51.   // The BOM for UTF-8 is the byte sequence EF BB BF
  52.   if (str.charCodeAt(0) === 0xEF && str.charCodeAt(1) === 0xBB && str.charCodeAt(2) === 0xBF) {
  53.     // If the first three bytes of the string match the BOM for UTF-8,
  54.     // check if the rest of the string is encoded in UTF-8
  55.     return isUTF8(str.substring(3));
  56.   }
  57.   // If the first three bytes of the string do not match the BOM for UTF-8,
  58.   // the string is not encoded in UTF-8
  59. Try again
复制代码

各种编码都有,可能还有UTF-32,文件被人上传会删除空行,网站处理后我下载的是正常和乱码的混合文本

TOP

\x2B\x2F\x76        文件为 UTF-7 编码
\xEF\xBB\xBF        文件为 UTF-8 BOM 编码
\xFE\xFF        文件为 UTF-16 BE 编码
\xFF\xFE        文件为 UTF-16 LE 编码
\x00\x00\xFE\xFF        文件为 UTF-32 BE 编码
\xFF\xFE\x00\x00        文件为 UTF-32 LE 编码

不能上传文件,查了下文件开头这样的,以二进制十六进制查看文件,开头是上面去掉\x内容

TOP

本帖最后由 xp3000 于 2023-1-3 09:20 编辑

回复 10# czjt1234
https://cowtransfer.com/s/215a0b55e04c4e 点击链接查看 [ UTF-32.zip ] ,或访问奶牛快传 cowtransfer.com 输入传输口令 mdty5z 查看;

TOP

这个BAT+JS可以实现吗

TOP

vbs也可以,主要是js能看懂一点

TOP

学习了,谢谢各位大神

TOP

返回列表