标题: [文本处理] 【已解决】求助批处理从一堆XML中提取所有关键词写到一个csv中 [打印本页]
作者: zhengwei007 时间: 前天 17:07 标题: 【已解决】求助批处理从一堆XML中提取所有关键词写到一个csv中
本帖最后由 zhengwei007 于 2024-11-21 21:19 编辑
我有N个XML碎文件,文件名各式各样,但无所谓,我希望把这些文件内容全部整理到一个csv中,我就挑几个例子展示,大概样式如下:复制代码
通过批处理输出结果如下:- id,type,name,icon,bodypart,armor_type,weight,soulshots,spiritshots,crystal_type,p_dam,rnd_dam,weapon_type,critical,hit_modify,atk_speed,m_dam,price,sellable,dropable,tradeable,pAtk,mAtk,rCrit,accCombat,pAtkSpd,pAtk,mAtk,pDef
- 1,Weapon,Short Sword,icon.weapon_small_sword_i00,rhand,,1600,1,1,none,8,10,sword,8,,379,6,768,,,,8,6,8,0,379,0,0,
- 2,Weapon,Long Sword,icon.weapon_long_sword_i00,rhand,,1560,2,2,none,24,10,sword,8,,379,17,136000,,,,24,17,8,0,379,0,0,
- 6,Weapon,Apprentice's Wand,icon.weapon_apprentices_wand_i00,rhand,,1350,1,1,none,5,20,blunt,4,4,379,7,138,FALSE,FALSE,FALSE,5,7,4,4,379,0,0,
- 99,Weapon,Apprentice's Spellbook,icon.weapon_apprentices_spellbook_i00,rhand,,650,1,1,none,9,10,etc,8,,379,12,12500,,,,9,12,8,0,379,0,0,
- 9208,Armor,Phantom Mask (Event),icon.accessary_mask_of_chaotic_i00,dhair,none,10,,,none,,,,,,,,,FALSE,FALSE,FALSE,,,,,,,,0
复制代码
里面遇到新的字段时,直接增加到列标题上,没有该数据的,值为空。
每个[item]为一段,for里面是把后面的做为标题,前面的val才是值。
不知有没有大佬帮忙看看好不好做这个批处理。先谢谢各位!
作者: new_user 时间: 前天 21:10
powershell
作者: czjt1234 时间: 前天 21:12
本帖最后由 czjt1234 于 2024-11-20 21:30 编辑
pAtk,mAtk,rCrit,accCombat,pAtkSpd,pAtk,mAtk,pDef
输出结果,第一行最后为什么有2个pAtk,mAtk
你处理csv的程序怎么分辨这2个相同的数据?
ID为7和8的为什么不输出?
作者: zhengwei007 时间: 前天 21:36
- <set val="9" order="0x08" stat="pAtk" />
- <set val="12" order="0x08" stat="mAtk" />
- <enchant val="0" order="0x0C" stat="pAtk" />
- <enchant val="0" order="0x0C" stat="mAtk" />
复制代码
虽然是2个,但是意义不太一样,只要所有的字段顺序对了就行,一样不一样对我来说无所谓。
我想要一个通用版本的批处理,以后用别的XML文件,我只需要简单修改这个批处理里面的关键词就能通用,要不一旦变了,我还得来求助。
作者: czjt1234 时间: 前天 21:42
本帖最后由 czjt1234 于 2024-11-20 21:49 编辑
你举例的这文件,有29个字段
那别的xml文件里会不会有新的字段?
是每个xml各输出一个csv,还是所有xml合并输出到一个csv?
作者: Five66 时间: 前天 22:41
for描述里的装备效果不止一个 ,有set 有add,有enchant,说不定可能还有sub ,div或者其他什么的
这样一来csv列数根本就不明 ,怎么弄???这能弄么???
作者: aloha20200628 时间: 昨天 00:14
本帖最后由 aloha20200628 于 2024-11-21 00:16 编辑
回复 1# zhengwei007
以下代码针对一楼示例样本 1.xml,结果文件为 new.xml,假定同名字段(如 pAtk,mAtk)共在一个节点内(如 <for ... >)- @echo off &setlocal enabledelayedexpansion
- set "lst=,item id, type, name,icon,bodypart,armor_type,weight,soulshots,spiritshots,crystal_type,p_dam,rnd_dam,weapon_type,critical,hit_modify,atk_speed,m_dam,price,sellable,dropable,tradeable,pAtk,mAtk,rCrit,accCombat,pAtkSpd,pAtk1,mAtk1,pDef,"
- set "v=!lst!"
- (echo,item,type,name,icon,bodypart,armor_type,weight,soulshots,spiritshots,crystal_type,p_dam,rnd_dam,weapon_type,critical,hit_modify,atk_speed,m_dam,price,sellable,dropable,tradeable,pAtk,mAtk,rCrit,accCombat,pAtkSpd,pAtk1,mAtk1,pDef
- for /f tokens^=1-7^delims^=^ ^<^=^" %%a in (
- 'findstr /rbi /c:" *.*=\"^" /c:" ^</item^>" "1.xml" ') do (
- if /i "%%a"=="item id" (
- set "v=!v:%%a,=%%b,!"&set "v=!v:%%c,=%%d,!"&set "v=!v:%%e,=%%f,!"
- ) else if "%%e"==" />" (set "v=!v:%%b,=%%d,!"
- ) else if "%%g"==" />" (
- set "go=1"&for %%s in ("%%f","%%f1") do if defined go if "!v:%%~s,=%%b,!" neq "!v!" (
- set "go="&set "v=!v:%%~s,=%%b,!")
- ) else if /i "%%a"=="/item>" ((for %%s in (!lst!) do set "v=!v:,%%s,=,,!")&echo,!v:~1,-1!&set "v=!lst!")
- ))>"new.xml"
- endlocal&pause&exit/b
复制代码
作者: zhengwei007 时间: 昨天 00:34
你举例的这文件,有29个字段
那别的xml文件里会不会有新的字段?
是每个xml各输出一个csv,还是所有xml ...
czjt1234 发表于 2024-11-20 21:42
你好,我希望每个XML合到一个文件中。
对于重复字段,只启前面的,后面enchant的直接过滤掉。
作者: zhengwei007 时间: 昨天 00:36
回复 zhengwei007
以下代码针对一楼示例样本 1.xml,结果文件为 new.xml,假定同名字段(如 pAtk,mAtk ...
aloha20200628 发表于 2024-11-21 00:14
您好,程序报错了,提示如下:- FINDSTR: 无法打开 1.xml
- 请按任意键继续...
复制代码
我的文件名怎么起的都有,如0000-0099.xml,2900-2999.xml,9200-9299.xml 这种的
作者: qixiaobin0715 时间: 昨天 08:46
请提供2个或2个以上需要处理的xml文件,不要修改,直接上传到网盘中,便于大家测试用。
作者: zhengwei007 时间: 昨天 09:28
链接: https://pan.baidu.com/s/1wrnrL-sqME6kMGqtb58ULQ
提取码: 3w3r
复制这段内容后打开百度网盘手机App,操作更方便哦
谢谢,这是全内容。
作者: hfxiang 时间: 昨天 11:15
回复 1# zhengwei007
将以下脚本以ansi编码格式保存为xml2csv.awk,下载gawk( http://bcn.bathome.net/tool/4.1.0/gawk.exe ),执行gawk -fxml2csv.awk *.xml>out.csv- BEGIN {
- FS = "\""
- }
-
- /^[ \t]*<item id=.+>[ \t]*$/, /^[ \t]*<\/item>[ \t]*$/ {
- if (/^[ \t]*<item id=.+>[ \t]*$/) {
- nnn++
- id_[nnn, "id"] = $2
- type_[nnn, "type"] = $4
- name_[nnn, "name"] = $6
- }
- if (/^[ \t]*<set name=".+" val=".+".+\/>[ \t]*$/) {
- if (! ($2 in d_id)) {
- id_n++
- d_id[$2] = id_n
- }
- dat_[nnn, $2] = $4
- }
- if (/^[ \t]*<for>[ \t]*$/) {
- for_id = 1
- }
- if (/^[ \t]*<\/for>[ \t]*$/) {
- for_id = 0
- }
- if (for_id) {
- if (/^[ \t]*<.+val=".+" order=".+" stat=".+".+\/>[ \t]*$/) {
- if (! ($6 in d_id)) {
- id_n++
- d_id[$6] = id_n
- }
- dat_[nnn, $6] = $2
- }
- }
- }
-
- END {
- PROCINFO["sorted_in"] = "@val_num_asc"
- printf "%s,%s,%s", "id", "type", "name"
- for (j in d_id) {
- printf ",%s", j
- }
- print ""
- for (i = 1; i <= nnn; i++) {
- printf "%s,%s,%s", id_[i, "id"], type_[i, "type"], name_[i, "name"]
- for (j in d_id) {
- printf ",%s", dat_[i, j]
- }
- print ""
- }
- }
复制代码
作者: czjt1234 时间: 昨天 11:25
本帖最后由 czjt1234 于 2024-11-21 11:58 编辑
- rem 另存为 ANSI 编码 bat
- ' & cls & cscript.exe /nologo /e:vbscript "%~f0" %* & pause & exit /b
-
- Option Explicit
- Dim p, a, b, c, e, d, f, i, v, oDOMDocument, oWshShell, oFSO
-
- p = "." '当前路径。可以指定其它路径,比如 d:\test\
-
- e = "id,type,name" '固定字段
-
- a = "name" '属性名
- b = "val" '属性值
-
- c = "stat" '属性名
- d = "val" '属性值
-
- Set oDOMDocument = CreateObject("Msxml2.DOMDocument")
- Set oWshShell = CreateObject("WScript.Shell")
- Set oFSO = CreateObject("Scripting.FileSystemObject")
- p = oFSO.GetAbsolutePathName(p)
- oWshShell.CurrentDirectory = p
- For Each i In oFSO.GetFolder(p).Files
- If LCase(oFSO.GetExtensionName(i)) = LCase("xml") Then
- f = ""
- v = ""
- Call t(i.Path)
- End If
- Next
-
- Sub t(ByVal file)
- Dim s, i, x, a0, a1, f1, f2, oNode, oXMLDOMSelection
- oDOMDocument.load file
- If oDOMDocument.parseError.errorCode <> 0 Then
- wsh.Echo i & vbCrLf & oDOMDocument.parseError.reason _
- & "第 " & oDOMDocument.parseError.line & " 行"
- wsh.Quit()
- End If
-
- x = ".//*[@" & a & " and @" & b & "]"
- f1 = ","
- For Each oNode In oDOMDocument.documentElement.childNodes
- Set oXMLDOMSelection = oNode.SelectNodes(x)
- For Each i In oXMLDOMSelection
- s = i.nodeName & "_" & i.getAttribute(a) & ","
- If InStr(f1, "," & s ) = 0 Then f1 = f1 & s
- Next
- Next
- f1 = Left(f1, Len(f1) - 1)
- f1 = Right(f1, Len(f1) - 1)
- f2 = ","
- x = ".//*[@" & c & " and @" & d & "]"
- For Each oNode In oDOMDocument.documentElement.childNodes
- Set oXMLDOMSelection = oNode.SelectNodes(x)
- For Each i In oXMLDOMSelection
- s = i.nodeName & "_" & i.getAttribute(c) & ","
- If InStr(f2, "," & s ) = 0 Then f2 = f2 & s
- Next
- Next
- If f2 = "," Then
- f = e & "," & f1
- Else
- f2 = Left(f2, Len(f2) - 1)
- f2 = Right(f2, Len(f2) - 1)
- f = e & "," & f1 & "," & f2
- End If
-
- For Each oNode In oDOMDocument.documentElement.childNodes
- For Each i In Split(e, ",")
- v = v & oNode.getAttribute(i) & ","
- Next
- For Each i In Split(f1, ",")
- a0 = Split(i, "_")(0)
- a1 = Split(i, "_", 2)(1)
- x = ".//" & a0 & "[@" & a & " = '" & a1 & "']"
- Set oXMLDOMSelection = oNode.SelectNodes(x)
- If oXMLDOMSelection.length = 0 Then
- x = ""
- Else
- x = oXMLDOMSelection(0).getAttribute(b)
- If InStr(x, ",") Then x = """" & x & """"
- End If
- v = v & x & ","
- Next
- For Each i In Split(f2, ",")
- If f2 = "," Then Exit For
- a0 = Split(i, "_")(0)
- a1 = Split(i, "_", 2)(1)
- x = ".//" & a0 & "[@" & c & " = '" & a1 & "']"
- Set oXMLDOMSelection = oNode.SelectNodes(x)
- If oXMLDOMSelection.length = 0 Then
- x = ""
- Else
- x = oXMLDOMSelection(0).getAttribute(d)
- If InStr(x, ",") Then x = """" & x & """"
- End If
- v = v & x & ","
- Next
- v = Left(v, Len(v) - 1) & vbCrLf
- Next
-
- s = oFSO.GetBaseName(file) & ".csv"
- oFSO.OpenTextFile(s, 2, True).Write f & vbCrLf & v
- wsh.Echo s
- End Sub
复制代码
作者: qixiaobin0715 时间: 昨天 11:28
小批量测试,发现有些name字段存在逗号,将name字段使用双引号来屏蔽,以避免干扰列的显示。- @echo off
- set TableHeader=id,type,name,icon,bodypart,armor_type,weight,soulshots,spiritshots,crystal_type,p_dam,rnd_dam,weapon_type,critical,hit_modify,atk_speed,m_dam,price,sellable,dropable,tradeable,pAtk,mAtk,rCrit,accCombat,pAtkSpd,pAtk,mAtk,pDef
- for %%i in (%TableHeader%) do set _%%i=true
- (echo,%TableHeader%
- setlocal enabledelayedexpansion
- for /f tokens^=1-6^ delims^=^"^=^ %%1 in ('type *.xml 2^>nul^|findstr "<item </item> <set <add"') do (
- if "%%1"=="<item id" (
- set "id=%%2"
- set "type=%%4"
- set "name="%%6""
- ) else (
- if defined _%%2 (
- set %%2=%%4
- ) else if defined _%%6 (
- set %%6=%%2
- ) else if "%%1"=="</item>" (
- for %%i in (%TableHeader%) do (
- if defined str (set str=!str!,!%%i!) else set str=!%%i!
- set %%i=
- )
- echo,!str!
- set str=
- )
- )
- ))>1.csv
- pause
复制代码
作者: czjt1234 时间: 昨天 11:33
回复 13# czjt1234
除了固定字段,其它字段名前面加了节点名,再有重复就不管了
比如
set_pAtk,set_mAtk,enchant_pAtk,enchant_mAtk
作者: qixiaobin0715 时间: 昨天 11:34
回复 4# zhengwei007
想法太美好,现实很骨感。哪里又有通用代码,况且你要处理的文件又不是很规范。
作者: aloha20200628 时间: 昨天 12:40
回复 11# zhengwei007
用7楼代码随机测试了11楼提供的几个文件(代码中的 1.xml 改为测试文件名),均予通过...
以下代码是将7楼代码改为批量处理版本,可将当前目录下的全部 *.xml 中的目标字段值析取整合到 new.xml,测试结果共析取9208条数据,与源文件名区间号码一致...- @echo off &setlocal enabledelayedexpansion &echo,稍候...
- set "lst=,item id, type, name,icon,bodypart,armor_type,weight,soulshots,spiritshots,crystal_type,p_dam,rnd_dam,weapon_type,critical,hit_modify,atk_speed,m_dam,price,sellable,dropable,tradeable,pAtk,mAtk,rCrit,accCombat,pAtkSpd,pAtk1,mAtk1,pDef,"
- set "v=!lst!"
- (echo,item,type,name,icon,bodypart,armor_type,weight,soulshots,spiritshots,crystal_type,p_dam,rnd_dam,weapon_type,critical,hit_modify,atk_speed,m_dam,price,sellable,dropable,tradeable,pAtk,mAtk,rCrit,accCombat,pAtkSpd,pAtk1,mAtk1,pDef
- for /f "delims=" %%F in ('dir/b/a-d *.xml') do for /f tokens^=1-7^delims^=^ ^<^=^" %%a in (
- 'findstr /rbi /c:" *.*=\"^" /c:" ^</item^>" "%%F" ') do (
- if /i "%%a"=="item id" (
- set "v=!v:%%a,=%%b,!"&set "v=!v:%%c,=%%d,!"&set "v=!v:%%e,=%%f,!"
- ) else if "%%e"==" />" (set "v=!v:%%b,=%%d,!"
- ) else if "%%g"==" />" (
- set "go=1"&for %%s in ("%%f","%%f1") do if defined go if "!v:%%~s,=%%b,!" neq "!v!" (
- set "go="&set "v=!v:%%~s,=%%b,!")
- ) else if /i "%%a"=="/item>" ((for %%s in (!lst!) do set "v=!v:,%%s,=,,!")&echo,!v:~1,-1!&set "v=!lst!")
- ))>"new.xml"
- endlocal&pause&exit/b
复制代码
作者: zhengwei007 时间: 昨天 21:19
谢谢各位大佬,已经解决了,我发现修改后和别的内容根本不能通用,我是想的有点多了,再次感谢。
欢迎光临 批处理之家 (http://bbs.bathome.net/) |
Powered by Discuz! 7.2 |