[新手上路]批处理新手入门导读[视频教程]批处理基础视频教程[视频教程]VBS基础视频教程[批处理精品]批处理版照片整理器
[批处理精品]纯批处理备份&还原驱动[批处理精品]CMD命令50条不能说的秘密[在线下载]第三方命令行工具[在线帮助]VBScript / JScript 在线参考
返回列表 发帖

[文本处理] 批处理-按关键字分割TXT文件

本帖最后由 qd2024 于 2023-2-2 14:32 编辑

TXT文件中,分割位置行首有关键字,如★★★★★

以“★★★★★”为标记,把一个TXT文件分割 为多个TXT文件,
生成的TXT文件,以“★★★★★”所在行为首行,以这一行的文字内容为文件名,
“★★★★★”写标记了开始位置,第2个“★★★★★”,即是第2个TXT文件的开始,也是第1个文件的结束,最后一个到文件尾
但生成的文行首行和文件名中都不包含“★★★★★”

下面的示例文本,最后生成4个txt文件
Module1 unit1 What a delicious smell
Module1 unit2 You sound just like me!
Module2 unit1 What are you doing
Module2 unit2 They have been to many interesting places.

如标点中有非法字符,就删除非法字符,不用写到文件名中。


谢谢



文本原文示例:
★★★★★Module1 unit1 What a delicious smell
Tony: Mnn…What a delicious smell! Your pizza looks so nice.
Betty: Thanks! Would you like to try some?
Tony: Yes, please. It looks lovely, it smells delicious and mm, it tastes good.
Daming: What’s that on top?
Betty: Oh, that’s cheese. Do you want to try a piece?
Daming: Ugh! No, thanks. I’m afraid I don’t like cheese. It doesn’t smell fresh. It smells too strong and it tastes a bit sour.
Betty: Well, my chocolate cookies are done now. Have a try!
Daming: Thanks! They taste really sweet and they feel soft in the middle.
Tony: Are you cooking lots of different things? You look very busy!
Betty: Yes, I am! There’s some pizza and some cookies, and now I’m making an apple pie and a cake.
Daming: Apple pie sounds nice. I have a sweet tooth, you know. Shall I get the sugar?
Betty: Yes, please. Oh, are you sure that’s sugar? Taste it first. It might be salt!
Daming: No, it’s OK. It tastes sweet. It’s sugar.
Tony: What’s this? It tastes sweet too.
Betty: That’s strawberry jam, for the cake.
Daming: Good, everything tastes so sweet! It’s my lucky day!
★★★★★Module1 unit2 You sound just like me!
Hi Lingling,
Thanks for your last message. It was great to hear from you, and I can't wait to meet you.
I hope you will know me from my photo when I arrive at the airport. I'm quite tall, with short fair hair, and I wear glasses. I'll wear jeans and a T-shirt for the journey, but I'll also carry my warm coat. I've got your photo - you look very pretty. I'm sure we'll find each other!
Thanks for telling me about your hobbies. You sound just like me! I spend a lot of time playing classical music with my friends at school, but I also like dance music - I love dancing! I enjoy sports as well, especially tennis. My brother is in the school tennis team - I'm very proud of him! He's good at everything, but I'm not. Sometimes I get bad marks at school, and I feel sad. I should work harder.
You asked me, "How do yo feel about coming to China? "Well, I often feel a bit sad at first when I leave my mum and dad for a few days, and I'm quite shy when I'm with strangers. I feel nervous when I speak Chinese, but I'll be fine in a few days. I'm always sorry when I don't know how to do things in the right way, so please help me when I'm with you in China! Oh, I'm afraid of flying too. But I can't tell you how excited I am about going to China!
See you next week!
★★★★★Module2 unit1 What are you doing?
Tony: Hi, Lingling. What are you doing?
Lingling: I'm entering a competition.
Tony: What kind of competition?
Lingling: A speaking competition.
Tony: "Great. "It'll help you improve your speaking. And maybe you will win a prize.
Lingling: The first prize is "My dream holiday".
Tony: Have you ever won any prizes before?
Lingling: No, I haven't. I've always wanted to go on a dream holiday. But I can't afford it. The plane tickets are too expensive.
Tony: Well, good luck! I've also entered lots of speaking competitions, but haven't won any prizes. I've stopped trying now.
Lingling: That's a pity. Have you ever thought about other kinds of competitions?
Tony: What do you mean?
Lingling: look! Here's a writing competition: Around the world in 80 Days. To win it, you need to write a short story about a place you've visited.
Tony: That sounds wonderful, but I haven't travelled much. How can I write about it?
Lingling: Don't worry. It doesn’t need to be true! You can make it up.
Tony: You're right. I'll try. I hope I will win, then I will invite you to come with me.
Lingling: Sorry! The first prize is only the book called Around the World in 80 Day!
★★★★★Module2 unit2 They have been to many interesting places.
Mike Robinson is a fifteen-year-old American boy and his sister Clare is fourteen. At the moment, Mike and Clare are in Cairo in Egypt, one of the biggest and busiest cities in Africa.
They moved here with their parents two years ago. Their father, Peter works for a very big company. "The company has offices in many countries, and it has sent Peter to work in Germany, France and China before. "Peter usually stays in a country for about two years. Then the company moves him again. His family always goes with him.
The Robinsons love seeing the world. They have been to many interesting places. For example, in Egypt, they seen the Pyramids, travelled on a boat on the Nile River, and visited the palaces and towers of ancient kings and queens.
Mike and Clare have also begun to learn the language of the country, Arabic. This language is different from English in many ways, and they find it hard to spell and pronounce the words. However, they still enjoy learning it. So far they have learnt to speak German, French, Chinese and Arabic. Sometimes they mix the languages. "It's really fun, "said Clare.
The Robinsons are moving again. The company has asked Peter to work back in the US. Mike and Clare are happy about this. They have friends all over the word, but they also miss their friends in the US. They are counting down the days.

回复 27# hfxiang


    好 谢谢

TOP

回复 28# terse


    谢谢 我测试一下 感谢

TOP

回复 26# qixiaobin0715


     我用另外的代码 给单词加中文 另一个要求U8

现在可以了

TOP

powershell 直接从word文档导出txt 这里档名为 a.docx
  1. <# : batch portion (begins PowerShell multi-line comment block)
  2. @echo off & setlocal
  3. powershell -noprofile -NoLogo "iex (${%~f0} | out-string)"
  4. pause
  5. exit
  6. #>
  7. $word = New-Object -ComObject Word.Application
  8. $file = (ls a.docx).FullName
  9. $doc = $word.Documents.Open($file)
  10. $text = $doc.Content.Text
  11. $pattern =[regex] '(?i)(Module\d+\s+unit\d+)[\r\n]*(.+?)(?=Module\d+\s+unit\d+|$)'
  12. $paragraphs = [regex]::matches($text,$pattern)
  13. $doc.Close()
  14. $word.Quit()
  15. $paragraphs.ForEach({[IO.File]::WriteAllText( $_.Groups[1].Value+ '.txt',$_.Groups[2].Value,[Text.Encoding]::Default)})
复制代码

TOP

本帖最后由 hfxiang 于 2023-2-5 12:30 编辑

回复 10# qd2024

把Word文档以GB2312编码另存为“最新八年级外研版英语下册课文.txt”,经Windows10下反复测试,如下gawk( http://bcn.bathome.net/tool/4.1.3/gawk.exe )脚本能胜任(无乱码):
  1. gawk -vRS="Module[0-9]+ unit[0-9]+" "F_n{print F_n\"\n\"$0>F_n\".txt\"}{F_n=RT}" 最新八年级外研版英语下册课文.txt
复制代码

TOP

回复 22# qd2024
后续还需要如何处理?

TOP

回复 24# aloha20200628
楼主的需求是,分割后的文件编码为UTF-8

TOP

与lz分享一下我的调试过程》
一。系统环境是win8.1简中版
二。复制lz的原文到记事本,用ANSI编码存盘为a.txt
三。本人的批处理脚本代码用记事本亦选ANSI编码存盘a.cmd
四。a.txt与a.cmd在同一目录
五。运行a.cmd,拖入或键入a.txt
六。结果是在a.txt目录中生成4个*.txt文件,完好复刻lz的需求效果(原文中的!...!段落不会丢失)。
      请问lz的调试方法与上述有何不同?

TOP

将源文件和批处理文件统一UTF-8编码:
  1. @echo off &@cls&chcp>nul 65001
  2. findstr /n /rb "Module[0-9]*.unit[0-9]" 1.txt>1.log
  3. for /f "delims=:" %%a in (1.log) do set _%%a=true
  4. del 1.log
  5. for /f "tokens=1* delims=:" %%i in ('findstr /n .* 1.txt') do (
  6.     if defined _%%i set "filename=%%j.txt"
  7.     set "str=%%j"
  8.     setlocal enabledelayedexpansion
  9.     echo,!str!>>!filename!
  10.     endlocal
  11. )
  12. pause
复制代码

TOP

回复 21# aloha20200628


    分割后的TXT文件,后面还有进一步处理,试了很久,
    只有UTF-8编辑才可以

谢谢

TOP

一。lz可先用记事本将原文件存为ANSI编码
二。以下批处理脚本代码亦存为ANSI编码
  1. @set @v=1 /*
  2. @echo off
  3. set "tF=" &set/p "tF=原文件:"
  4. if not defined tF exit/b
  5. (cscript.exe -e:jscript "%~f0" %tF%)
  6. exit/b
  7. */
  8. var v=WScript.arguments;
  9. var fso=new ActiveXObject('scripting.filesystemobject');
  10. var fr=fso.opentextfile(v(0));
  11. var alllines=fr.readall().split('\r\n'); fr.close();
  12. var n, nL=alllines.length, outF='';
  13. for (n=0; n<nL; ++n)
  14. if (alllines[n].indexOf('★') != -1) {
  15. if (outF != '') fw.close();
  16. outF=alllines[n].replace(/[★\?]/g, '');
  17. outF+='.txt';
  18. fw=fso.opentextfile(outF, 2, true);
  19. }
  20. else fw.write(alllines[n]+'\r\n');
  21. WSH.quit(0);
复制代码

TOP

回复 13# qixiaobin0715


    分割后的TXT文件,后面还有进一步处理,试了很久,发现 只有UTF-8才可以 没有乱码 谢谢 帮我看看

TOP

回复 13# qixiaobin0715
分割后文件里有不确定乱码 能解决吗

    Module4 unit1
Doctor: How can I help you?
Daming: I fell ill. I鈥檝e got a stomach ache and my head hurts.
Doctor: How long have you been like this?
Daming: Since Friday. I've been ill for about three days!
Doctor: I see. Have you caught a cold?
Daming: I don't think so.
Doctor: Let me take your temperature鈥mm, there's no fever. What kind of food do you eat?
Daming: Usually fast food.
Doctor: Do you have breakfast?
Daming: No, not usually.
Doctor: 鈥淭hat's the problem! Fast food and no breakfast.鈥?That's why you've got a stomach ache.
Daming: What about the headache?
Doctor: Do you do any exercise?
Daming: Not really. I haven't done much exercise since I got my computer last year.
Doctor: 鈥淵ou spend too much time in front of the computer.鈥?It can be very harmful to your health.
Daming: OK, so what should I do?
Doctor: Well, don't worry. It's not serious. First, stop eating fast food and have breakfast every day. Second, get some exercise, such as running. And I'll give you some medicine. Take it three times a day.
Daming: Thank you, doctor.

====

Module3 unit2
鈥淪cientists think that there has been life on the earth for hundreds of millions of years.鈥?However, we have not found life on any other planets yet.
The earth is a planet and it goes around the sun. Seven other planets also go around the sun. None of them has an environment like that of the earth, so scientists do not think they will find life on them. The sun and its planets are called the solar system, and our solar system is a small part of a much larger group of stars and planets, called the Galaxy or the Milky Way. There are billions of stars in the Galaxy, and our sun is only one of them.
Scientists have also discovered many other galaxies in the universe. They are very far away and their light has to travel for many years to reach us. So how large is the universe? It is impossible to imagine.
Scientists have sent spaceships to the planet Mars to take photos. They have even sent spaceships to travel outside the solar system. However, no spaceship has travelled far enough to reach other stars in our Galaxy.
Scientists have always asked the questions: with so many stars in the universe, are we alone, or is there life out there in space? Have there been visitors to the earth from other planets? Why has no one communicated with us? We do not know the answers... yet.



=====

Module3 unit1
Daming: Hi, Tony. What are you up to?
Tony: Hi Daming. I've just made a model spaceship for our school project.
Daming: I haven't started yet because I'm not sure how to make it. Can you help me?
Tony: Sure, no problem. Have you heard the latest news? Scientists have sent a spaceship to Mars. The journey has taken several months.
Daming: Has it arrived yet?
Tony: Yes, it has arrived already. That's why it's on the news.
Daming: So have they discovered life on the Mars?
Tony: No, they haven't yet.
Daming: Are there any astronauts in the spaceship?
Tony: No, there aren't.
Daming: 鈥淲hy not? Astronauts have already been to the moon.鈥?
Tony: Yes, but no one has been to Mars yet, because Mars is very far away, much farther than the moon. Lots of scientists are working hard in order to send astronauts to Mars one day.
Daming: That's interesting! How can I get information on space travel?
Tony: You can go online to search for information.
Daming: I will. Thank you, Tony!

TOP

回复 14# hfxiang


    谢谢

TOP

返回列表