标题: [文本处理] 批处理怎样提取txt多个指定字符串的全部行? [打印本页]
作者: WILSONMAO 时间: 2021-9-26 12:54 标题: 批处理怎样提取txt多个指定字符串的全部行?
文本内容如最后,以下为一组,重复30万组。
内容开头为字段名,需要提取所有文本组内CT CY CL WC C1五个字段下的全部内容,需要五个字段是对应的,因为零星一些内容可能为空。
最后效果如下:
CT CY CL WC C1
1** ** ** ** **
2** ** ** ** **
3** ** ** ** **
.....
FN Clarivate Analytics Web of Science
VR 1.0
PT C
AU Si, D
Cheng, SC
Xing, RW
Liu, C
Wu, OY
AF Si, Dong
Cheng, Sunny Chieh
Xing, Ruiwen
Liu, Chang
Wu, Hoi Yan
GP IEEE
TI Scaling up Prediction of Psychosis by Natural Language Processing
SO 2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL
INTELLIGENCE (ICTAI 2019)
SE Proceedings-International Conference on Tools With Artificial
Intelligence
LA English
DT Proceedings Paper
CT 31st IEEE International Conference on Tools with Artificial Intelligence
(ICTAI)
CY NOV 04-06, 2019
CL Portland, OR
SP IEEE, IEEE Comp Soc
DE Machine learning; Natural language processing; Text classification;
Prediction of psychosis; Schizophrenia; Word embeddings; Convolutional
neural networks
ID HIGH-RISK; SCHIZOPHRENIA; PREVALENCE
AB Mental health professionals currently diagnose and treat mental disorders, such as schizophrenia, mainly by analyzing the language and speech of their patients, a method that maybe improved with the usage of artificial intelligence. This study aims to use machine learning to distinguish between the speech of patients who suffer from mental disorders which cause psychosis from that of healthy individuals to improve early detection of schizophrenia. We analyzed forty interview transcripts from patients who have been diagnosed with first episode psychosis. Word embeddings and convolutional neural network were utilized for the classification of patients from healthy individuals. The preliminary test results achieved a prediction rate of 99%, which indicated that our speech classifier was able to discriminate speech in patients from healthy individuals' daily conversations. This suggested that machine learning models can learn and train upon features of natural languages to predict whether or not an individual is beginning to show the first signs of early psychosis based on their speech. This line of inquiry will contribute to the improved identification of individuals at risk for psychiatric symptoms and lead to the development of targeted therapies.
C1 [Si, Dong; Xing, Ruiwen; Liu, Chang; Wu, Hoi Yan] Univ Washington, Comp & Software Syst, Bothell, WA 98011 USA.
[Cheng, Sunny Chieh] Univ Washington, Nursing & Healthcare Leadership, Tacoma, WA USA.
RP Si, D (corresponding author), Univ Washington, Comp & Software Syst, Bothell, WA 98011 USA.
EM dongsi@uw.edu; ccsunny@uw.edu; ruiwen@uw.edu; chang15@uw.edu;
hoiyanwu@uw.edu
FU Graduate Research Award of Computing and Software Systems division;
University of Washington BothellUniversity of Washington [74-0525];
NVIDIA Corporation (Santa Clara, CA, USA)
FX This research was funded by the Graduate Research Award of Computing and
Software Systems division and the startup fund 74-0525 of the University
of Washington Bothell.; We gratefully acknowledge the support of NVIDIA
Corporation (Santa Clara, CA, USA) with the donation of the GPU used for
this research.
NR 31
TC 1
Z9 1
U1 0
U2 0
PU IEEE COMPUTER SOC
PI LOS ALAMITOS
PA 10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1264 USA
SN 1082-3409
BN 978-1-7281-3798-8
J9 PROC INT C TOOLS ART
PY 2019
BP 339
EP 347
DI 10.1109/ICTAI.2019.00055
PG 9
WC Computer Science, Artificial Intelligence; Computer Science, Theory &
Methods
SC Computer Science
GA BP4NY
UT WOS:000553441500046
DA 2021-09-15
ER
作者: qixiaobin0715 时间: 2021-9-26 16:45
WC和C1是不是写颠倒了?
字段与字段之间是用空格分隔吗?
作者: WILSONMAO 时间: 2021-9-26 17:07
回复 2# qixiaobin0715
顺序可以随意更改;字段之间似乎没有空格,回车下一行了
作者: qixiaobin0715 时间: 2021-9-26 17:10
我说的是同一行的CT CY CL WC C1之间。
作者: qixiaobin0715 时间: 2021-9-26 17:21
源文件中CT CY CL C1 WC是固定顺序的吧?
作者: WILSONMAO 时间: 2021-9-26 17:42
回复 5# qixiaobin0715
是的
作者: WILSONMAO 时间: 2021-9-26 17:43
回复 4# qixiaobin0715
是的
作者: qixiaobin0715 时间: 2021-9-26 17:45
本帖最后由 qixiaobin0715 于 2021-9-26 17:48 编辑
零星一些内容可能为空,是什么意思?
CT也可能为空吗?
作者: WILSONMAO 时间: 2021-9-26 18:10
链接:https://pan.baidu.com/s/1QX4H6uUy41_ezGPwQuVszw
提取码:1x4z
附件链接
作者: WILSONMAO 时间: 2021-9-26 18:12
回复 8# qixiaobin0715
链接:https://pan.baidu.com/s/1QX4H6uUy41_ezGPwQuVszw
提取码:1x4z
详情见附件 感恩大佬
作者: idwma 时间: 2021-9-26 20:47
本帖最后由 idwma 于 2021-9-26 22:22 编辑
- @echo off
- setlocal enabledelayedexpansion
- set "str=CT CY CL C1 WC"
- (for /f "delims=" %%a in (111.txt) do (
- set "strr=%%a"
- if defined f (
- set ccc=
- if not "!strr:~0,2!"==" " (
- set f=
- set ccc=1
- )
- if not defined ccc (
- call set "!ff!=%%!ff!%%!strr:~3!"
-
- )
- )
-
- for %%c in (!str!) do (
- if "!strr:~0,2!"=="%%c" (
- set str=!str:%%c=!
- set "ff=%%c"
- set "!ff!=!strr:~3! "
- set f=1
- )
- )
-
- if defined CT if defined CY if defined CL if defined WC if defined C1 (
- if not defined f (
- set /a n+=1
- echo;!n!##!CT!##!CY!##!CL!##!C1!##!WC!
- for %%c in (CT CY CL C1 WC) do set %%c=
- set "str=CT CY CL C1 WC"
- )
- )
- ))>222.txt
- pause
复制代码
作者: qixiaobin0715 时间: 2021-9-27 11:22
本帖最后由 qixiaobin0715 于 2021-9-27 13:58 编辑
回复 1# WILSONMAO
文件较大,请耐心等待:- @echo off &@cls&chcp>nul 65001
- set var=CT CY CL WC C1
- setlocal enabledelayedexpansion
- (echo,CT,CY,CL,WC,C1
- for /f "tokens=1*" %%a in ('findstr /br "%var%" 2019') do (
- if "%%a"=="CT" if defined _CT (
- echo,"%%b","!_CY!","!_CL!","!_WC!","!_C1!"
- for %%i in (%var%) do set "_%%i="
- )
- set "_%%a=%%b"
- )
- echo,"!_CT!","!_CY!","!_CL!","!_WC!","!_C1!"
- )>test.csv
- pause
复制代码
作者: qixiaobin0715 时间: 2021-9-27 11:50
本帖最后由 qixiaobin0715 于 2021-9-27 12:18 编辑
回复 1# WILSONMAO
这样要准确些,并且效率提升不少:- @echo off &@cls&chcp>nul 65001
- set var=CT CY CL WC C1
- findstr /br "%var%" 2019>a.txt
- setlocal enabledelayedexpansion
- (echo,CT,CY,CL,WC,C1
- for /f "tokens=1*" %%a in (a.txt) do (
- if "%%a"=="WC" (
- echo,"!_CT!","!_CY!","!_CL!","%%b","!_C1!"
- for %%i in (%var%) do set "_%%i="
- )
- set "_%%a=%%b"
- ))>test.csv
- del a.txt
- pause
复制代码
csv文件可使用Excel打开。
作者: WILSONMAO 时间: 2021-9-30 10:01
回复 13# qixiaobin0715
感恩大佬
欢迎光临 批处理之家 (http://bbs.bathome.net/) |
Powered by Discuz! 7.2 |