|
|
drugbank.txt中的信息大致如下
(由于附件太大,我只上传了一个drug的具体信息,全部的上传到了百度网盘,连接如下),想要提取的内容都是红色加粗部分:
http://pan.baidu.com/s/1numNLLf
#BEGIN_DRUGCARD DB00001
# AHFS_Codes:
20:12.04.12
# ATC_Codes:
B01AE02
# Absorption:
Bioavailability is 100% following injection.
# Drug_Interactions:
Ginkgo biloba Additive anticoagulant/antiplatelet effects may increase bleed risk. Concomitant therapy should be avoided.
Treprostinil The prostacyclin analogue, Treprostinil, increases the risk of bleeding when combined with the anticoagulant, Lepirudin. Monitor for increased bleeding during concomitant thearpy.
# Indication:
For the treatment of heparin-induced thrombocytopenia
# KEGG_Drug_ID:
D06880
# Pathways:
Lepirudin Pathway SMP00278
#END_DRUGCARD DB00001
希望经过处理后得到的out.txt如下:
BEGIN_DRUGCARD ATC_Codes Drug_Interactions Indication KEGG_Drug_ID Pathways
DB00001 B01AE02 Ginkgo biloba,Treprostinil heparin-induced thrombocytopenia D06880 SMP00278
DB00002 ... .... ... ... ...
如下是我的程序,得不到结果,希望大神能给出有效的程序(不需要帮我改我的程序),只要能得到我想要的结果就好,灰常感谢!!
# 2>nul&@Gawk -f %0 drugbank.txt&Exit
BEGIN{printf("ENTRY ATC code Indication Drug_Interactions PATHWAY\n")>>"$Data.txt";A[2]=I[2]=D[2]=P[2]="~"}
END{printf("\n拥有ATC code的药物有%d种\n拥有Drug group的药物有%d种\n拥有Therapeutic category的药物有%d种\n拥有PATHWAY的药物有%d种\n",_A,_I,_D,_P)>>"$Data.txt"}
$1~"///"{
A[2]!="~"?_A++:0;I[2]!="~"?_I++:0;D[2]!="~"?_D++:0 !="~"?_P++:0
printf("%-16s %-15s %-16s %-31s %s\n",E,A[2],I[2],D[2],P[2])>>"$Data.txt"
A[2]=I[2]=D[2]=P[2]="~"
}
$1~"ENTRY"{{split($0,B,"BEGIN_DRUGCARD ");gsub(" ",E[2])}
$0~"ATC code"{split($0,A,"ATC_Codes: ");gsub(" ",",",A[2])}
$0~"Indication:"{split($0,I,"Indication: ");gsub(" ",",",I[2])}
$0~"Drug_Interactions:"&&$0!~"of"{split($0,D,Drug_Interactions:: ");gsub(" ",",",D[2])}
$0~"PATHWAY"{split($0,P,"Pathways: ");gsub(" ",",",P[2])} |
评分
-
查看全部评分
|