Cautions:
1.0.0
which means not touched since first made.grep
is a very common, often used Unix command, and so naming your program that insures no one will ever be able to find the code or anything related.)This won't be necessary if running from link provided as requirments will already be installed:
If running this elsewhere and getting errors saying cannot import scipy, argparse, numpy, or pandas, then run this code block in a cell in the notebook:
%pip install scipy
%pip install argparse
%pip install numpy
%pip install pandas
#show first few lines of example input
!head example/RA_trans.genes
PANK4 HES5 LOC115110 TNFRSF14 C1orf93 MMEL1 MFAP2 ATP13A2 SDHB PADI2
#show directory contents before run
!ls
data LICENSE.md 'Demonstrate Genome for REPositioning drugs.ipynb' README.md example requirements.txt grep.py update_database
%run grep.py --genelist example/RA_trans.genes --out example --test ATC
#show directory contents to see what was made
!ls
data grep.py 'Demonstrate Genome for REPositioning drugs.ipynb' LICENSE.md example README.md example.ATC.detail.txt requirements.txt example.ATC.large.txt update_database
# show first 30 lines of result file
!head -30 example.ATC.detail.txt
#Group GroupName OddsRatio FisherExactP A01 STOMATOLOGICAL PREPARATIONS 0.0 1.0 A02 DRUGS FOR ACID RELATED DISORDERS 0.0 1.0 A03 DRUGS FOR FUNCTIONAL GASTROINTESTINAL DISORDERS 1.696969696969697 0.45866618350054966 A04 ANTIEMETICS AND ANTINAUSEANTS 0.0 1.0 A05 BILE AND LIVER THERAPY 0.0 1.0 A06 DRUGS FOR CONSTIPATION 0.0 1.0 A07 ANTIDIARRHEALS, INTESTINAL ANTIINFLAMMATORY/ANTIINFECTIVE AGENTS 0.0 1.0 A08 ANTIOBESITY PREPARATIONS, EXCL. DIET PRODUCTS 0.0 1.0 A09 DIGESTIVES, INCL. ENZYMES 0.0 1.0 A10 DRUGS USED IN DIABETES 0.0 1.0 A11 VITAMINS 0.0 1.0 A12 MINERAL SUPPLEMENTS 0.0 1.0 A14 ANABOLIC AGENTS FOR SYSTEMIC USE 0.0 1.0 A16 OTHER ALIMENTARY TRACT AND METABOLISM PRODUCTS 0.0 1.0 B01 ANTITHROMBOTIC AGENTS 1.648989898989899 0.46790118087197186 B02 ANTIHEMORRHAGICS 0.0 1.0 B03 ANTIANEMIC PREPARATIONS 0.0 1.0 B05 BLOOD SUBSTITUTES AND PERFUSION SOLUTIONS 0.0 1.0 B06 OTHER HEMATOLOGICAL AGENTS 0.0 1.0 C01 CARDIAC THERAPY 1.6224662162162162 0.36633761423573746 C02 ANTIHYPERTENSIVES 0.0 1.0 C03 DIURETICS 2.7176308539944904 0.3234969317058849 C04 PERIPHERAL VASODILATORS 3.5258467023172906 0.2632265400754351 C05 VASOPROTECTIVES 0.0 1.0 C07 BETA BLOCKING AGENTS 0.0 1.0 C08 CALCIUM CHANNEL BLOCKERS 2.7176308539944904 0.3234969317058849 C09 AGENTS ACTING ON THE RENIN-ANGIOTENSIN SYSTEM 0.0 1.0 C10 LIPID MODIFYING AGENTS 0.0 1.0 D01 ANTIFUNGALS FOR DERMATOLOGICAL USE 0.0 1.0
# show first 30 lines of other result file
!head -30 example.ATC.large.txt
#Group GroupName OddsRatio FisherExactP A ALIMENTARY TRACT AND METABOLISM 0.4175084175084175 0.9073586755836576 B BLOOD AND BLOOD FORMING ORGANS 0.9292929292929293 0.6667409120705736 C CARDIOVASCULAR SYSTEM 1.1098790322580645 0.52288230144114 D DERMATOLOGICALS 0.6980649872216137 0.7653438054311404 G GENITO URINARY SYSTEM AND SEX HORMONES 0.0 1.0 H SYSTEMIC HORMONAL PREPARATIONS, EXCL. SEX HORMONES AND INSULINS 0.0 1.0 J ANTIINFECTIVES FOR SYSTEMIC USE 0.0 1.0 L ANTINEOPLASTIC AND IMMUNOMODULATING AGENTS 3.7395833333333335 0.0016006128913752943 M MUSCULO-SKELETAL SYSTEM 1.4580792682926829 0.41483225119822686 N NERVOUS SYSTEM 0.3700581978727674 0.9307597129365626 P ANTIPARASITIC PRODUCTS, INSECTICIDES AND REPELLENTS 0.0 1.0 R RESPIRATORY SYSTEM 2.3784119106699753 0.1517209311834251 S SENSORY ORGANS 0.7069475240206947 0.761171048831641 V VARIOUS 0.0 1.0
Make a list of genes with each gene on a separate line. Like so:
ABCC10
ABI2
ATXN2
BGLAP
BLM
BMPR2
BNIP1
BRAP
BZRAP1
BZRAP1-AS1
CDKN1A
COL4A2
CORO1C
CREBRF
CRIP3
CTLA4
CTTNBP2NL
CUX2
CYLC2
F11
FAM109A
FAM216A
FAM222A
FAM222A-AS1
FARP1
FES
FGA
FGB
FGG
FICD
FOXC1
FOXF2
FOXN4
FOXQ1
FURIN
GIT2
GLTP
GMDS
GMNN
HTRA1
HVCN1
ICA1L
IFT81
ILF3
ILF3-AS1
IQCD
ISCU
KCNK3
KCTD10
KDM4A
KLKB1
LDLR
LPA
LRCH1
MAN2A2
MAPKAPK5
MAPKAPK5-AS1
MIR142
MIR22
MIR22HG
MIR638
MMAB
MMP12
MMP13
MMP3
MPO
MTMR4
MVK
MYL2
MYO1H
NAA25
NBEAL1
NEDD4
NKX2-5
NOP58
OAS1
OAS2
OAS3
OBFC1
PAQR6
PDE3A
PITX2
PLEKHA1
PLG
PLRG1
PMF1
PMF1-BGLAP
PPM1E
PPP1CC
PPTC7
PRPF8
PRRX1
PTPN11
PTPRF
QTRT1
RAB44
RAD51C
RAD9B
RAPH1
RASAL1
RGS7
RILP
RNF43
RPH3A
RPL6
SART3
SBF2
SBF2-AS1
SCARF1
SELPLG
SEPT4
SH2B3
SH3PXD2A
SLC22A7
SMARCA4
SMG5
SNORD11
SNORD11B
SNORD70
SSH1
ST7L
SUMO1
SUPT4H1
TMEM119
TPCN1
TRAFD1
TRIM37
TRPV4
TSPAN2
TTBK1
UBE3B
UNG
USP30
ZNF474
Normally, you'd make such a file with a text editor; however, here we'll use Python. A text editor is critical; use SublimeText, or Atom, or VScode. Nothing else, besides VSCode, made by Microsoft will do this. This is a common tripping point, as novices aren't away Microsoft Word and Wordpad on their computer aren't suitable for this.
s='''ABCC10
ABI2
ATXN2
BGLAP
BLM
BMPR2
BNIP1
BRAP
BZRAP1
BZRAP1-AS1
CDKN1A
COL4A2
CORO1C
CREBRF
CRIP3
CTLA4
CTTNBP2NL
CUX2
CYLC2
F11
FAM109A
FAM216A
FAM222A
FAM222A-AS1
FARP1
FES
FGA
FGB
FGG
FICD
FOXC1
FOXF2
FOXN4
FOXQ1
FURIN
GIT2
GLTP
GMDS
GMNN
HTRA1
HVCN1
ICA1L
IFT81
ILF3
ILF3-AS1
IQCD
ISCU
KCNK3
KCTD10
KDM4A
KLKB1
LDLR
LPA
LRCH1
MAN2A2
MAPKAPK5
MAPKAPK5-AS1
MIR142
MIR22
MIR22HG
MIR638
MMAB
MMP12
MMP13
MMP3
MPO
MTMR4
MVK
MYL2
MYO1H
NAA25
NBEAL1
NEDD4
NKX2-5
NOP58
OAS1
OAS2
OAS3
OBFC1
PAQR6
PDE3A
PITX2
PLEKHA1
PLG
PLRG1
PMF1
PMF1-BGLAP
PPM1E
PPP1CC
PPTC7
PRPF8
PRRX1
PTPN11
PTPRF
QTRT1
RAB44
RAD51C
RAD9B
RAPH1
RASAL1
RGS7
RILP
RNF43
RPH3A
RPL6
SART3
SBF2
SBF2-AS1
SCARF1
SELPLG
SEPT4
SH2B3
SH3PXD2A
SLC22A7
SMARCA4
SMG5
SNORD11
SNORD11B
SNORD70
SSH1
ST7L
SUMO1
SUPT4H1
TMEM119
TPCN1
TRAFD1
TRIM37
TRPV4
TSPAN2
TTBK1
UBE3B
UNG
USP30
ZNF474
'''
%store s >mygenes.txt
Writing 's' (str) to file 'mygenes.txt'.
#run with my genes as input
%run grep.py --genelist mygenes.txt --out amazing_results --test ATC
Can make the list of gene names using the bash shel
!echo SUMO1 > another_genelist.txt
!echo SUPT4H1 >> another_genelist.txt
!echo TMEM119 >> another_genelist.txt
!echo TPCN1 >> another_genelist.txt
!echo TRAFD1 >> another_genelist.txt
!echo TRIM37 >> another_genelist.txt
!echo TRPV4 >> another_genelist.txt
!echo TSPAN2 >> another_genelist.txt
!echo TTBK1 >> another_genelist.txt
!echo UBE3B >> another_genelist.txt
OR another way
#to set up demonstrate another way, first delete last example
!rm another_genelist.txt
%%bash
myArray=("SUMO1" "SUPT4H1" "TMEM119" "TPCN1" "TRAFD1" "TRIM37" "TRPV4" "TSPAN2" "TTBK1" "UBE3B")
for str in ${myArray[@]}; do
echo $str >> another_genelist.txt
done
#run with another set of genes as input
%run grep.py --genelist mygenes.txt --out another_genelist --test ATC
Enjoy!