@Hip
This is a description on how the software works (actually there is more than one of them). Take a deep breath!
The first one is a Python library that connects to PubMed and submits Queries to abstracts. It then downloads all abstracts to a csv file.
The second software matches several topics of interest to the csv files that have been downloaded from PubMed.
At the moment of writing These topics are :
research_subj=["hmgb1"," p53 "," stat1 "," inositol ","adrenergic receptor","caloric restriction","hepatocyte","baroreceptor"," microglia ","immune response","irritable bowel","chaperone"," hepatic stellate cell "," tgfb1 "," perk ","xbp1","ire1","il-10","protease inhibitor","interferon","mcp-1","vcam-1","cimetidine","cholestasis","anhedonia","interleukin","glycerylphosphorylcholine","selenocysteine"," l-arginine ","vitamin k2"," tyrosine kinase ","beta-alanine","trpv","histone deacetylase"," nitroglycerin ","limbic","insomnia","dysautonomia","dexamethasone","reactive oxygen species","rituximab"," tau ","udp-glcnac"," b6 "," dopamine "," gaba ","mucuna"," enos "," inos "," nnos ","fmo3","5-htp"," oxalate ","human growth hormone"," iron ","autism","adhd","osmolyte","manganese","calcium","magnesium","phosphatidylcholine","asymmetric dimethylarginine"," trimethylamine ","trimethylamine-n-oxide","phospholipid","mitochondrial dysfunction","visceral fat","nafld","acetyl-coa","choline acetyltransferase"," acetylcholine ","advanced glycation end","benfotiamine"," hpa axis ","glycoprotein","mpdu1","dpagt1","hyperlipidemia","o-glcnac","n-acetylglucosamine","hexosamine"," glucosamine ","srd5a3","selenoprotein","probiotic","finasteride","accutane"," bile ","cyp1b1","cyp2d6","floaters","social anxiety","d-limonene","cholecalciferol","pyrroloquinoline","mk-4","grapefruit","nigella","omega","curcumin","lipoic","alcar","carnitine","ashwagandha","resveratrol","omega","zinc","magnesium","manganese","butyric","butyrate","l-tyrosine","5-mthf","gpx1","gpx2","gpx3"," nos1 "," nos2 "," nos3 ","tyrosine hydroxylase","tinnitus","cardiovascular"," choline ","homocysteine","uric acid"," semen ","hypothyroid","mastocytosis","mast cell","mastocytosis","histamine","triiodothyronine","dopamine","nrf-2","serotonin","adrenaline","noradrenaline","epinephrine","monoamine oxidase","estrogen","nerve growth factor","cyp2d6","proteinuria"," ammonia " ,"nitric oxide","phenylalanine","sinusitis","nrf1","nrf2","arrhythmia","pgc-1"," creb ","modafinil","methylphenidate","piracetam"," tyrosine ","sirt3","melatonin"," tryptophan ","ritalin"," mtor ","chronic kidney","glutathione","cancer","nitric oxide","n-linked","acetylcholinesterase","piracetam","scfa","anxiety disorder","cortisol"," amyloid precursor protein ","gsk-3","beta-amyloid"," ttr ","neurodegenerati"," gut ","steatohepatitis","microbiome","acetylated histone","ischemic reperfusion","acetylated histone","mnsod","cuznsod","sod1","sod2","sod3","sod4","hypocortisolism","hypercortisolism"," mrna ","butyrate","calcitonin","tauopath","insulin resistance","sinusitis","personality","hydroxysteroid dehydrogenase","alzheimer","parkinson","hsf-1","cortisol","catecholamine","intestina","insulin","hypoxia","ankyrin","80-kda","ribonuclease","pbmc","37-kda","restless","ugt1a1","immune","caspase","dermatitis","gpr78","glutathione","glucose","apoptosis","diabetes","mitochondri","inflammat","gch1","hepatic","constipat","ca2","liver","lymph","cyp1","cyp2","cyp3","p450","dysautonomia","glutam","cd20","dolichol","interstitial","magnesium","lymph","orthostatic","hyperviscosity","neurologic","depression","autonomic","hsp90","er stress","calcium homeostasis","caspase","aplp1","gelsolin","amyloidosis","p53","retinoic","calcium","dihydroprogesterone","allopregnanolone","androstane","pregnane","dehydrotestosterone","gluten","celiac","neuphropathy","prostate","platelet aggregation","hmg-coa","vitamin d3","ascorbic acid","nitric oxide","cytochrome","oxidative stress","testosterone","mthfr","methylation","peroxynitrite","thyroid","misfolded proteins","unfolded protein response","endoplasmic reticulum","selenium","magnesium","taurine","tudca","resveratrol","hsp70","dht","glycosylation","tetrahydrobiopterin","pregnenolone","progesterone","steroidogenesis","phenylketonuria","estradiol","shbg","free testosterone","hypogonadism","adrenal insufficiency","adrenal hyperplasia","igf1","sirt1"]
The csv files that have been downloaded are for the following PubMed entries :
tetrahydrobiopterin
nitric oxide
phenylalanine
tyrosine
phenylketonuria
dopamine
l-arginine
enos
tyrosine hydroxylase
liver
oxidative stress
dopamine
serotonin
mrna
inflammat
catecholamine
reactive oxygen species
inos
cardiovascular
tryptophan
neurologic
parkinson
diabetes
interferon
calcium
nnos
peroxynitrite
acetylcholine
iron
omega
cytochrome
immune
interleukin
glucose
glutathione
epinephrine
ca2
hepatic
gch1
depression
insulin
mitochondri
apoptosis
lymph
l-tyrosine
glutam
p450
neurodegenerati
adrenaline
ascorbic acid
noradrenaline
zinc
hepatocyte
homocysteine
cancer
intestina
dexamethasone
insulin resistance
chaperone
hypoxia
alzheimer
estrogen
asymmetric dimethylarginine
immune response
5-htp
autism
estradiol
melatonin
nitroglycerin
uric acid
manganese
nerve growth factor
hsp90
nos3
monoamine oxidase
magnesium
thyroid
platelet aggregation
caspase
nos2
autonomic
mitochondrial dysfunction
p53
phospholipid
microglia
resveratrol
hmg-coa
5-mthf
methylation
ammonia
carnitine
arrhythmia
nos1
hyperlipidemia
il-10
glycosylation
mast cell
hsp70
cortisol
proteinuria
endoplasmic reticulum
p53
vcam-1
nrf2
tau
butyric
selenium
glycoprotein
calcium homeostasis
tyrosine kinase
glucosamine
mcp-1
interstitial
b6
personality
advanced glycation end
beta-amyloid
gut
sod1
restless
retinoic
testosterone
choline
stat1
adhd
n-linked
hexosamine
creb
mthfr
hypothyroid
progesterone
sod3
acetylcholinesterase
gaba
adrenergic receptor
mucuna
phosphatidylcholine
cuznsod
histamine
histone deacetylase
n-acetylglucosamine
hydroxysteroid dehydrogenase
protease inhibitor
gpx1
choline acetyltransferase
sirt1
limbic
dht
chronic kidney
pbmc
triiodothyronine
dermatitis
ischemic reperfusion
modafinil
misfolded proteins
sod2
er stress
mtor
inositol
selenocysteine
steroidogenesis
piracetam
acetyl-coa
taurine
bile
lipoic
baroreceptor
ankyrin
celiac
calcitonin
caloric restriction
methylphenidate
vitamin d3
amyloidosis
glycerylphosphorylcholine
udp-glcnac
tgfb1
dysautonomia
tudca
80-kda
allopregnanolone
fmo3
dehydrotestosterone
cholestasis
ttr
finasteride
adrenal hyperplasia
cd20
tinnitus
vitamin k2
insomnia
trimethylamine-n-oxide
alcar
steatohepatitis
gluten
oxalate
mk-4
probiotic
semen
pregnane
scfa
visceral fat
butyrate
irritable bowel
hsf-1
anxiety disorder
floaters
nafld
pregnenolone
tauopath
dihydroprogesterone
cholecalciferol
beta-alanine
dpagt1
osmolyte
trimethylamine
ribonuclease
constipat
gelsolin
sod4
aplp1
hpa axis
mastocytosis
cyp2d6
orthostatic
nigella
grapefruit
srd5a3
social anxiety
mnsod
benfotiamine
pyrroloquinoline
microbiome
rituximab
neuphropathy
gsk-3
37-kda
amyloid precursor protein
nrf1
hypogonadism
d-limonene
gpr78
ashwagandha
dolichol
xbp1
sirt3
hmgb1
ire1
anhedonia
cimetidine
curcumin
trpv
o-glcnac
prostate
cyp1b1
androstane
accutane
hyperviscosity
ugt1a1
sinusitis
acetylated histone
hypocortisolism
hypercortisolism
human growth hormone
perk
free testosterone
hepatic stellate cell
nrf-2
adrenal insufficiency
igf1
mpdu1
ritalin
pgc-1
gpx3
shbg
unfolded protein response
gpx2
selenoprotein
cyp1
cyp2
cyp3
What the software does is to identify the frequency of each Topic within each csv. So for example the Topic TUDCA has the following matches to the csv files :
*********Topic : tudca ***************
tudca.csv : 49.53 %
xbp1.csv : 1.24 %
taurine.csv : 1.15 %
gpr78.csv : 0.95 %
er_stress.csv : 0.83 %
perk.csv : 0.59 %
upr.csv : 0.58 %
ire1.csv : 0.55 %
chaperones.csv : 0.22 %
misfolded_proteins.csv : 0.11 %
cholestasis.csv : 0.11 %
tmao.csv : 0.10 %
systemic_amyloidosis.csv : 0.07 %
hepatocytes.csv : 0.07 %
nafld.csv : 0.06 %
osmolytes.csv : 0.06 %
mitochondrial_dysfunction.csv : 0.04 %
oxidative_stress_markers.csv : 0.04 %
choline_deficiency.csv : 0.04 %
heat_shock_protein.csv : 0.04 %
advanced_glycation_end.csv : 0.03 %
caspase_human.csv : 0.03 %
butyrate.csv : 0.03 %
amyloid.csv : 0.03 %
amyloidosis.csv : 0.02 %
lipoic_acid.csv : 0.02 %
3betahsd.csv : 0.02 %
hmgcoa.csv : 0.02 %
cyp2e1.csv : 0.02 %
hepatotoxicity.csv : 0.02 %
cyp1a2.csv : 0.02 %
inositol.csv : 0.02 %
ros.csv : 0.02 %
excitotoxicity.csv : 0.02 %
dht.csv : 0.02 %
mcp-1.csv : 0.02 %
hydroxysteroid_dehydrogenase.csv : 0.02 %
microglia.csv : 0.01 %
vcam-1.csv : 0.01 %
monosodium_glutamate.csv : 0.01 %
endothelial_nos.csv : 0.01 %
glycosylation.csv : 0.01 %
oxidative_stress_protection.csv : 0.01 %
phosphatidylcholine.csv : 0.01 %
calcium_homeostasis.csv : 0.01 %
p53.csv : 0.01 %
tau.csv : 0.01 %
acetyl-coa.csv : 0.01 %
steatohepatitis.csv : 0.01 %
histone_deacetylase.csv : 0.01 %
uric_acid.csv : 0.01 %
human_proteinuria.csv : 0.01 %
hsp70.csv : 0.01 %
...
...
..
The software also outputs the matchings that are above a cutoff value :
After all topics are searched, we get the following csv file that contains all matchings on CSV files for all topics.
Then we can find the Linear correlation for any given topic with other Topics :
Finally we can execute Association Rule Mining as follows. I use a cutoff value after which a 'T' value is inserted, otherwise a 'F' value is inserted.
This is a snapshot for a cutoff of 0.5. Notice how on Row 3 we have T (=TRUE) values for acetylcholine and the adrenergic receptor. This means that for a Topic searched, these two csv files (acetylcholine.csv and adrenergic_receptor.csv) have matched with a frequency of more than 0.5 %
Next, we run the associations discovery for -say- er stress. If we have a high cutoff value then the results will be more or less known :
[upr=T] 76 == [er_stress=T] 75 conf (0.99) lift 3.45) lev 0.18) conv (27.14)
[gpr78=T] 74 == [er_stress=T] 71 conf (0.96) lift (3.36) lev (0.17) conv (13.21)
[ire1=T] 76 == [er_stress=T] 71 conf (0.93) lift (3.27) lev (0.16) conv (9.05)
So upr.csv and er_stress.csv are frequently matched to a given topic (say TUDCA) with a frequency value of -say- 1%. If we lower the cutoff value to 0.3% we get the following :
[p53=T, mitochondrial_dysfunction=T] 113 == [er_stress=T] 113 conf (1) lift (1.68) lev (0.15) conv (45.8)
[p53=T, peroxynitrite=T] 109 == [er_stress=T] 109 conf (1) lift (1.68) lev (0.15) conv (44.18)
[p53=T, tau=T] 108 == [er_stress=T] 108 conf (1) lift (1.68) lev (0.15) conv (43.77)
[ire1=T] 134 == [er_stress=T] 129 conf (0.96) lift (1.62) lev (0.16) conv (9.05)
[upr=T] 162 == [er_stress=T] 155 conf (0.96) lift (1.61) lev (0.19) conv (8.21)
[p53=T] 125 == [er_stress=T] 116 conf (0.93) lift (1.56) lev (0.14) conv (5.07)
We also have the File Matchings which i will discuss when i come back from Holidays