Optimization of Metabolic Pathway DataMining Inference System
Tomas V. Arredondo, Wladimir O. Ormazábal, Diego C. Candel, Werner Creixell
UTFSM
V Escuela Inv. Robotica UTFSM, Valparaiso
1
Introduction
Proposed Model
Experiments
Conclusions
UTFSM
V Escuela Inv. Robotica UTFSM, Valparaiso
2
We wanted to optimize a bioinformatic inference system to generate better hypothesis (h) given a specific Learner (LK) and a specific data set (D)
Identity Gene
Positives BLAST
GenBank
E-value Gaps
Inference System (h)
BitScore UTFSM
V Escuela Inv. Robotica UTFSM, Valparaiso
3
Score (0 to 1)
Our objectives were to have
a general conceptual framework,
the system work unattended during optimization,
that it could be configured for a variety of applications.
UTFSM
V Escuela Inv. Robotica UTFSM, Valparaiso
4
Introduction
Proposed Model
Experiments
Conclusions
UTFSM
V Escuela Inv. Robotica UTFSM, Valparaiso
5
Toward this we define a learner LK as formed by a learning model (modi) and training method (trj) pair: LK = (modi,trj) Where a vector pK, X contains the parameters for LK and PK is a set of γ such vectors: ∀ L k ∃ aP k = { pk, 1 ,p k,2 ,. . .,p k,γ }
And, MK is the set of θ possible model configurations mK, X: ∀ L k ∃ aM k ={ mk,1 ,m k,2 , .. . ,m k,θ }
UTFSM
V Escuela Inv. Robotica UTFSM, Valparaiso
6
From our data set D several disjoint partitions are generated and a training set is selected (Si). T1
T1
T2
T2
T…
T…
Ti
Si
Tr1
Tr1
Tr
Tr
Si = D - Ti
D = T1 U T2 … U Ti U … Tr-1 U Tr 7
UTFSM
V Escuela Inv. Robotica UTFSM, Valparaiso
7
We consider a template as the model configuration and the parameters selected which together with a specific training set generate a hypothesis
LA = (mod , tr) mA,j pA,k Si
UTFSM
hA,j,k V Escuela Inv. Robotica UTFSM, Valparaiso
hA,j,k,i 8
8
hA,j,k
S1
hA,j,k,1
S2 …
hA,j,k,2
Si
hA,j,k,i
… Sr-1
hA,j,k,r-1
Sr
hA,j,k,r 9
UTFSM
V Escuela Inv. Robotica UTFSM, Valparaiso
9
UTFSM
V Escuela Inv. Robotica UTFSM, Valparaiso
10
Introduction
Proposed Model
Experiments
Conclusions
UTFSM
V Escuela Inv. Robotica UTFSM, Valparaiso
11
Genetic Algorithm
Simulated Annealing:
(De Jong, Grefenstette):
•
e^Δx/T
•
Generations: 100
•
Iterations: 100.
•
Individuals: 20
•
Starts: 20.
•
Crossover: Two-point
•
Selection: Tournament w/ 0.95
•
Mutation: 0.01
•
Elite: Si
Particle Swarm Optimization (Shi & Eberhart, Pedersen & Chipperfield):
Stochastic Hillclimbing:
UTFSM
•
Generations: 100
•
Individuals: 20
•
Iterations: 100
•
ω : 0.729
•
Starts: 20
•
φg : 1.49445
•
φp : 1.49445
V Escuela Inv. Robotica UTFSM, Valparaiso
12
Learner
model = FFNN
tr = Backpropagation
η can be static or dynamic (Watkins technical report)
UTFSM
V Escuela Inv. Robotica UTFSM, Valparaiso
13
Methodology Metalearner parameters chosen from standard references and some adhoc meta-tuning. Metalearner function evals set to 2000. Toward the selection of training and test sets, we have applied cross validation random sub-sampling (Bouckaert and Frank, 2004), in a 70% - 30% proportion respectively with 20 averaged runs. NN: we restricted the maximum number of Backpropagation iterations to 20000 and the minimum number to 1000.
UTFSM
V Escuela Inv. Robotica UTFSM, Valparaiso
14
Learner Parameters and Granularities
UTFSM
V Escuela Inv. Robotica UTFSM, Valparaiso
15
MTL-D3
16
UTFSM
V Escuela Inv. Robotica UTFSM, Valparaiso
16
Introduction
Proposed Model
Experiments
Conclusions
UTFSM
V Escuela Inv. Robotica UTFSM, Valparaiso
17
Conclusions
We found that the meta learning approach worked fairly well in this problem, we also tested it in a social network problem to try an determine a user class with fairly good results, with a better randomizer the dynamic cases do seem to have done better than the static case and the GA still seems to be the best approach for this data set,
greater granularity is not necessarily better,
source is available as simmetalib & simgalib.
UTFSM
V Escuela Inv. Robotica UTFSM, Valparaiso
18
Questions?
[email protected]
UTFSM
V Escuela Inv. Robotica UTFSM, Valparaiso
19