V Escuela Inv Robotica.pdf

V Escuela Inv. Robotica. UTFSM, Valparaiso. 12. UTFSM. Genetic Algorithm. (De Jong, Grefenstette): •. Generations: 100. •. Individuals: 20. •. Crossover: Two- ...
317KB Größe 5 Downloads 92 vistas
Optimization of Metabolic Pathway DataMining Inference System

Tomas V. Arredondo, Wladimir O. Ormazábal, Diego C. Candel, Werner Creixell

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

1



Introduction



Proposed Model



Experiments



Conclusions

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

2



We wanted to optimize a bioinformatic inference system to generate better hypothesis (h) given a specific Learner (LK) and a specific data set (D)

Identity Gene

Positives BLAST

GenBank

E-value Gaps

Inference System (h)

BitScore UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

3

Score (0 to 1)



Our objectives were to have 

a general conceptual framework,



the system work unattended during optimization,



that it could be configured for a variety of applications.

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

4



Introduction



Proposed Model



Experiments



Conclusions

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

5





Toward this we define a learner LK as formed by a learning model (modi) and training method (trj) pair: LK = (modi,trj) Where a vector pK, X contains the parameters for LK and PK is a set of γ such vectors: ∀ L k ∃ aP k = { pk, 1 ,p k,2 ,. . .,p k,γ }



And, MK is the set of θ possible model configurations mK, X: ∀ L k ∃ aM k ={ mk,1 ,m k,2 , .. . ,m k,θ }

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

6



From our data set D several disjoint partitions are generated and a training set is selected (Si). T1

T1

T2

T2

T…

T…

Ti

Si

  Tr­1

  Tr­1

Tr

Tr

Si = D - Ti

D = T1 U T2 … U Ti U … Tr-1 U Tr 7

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

7



We consider a template as the model configuration and the parameters selected which together with a specific training set generate a hypothesis

LA = (mod , tr) mA,j pA,k Si

UTFSM

hA,j,k V Escuela Inv. Robotica UTFSM, Valparaiso

hA,j,k,i 8

8

hA,j,k

S1

hA,j,k,1

S2 …

hA,j,k,2

Si

hA,j,k,i

… Sr-1

hA,j,k,r-1

Sr

hA,j,k,r 9

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

9

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

10



Introduction



Proposed Model



Experiments



Conclusions

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

11

Genetic Algorithm 

Simulated Annealing:

(De Jong, Grefenstette):



e^Δx/T



Generations: 100



Iterations: 100.



Individuals: 20



Starts: 20.



Crossover: Two-point



Selection: Tournament w/ 0.95



Mutation: 0.01



Elite: Si

Particle Swarm Optimization (Shi & Eberhart, Pedersen & Chipperfield):

Stochastic Hillclimbing:

UTFSM



Generations: 100



Individuals: 20



Iterations: 100



ω : 0.729



Starts: 20



φg : 1.49445



φp : 1.49445

V Escuela Inv. Robotica UTFSM, Valparaiso

12

Learner 

model = FFNN



tr = Backpropagation



η can be static or dynamic (Watkins technical report)

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

13

Methodology Metalearner parameters chosen from standard references and some adhoc meta-tuning.  Metalearner function evals set to 2000.  Toward the selection of training and test sets, we have applied cross validation random sub-sampling (Bouckaert and Frank, 2004), in a 70% - 30% proportion respectively with 20 averaged runs.  NN: we restricted the maximum number of Backpropagation iterations to 20000 and the minimum number to 1000. 

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

14

Learner Parameters and Granularities

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

15

MTL-D3

16

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

16



Introduction



Proposed Model



Experiments



Conclusions

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

17

Conclusions 





We found that the meta learning approach worked fairly well in this problem, we also tested it in a social network problem to try an determine a user class with fairly good results, with a better randomizer the dynamic cases do seem to have done better than the static case and the GA still seems to be the best approach for this data set,



greater granularity is not necessarily better,



source is available as simmetalib & simgalib.

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

18

Questions?

[email protected]

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

19