V Escuela Inv Robotica.pdf

V Escuela Inv. Robotica. UTFSM, Valparaiso. 12. UTFSM. Genetic Algorithm. (De Jong, Grefenstette): •. Generations: 100. •. Individuals: 20. •. Crossover: Two- ...

Descargar PDF

Imágenes PNG

317KB Größe 5 Downloads 92 vistas

comentario

Informe

Optimization of Metabolic Pathway DataMining Inference System

Tomas V. Arredondo, Wladimir O. Ormazábal, Diego C. Candel, Werner Creixell

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

1



Introduction



Proposed Model



Experiments



Conclusions

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

2



We wanted to optimize a bioinformatic inference system to generate better hypothesis (h) given a specific Learner (LK) and a specific data set (D)

Identity Gene

Positives BLAST

GenBank

E-value Gaps

Inference System (h)

BitScore UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

3

Score (0 to 1)



Our objectives were to have 

a general conceptual framework,



the system work unattended during optimization,



that it could be configured for a variety of applications.

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

4



Introduction



Proposed Model



Experiments



Conclusions

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

5





Toward this we define a learner LK as formed by a learning model (modi) and training method (trj) pair: LK = (modi,trj) Where a vector pK, X contains the parameters for LK and PK is a set of γ such vectors: ∀ L k ∃ aP k = { pk, 1 ,p k,2 ,. . .,p k,γ }



And, MK is the set of θ possible model configurations mK, X: ∀ L k ∃ aM k ={ mk,1 ,m k,2 , .. . ,m k,θ }

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

6



From our data set D several disjoint partitions are generated and a training set is selected (Si). T1

T1

T2

T2

T…

T…

Ti

Si

Tr1

Tr1

Tr

Tr

Si = D - Ti

D = T1 U T2 … U Ti U … Tr-1 U Tr 7

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

7



We consider a template as the model configuration and the parameters selected which together with a specific training set generate a hypothesis

LA = (mod , tr) mA,j pA,k Si

UTFSM

hA,j,k V Escuela Inv. Robotica UTFSM, Valparaiso

hA,j,k,i 8

8

hA,j,k

S1

hA,j,k,1

S2 …

hA,j,k,2

Si

hA,j,k,i

… Sr-1

hA,j,k,r-1

Sr

hA,j,k,r 9

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

9

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

10



Introduction



Proposed Model



Experiments



Conclusions

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

11

Genetic Algorithm 

Simulated Annealing:

(De Jong, Grefenstette):

•

e^Δx/T

•

Generations: 100

•

Iterations: 100.

•

Individuals: 20

•

Starts: 20.

•

Crossover: Two-point

•

Selection: Tournament w/ 0.95

•

Mutation: 0.01

•

Elite: Si

Particle Swarm Optimization (Shi & Eberhart, Pedersen & Chipperfield):

Stochastic Hillclimbing:

UTFSM

•

Generations: 100

•

Individuals: 20

•

Iterations: 100

•

ω : 0.729

•

Starts: 20

•

φg : 1.49445

•

φp : 1.49445

V Escuela Inv. Robotica UTFSM, Valparaiso

12

Learner 

model = FFNN



tr = Backpropagation



η can be static or dynamic (Watkins technical report)

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

13

Methodology Metalearner parameters chosen from standard references and some adhoc meta-tuning.  Metalearner function evals set to 2000.  Toward the selection of training and test sets, we have applied cross validation random sub-sampling (Bouckaert and Frank, 2004), in a 70% - 30% proportion respectively with 20 averaged runs.  NN: we restricted the maximum number of Backpropagation iterations to 20000 and the minimum number to 1000. 

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

14

Learner Parameters and Granularities

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

15

MTL-D3

16

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

16



Introduction



Proposed Model



Experiments



Conclusions

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

17

Conclusions 





We found that the meta learning approach worked fairly well in this problem, we also tested it in a social network problem to try an determine a user class with fairly good results, with a better randomizer the dynamic cases do seem to have done better than the static case and the GA still seems to be the best approach for this data set,



greater granularity is not necessarily better,



source is available as simmetalib & simgalib.

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

18

Questions?

[email protected]

UTFSM

V Escuela Inv. Robotica UTFSM, Valparaiso

19