| Application of Machine Learning (ML) in Polymorphism discovery.
In a polymorphism discovery project many candidate SNP are detected.
Each of these SNP must be expertly evaluated and classified as true
or false. ML was applied to expedite this manual step by filtering
majority of the false predictions. The ML program C4.5 was applied
to a set of carefully chosen features to build a SNP classifier
from a training dataset along with expert decisions. The ML classifier
has 97% overall accuracy (i.e., fraction of candidate SNP that were
correctly classified) and 85 % positive predictive value (i.e.,
fraction of candidate polymorphisms being real).
ML has so far been only applied to polymorphism discovery from
soybean amplified STS analyzed with PolyBayes. However, the optimized
ML feature set and ML framework can be applied to other instances
of polymorphism discovery.
|