LE QUY DON
Technical University
VietnameseClear Cookie - decide language by browser settings

Optimizing genetic algorithm in feature selection for named entity recognition

Thanh, H.L.E. and Van Tran, L. and Nguyen, T.H. and Nguyen, X.H. (2015) Optimizing genetic algorithm in feature selection for named entity recognition. In: 6th International Symposium on Information and Communication Technology, SoICT 2015, 3 December 2015 through 4 December 2015.

Text
Optimizing genetic algorithm in feature selection for named entity recognition.pdf

Download (345kB) | Preview

Abstract

This paper proposes some strategies to reduce the running time of genetic algorithms used in a feature selection task for the problem of named entity recognition. They include: (i) reduction of population size during the evolution process of the genetic algorithm; (ii) parallelization of the fitness computation; and (iii) use of progressive sampling for calculating the optimal sample size of the training data. Maximum Entropy algorithm is then used, as a test classifier, to compute the accuracy of the named entity recognition system with the reduced feature sets identified by the genetic algorithm. Experimental results show that our improved genetic algorithm run three time faster than the standard genetic algorithm, while the accuracy of the named entity recognition system (using Maximum Entropy) on the induced feature subset does not decrease. In addition, the feature subset induced by our improved genetic algorithm is much smaller than the original feature set and has helped Maximum Entropy to achieve higher accuracy than the original one. © 2015 ACM.

Item Type: Conference or Workshop Item (Paper)
Divisions: Faculties > Faculty of Information Technology
Identification Number: 10.1145/2833258.2833262
Uncontrolled Keywords: Entropy; Evolutionary algorithms; Feature extraction; Genetic algorithms; Maximum entropy methods; Population statistics; Sampling; Evolution process; Fitness computation; Maximum entropy algorithms; Named entity recognition; Parallelizations; Population sizes; Progressive sampling; Standard genetic algorithm; Algorithms
Additional Information: Conference code: 119164. Language of original document: English.
URI: http://eprints.lqdtu.edu.vn/id/eprint/9887

Actions (login required)

View Item
View Item