MARTT: Automatic Markup of Taxonomic Descriptions with XML

Authors

  • Hong Cui University of Western Ontario

DOI:

https://doi.org/10.29173/cais277

Abstract

Despite the sub-language nature of taxonomic descriptions of animals and plants, researchers have warned about the existence of large variations among different description collections in terms of information content and its representation. These variations impose a serious threat to the development of automatic tools to structure large volumes of text-based descriptions. This paper presents a general approach to mark up different collections of taxonomic descriptions with XML, using two large-scale floras as examples. The markup system, MARTT, is based on machine learning methods and enhanced by machine learned domain rules and conventions. Experiments show that our simple and efficient machine learning algorithms outperform significantly general purpose algorithms and that rules learned from one flora can be used when marking up a second flora and help to improve the markup performance, especially for elements that have sparse training examples.

Malgré la nature de sous-langage des descriptions taxinomiques des animaux et des plantes, les chercheurs reconnaissent l’existence de vastes variations parmi différentes collections de descriptions, en termes de contenu informationnel et de leur représentation. Ces variations présentent une menace sérieuse pour le développement d’outils automatiques pour la structuration de larges… 

Downloads

Published

2013-10-19

How to Cite

Cui, H. (2013). MARTT: Automatic Markup of Taxonomic Descriptions with XML. Proceedings of the Annual Conference of CAIS Actes Du congrès Annuel De l’ACSI. https://doi.org/10.29173/cais277

Issue

Section

Articles