
Do Botanists Dream of Electric Sheep?
Botany is a tricky subject when it comes to big data. The study of plants, like the study of most biological subjects, has expanded into the genome-mapping realm.
Storing all of the DNA bits of any genome is quite data-intensive, especially when it comes to exotic plant life. Further, while physics and chemistry have for the most part settled on universally accepted terms that split into neat categories when they need to be queried, botany is much more diverse, making it difficult to recall necessary information when conducting a study.
To remedy this situation, Ramona Walls of the New York Botanical Garden and several colleagues across the world of botany have developed an ontological guide for “accessing and analyzing the rapidly growing pool of plant genomic and phenomic data.” Essentially, Walls et al were trying to accomplish two things: providing standard definitions and divvying up these standard definitions such that they can easily be found by computers while cross-referencing them so they can be easily analyzed by computers.
“By providing standardized definitions for the terms used by scientists to represent these classes, and by defining the logical relationships among these terms, ontologies make information about content explicit for computers, allowing them to discover common meaning in diverse data sets.”
A big difference between humans and computers is the ability to understand nuance in language. It is a skill we develop as we learn language for the first time, making it more or less natural and therefore more difficult to teach or, in a computer’s case, program. In this specific case, a trained botanist would know that the words petiole, midrib, and frond are related to the word leaf in that frond is a type of leaf and petiole and midrib are parts of a leaf. A simple computer search engine would not.
This would not be a problem if a researcher could themselves sift through the research to complete a study he or she was doing. But with as many papers that exist in the botanical world and all the data that backs up those papers, it becomes necessary to invoke the computer’s help. For example, as of right now, there are 25 species of plant whose genomes have been completely mapped.
“Data overload is an issue for nearly every branch of plant science. Complete genomes exist for 25 plant species, with more in progress (Joint Genome Institute, 2012), and new high throughput gene expression, proteomics, and phenomics data sets are being generated continuously.”
Walls is not just interested in being able to search easily, but also in being able to do analysis. One of the four key areas the paper identified as a major future uses of ontology was comparative genetics, genomics, phenomics, and development. Big data analytics has been very helpful to many a medical researcher studying human genomes in developing personalized medicine. Walls hopes said analytics can be similarly useful to botany.
So why has botany been relatively slow? According to Walls, it has a lot to do with the exotic nature of the study subjects. “It is common for the same biological entity to have different names in different taxa. For example, vascular leaf may be called ‘frond’ in cycads, ferns, or palms and ‘needle’ in some conifers. In another example, ‘BBCH principal growth stage 6’ is used in a very specialized way by the Z. mays community for flowering stage.”
It should be noted that Walls’s paper is not itself an ontology, but rather a discussion of both the most significant ontologies as well as those ontologies’ importance to botany. The paper especially lauded Plant Ontology (PO) for providing both rigorous and flexible definitions that fit well into a computer model. “The ontology approach embraced by the PO does not, however, seek to impose a single, inflexible vocabulary across the whole of plant science. Rather, its strategy of using ontology terms to enhance existing data through annotations is compatible with an approach that involves the use of multiple terminologies by different communities of scientists.”
As a result of its complexity and diversity, botany was slow to the analytics arena, according to Walls, and the need to incorporate big data systems into it has been increasing with each new study. However, Walls sees the PO being what makes botanical data accessible to computers, allowing her subject to advance with the rest of science.
Related Stories
Researchers Germinate Novel Approach to Big Bio Data
A Big Data Revolution in Astrophysics
Elsevier on the State of Big Science Data