An Entity-Based Algorithm for Multiple-Relation Extraction from Single Sentences

Hao, Q., Keppens, J. and Rodrigues, O.

Journal of Biomedical Informatics ?:.

April 2018

Abstract

A substantial amount of expert Biomedical (and other) knowledge is accumulated in the form of natural language texts, such as peer reviewed journal articles and conference abstracts. Often, such texts establish, substantiate, confirm, question or undermine associations or relationships between entities that are useful for a broad range of applications. Currently, the knowledge contained within a corpus of Biomedical texts is rarely extracted and analysed in a systematic manner, because it is extremely time consuming for humans to do so. Efficient techniques to automatically extract such knowledge would facilitate the development of novel applications that systematically review and reuse the knowledge contained in natural language texts. While such techniques are emerging, they often fail to tackle complex sentence structures properly. They often misinterpret them; fail to recognize embedded relationships; struggle with pronoun references; or have difficulty distinguishing the assertion of a relationship from a conjecture about its existence. In this paper, we present a new entity-based algorithm for multiple-relation extraction from unstructured text obtained in electronic form. The contribution of this paper is threefold. First, our algorithm is able to recognize pronoun references and extract the correct bio-entities they refer to. Second, our algorithm can deal with complex sentence structures. Third, we propose a novel approach to determine whether an extracted relationship should be recorded based on the novel concept of relationship polarity. Our approach offers clear performance advantages over conventional verb-based extraction techniques, achieving a precision of extraction of 91.4% and recall of 94% based on the F-measure over a random sample of sentences from the biomedical literature. The algorithms can also be employed in other knowledge domains by changing the database used to identify entities and verbs of interest.