Multi-Task Identification of Entities, Relations, and Coreferencefor Scientific Knowledge Graph Construction

Yi Luan, Luheng He, Mari Ostendorf, Hannaneh Hajishirzi.

Abstract

We introduce a multi-task setup of identifying entities, relations, and coreference clustersin scientific articles. We create SCIERC, adataset that includes annotations for all threetasks and develop a unified framework calledSCIIE with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experimentsshow that our multi-task model outperformsprevious models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledgegraph, which we use to analyze information inscientific literature.

The details can be found in our paper:

Datasets

Check out our raw dataset, our processed dataset (tokenized, in jason format, together with Elmo embeddings), and the annotation guideline.

Our dataset (called SCIERC) includes annotationsfor scientific entities, their relations, and coreference clusters for 500 scientific abstracts. These abstracts are taken from 12 AI conference/workshop proceedings in four AI communities, from the Semantic Scholar Corpus. SCI-ERC extends previous datasets in scientific articles SemEval 2017 Task 10 and SemEval 2018 Task 7 by extending entity types, relation types, relation coverage, and adding cross-sentence relations using coreference links.

An annotation example is as follows:

Code

Our method SciIE is an unified framework for identifying entities, relations, and coreference clusters in scientific articles with shared span representations. Check out our BitBucket Repository.

Application for Knowledge Graph Construction

With SciIE, we are able to extract entity, relation and coreference from large collection of scientific papers. We construct a scientific knowledge graph from a large corpus of scientific articles. The corpus includes all abstracts (110k in total) from 12 AIconference proceedings from the Semantic Scholar Corpus. Nodes in the knowledge graph correspond to scientific entities. Edges correspond to scientific relations between pairs of entities. A part of an automatically constructed scientific knowledge graph is as follows: