U.S. Department of Energy

Pacific Northwest National Laboratory

NOUS: Incremental Maintenance of Knowledge Graphs

A knowledge graph is a repository of information about entities, where entities can be anything of interest such as people, location, organization or even scientific topics, concepts etc.  An entity is frequently characterized by its association with other entities.  As an example, capturing the knowledge about a company involves describing its products, listing key individuals, its locations etc. Most big data applications today focus on a few core tasks such as classification (which entities are being observed), link prediction (what is the likelihood of an emerging relationship), trend analysis and search (is a pattern observed in the data).  As a source of contextual information, Knowledge Graphs provide major improvements to these tasks through improved accuracy and better interpretation of results. However, graph construction is an expensive process that traditionally involves a large manual effort.  Today, the four V’s of Big Data (Volume, Velocity, Veracity and Variety) all amplify this challenge and motivate research to break this barrier.

Approach

Most databases that serve as a source of background information store text datasets.  Therefore, we have focused primarily on constructing knowledge graphs from natural language text. We employ state of the art natural language processing techniques to parse text documents and extract named entities and their relationships a process that involves embracing the entire data deluge on its face.  We seek to circumvent this problem by calling on a human expert on a case-by-case basis depending on the noisiness or ambiguity in the data. 

Benefits

Applications that need to scale on very large data sources will be primary benefactor of this approach.  An example application will be monitoring emerging technical trends by processing the entire volume of scientific articles published in a niche area.  Entities in such knowledge graphs are authors, institutions, scientific concepts and relationships range from simple.

https://github.com/streaming-graphs/NOUS

 

| Pacific Northwest National Laboratory