Enterprise knowledge management models are vital for enterprises managing growing data volumes. It helps capture, store, and share knowledge, improving decision-making and efficiency. A key challenge is linking unstructured data, which includes emails, documents, and media, unlike structured data found in spreadsheets or databases. Gartner estimates that 80% of today’s data is unstructured, often untapped by enterprises. Without integrating this data into the knowledge ecosystem, businesses miss valuable insights. Knowledge graphs address this by linking unstructured data, improving search functions, decision-making, efficiency, and fostering innovation.
How Do Knowledge Graphs (KG) Work?
A knowledge graph is a structured representation where entities (nodes) and relationships (edges) are clearly defined. For instance, in an enterprise knowledge management model, knowledge graphs can link structured data like customer purchases with unstructured data like reviews or social media posts, providing a holistic view of customer behavior. A well-known example is Google’s Knowledge Graph, which enhances search functions by linking structured and unstructured data for contextual results.
Linking Unstructured Data in Knowledge Graphs
While structured data can be seamlessly integrated into knowledge graphs, unstructured data requires more finesse. The process begins by gathering structured data from databases and ensuring consistency in formats. For unstructured data, techniques like NLP extract text, while image recognition handles multimedia. After cleaning and standardizing the data, key entities and relationships are identified to structure the graph. Finally, data interoperability is achieved by developing a schema and linking datasets for seamless integration.
In the healthcare sector, linking patient records (unstructured data) with clinical reports (structured data) via knowledge graphs can enable personalized treatment recommendations. For instance, combining historical data from patient records with research on medical conditions can help clinicians identify optimal treatment pathways. Retail businesses can use knowledge graphs to merge structured sales data with unstructured customer reviews and social media mentions, creating a holistic view of customer preferences and market trends. This insight helps businesses customize product offerings and improve customer experience.
Building Knowledge Graphs from Unstructured Data
There are many techniques and methods to build knowledge graphs from unstructured text.
Automating Knowledge Graph Construction from Unstructured Text Using LLMs
LLMs can be utilized to automate the creation of knowledge graphs from unstructured text data, a task that traditionally required significant manual effort.
LLMs extract nodes and edges from text, transforming it into structured data. To manage large texts, they are split into smaller, overlapping chunks to maintain context. A predefined list of node types ensures consistency in naming similar entities. During the process of entity disambiguation, LLMs identify and merge duplicate entities across different chunks, creating a unified representation. Once extracted, the data is formatted into CSV files and imported into Neo4j, a graph database platform designed for interconnected data. Neo4j’s Data Importer allows for a final review before constructing a fully queryable knowledge graph.
For example, suppose an electronics seller receives product reviews for electronics like laptops, phones, and tablets. The LLM extracts information like:
Products: “Laptop,” “Mobile Phone,” “Tablet”
Properties: “Brand” (e.g., Apple, Samsung), “Features” (e.g., Touchscreen, 5G), “Reviews” (e.g., ratings)
If they have “Samsung Galaxy S23” mentioned multiple times, the LLM recognizes that “Samsung Galaxy S23” and “Samsung S23” are the same product. It merges these into a single entry, combining all related details. Finally, the data is imported into Neo4j, creating a knowledge graph to easily track product reviews, features, and trends.
LLMs offer scalability and can process vast amounts of unstructured data efficiently. By automating entity extraction and relationship building, they significantly reduce manual efforts. This makes LLM-based methods better suited for enterprises with large-scale datasets, where speed and automation are critical.
Automating Knowledge Graph Construction from Unstructured Text Using NLP Techniques and RDF Models
To create a knowledge graph, text is first collected from various sources such as documents and websites. Ambiguous terms are clarified through entity disambiguation, which replaces vague pronouns with specific terms using tools like spaCy. Named Entity Recognition (NER) identifies key entities such as names and places, while entity linking connects these entities to a knowledge base for additional context. Relationships between entities are then extracted as triples (subject-predicate-object statements) to illustrate connections. Finally, these relationships are stored in a knowledge graph using RDF (Resource Description Framework), allowing for easy querying and analysis.
Consider this example: When analyzing medical records about “cardiac amyloidosis,” abstracts from PubMed are collected using Biopython, a Python library for biological computation. Ambiguous terms like “it” are clarified through entity disambiguation, replacing them with specific terms like “High fever.” NER then identifies key entities such as “cardiac amyloidosis” and links them to a knowledge base like the National Cancer Institute Thesaurus (NCIT). Relationships, such as “cardiac amyloidosis is associated with amyloid proteins,” are extracted using the triple extraction REBEL model. Finally, these relationships are organized into a knowledge graph using RDF, resulting in a structured and easily queryable format.
While LLMs offer scalability and automation, they might lack the precision needed for domain-specific data. NLP techniques and RDF models, provides better control over entity recognition and relationship extraction, ensuring higher accuracy in the final knowledge graph. The use of RDF ensures semantic consistency and is more reliable in complex data scenarios, making it better suited for environments that prioritize accuracy over speed, such as regulatory compliance or highly specialized fields.
Enhancing RAG with Knowledge Graphs for Improved Information Retrieval
Standard RAG systems can struggle with finding connections between different text chunks because they handle each part of the text separately. By using a knowledge graph, these text chunks can be linked together, making it easier for the system to understand and retrieve related information using a vector index. This connection helps in answering queries more accurately by recognizing relationships between different pieces of data.
Another enhancement involves converting text into triples (simple statements like “X is related to Y”) using models like REBEL and storing these triples in a graph database. This approach improves efficiency by making it faster and easier to retrieve and analyze information.
Consider a financial department facing challenges in tracking purchase orders, policy compliance, and financial performance. For instance, the department needs to determine why a particular purchase order was delayed and ensure it aligns with internal policies. AI Fortune Cookie, a secure enterprise knowledge management model addresses these issues by using knowledge graphs to link text chunks from different sources—such as purchase orders, policy documents, and financial reports. This interconnected data allows the system to trace relationships, revealing, for example, that a purchase order delay occurred due to a missing approval outlined in a specific policy document.
Using RAG, the platform efficiently retrieves relevant information from the knowledge graph, ensuring that the full context is considered in every query. The system converts the information into triples, such as “Purchase Order X is delayed due to Policy Y,” and stores them in a graph database, making it easy for employees to find linked information quickly. By streamlining the retrieval of connected data, AI Fortune Cookie helps resolve issues more efficiently, ensuring policy compliance and improving financial performance management.
Enhancing RAG with knowledge graphs takes information retrieval a step further by linking text chunks in a cohesive way, improving contextual understanding and retrieval accuracy. Instead of treating text chunks separately, it uses a knowledge graph to connect relationships, making it better at answering queries with a holistic view. By converting unstructured text into triples and linking them through RAG, this approach ensures faster and more accurate information retrieval than the previous two methods, especially in dynamic enterprise environments where real-time data access is critical.
At Random Walk, we help you leverage enterprise knowledge management models with tailored AI integration services. Contact us for a personalized consultation for a demo on AI Fortune Cookie, our data visualization tool using generative AI, to manage and visualize your structured and unstructured enterprise data for efficient operations.