The Random Walk Blog

2024-09-23

Linking Unstructured Data in Knowledge Graphs for Enterprise Knowledge Management

Linking Unstructured Data in Knowledge Graphs for Enterprise Knowledge Management

Enterprise knowledge management models are vital for enterprises managing growing data volumes. It helps capture, store, and share knowledge, improving decision-making and efficiency. A key challenge is linking unstructured data, which includes emails, documents, and media, unlike structured data found in spreadsheets or databases. Gartner estimates that 80% of today’s data is unstructured, often untapped by enterprises. Without integrating this data into the knowledge ecosystem, businesses miss valuable insights. Knowledge graphs address this by linking unstructured data, improving search functions, decision-making, efficiency, and fostering innovation.

How Do Knowledge Graphs (KG) Work?

A knowledge graph is a structured representation where entities (nodes) and relationships (edges) are clearly defined. For instance, in an enterprise knowledge management model, knowledge graphs can link structured data like customer purchases with unstructured data like reviews or social media posts, providing a holistic view of customer behavior. A well-known example is Google’s Knowledge Graph, which enhances search functions by linking structured and unstructured data for contextual results.

Linking Unstructured Data in Knowledge Graphs

While structured data can be seamlessly integrated into knowledge graphs, unstructured data requires more finesse. The process begins by gathering structured data from databases and ensuring consistency in formats. For unstructured data, techniques like NLP extract text, while image recognition handles multimedia. After cleaning and standardizing the data, key entities and relationships are identified to structure the graph. Finally, data interoperability is achieved by developing a schema and linking datasets for seamless integration.

In the healthcare sector, linking patient records (unstructured data) with clinical reports (structured data) via knowledge graphs can enable personalized treatment recommendations. For instance, combining historical data from patient records with research on medical conditions can help clinicians identify optimal treatment pathways. Retail businesses can use knowledge graphs to merge structured sales data with unstructured customer reviews and social media mentions, creating a holistic view of customer preferences and market trends. This insight helps businesses customize product offerings and improve customer experience.

Building Knowledge Graphs from Unstructured Data

There are many techniques and methods to build knowledge graphs from unstructured text.

Automating Knowledge Graph Construction from Unstructured Text Using LLMs

LLMs can be utilized to automate the creation of knowledge graphs from unstructured text data, a task that traditionally required significant manual effort.

LLMs extract nodes and edges from text, transforming it into structured data. To manage large texts, they are split into smaller, overlapping chunks to maintain context. A predefined list of node types ensures consistency in naming similar entities. During the process of entity disambiguation, LLMs identify and merge duplicate entities across different chunks, creating a unified representation. Once extracted, the data is formatted into CSV files and imported into Neo4j, a graph database platform designed for interconnected data. Neo4j’s Data Importer allows for a final review before constructing a fully queryable knowledge graph.

knowledge graph creation.svg

Source: Mayerhofer, Noah. Construct Knowledge Graphs From Unstructured Text

For example, suppose an electronics seller receives product reviews for electronics like laptops, phones, and tablets. The LLM extracts information like:

Products: “Laptop,” “Mobile Phone,” “Tablet”

Properties: “Brand” (e.g., Apple, Samsung), “Features” (e.g., Touchscreen, 5G), “Reviews” (e.g., ratings)

If they have “Samsung Galaxy S23” mentioned multiple times, the LLM recognizes that “Samsung Galaxy S23” and “Samsung S23” are the same product. It merges these into a single entry, combining all related details. Finally, the data is imported into Neo4j, creating a knowledge graph to easily track product reviews, features, and trends.

LLMs offer scalability and can process vast amounts of unstructured data efficiently. By automating entity extraction and relationship building, they significantly reduce manual efforts. This makes LLM-based methods better suited for enterprises with large-scale datasets, where speed and automation are critical.

Automating Knowledge Graph Construction from Unstructured Text Using NLP Techniques and RDF Models

To create a knowledge graph, text is first collected from various sources such as documents and websites. Ambiguous terms are clarified through entity disambiguation, which replaces vague pronouns with specific terms using tools like spaCy. Named Entity Recognition (NER) identifies key entities such as names and places, while entity linking connects these entities to a knowledge base for additional context. Relationships between entities are then extracted as triples (subject-predicate-object statements) to illustrate connections. Finally, these relationships are stored in a knowledge graph using RDF (Resource Description Framework), allowing for easy querying and analysis.

knowledge graph and NLP.svg

Source: Creating knowledge graphs from unstructured text, Faircookbook.

Consider this example: When analyzing medical records about “cardiac amyloidosis,” abstracts from PubMed are collected using Biopython, a Python library for biological computation. Ambiguous terms like “it” are clarified through entity disambiguation, replacing them with specific terms like “High fever.” NER then identifies key entities such as “cardiac amyloidosis” and links them to a knowledge base like the National Cancer Institute Thesaurus (NCIT). Relationships, such as “cardiac amyloidosis is associated with amyloid proteins,” are extracted using the triple extraction REBEL model. Finally, these relationships are organized into a knowledge graph using RDF, resulting in a structured and easily queryable format.

While LLMs offer scalability and automation, they might lack the precision needed for domain-specific data. NLP techniques and RDF models, provides better control over entity recognition and relationship extraction, ensuring higher accuracy in the final knowledge graph. The use of RDF ensures semantic consistency and is more reliable in complex data scenarios, making it better suited for environments that prioritize accuracy over speed, such as regulatory compliance or highly specialized fields.

Enhancing RAG with Knowledge Graphs for Improved Information Retrieval

Standard RAG systems can struggle with finding connections between different text chunks because they handle each part of the text separately. By using a knowledge graph, these text chunks can be linked together, making it easier for the system to understand and retrieve related information using a vector index. This connection helps in answering queries more accurately by recognizing relationships between different pieces of data.

knowledge graphs and RAGs.svg

Source: Salihoğlu, Semih. RAG using unstructured data and the role of knowledge graphs.

Another enhancement involves converting text into triples (simple statements like “X is related to Y”) using models like REBEL and storing these triples in a graph database. This approach improves efficiency by making it faster and easier to retrieve and analyze information.

RAG system enhancement.svg

Source: Salihoğlu, Semih. RAG using unstructured data and the role of knowledge graphs.

Consider a financial department facing challenges in tracking purchase orders, policy compliance, and financial performance. For instance, the department needs to determine why a particular purchase order was delayed and ensure it aligns with internal policies. AI Fortune Cookie, a secure enterprise knowledge management model addresses these issues by using knowledge graphs to link text chunks from different sources—such as purchase orders, policy documents, and financial reports. This interconnected data allows the system to trace relationships, revealing, for example, that a purchase order delay occurred due to a missing approval outlined in a specific policy document.

Using RAG, the platform efficiently retrieves relevant information from the knowledge graph, ensuring that the full context is considered in every query. The system converts the information into triples, such as “Purchase Order X is delayed due to Policy Y,” and stores them in a graph database, making it easy for employees to find linked information quickly. By streamlining the retrieval of connected data, AI Fortune Cookie helps resolve issues more efficiently, ensuring policy compliance and improving financial performance management.

Enhancing RAG with knowledge graphs takes information retrieval a step further by linking text chunks in a cohesive way, improving contextual understanding and retrieval accuracy. Instead of treating text chunks separately, it uses a knowledge graph to connect relationships, making it better at answering queries with a holistic view. By converting unstructured text into triples and linking them through RAG, this approach ensures faster and more accurate information retrieval than the previous two methods, especially in dynamic enterprise environments where real-time data access is critical.

At Random Walk, we help you leverage enterprise knowledge management models with tailored AI integration services. Contact us for a personalized consultation for a demo on AI Fortune Cookie, our data visualization tool using generative AI, to manage and visualize your structured and unstructured enterprise data for efficient operations.

Related Blogs

I Built an AI Agent From Scratch—Here’s What I Learned

I’ve worked with LangChain. I’ve played with LlamaIndex. They’re great—until they aren’t.

I Built an AI Agent From Scratch—Here’s What I Learned

How Can Enterprises Benefit from Generative AI in Data Visualization

It’s New Year’s Eve, and John, a data analyst, is finishing up a fun party with his friends. Feeling tired and eager to relax, he looks forward to unwinding. But as he checks his phone, a message from his manager pops up: “Is the dashboard ready for tomorrow’s sales meeting?” John’s heart sinks. The meeting is in less than 12 hours, and he’s barely started on the dashboard. Without thinking, he quickly types back, “Yes,” hoping he can pull it together somehow. The problem? He’s exhausted, and the thought of combing through a massive 1000-row CSV file to create graphs in Excel or Tableau feels overwhelming. Just when he starts to panic, he remembers his secret weapon: Fortune Cookie, the AI-assistant that can turn data into insightful data visualizations in no time. Relieved, John knows he doesn’t have to break a sweat. Fortune Cookie has him covered, and the dashboard will be ready in no time.

How Can Enterprises Benefit from Generative AI in Data Visualization

Streamlining File Management with MindFolder’s Intelligent Edge

Brain rot, the 2024 Word of the Year, perfectly encapsulates the overwhelming state of mental fatigue caused by endless information overload—a challenge faced by individuals and businesses alike in today’s fast-paced digital world. At its core, this term highlights the need for streamlined systems that simplify the way we interact with data and files.

Streamlining File Management with MindFolder’s Intelligent Edge

Refining and Creating Data Visualizations with LIDA and AI Fortune Cookie

Data visualization and storytelling are critical for making sense of today’s data-rich world. Whether you’re an analyst, a researcher, or a business leader, translating raw data into actionable insights often hinges on effective tools. Two innovative platforms that elevate this process are Microsoft’s LIDA and our RAG-enhanced data visualization platform using gen AI, AI Fortune Cookie. While LIDA specializes in refining and enhancing infographics, Fortune Cookie transforms disparate datasets into cohesive dashboards with the power of natural language prompts. Together, they form a powerful combination for visual storytelling and data-driven decision-making.

Refining and Creating Data Visualizations with LIDA and AI Fortune Cookie

1-bit LLMs: The Future of Efficient and Accessible Enterprise AI

As data grows, enterprises face challenges in managing their knowledge systems. While Large Language Models (LLMs) like GPT-4 excel in understanding and generating text, they require substantial computational resources, often needing hundreds of gigabytes of memory and costly GPU hardware. This poses a significant barrier for many organizations, alongside concerns about data privacy and operational costs. As a result, many enterprises find it difficult to utilize the AI capabilities essential for staying competitive, as current LLMs are often technically and financially out of reach.

1-bit LLMs: The Future of Efficient and Accessible Enterprise AI
I Built an AI Agent From Scratch—Here’s What I Learned

I Built an AI Agent From Scratch—Here’s What I Learned

I’ve worked with LangChain. I’ve played with LlamaIndex. They’re great—until they aren’t.

How Can Enterprises Benefit from Generative AI in Data Visualization

How Can Enterprises Benefit from Generative AI in Data Visualization

It’s New Year’s Eve, and John, a data analyst, is finishing up a fun party with his friends. Feeling tired and eager to relax, he looks forward to unwinding. But as he checks his phone, a message from his manager pops up: “Is the dashboard ready for tomorrow’s sales meeting?” John’s heart sinks. The meeting is in less than 12 hours, and he’s barely started on the dashboard. Without thinking, he quickly types back, “Yes,” hoping he can pull it together somehow. The problem? He’s exhausted, and the thought of combing through a massive 1000-row CSV file to create graphs in Excel or Tableau feels overwhelming. Just when he starts to panic, he remembers his secret weapon: Fortune Cookie, the AI-assistant that can turn data into insightful data visualizations in no time. Relieved, John knows he doesn’t have to break a sweat. Fortune Cookie has him covered, and the dashboard will be ready in no time.

Streamlining File Management with MindFolder’s Intelligent Edge

Streamlining File Management with MindFolder’s Intelligent Edge

Brain rot, the 2024 Word of the Year, perfectly encapsulates the overwhelming state of mental fatigue caused by endless information overload—a challenge faced by individuals and businesses alike in today’s fast-paced digital world. At its core, this term highlights the need for streamlined systems that simplify the way we interact with data and files.

Refining and Creating Data Visualizations with LIDA and AI Fortune Cookie

Refining and Creating Data Visualizations with LIDA and AI Fortune Cookie

Data visualization and storytelling are critical for making sense of today’s data-rich world. Whether you’re an analyst, a researcher, or a business leader, translating raw data into actionable insights often hinges on effective tools. Two innovative platforms that elevate this process are Microsoft’s LIDA and our RAG-enhanced data visualization platform using gen AI, AI Fortune Cookie. While LIDA specializes in refining and enhancing infographics, Fortune Cookie transforms disparate datasets into cohesive dashboards with the power of natural language prompts. Together, they form a powerful combination for visual storytelling and data-driven decision-making.

1-bit LLMs: The Future of Efficient and Accessible Enterprise AI

1-bit LLMs: The Future of Efficient and Accessible Enterprise AI

As data grows, enterprises face challenges in managing their knowledge systems. While Large Language Models (LLMs) like GPT-4 excel in understanding and generating text, they require substantial computational resources, often needing hundreds of gigabytes of memory and costly GPU hardware. This poses a significant barrier for many organizations, alongside concerns about data privacy and operational costs. As a result, many enterprises find it difficult to utilize the AI capabilities essential for staying competitive, as current LLMs are often technically and financially out of reach.

Additional

Your Random Walk Towards AI Begins Now