top of page

Leveraging Enterprise GenerativeAI capabilities with Knowledge Graph to develop and launch Improv

Problem Statement:

Job seekers face numerous challenges, including limited job openings, intense competition, and outdated skill sets. Upskilling and career changers struggle to identify relevant courses and certifications that align with their goals. The stress and uncertainty of job searching, the risk of being left behind in a rapidly changing job market is real.


While there are lot of point solutions in the market, that helps you identify keywords missing in the job description that should be added in the resume, we are missing a contextual ability to assess candidate skills, aspirations and design a career path which is suitable. There is a need to guide them through on how to approach and apply for suitable jobs that will prepare them for next level.


Improv was launched to help Increase the chances of getting an interview with personalized resume feedback for Job Seekers, by codifying what hiring managers look for in a resume.


Following are the benefits for Job Seekers.

  • Sentences that help you align the resume with the job description

  • Quantifying your experience

  • Identify missing keywords which are required in the job description.

  • Grammatical or spelling mistakes.

  • Resume Summary

  • Email to Hiring Manager


Try Improv for free: Improv


Under the hood of how Improv was developed.


Enterprise generative AI has the potential to revolutionize various industries by enabling organizations to automate content creation, improve decision-making, and streamline operations. One way to achieve this is by combining knowledge graphs (KGs) and large language models (LLMs).


In this blog, we will explore how KGs and LLMs has been used in Improv, discuss key technology software products and libraries that may be required, introduce key technical concepts, and develop a high-level target architecture specification that could be used by developers.

  • Knowledge Graphs (KGs):

A knowledge graph is a database that stores information in the form of a graph, where entities are represented as nodes, and relationships between entities are represented as edges. KGs are particularly useful for storing complex, hierarchical, and interconnected data, which makes them ideal for representing domain-specific knowledge. In the context of enterprise generative AI, KGs can be used to represent entities, their properties, and relationships, enabling machines to reason about the underlying data and generate new insights.

  • Large Language Models (LLMs):

Large language models are deep neural networks trained on vast amounts of text data to generate language outputs that are coherent and natural-sounding. LLMs can be fine-tuned for specific tasks, such as text classification, sentiment analysis, and machine translation. In the context of enterprise generative AI, LLMs can be used to generate text based on given prompts or topics, adapt to different styles and tones, and engage in conversations.

  • Natural Language Processing (NLP):

Natural language processing is a subfield of artificial intelligence that deals with the interaction between computers and human language. NLP is essential for analyzing and understanding natural language inputs and generating meaningful responses. In the context of enterprise generative AI, NLP can be used to analyze prompts, understand the context, and generate relevant responses.

  • Machine Learning (ML):

Machine learning is a subset of artificial intelligence that involves training algorithms on data and using those algorithms to make predictions or take actions. ML is essential for developing predictive models that can identify patterns and correlations in data. In the context of enterprise generative AI, ML can be used to train models that can automatically detect skills, group them against various candidate profiles, and correlate adjacent skills in an automated manner.Key Technology Software Products and Libraries:

  • Knowledge Graph Storage:

Google Cloud Datastore is a fully managed NoSQL database that can be used to store and manage knowledge graphs. It provides efficient storage and querying mechanisms for knowledge graphs, making it an ideal choice for enterprise generative AI initiatives.

  • Large Language Model Management:

Google Vertex is a managed service that enables organizations to deploy, manage, and scale large language models. It provides a variety of pre-trained models that can be fine-tuned for specific tasks, as well as tools for monitoring and optimizing model performance.

  • Natural Language Processing (NLP) Library:

Spacy is a Python library that provides a wide range of NLP tools, including tokenization, POS tagging, named entity recognition, and dependency parsing. It can be used to analyze prompts and understand the context.

  • Machine Learning (ML) Library:

TensorFlow is an open-source machine learning library developed by Google. It provides a wide range of tools and functionalities for developing and training ML models. It can be used to train models that can automatically detect skills, group them against various candidate profiles, and correlate adjacent skills in an automated manner.


Target Architecture Specification: The target architecture for supporting enterprise generative AI with KGs and LLMs comprises the following components:

  • Proprietary Knowledge Graph:

The proprietary knowledge graph is stored in Google Cloud Datastore and represents various entities, their properties, and relationships. The knowledge graph is designed to enable automatic detection of skills, grouping them against various types of candidate profiles, and correlating adjacent skills in an automated manner.

  • Large Language Model (LLM):

The LLM is managed by Google Vertex and is used to generate text based on given prompts or topics. The LLM is trained on a dataset of high-quality text samples and is fine-tuned for specific tasks, such as text classification, sentiment analysis, and machine translation.

  • Natural Language Processing (NLP) Engine:

The NLP engine is built using Stanford CoreNLP and is used to analyze prompts, understand the context, and generate relevant responses.

  • Machine Learning (ML) Model:

The ML model is built using TensorFlow and is used to automatically detect skills, group them against various candidate profiles, and correlate adjacent skills in an automated manner.


  • Data Security and Governance:

The target architecture includes robust security measures to protect sensitive data and ensure compliance with regulatory requirements. Access control policies are implemented to restrict unauthorized access to the knowledge graph and LLM. Data encryption is used to protect data in transit and at rest. Regular backups and disaster recovery processes are also implemented to minimize downtime and data loss.


  • Integration Layer:

The integration layer connects the knowledge graph, LLM, NLP engine, and ML model. It enables seamless communication between the components and ensures that the system works together effectively.


Configuring an API to access a knowledge graph from large language models (LLMs) requires careful consideration of several factors, including data formats, communication protocols, and authentication mechanisms. Here are some steps and technologies that can help accelerate the development process:

  • Choose a data format:

The first step is to select a data format for representing the knowledge graph. Popular choices include JSON, XML, and RDF. JSON is a lightweight and flexible format that is easy to work with, while XML provides more structure and validation capabilities. RDF is a standard for representing and exchanging data on the web and is often used in knowledge graph applications. We choose JSON as format.

  • Select a communication protocol:

Once the data format is chosen, the next step is to select a communication protocol for accessing the knowledge graph. REST (Representational State Transfer) is a popular choice for APIs, as it is simple and widely supported. GraphQL is another option that allows for more flexible queries and can reduce the amount of data transferred. We choose REST.

  • Authenticate requests:

To prevent unauthorized access to the knowledge graph, authentication is necessary. Common authentication methods include OAuth, JWT, and basic auth. OAuth is a popular choice for APIs, as it allows clients to authenticate without sharing their credentials. JWT provides a secure way to transmit information between parties and can be used for authorization. Basic auth is simple but less secure, as it transmits credentials in plaintext. We used OAuth on Google Cloud.

  • Implement request handling:

After configuring the API, the next step is to implement request handling. This involves creating endpoints for retrieving, updating, and deleting knowledge graph data. Endpoints can be created using frameworks like Flask or Django for Python, Express.js for Node.js, or Spring Boot for Java. We used Flask.

  • Utilize libraries and frameworks:

There are several libraries and frameworks available that can accelerate the development process. For example, PyRDF provides a Python interface for working with RDF data, while Apache Spark GraphX offers a scalable platform for processing graph data. We build a properitary knowledge graph database structure built on in-memory capabilities from Redis.

  • Optimize performance:

Finally, it's important to optimize the performance of the API. Caching frequently accessed data, indexing the knowledge graph, and using distributed computing techniques can all help improve performance.


But why did we integrate LLM with Knowledge Graph?


Traditional LLMs, including the ones offered by ChatGPT, Bard are limited in their ability to capture complex relationships between entities and lack domain-specific knowledge. They propose that incorporating KGs into the model can address these limitations by providing a rich source of structured knowledge about entities and their relationships.


The knowledge graph serves as a repository of domain-specific knowledge that can augment the large language model's understanding of text. By integrating the knowledge graph, the model can better comprehend the relationships between entities, leading to improved performance in tasks requiring such understanding.


In other words,


LLMs are good at understanding words and sentences, but they don't know much about the world. KGs, on the other hand, contain lots of information about things and how they relate to each other.


Step by Step Approach for Integrating Knowledge Graph with LLM model


Step 1: Knowledge Graph Preprocessing

  • Start by preprocessing the knowledge graph data to ensure it's in a format that can be easily integrated with the LLM.

  • This may involve converting the knowledge graph data into a matrix or vector representation, removing duplicates or redundant edges, and normalizing the data.

Step 2: LLM Pretraining

  • Next, pretrain the LLM on a large corpus of text data. In our case, it was large corpus of job descriptions and sanitized resumes (without names, contact details).

  • This step is important because it allows the LLM to learn general language representations that can be fine-tuned later for the specific task at hand.

Step 3: Knowledge Graph Embedding

  • After pretraining the LLM, create embeddings for the knowledge graph entities and relations.

  • There are several ways to do this, but one popular method is to use a knowledge graph embedding (KGE) model, such as TransE or DistMult, to map the entities and relations to dense vectors in a high-dimensional space. We used DistMult.

Step 4: Integrating Knowledge Graph Embeddings with LLM

  • Once you have the knowledge graph embeddings, integrate them with the LLM using a fusion strategy.

  • One simple way to do this is to add the knowledge graph embeddings to the input embeddings of the LLM, effectively combining the linguistic information from the text with the semantic information from the knowledge graph.

Step 5: Fine-Tuning the LLM

  • Now that the LLM has been augmented with the knowledge graph embeddings, fine-tune it on a small set of labeled data that contains both text and knowledge graph information.

  • During fine-tuning, the model will learn to align the linguistic information from the text with the semantic information from the knowledge graph, enabling it to perform tasks like textual entailment, question answering, and knowledge grounding.

Step 6: Evaluation and Iteration

  • Evaluate the performance of the integrated LLM on a held-out test set.

  • If the performance is unsatisfactory, iterate on the integration strategy, adjusting parameters, experimenting with different fusion strategies, or even exploring alternative KGE models.

Step 7: Domain Specific Training

  • Finally, train the integrated LLM on a large dataset specifically designed for the task at hand (e.g., question answering, text classification).

  • This step allows the model to further adapt to the task-specific context and refine its performance.

Below is sequence of steps required to achieve the same.

  1. Defining Rules for Representing Information: Define clear rules for representing information in the Knowledge Graph. For example, define entities such as skills, experiences, education, etc., and their respective attributes like dates, locations, etc.

  2. Creating New Sentences: Once the LLM and Knowledge Graph are trained, use them to generate new sentences that demonstrate impact, quantify experience, and present more relevant experience upfront. These sentences can be generated based on the information extracted from the resumes and job descriptions.

  3. Demonstrating Impact: Use the LLM to generate sentences that demonstrate the impact of the candidate's previous work experience. For instance, "Led a team that increased sales by 20% within six months" or "Developed a software application that reduced customer complaints by 50%."

  4. Quantifying Experience: Use the Knowledge Graph to quantify the candidate's experience by generating sentences like "Has over 10 years of experience in project management" or "Managed teams of up to 50 people."

  5. Presenting Relevant Experience Upfront: Use both the LLM and Knowledge Graph to present the most relevant experience upfront. For example, generate sentences like "Gained extensive experience in digital marketing, resulting in a 30% increase in website traffic" or "Spearheaded several successful product launches, increasing revenue by 25%."

  6. Ensuring Accuracy and Consistency: To ensure accuracy and consistency, we have multiple models and human reviewers evaluate the generated sentences. This has helped identify any errors or biases in the output and improve the overall quality of the generated sentences.

By following this approach, organizations can integrate LLM and Knowledge Graph technology to create new sentences that effectively communicate the candidate's qualifications and experience, helping hiring managers make more informed decisions during the hiring process.


Try Improv for free and experience the joy of getting hired faster!


Summary


Combining knowledge graphs and large language models can significantly enhance enterprise generative AI capabilities. By leveraging these technologies, businesses can automate content creation, improve decision-making, and streamline operations. The target architecture specification outlined in this blog provides a comprehensive framework for developers to build and implement enterprise generative AI systems that combine KGs and LLMs. The architecture includes a proprietary knowledge graph, LLM, NLP engine, ML model, and integration layer, all working together to enable automatic detection of skills, grouping them against various candidate profiles, and correlating adjacent skills in an automated manner. The architecture also includes robust security measures to protect sensitive data and ensure compliance with regulatory requirements.

276 views0 comments

Comments


More clics

Never miss an update

Thanks for submitting!

bottom of page