Enhancing AI coding assistants with context using RAG and SEM-RAG

Home / Blog /

6 minutes /

May 27, 2024

Table of contents

Refining LLMs with RAG
How RAG works in coding assistants
Bolstering RAG with semantic memory
Augmenting code quality and developer productivity with AI

Basic AI coding assistants, while helpful, often fall short of delivering the most relevant and contextually accurate code suggestions due to their reliance on a general understanding of software languages and the most common patterns for writing software. These coding assistants generate code that is suited to solve the problems they are tasked with but are often not aligned with an individual team’s coding standards, conventions and styles. This typically results in a need to modify or refine their recommendations in order for the code to be accepted into the application.

AI coding assistants typically function by leaning on the knowledge contained within a specific large language model (LLM), and applying general coding principles across various scenarios. Therefore, typical AI assistants often lack the ability to understand the specific context of a project, leading to suggestions that, while syntactically correct, might not align with a team’s unique guidelines, expected approach or architectural designs. The LLMs that underpin generative AI systems operate based on a fixed set of training data, which does not evolve dynamically as the project progresses. This static approach can result in mismatches between the generated code and the project’s current state or requirements, necessitating developers to make further manual adjustments.

Refining LLMs with RAG

There is a misunderstanding that AI assistants simply interact with an LLM to generate the results a user is looking for. Whether you are generating text, images or code, the best AI assistants use a complex set of guidelines to ensure that what the user asks for (e.g., a software function that accomplishes a specific task) and what gets generated (a Java function, in the correct version, with the correct parameters for the application) are aligned.

One of the proven techniques to get the best outputs from any LLM is to provide additional context with the prompt. This approach, referred to as retrieval-augmented generation (RAG), has become a critical component of chatbots, AI assistants and agents that successfully serve enterprise use cases.

AI coding assistants, like all generative AI tools, use LLMs as the foundation for code generation. Bringing highly tailored RAG to coding assistants enables them to generate code that is of higher quality and more closely aligned with a company’s existing codebase and engineering standards.

In the realm of chatbots, RAG considers existing data available in structured and unstructured formats. Through either full-text or semantic search, it retrieves just enough context and injects it into the prompt sent to the LLM.

An AI coding assistant can use a similar (albeit more complex) approach, retrieving context from the existing codebase through the integrated development environment. A high-performing AI coding assistant can crawl the project workspace to access additional context from the current file, open files, Git history, logs, project metadata and even connected Git repositories.

RAG empowers AI coding assistants to provide highly relevant and precise results by considering specific aspects of the project, such as existing APIs, frameworks and coding patterns. Instead of offering generic solutions, the AI assistant tailors its guidance to align with the project’s established practices, like suggesting database connections consistent with the current implementation or providing code recommendations that seamlessly incorporate private APIs. By utilizing RAG, the assistant can even generate test functions that mirror the structure, style and syntax of your existing tests, ensuring that the code is both contextually accurate and aligned with the project’s requirements.

This approach can result in unparalleled personalization that is easy for developers to immediately accept.

How RAG works in coding assistants

Let’s take a look at the steps involved in implementing RAG for coding assistants.

The first phase is indexing and storing. Initially, when the coding assistant is installed and integrated into the development environment, it performs a search and identifies all relevant documents that can add context. It then splits each document into chunks and sends them to an embedding model. The embedding model is responsible for converting each chunk into a vector without losing its semantic representation. The resulting vectors are stored in a vector database for future retrieval. Periodically, the coding assistant may scan the workspace and add documents to the vector database.

During the next phase, coding, a developer may create a comment or use the chat assistant to generate a specific function. The assistant uses the prompt to perform a similarity search on the previously indexed collection stored in the vector database. The outcome of this search is retrieved and used to augment the prompt with relevant context. When the LLM receives the enhanced prompt along with the context, it generates a code snippet that is aligned with the code already present in the context.

Applying RAG to coding assistants improves the performance, accuracy and acceptability of LLM-generated code. It significantly enhances the tool’s utility and reduces the time developers spend rewriting or adjusting the AI-generated code. Having a direct alignment with the project’s existing codebase leads to higher accuracy in code suggestions and greatly improves developer productivity and code quality.

“Using an AI coding assistant that is not sufficiently aware of your existing code base and your coding standards is like hiring a well-trained software engineer off the street: helpful and well-intentioned, but likely to create code that needs to be modified to fit your application. When you layer in the right level of context — including local files, the codebase of the project or company, and relevant non-code sources of information — it’s like having a senior engineer with years of experience in your company sitting alongside your developers,” said Peter Guagenti, president of Tabnine. “The numbers prove this out. Tabnine users who allow us to use their existing code as context accept 40% more code recommendations without modifications. That number climbs even higher when Tabnine is connected to a company’s entire repository.”

This is one-way RAG addresses the limitations of scalability and adaptability that hamper traditional coding assistants. As projects grow and evolve, RAG-equipped tools continuously learn and adapt, refining their suggestions based on new patterns and information gleaned from the codebase. This ability to evolve makes RAG a highly robust tool in a dynamic development environment.

Bolstering RAG with semantic memory

Semantic retrieval-augmented generation (SEM-RAG) is an advanced iteration of RAG techniques tailored to extend RAG’s accuracy and contextualization. It enhances coding assistants by using semantic memory instead of vector search to integrate semantic understanding into the retrieval process.

Unlike traditional RAG, which primarily relies on vector space models for retrieving relevant code snippets, SEM-RAG employs a more nuanced semantic indexing approach. This method leverages static analysis to deeply understand the structure and semantics of the codebase, identifying relationships and dependencies within the code elements.

For instance, SEM-RAG can analyze import statements in languages like Java and TypeScript, enabling it to pull in contextually relevant code elements from libraries — even when direct access to the source code is not available. This capability allows SEM-RAG to comprehend and utilize imported libraries’ bytecode, effectively using these insights to enrich the context provided to the language model.

While traditional RAG significantly improves the relevance of code suggestions by matching vectorized representations of code snippets with queries, it sometimes lacks the depth to fully grasp the semantic nuances of complex software projects. SEM-RAG addresses this limitation by focusing on the semantic relationships within the code, resulting in a more precise alignment with the project’s coding practices. For example, by understanding the relationships and dependencies defined in a project’s architecture, SEM-RAG can offer suggestions that are not only contextually accurate but also architecturally coherent. This enhances performance by generating code that integrates seamlessly with existing systems, reducing the likelihood of introducing errors or inconsistencies.

SEM-RAG’s approach to treating code as interconnected elements rather than isolated snippets allows deeper contextualization than traditional RAG provides. This depth of understanding promotes a higher degree of automation in coding tasks, particularly in complex domains where interdependencies within the codebase are critical. Therefore, SEM-RAG not only maintains all the benefits of traditional RAG but also surpasses it in environments where understanding the deeper semantic and structural aspects of code is paramount. This makes SEM-RAG an invaluable tool for large-scale and enterprise-level software development, where maintaining architectural integrity is as important as code correctness.

Augmenting code quality and developer productivity with AI

Choosing an AI coding assistant that incorporates contextual awareness through advanced techniques like RAG and SEM-RAG marks a transformative step in the evolution of software development tools. By embedding a deep understanding of the codebase’s context, these assistants significantly enhance the accuracy, relevance and performance of the code they generate. This contextual integration helps ensure that suggestions are not only syntactically correct but also align with your specific coding standards, architectural frameworks and project-specific nuances, effectively reducing the gap between AI-generated code and human expertise.

RAG-enabled AI assistants significantly increase developer productivity and improve code quality. Developers can rely on these enhanced AI assistants to generate code that not only fits the task but that fits seamlessly into the larger project context, thereby minimizing the need for revisions and accelerating the development cycle. By automating more aspects of coding with a high degree of precision, these context-aware coding assistants are setting new standards in software development, pushing towards a future where AI tools understand and adapt to the complex dynamics of project environments as comprehensively as the developers themselves.

This article was originally posted on The New Stack.

Enhancing AI coding assistants with context using RAG and SEM-RAG

Refining LLMs with RAG

AI coding agents level up from helpers to team players

Tabnine Changelog: November 2024

Takeaways from AWS re:Invent 2024

ChatGPT vs. Tabnine: Why AI code assistants are so much more than LLMs

How Tabnine adapts to your organization

AI technology: Powering the AI revolution

Other categories

Enhancing AI coding assistants with context using RAG and SEM-RAG

Refining LLMs with RAG

How RAG works in coding assistants

Bolstering RAG with semantic memory

Augmenting code quality and developer productivity with AI

AI coding agents level up from helpers to team players

Tabnine Changelog: November 2024

Takeaways from AWS re:Invent 2024

ChatGPT vs. Tabnine: Why AI code assistants are so much more than LLMs

How Tabnine adapts to your organization

AI technology: Powering the AI revolution

Other categories

Subscribe to the Tabnine newsletter