Comtegra GPU Cloud RAG

Comtegra GPU Cloud RAG is an advanced AI system leveraging Retrieval-Augmented Generation (RAG) to enhance Large Language Model (LLM) performance by integrating knowledge from various sources. This system automates the transformation of documentation into an AI-ready knowledge base using GitLab CI/CD, embedding models, and the Weaviate vector database. Powered by our in-house LLM Inference API, it provides accurate, context-aware responses accessible through the web, audio and chat interfaces. It allows to integrate with all kinds of web services, bots and applications.

What is RAG?

RAG, or Retrieval-Augmented Generation, is an AI framework that enhances the performance of large language models (LLMs) by incorporating external knowledge bases. It combines traditional information retrieval with generative LLMs to provide more accurate, up-to-date, and contextually relevant text generation.

Components

CGC User Docs

The process begins with the CGC User Docs repository. This repository contains the Markdown source files for the documentation published at docs.cgc.comtegra.cloud

When updates are pushed to the main branch, a CI/CD pipeline in GitLab is automatically triggered.

See Docs

GitLab

When a new version of CGC User Docs is pushed, the CI/CD pipeline builds and releases a new web service, cleans the docs from metadata, and builds AI ready knowledge base that can be pushed to vector database.

Learn more

Embedding Models

Embedding models convert textual data into vector embeddings. This process is crucial for populating the knowledge base, transforming documents into a format the RAG system can understand and query.

During user interaction, these models also embed incoming queries. The resulting query embeddings are then used to search the knowledge base and retrieve the most relevant information to answer the user's question.

Learn more

Weaviate

AI-ready, enterprise vector database that allows to use multiple external embedding models. Connected through Comtegra LLM API for easier integration. It allows to easily create and utilize collections of documents for many different languages.

Application developers don't need to worry about user queries embeddings as they are handled by the database itself.

Learn more

LLM Inference API

In-house, Open-Source LLM inference API that allows to use multiple models on a single endpoint. Compatible with OpenAI API asure easines of use with existing and new AI applications and databases.

It supports endpoints for chat completions, models, embeddings, and audio transcriptions.

Learn more

CGC Web

One of the endpoints creates/rebuilds a collection of our documents from the JSON sent by GitLab. It takes a collection name and a list of objects to batch insert them into Weaviate. cgc-web also allows users to interact with our created RAG on the `/chat` endpoint.

You are here

Why These Components?

The architecture of Comtegra GPU Cloud RAG is intentionally designed to offer a powerful yet adaptable solution. Our selection of components is guided by a commitment to openness, flexibility, and ease of use, empowering you to build and scale with confidence:

Freedom from Vendor Lock-In

We prioritize open-source technologies and standardized interfaces. Components like GitLab for CI/CD, Weaviate (an open-source vector database), and our in-house, Open-Sourced LLM Inference API ensure you retain control. Use your favorite inference engines like SGLang, Lamma.cpp, vLLM or NVIDIA NIMs.

You have the freedom to adapt, modify, and extend the system as your needs evolve, without being tied to proprietary ecosystems.

Ease of Use & Integration

We believe powerful AI shouldn't require excessive complexity. GitLab automates the knowledge base creation pipeline. Weaviate simplifies development by handling query embeddings internally. Our LLM Inference API offers OpenAI compatibility, making integration with existing applications and the development of new ones straightforward, whether for web services, bots, or other applications. The ability to choose various Embedding Models also means you can select the best fit without a steep learning curve for a proprietary system.

Unparalleled Flexibility

This modular approach means each part of the RAG pipeline can be optimized or even swapped out if your requirements change. You can choose different embedding models, connect to various data sources beyond documentation, and deploy diverse LLMs via our Inference API. The CGC Web (or your custom application layer) serves as a flexible interface, demonstrating how these components can be seamlessly integrated to deliver RAG capabilities through web, audio, and chat interfaces, tailored to your specific use case.

Ultimately, this component strategy empowers you to build sophisticated, reliable, and future-proof RAG solutions tailored to your unique requirements, without compromising on control or adaptability.

Contact Us!

location logo

ul. Pulawska 474

02-884 Warszawa

phone logo

+48 22 311 18 00

Join our newsletter

Created and maintained by Comtegra S.A.