_content/blog/llmpowered: new blog post
Change-Id: Ibf76dee6ca01f9a190b8d8c96da9cba822b0f1f6 Reviewed-on: https://go-review.googlesource.com/c/website/+/610562 Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Cameron Balahan <cbalahan@google.com> Reviewed-by: Eli Bendersky <eliben@google.com> TryBot-Bypass: Eli Bendersky <eliben@google.com>
This commit is contained in:
Родитель
85b8b36749
Коммит
03edfca046
|
@ -0,0 +1,212 @@
|
|||
---
|
||||
title: Building LLM-powered applications in Go
|
||||
date: 2024-09-12
|
||||
by:
|
||||
- Eli Bendersky
|
||||
tags:
|
||||
- llm
|
||||
- ai
|
||||
- network
|
||||
summary: LLM-powered applications in Go using Gemini, langchaingo and Genkit
|
||||
---
|
||||
|
||||
As the capabilities of LLMs (Large Language Models) and adjacent tools like
|
||||
embedding models grew significantly over the past year, more and more developers
|
||||
are considering integrating LLMs into their applications.
|
||||
|
||||
Since LLMs often require dedicated hardware and significant compute resources,
|
||||
they are most commonly packaged as network services that provide APIs for
|
||||
access. This is how the APIs for leading LLMs like OpenAI or Google Gemini work;
|
||||
even run-your-own-LLM tools like [Ollama](https://ollama.com/) wrap
|
||||
the LLM in a REST API for local consumption. Moreover, developers who take
|
||||
advantage of LLMs in their applications often require supplementary tools like
|
||||
Vector Databases, which are most commonly deployed as network services as
|
||||
well.
|
||||
|
||||
In other words, LLM-powered applications are a lot like other modern
|
||||
cloud-native applications: they require excellent support for REST and RPC
|
||||
protocols, concurrency and performance. These just so happen to be the areas
|
||||
where Go excels, making it a fantastic language for writing LLM-powered
|
||||
applications.
|
||||
|
||||
This blog post works through an example of using Go for a simple LLM-powered
|
||||
application. It starts by describing the problem the demo application is
|
||||
solving, and proceeds by presenting several variants of the application that
|
||||
all accomplish the same task, but use different packages to implement it. All
|
||||
the code for the demos of this post
|
||||
[is available online](https://github.com/golang/example/tree/master/ragserver).
|
||||
|
||||
## A RAG server for Q&A
|
||||
|
||||
A common LLM-powered application technique is RAG -
|
||||
[Retrieval Augmented Generation](https://en.wikipedia.org/wiki/Retrieval-augmented_generation).
|
||||
RAG is one of the most scalable ways of customizing an LLM's knowledge base
|
||||
for domain-specific interactions.
|
||||
|
||||
We're going to build a *RAG server* in Go. This is an HTTP server that provides
|
||||
two operations to users:
|
||||
|
||||
* Add a document to the knowledge base
|
||||
* Ask an LLM a question about this knowledge base
|
||||
|
||||
In a typical real-world scenario, users would add a corpus of documents to
|
||||
the server, and proceed to ask it questions. For example, a company can fill up
|
||||
the RAG server's knowledge base with internal documentation and use it to
|
||||
provide LLM-powered Q&A capabilities to internal users.
|
||||
|
||||
Here's a diagram showing the interactions of our server with the external
|
||||
world:
|
||||
|
||||
<div class="image"><div class="centered">
|
||||
<figure>
|
||||
<img src="llmpowered/rag-server-diagram.png" alt="RAG server diagram"/>
|
||||
</figure>
|
||||
</div></div>
|
||||
|
||||
In addition to the user sending HTTP requests (the two operations described
|
||||
above), the server interacts with:
|
||||
|
||||
* An embedding model to calculate [vector embeddings](https://en.wikipedia.org/wiki/Sentence_embedding)
|
||||
for the submitted documents and for user questions.
|
||||
* A Vector Database for storing and retrieving embeddings efficiently.
|
||||
* An LLM for asking questions based on context collected from the knowledge
|
||||
base.
|
||||
|
||||
Concretely, the server exposes two HTTP endpoints to users:
|
||||
|
||||
`/add/: POST {"documents": [{"text": "..."}, {"text": "..."}, ...]}`: submits
|
||||
a sequence of text documents to the server, to be added to its knowledge base.
|
||||
For this request, the server:
|
||||
|
||||
1. Calculates a vector embedding for each document using the embedding model.
|
||||
2. Stores the documents along with their vector embeddings in the vector DB.
|
||||
|
||||
`/query/: POST {"content": "..."}`: submits a question to the server. For this
|
||||
request, the server:
|
||||
|
||||
1. Calculates the question's vector embedding using the embedding model.
|
||||
2. Uses the vector DB's similarity search to find the most relevant documents
|
||||
to the question in the knowledge database.
|
||||
3. Uses simple prompt engineering to reformulate the question with the most
|
||||
relevant documents found in step (2) as context, and sends it to the LLM,
|
||||
returning its answer to the user.
|
||||
|
||||
The services used by our demo are:
|
||||
|
||||
* [Google Gemini API](https://ai.google.dev/) for the LLM and embedding model.
|
||||
* [Weaviate](https://weaviate.io/) for a locally-hosted vector DB; Weaviate
|
||||
is an open-source vector database
|
||||
[implemented in Go](https://github.com/weaviate/weaviate).
|
||||
|
||||
It should be very simple to replace these by other, equivalent services. In
|
||||
fact, this is what the second and third variants of the server are all about!
|
||||
We'll start with the first variant which uses these tools directly.
|
||||
|
||||
## Using the Gemini API and Weaviate directly
|
||||
|
||||
Both the Gemini API and Weaviate have convenient Go SDKs (client libraries),
|
||||
and our first server variant uses these directly. The full code of this
|
||||
variant is [in this directory](https://github.com/golang/example/tree/master/ragserver/ragserver).
|
||||
|
||||
We won't reproduce the entire code in this blog post, but here are some notes
|
||||
to keep in mind while reading it:
|
||||
|
||||
**Structure**: the code structure will be familiar to anyone who's written an
|
||||
HTTP server in Go. Client libraries for Gemini and for Weaviate are initialized
|
||||
and the clients are stored in a state value that's passed to HTTP handlers.
|
||||
|
||||
**Route registration**: the HTTP routes for our server are trivial to set up
|
||||
using the [routing enhancements](/blog/routing-enhancements) introduced in
|
||||
Go 1.22:
|
||||
|
||||
```Go
|
||||
mux := http.NewServeMux()
|
||||
mux.HandleFunc("POST /add/", server.addDocumentsHandler)
|
||||
mux.HandleFunc("POST /query/", server.queryHandler)
|
||||
```
|
||||
|
||||
**Concurrency**: the HTTP handlers of our server reach out
|
||||
to other services over the network and wait for a response. This isn't a problem
|
||||
for Go, since each HTTP handler runs concurrently in its own goroutine. This
|
||||
RAG server can handle a large number of concurrent requests, and the code of
|
||||
each handler is linear and synchronous.
|
||||
|
||||
**Batch APIs**: since an `/add/` request may provide a large number of documents
|
||||
to add to the knowledge base, the server leverages *batch APIs* for both
|
||||
embeddings (`embModel.BatchEmbedContents`) and the Weaviate DB
|
||||
(`rs.wvClient.Batch`) for efficiency.
|
||||
|
||||
## Using LangChain for Go
|
||||
|
||||
Our second RAG server variant uses LangChainGo to accomplish the same task.
|
||||
|
||||
[LangChain](https://www.langchain.com/) is a popular Python framework for
|
||||
building LLM-powered applications.
|
||||
[LangChainGo](https://github.com/tmc/langchaingo) is its Go equivalent. The
|
||||
framework has some tools to build applications out of modular components, and
|
||||
supports many LLM providers and vector databases in a common API. This allows
|
||||
developers to write code that may work with any provider and change providers
|
||||
very easily.
|
||||
|
||||
The full code for this variant is [in this directory](https://github.com/golang/example/tree/master/ragserver/ragserver-langchaingo).
|
||||
You'll notice two things when reading the code:
|
||||
|
||||
First, it's somewhat shorter than the previous variant. LangChainGo takes care
|
||||
of wrapping the full APIs of vector databases in common interfaces, and less
|
||||
code is needed to initialize and deal with Weaviate.
|
||||
|
||||
Second, the LangChainGo API makes it fairly easy to switch providers. Let's say
|
||||
we want to replace Weaviate by another vector DB; in our previous variant, we'd
|
||||
have to rewrite all the code interfacing the vector DB to use a new API. With
|
||||
a framework like LangChainGo, we no longer need to do so. As long as LangChainGo
|
||||
supports the new vector DB we're interested in, we should be able to replace
|
||||
just a few lines of code in our server, since all the DBs implement a
|
||||
[common interface](https://pkg.go.dev/github.com/tmc/langchaingo@v0.1.12/vectorstores#VectorStore):
|
||||
|
||||
```Go
|
||||
type VectorStore interface {
|
||||
AddDocuments(ctx context.Context, docs []schema.Document, options ...Option) ([]string, error)
|
||||
SimilaritySearch(ctx context.Context, query string, numDocuments int, options ...Option) ([]schema.Document, error)
|
||||
}
|
||||
```
|
||||
|
||||
## Using Genkit for Go
|
||||
|
||||
Earlier this year, Google introduced [Genkit for Go](https://developers.googleblog.com/en/introducing-genkit-for-go-build-scalable-ai-powered-apps-in-go/) -
|
||||
a new open-source framework for building LLM-powered applications. Genkit shares
|
||||
some characteristics with LangChain, but diverges in other aspects.
|
||||
|
||||
Like LangChain, it provides common interfaces that may be implemented by
|
||||
different providers (as plugins), and thus makes switching from one to the other
|
||||
simpler. However, it doesn't try to prescribe how different LLM components
|
||||
interact; instead, it focuses on production features like prompt management and
|
||||
engineering, and deployment with integrated developer tooling.
|
||||
|
||||
Our third RAG server variant uses Genkit for Go to accomplish the same task.
|
||||
Its full code is [in this directory](https://github.com/golang/example/tree/master/ragserver/ragserver-genkit).
|
||||
|
||||
This variant is fairly similar to the LangChainGo one - common interfaces for
|
||||
LLMs, embedders and vector DBs are used instead of direct provider APIs, making
|
||||
it easier to switch from one to another. In addition, deploying an LLM-powered
|
||||
application to production is much easier with Genkit; we don't implement this
|
||||
in our variant, but feel free to read [the documentation](https://firebase.google.com/docs/genkit-go/get-started-go)
|
||||
if you're interested.
|
||||
|
||||
## Summary - Go for LLM-powered applications
|
||||
|
||||
The samples in this post provide just a taste of what's possible for building
|
||||
LLM-powered applications in Go. It demonstrates how simple it is to build
|
||||
a powerful RAG server with relatively little code; most important, the samples
|
||||
pack a significant degree of production readiness because of some fundamental
|
||||
Go features.
|
||||
|
||||
Working with LLM services often means sending REST or RPC requests to a network
|
||||
service, waiting for the response, sending new requests to other services based
|
||||
on that and so on. Go excels at all of these, providing great tools for managing
|
||||
concurrency and the complexity of juggling network services.
|
||||
|
||||
In addition, Go's great performance and reliability as a Cloud-native language
|
||||
makes it a natural choice for implementing the more fundamental building blocks
|
||||
of the LLM ecosystem. For some examples, see projects like
|
||||
[Ollama](https://ollama.com/), [LocalAI](https://localai.io/),
|
||||
[Weaviate](https://weaviate.io/) or [Milvus](https://zilliz.com/what-is-milvus).
|
Двоичный файл не отображается.
После Ширина: | Высота: | Размер: 17 KiB |
Загрузка…
Ссылка в новой задаче