_content/blog/llmpowered: new blog post

Change-Id: Ibf76dee6ca01f9a190b8d8c96da9cba822b0f1f6 Reviewed-on: https://go-review.googlesource.com/c/website/+/610562 Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Cameron Balahan <cbalahan@google.com> Reviewed-by: Eli Bendersky <eliben@google.com> TryBot-Bypass: Eli Bendersky <eliben@google.com>
2024-09-06 05:55:08 -07:00 · 2024-09-06 05:55:08 -07:00 · 03edfca046
--- a/_content/blog/llmpowered.md
+++ b/_content/blog/llmpowered.md
@ -0,0 +1,212 @@
+---
+title: Building LLM-powered applications in Go
+date: 2024-09-12
+by:
+- Eli Bendersky
+tags:
+- llm
+- ai
+- network
+summary: LLM-powered applications in Go using Gemini, langchaingo and Genkit
+---
+
+As the capabilities of LLMs (Large Language Models) and adjacent tools like
+embedding models grew significantly over the past year, more and more developers
+are considering integrating LLMs into their applications.
+
+Since LLMs often require dedicated hardware and significant compute resources,
+they are most commonly packaged as network services that provide APIs for
+access. This is how the APIs for leading LLMs like OpenAI or Google Gemini work;
+even run-your-own-LLM tools like [Ollama](https://ollama.com/) wrap
+the LLM in a REST API for local consumption. Moreover, developers who take
+advantage of LLMs in their applications often require supplementary tools like
+Vector Databases, which are most commonly deployed as network services as
+well.
+
+In other words, LLM-powered applications are a lot like other modern
+cloud-native applications: they require excellent support for REST and RPC
+protocols, concurrency and performance. These just so happen to be the areas
+where Go excels, making it a fantastic language for writing LLM-powered
+applications.
+
+This blog post works through an example of using Go for a simple LLM-powered
+application. It starts by describing the problem the demo application is
+solving, and proceeds by presenting several variants of the application that
+all accomplish the same task, but use different packages to implement it. All
+the code for the demos of this post
+[is available online](https://github.com/golang/example/tree/master/ragserver).
+
+## A RAG server for Q&A
+
+A common LLM-powered application technique is RAG -
+[Retrieval Augmented Generation](https://en.wikipedia.org/wiki/Retrieval-augmented_generation).
+RAG is one of the most scalable ways of customizing an LLM's knowledge base
+for domain-specific interactions.
+
+We're going to build a *RAG server* in Go. This is an HTTP server that provides
+two operations to users:
+
+* Add a document to the knowledge base
+* Ask an LLM a question about this knowledge base
+
+In a typical real-world scenario, users would add a corpus of documents to
+the server, and proceed to ask it questions. For example, a company can fill up
+the RAG server's knowledge base with internal documentation and use it to
+provide LLM-powered Q&A capabilities to internal users.
+
+Here's a diagram showing the interactions of our server with the external
+world:
+
+<div class="image"><div class="centered">
+<figure>
+<img src="llmpowered/rag-server-diagram.png" alt="RAG server diagram"/>
+</figure>
+</div></div>
+
+In addition to the user sending HTTP requests (the two operations described
+above), the server interacts with:
+
+* An embedding model to calculate [vector embeddings](https://en.wikipedia.org/wiki/Sentence_embedding)
+  for the submitted documents and for user questions.
+* A Vector Database for storing and retrieving embeddings efficiently.
+* An LLM for asking questions based on context collected from the knowledge
+  base.
+
+Concretely, the server exposes two HTTP endpoints to users:
+
+`/add/: POST {"documents": [{"text": "..."}, {"text": "..."}, ...]}`: submits
+a sequence of text documents to the server, to be added to its knowledge base.
+For this request, the server:
+
+1. Calculates a vector embedding for each document using the embedding model.
+2. Stores the documents along with their vector embeddings in the vector DB.
+
+`/query/: POST {"content": "..."}`: submits a question to the server. For this
+request, the server:
+
+1. Calculates the question's vector embedding using the embedding model.
+2. Uses the vector DB's similarity search to find the most relevant documents
+   to the question in the knowledge database.
+3. Uses simple prompt engineering to reformulate the question with the most
+   relevant documents found in step (2) as context, and sends it to the LLM,
+   returning its answer to the user.
+
+The services used by our demo are:
+
+* [Google Gemini API](https://ai.google.dev/) for the LLM and embedding model.
+* [Weaviate](https://weaviate.io/) for a locally-hosted vector DB; Weaviate
+  is an open-source vector database
+  [implemented in Go](https://github.com/weaviate/weaviate).
+
+It should be very simple to replace these by other, equivalent services. In
+fact, this is what the second and third variants of the server are all about!
+We'll start with the first variant which uses these tools directly.
+
+## Using the Gemini API and Weaviate directly
+
+Both the Gemini API and Weaviate have convenient Go SDKs (client libraries),
+and our first server variant uses these directly. The full code of this
+variant is [in this directory](https://github.com/golang/example/tree/master/ragserver/ragserver).
+
+We won't reproduce the entire code in this blog post, but here are some notes
+to keep in mind while reading it:
+
+**Structure**: the code structure will be familiar to anyone who's written an
+HTTP server in Go. Client libraries for Gemini and for Weaviate are initialized
+and the clients are stored in a state value that's passed to HTTP handlers.
+
+**Route registration**: the HTTP routes for our server are trivial to set up
+using the [routing enhancements](/blog/routing-enhancements) introduced in
+Go 1.22:
+
+```Go
+mux := http.NewServeMux()
+mux.HandleFunc("POST /add/", server.addDocumentsHandler)
+mux.HandleFunc("POST /query/", server.queryHandler)
+```
+
+**Concurrency**: the HTTP handlers of our server reach out
+to other services over the network and wait for a response. This isn't a problem
+for Go, since each HTTP handler runs concurrently in its own goroutine. This
+RAG server can handle a large number of concurrent requests, and the code of
+each handler is linear and synchronous.
+
+**Batch APIs**: since an `/add/` request may provide a large number of documents
+to add to the knowledge base, the server leverages *batch APIs* for both
+embeddings (`embModel.BatchEmbedContents`) and the Weaviate DB
+(`rs.wvClient.Batch`) for efficiency.
+
+## Using LangChain for Go
+
+Our second RAG server variant uses LangChainGo to accomplish the same task.
+
+[LangChain](https://www.langchain.com/) is a popular Python framework for
+building LLM-powered applications.
+[LangChainGo](https://github.com/tmc/langchaingo) is its Go equivalent. The
+framework has some tools to build applications out of modular components, and
+supports many LLM providers and vector databases in a common API. This allows
+developers to write code that may work with any provider and change providers
+very easily.
+
+The full code for this variant is [in this directory](https://github.com/golang/example/tree/master/ragserver/ragserver-langchaingo).
+You'll notice two things when reading the code:
+
+First, it's somewhat shorter than the previous variant. LangChainGo takes care
+of wrapping the full APIs of vector databases in common interfaces, and less
+code is needed to initialize and deal with Weaviate.
+
+Second, the LangChainGo API makes it fairly easy to switch providers. Let's say
+we want to replace Weaviate by another vector DB; in our previous variant, we'd
+have to rewrite all the code interfacing the vector DB to use a new API. With
+a framework like LangChainGo, we no longer need to do so. As long as LangChainGo
+supports the new vector DB we're interested in, we should be able to replace
+just a few lines of code in our server, since all the DBs implement a
+[common interface](https://pkg.go.dev/github.com/tmc/langchaingo@v0.1.12/vectorstores#VectorStore):
+
+```Go
+type VectorStore interface {
+	AddDocuments(ctx context.Context, docs []schema.Document, options ...Option) ([]string, error)
+	SimilaritySearch(ctx context.Context, query string, numDocuments int, options ...Option) ([]schema.Document, error)
+}
+```
+
+## Using Genkit for Go
+
+Earlier this year, Google introduced [Genkit for Go](https://developers.googleblog.com/en/introducing-genkit-for-go-build-scalable-ai-powered-apps-in-go/) -
+a new open-source framework for building LLM-powered applications. Genkit shares
+some characteristics with LangChain, but diverges in other aspects.
+
+Like LangChain, it provides common interfaces that may be implemented by
+different providers (as plugins), and thus makes switching from one to the other
+simpler. However, it doesn't try to prescribe how different LLM components
+interact; instead, it focuses on production features like prompt management and
+engineering, and deployment with integrated developer tooling.
+
+Our third RAG server variant uses Genkit for Go to accomplish the same task.
+Its full code is [in this directory](https://github.com/golang/example/tree/master/ragserver/ragserver-genkit).
+
+This variant is fairly similar to the LangChainGo one - common interfaces for
+LLMs, embedders and vector DBs are used instead of direct provider APIs, making
+it easier to switch from one to another. In addition, deploying an LLM-powered
+application to production is much easier with Genkit; we don't implement this
+in our variant, but feel free to read [the documentation](https://firebase.google.com/docs/genkit-go/get-started-go)
+if you're interested.
+
+## Summary - Go for LLM-powered applications
+
+The samples in this post provide just a taste of what's possible for building
+LLM-powered applications in Go. It demonstrates how simple it is to build
+a powerful RAG server with relatively little code; most important, the samples
+pack a significant degree of production readiness because of some fundamental
+Go features.
+
+Working with LLM services often means sending REST or RPC requests to a network
+service, waiting for the response, sending new requests to other services based
+on that and so on. Go excels at all of these, providing great tools for managing
+concurrency and the complexity of juggling network services.
+
+In addition, Go's great performance and reliability as a Cloud-native language
+makes it a natural choice for implementing the more fundamental building blocks
+of the LLM ecosystem. For some examples, see projects like
+[Ollama](https://ollama.com/), [LocalAI](https://localai.io/),
+[Weaviate](https://weaviate.io/) or [Milvus](https://zilliz.com/what-is-milvus).
--- a/_content/blog/llmpowered/rag-server-diagram.png
+++ b/_content/blog/llmpowered/rag-server-diagram.png