Skip to content

use online LLM parallely #8

@RedLordezh7Venom

Description

@RedLordezh7Venom

Use LLM call parallelly when online

https://www.kaggle.com/code/jacoporepossi/text-summarization-with-gemma#Text-summarization:-Methods-and-strategies
discusses main strategies

-could switch to groq
-could use refine approach too

Map reduce can be used in parallel so would be the best way

To use LangChain to send parallel calls to Gemini (Google's model) for map-reduce chunking summaries, you’ll want to:

  1. Split your input document into chunks.
  2. Use LangChain’s MapReduceDocumentsChain or custom logic for map-reduce.
  3. Use LangChain’s support for concurrency to process those chunks in parallel.
  4. Connect LangChain to the Gemini API via a custom LLM wrapper (if one doesn’t exist in LangChain already).

✅ Overview of Steps

1. Install dependencies

pip install langchain google-generativeai

google-generativeai is used to access Gemini (from Google AI Studio / Gemini Pro)


2. Set up Gemini LLM wrapper for LangChain

LangChain may not yet have a built-in Gemini wrapper, but you can use a LLM wrapper like this:

from langchain.llms.base import LLM
from typing import List
import google.generativeai as genai

class GeminiLLM(LLM):
    def __init__(self, model_name="gemini-pro", temperature=0.7):
        genai.configure(api_key="YOUR_GEMINI_API_KEY")
        self.model = genai.GenerativeModel(model_name)
        self.temperature = temperature

    def _call(self, prompt: str, stop: List[str] = None) -> str:
        response = self.model.generate_content(prompt)
        return response.text

    @property
    def _llm_type(self) -> str:
        return "gemini"

3. Chunk the document

Use LangChain’s CharacterTextSplitter, RecursiveCharacterTextSplitter, etc.:

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = splitter.create_documents([your_large_text])

4. Create a map chain and a reduce chain

from langchain.chains import MapReduceDocumentsChain, LLMChain
from langchain.prompts import PromptTemplate

llm = GeminiLLM()

# Map step: Summarize each chunk
map_prompt = PromptTemplate.from_template("Summarize the following:\n{text}")
map_chain = LLMChain(llm=llm, prompt=map_prompt)

# Reduce step: Combine summaries
reduce_prompt = PromptTemplate.from_template("Combine the summaries into a coherent summary:\n{text}")
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)

# MapReduce Chain
summary_chain = MapReduceDocumentsChain(
    llm_chain=map_chain,
    reduce_chain=reduce_chain,
    document_variable_name="text",
    return_intermediate_steps=True,
)

5. Run in parallel using RunnableParallel or LangChain's concurrent.futures + async support

LangChain itself supports parallel execution under the hood in RunnableParallel (LangChain Expression Language).

If you want full control, you can do:

import asyncio
from langchain.schema import Document

async def summarize_chunk(doc: Document):
    return await asyncio.to_thread(map_chain.run, doc.page_content)

async def parallel_summarize(docs: List[Document]):
    return await asyncio.gather(*(summarize_chunk(doc) for doc in docs))

# Run map step in parallel
intermediate_summaries = asyncio.run(parallel_summarize(docs))

# Run reduce step
final_summary = reduce_chain.run('\n'.join(intermediate_summaries))

✅ Final Output

You’ll get:

  • Intermediate summaries (from map step)
  • Final reduced summary (concise, high-level output)

🔁 Optional Improvements

  • Use async Gemini API directly if available.
  • Tune chunk_size and prompts for better summarization.
  • Log retries or handle rate limits with Gemini.

Would you like a complete working code sample for this setup (with real Gemini API calls)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions