Use LLM call parallelly when online
https://www.kaggle.com/code/jacoporepossi/text-summarization-with-gemma#Text-summarization:-Methods-and-strategies
discusses main strategies
-could switch to groq
-could use refine approach too
Map reduce can be used in parallel so would be the best way
To use LangChain to send parallel calls to Gemini (Google's model) for map-reduce chunking summaries, you’ll want to:
- Split your input document into chunks.
- Use LangChain’s
MapReduceDocumentsChain or custom logic for map-reduce.
- Use LangChain’s support for concurrency to process those chunks in parallel.
- Connect LangChain to the Gemini API via a custom
LLM wrapper (if one doesn’t exist in LangChain already).
✅ Overview of Steps
1. Install dependencies
pip install langchain google-generativeai
google-generativeai is used to access Gemini (from Google AI Studio / Gemini Pro)
2. Set up Gemini LLM wrapper for LangChain
LangChain may not yet have a built-in Gemini wrapper, but you can use a LLM wrapper like this:
from langchain.llms.base import LLM
from typing import List
import google.generativeai as genai
class GeminiLLM(LLM):
def __init__(self, model_name="gemini-pro", temperature=0.7):
genai.configure(api_key="YOUR_GEMINI_API_KEY")
self.model = genai.GenerativeModel(model_name)
self.temperature = temperature
def _call(self, prompt: str, stop: List[str] = None) -> str:
response = self.model.generate_content(prompt)
return response.text
@property
def _llm_type(self) -> str:
return "gemini"
3. Chunk the document
Use LangChain’s CharacterTextSplitter, RecursiveCharacterTextSplitter, etc.:
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = splitter.create_documents([your_large_text])
4. Create a map chain and a reduce chain
from langchain.chains import MapReduceDocumentsChain, LLMChain
from langchain.prompts import PromptTemplate
llm = GeminiLLM()
# Map step: Summarize each chunk
map_prompt = PromptTemplate.from_template("Summarize the following:\n{text}")
map_chain = LLMChain(llm=llm, prompt=map_prompt)
# Reduce step: Combine summaries
reduce_prompt = PromptTemplate.from_template("Combine the summaries into a coherent summary:\n{text}")
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)
# MapReduce Chain
summary_chain = MapReduceDocumentsChain(
llm_chain=map_chain,
reduce_chain=reduce_chain,
document_variable_name="text",
return_intermediate_steps=True,
)
5. Run in parallel using RunnableParallel or LangChain's concurrent.futures + async support
LangChain itself supports parallel execution under the hood in RunnableParallel (LangChain Expression Language).
If you want full control, you can do:
import asyncio
from langchain.schema import Document
async def summarize_chunk(doc: Document):
return await asyncio.to_thread(map_chain.run, doc.page_content)
async def parallel_summarize(docs: List[Document]):
return await asyncio.gather(*(summarize_chunk(doc) for doc in docs))
# Run map step in parallel
intermediate_summaries = asyncio.run(parallel_summarize(docs))
# Run reduce step
final_summary = reduce_chain.run('\n'.join(intermediate_summaries))
✅ Final Output
You’ll get:
- Intermediate summaries (from map step)
- Final reduced summary (concise, high-level output)
🔁 Optional Improvements
- Use
async Gemini API directly if available.
- Tune
chunk_size and prompts for better summarization.
- Log retries or handle rate limits with Gemini.
Would you like a complete working code sample for this setup (with real Gemini API calls)?
Use LLM call parallelly when online
https://www.kaggle.com/code/jacoporepossi/text-summarization-with-gemma#Text-summarization:-Methods-and-strategies
discusses main strategies
-could switch to groq
-could use refine approach too
Map reduce can be used in parallel so would be the best way
To use LangChain to send parallel calls to Gemini (Google's model) for map-reduce chunking summaries, you’ll want to:
MapReduceDocumentsChainor custom logic for map-reduce.LLMwrapper (if one doesn’t exist in LangChain already).✅ Overview of Steps
1. Install dependencies
2. Set up Gemini LLM wrapper for LangChain
LangChain may not yet have a built-in Gemini wrapper, but you can use a
LLMwrapper like this:3. Chunk the document
Use LangChain’s
CharacterTextSplitter,RecursiveCharacterTextSplitter, etc.:4. Create a
mapchain and areducechain5. Run in parallel using
RunnableParallelor LangChain'sconcurrent.futures+asyncsupportLangChain itself supports parallel execution under the hood in
RunnableParallel(LangChain Expression Language).If you want full control, you can do:
✅ Final Output
You’ll get:
🔁 Optional Improvements
asyncGemini API directly if available.chunk_sizeand prompts for better summarization.Would you like a complete working code sample for this setup (with real Gemini API calls)?