Overview
Vapi assistants are stateless across calls — when the same caller phones back next week, the agent starts from scratch and asks for their name and number again. Mengram’s Vapi adapter solves this with two HTTP webhooks the assistant invokes at call start and call end.- At call start — the assistant calls a custom tool (
recall_caller) with the caller’s phone number. Mengram returns a concise summary of everything it knows about that caller. - At call end — Vapi posts the final transcript to a server URL. Mengram extracts entities, facts, and episodes, keyed to the caller’s phone number.
sub_user_id=voice:<E.164>. One Mengram account can power thousands of caller memories across multiple white-label clients.
Apache 2.0, free tier covers about 40 inbound calls/month. Paid tiers from $5/mo. Same retrieval stack as the rest of Mengram (hybrid vector + BM25 + RRF, Ebbinghaus decay, reflection cron).
Quick Setup
You’ll need a Mengram API key. Get one at mengram.io → sign up → Dashboard → Keys.1. Add the recall tool to your Vapi assistant
In the Vapi dashboard, open your assistant → Tools → add a Custom Tool:2. Tell the assistant to call it first
In your assistant’s system prompt, add one paragraph:3. Wire the end-of-call save webhook
Still in the assistant config, set the Server URL to:Authorization: Bearer YOUR_MENGRAM_KEY header in the Server URL config.
4. Test it
Make a real test call to the assistant. After it hangs up, wait ~30–60s for extraction, then call back from the same number. The assistant should greet you with what it learned from the first call.How It Works
sub_user_id directly (not semantic search) — for a known caller you want everything we know, not “most relevant to a query.” Persons are sorted by fact count descending so the caller’s own facts surface ahead of mentioned people (their daughter, their doctor).
The save endpoint routes the transcript through Mengram’s standard extraction pipeline — same code path as /v1/add. Entities, facts, episodes, and procedures all get extracted normally. The daily reflection cron then synthesizes patterns (“this caller prefers morning slots”, “anxiety about novocaine”) so the next recall returns insight, not raw transcripts.
Endpoints
POST /v1/voice/vapi/recall
Called by Vapi as a custom tool. Returns the caller context string. Request body (either shape accepted — Vapi sends both):result MUST be a string):
search.
If message.type is anything other than tool-calls, the endpoint returns 200 {"status":"ignored"} so Vapi doesn’t mark the assistant as broken when lifecycle events arrive at the same URL.
POST /v1/voice/vapi/save
Called by Vapi at end of call. Routes transcript through Mengram’s extraction pipeline. Request body:add.
The transcript can also be at message.artifact.transcript — Mengram reads whichever is present. Only end-of-call-report triggers extraction; partial transcript events are ignored.
Pricing
Each inbound call = 1 recall + 1 save = 1 search quota + 1 add quota. Free tier (40 adds, 200 searches/mo) covers about 40 inbound calls per month — enough to validate. Paid tiers from $5/mo. See full pricing.| Plan | Inbound calls/mo (estimate) |
|---|---|
| Free | ~40 |
| Starter ($5) | ~100 |
| Pro ($19) | ~1,000 |
| Growth ($59) | ~3,000 |
| Business ($99) | ~8,000 |
Performance
Measured against production with a 1,186-word transcript indexed for the caller:- Single recall: 500–900ms
- Under 10–20 concurrent recalls: p50 ≈ 1200ms, p95 ≈ 1300ms
recall_caller during the natural greeting pause, so callers don’t perceive the latency. If your use case has hard sub-1s SLAs, benchmark in your own setup first.
FAQ
How is this different from mem0 + Vapi tutorials online?
How is this different from mem0 + Vapi tutorials online?
Mem0’s Vapi tutorials require gluing mem0 + n8n + custom code together. Mengram’s adapter is Vapi-native — paste the JSON, you’re done. Same hybrid retrieval (vector + BM25 + RRF) underneath, just less wiring.
Will this work with Retell, Pipecat, or LiveKit?
Will this work with Retell, Pipecat, or LiveKit?
The recall/save endpoints are HTTP webhooks — anything that can POST JSON works. Native Pipecat processor and LiveKit agent helper aren’t built yet; email ali@mengram.io if you need one and I’ll prioritize based on demand.
What about HIPAA?
What about HIPAA?
Self-host gives full data residency (Apache 2.0, your Postgres, your OpenAI key). Hosted-cloud BAA isn’t yet available — for now, healthcare voice agents should self-host. See Self-hosting.
Can I test with Vapi's web 'Talk to Assistant' button?
Can I test with Vapi's web 'Talk to Assistant' button?
What if a caller calls back from a different number?
What if a caller calls back from a different number?
They’ll be treated as a new caller. If you need cross-number identity, ask for confirmation in the assistant prompt (“Are you the same Sarah who called last week?”) and use the existing user_id field to merge — or run
/v1/merge_user once you’ve confirmed.Next Steps
- Try the landing demo for a one-page setup walkthrough
- See Memory Types for what gets extracted from transcripts
- Check the API Reference for
/v1/voice/vapi/recalland/v1/voice/vapi/saveendpoint specs - Self-host if you want full data residency