MemVector is a PHP extension for vector storage, embedding generation, similarity search and reranking. Everything runs inside your PHP process — no external vector database, no Python sidecar, no per-token API costs.
MemVector is a good fit for AI workloads and AI developers. If you're building RAG pipelines, semantic search, recommendation engines or any application that works with embeddings, MemVector gives you the vector primitives you need without leaving PHP.
A typical vector search setup in PHP requires calling an embedding API, sending vectors to a database like Pinecone or Qdrant, then querying over the network — 200–600ms across three network hops. MemVector does it all in-process in under 10ms.
| Operation | MemVector | Cloud API |
|---|---|---|
| Embedding generation | 5–15 ms | 50–200 ms |
| Vector search | 0.1–5 ms | 10–50 ms |
| Full RAG pipeline | 10–30 ms | 200–600 ms |
| Cost per token | $0 | Pay-per-use |
Memory footprint: ~1 MB for the extension, ~33 MB with all-MiniLM model loaded, ~40 MB for 100K vectors at 384 dimensions with quantization.
You don't need embeddings or AI models to use MemVector. The most basic use case is storing vectors and computing distances between them. Compared to PHP arrays, the memory difference is significant — a PHP array of 1,536 floats takes ~73 KB per vector, MemVector stores the same in ~6 KB:
| Storage | Per vector (1,536 dim) | 10,000 vectors |
|---|---|---|
| PHP array | ~73 KB | ~715 MB |
| MemVector (float32) | ~6 KB | ~60 MB |
| MemVector F16 | ~3 KB | ~30 MB |
| MemVector Int8 | ~1.5 KB | ~15 MB |
| MemVector binary | ~192 B | ~1.9 MB |
Quantization trades a small amount of precision for much lower memory usage. F16 and Int8 work well for most use cases. Binary is useful when you have millions of vectors and memory is tight.
Speed is also different. Cosine similarity on a 1,536-dimension vector takes ~0.5–1ms in a PHP loop. MemVector does it in microseconds and can search across thousands of vectors in that time.
Cosine similarity in plain PHP:
function cosineSimilarity(array $a, array $b): float
{
$dot = 0.0;
$normA = 0.0;
$normB = 0.0;
for ($i = 0, $n = count($a); $i < $n; $i++) {
$dot += $a[$i] * $b[$i];
$normA += $a[$i] * $a[$i];
$normB += $b[$i] * $b[$i];
}
return $dot / (sqrt($normA) * sqrt($normB));
}
// Compare a query against all stored vectors
$bestScore = -1;
$bestKey = null;
foreach ($vectors as $key => $vec) {
$score = cosineSimilarity($queryVector, $vec);
if ($score > $bestScore) {
$bestScore = $score;
$bestKey = $key;
}
}
With MemVector:
$store = new MemVectorStore('/data/vectors', [
'storage' => 'memory',
'dimensions' => 1536,
'distance' => 'cosine',
]);
// Store vectors (from any source — API, database, CSV, etc.)
$store->set('product_1', $vector1, json_encode(['name' => 'Widget A']));
$store->set('product_2', $vector2, json_encode(['name' => 'Widget B']));
$store->set('product_3', $vector3, json_encode(['name' => 'Widget C']));
// Find the 5 most similar vectors — uses SIMD and HNSW index internally
$results = $store->search($queryVector, 5);
// Each result has key, score, and metadata
foreach ($results as $result) {
echo "{$result['key']}: {$result['score']}\n";
}
The vectors can come from anywhere — an API, a CSV, a database, or your own code. You can switch the distance metric depending on what you need:
// Dot product — useful for unnormalized vectors, recommendation scores
$store = new MemVectorStore(null, ['dimensions' => 384, 'distance' => 'dot']);
// Euclidean distance — useful for spatial data, clustering
$store = new MemVectorStore(null, ['dimensions' => 384, 'distance' => 'euclidean']);
// Manhattan distance — useful for grid-based distances, sparse features
$store = new MemVectorStore(null, ['dimensions' => 384, 'distance' => 'manhattan']);
MemVector works best with long-lived processes. With OpenSwoole:
shm storage mode shares the vector store across all workers<?php
use OpenSwoole\Http\Server;
use OpenSwoole\Http\Request;
use OpenSwoole\Http\Response;
$server = new Server('0.0.0.0', 9501);
$server->set([
'worker_num' => 4,
]);
$server->on('workerStart', function (Server $server, int $workerId) {
// Load embedding model once per worker — persists across all requests
$server->embedding = new MemVectorEmbedding('/models/all-MiniLM-L6-v2.Q8_0.gguf');
// Use shared memory so all workers access the same vector store
$server->store = new MemVectorStore('/data/vectors', [
'storage' => 'shm',
'dimensions' => $server->embedding->dimensions(), // 384
'distance' => 'cosine',
]);
echo "Worker {$workerId}: model and store ready\n";
});
$server->on('request', function (Request $request, Response $response) use ($server) {
$path = $request->server['request_uri'];
if ($path === '/index' && $request->getMethod() === 'POST') {
// Index a document
$body = json_decode($request->getContent(), true);
$key = $body['id'];
$text = $body['text'];
$metadata = json_encode($body['metadata'] ?? []);
$vector = $server->embedding->embed($text);
$server->store->set($key, $vector, $metadata);
$response->header('Content-Type', 'application/json');
$response->end(json_encode([
'status' => 'indexed',
'key' => $key,
'dimensions' => count($vector),
]));
} elseif ($path === '/search') {
// Semantic search
$query = $request->get['q'] ?? '';
$topK = (int) ($request->get['top_k'] ?? 10);
$queryVector = $server->embedding->embed($query);
$results = $server->store->search($queryVector, $topK);
$response->header('Content-Type', 'application/json');
$response->end(json_encode([
'query' => $query,
'results' => $results,
]));
} elseif ($path === '/stats') {
$response->header('Content-Type', 'application/json');
$response->end(json_encode($server->store->stats()));
} else {
$response->status(404);
$response->end('Not Found');
}
});
$server->start();
Try it out:
# Index documents
curl -X POST http://localhost:9501/index \
-H 'Content-Type: application/json' \
-d '{"id": "doc_1", "text": "OpenSwoole is an async PHP framework", "metadata": {"source": "docs"}}'
curl -X POST http://localhost:9501/index \
-H 'Content-Type: application/json' \
-d '{"id": "doc_2", "text": "PHP 8.5 introduces the pipe operator", "metadata": {"source": "blog"}}'
# Search
curl "http://localhost:9501/search?q=async+programming&top_k=5"
Broad vector search first, then rerank with a cross-encoder for better precision:
$server->on('workerStart', function (Server $server, int $workerId) {
$server->embedding = new MemVectorEmbedding('/models/all-MiniLM-L6-v2.Q8_0.gguf');
$server->reranker = new MemVectorReranker('/models/bge-reranker-v2-m3-Q8_0.gguf');
$server->store = new MemVectorStore('/data/vectors', [
'storage' => 'shm',
'dimensions' => $server->embedding->dimensions(),
'distance' => 'cosine',
]);
});
$server->on('request', function (Request $request, Response $response) use ($server) {
if ($request->server['request_uri'] === '/rag') {
$query = $request->get['q'] ?? '';
// Broad vector search to get 50 candidates
$queryVector = $server->embedding->embed($query);
$candidates = $server->store->search($queryVector, 50);
// Rerank down to top 5
$reranked = $server->reranker->rerank($query, $candidates, 5);
$response->header('Content-Type', 'application/json');
$response->end(json_encode([
'query' => $query,
'results' => $reranked,
]));
}
});
The whole pipeline completes in 10–30ms.
<?php
use OpenSwoole\WebSocket\Server;
use OpenSwoole\WebSocket\Frame;
$server = new Server('0.0.0.0', 9502);
$server->on('workerStart', function ($server, $workerId) {
$server->embedding = new MemVectorEmbedding('/models/all-MiniLM-L6-v2.Q8_0.gguf');
$server->store = new MemVectorStore('/data/vectors', [
'storage' => 'shm',
'dimensions' => $server->embedding->dimensions(),
'distance' => 'cosine',
]);
});
$server->on('message', function (Server $server, Frame $frame) {
$data = json_decode($frame->data, true);
if ($data['action'] === 'search') {
$vector = $server->embedding->embed($data['query']);
$results = $server->store->search($vector, $data['top_k'] ?? 5);
$server->push($frame->fd, json_encode([
'type' => 'results',
'query' => $data['query'],
'results' => $results,
]));
}
});
$server->start();
MemVector also works with PHP-FPM. No persistent model loading or shared memory, but disk-backed mmap storage is still fast. Use an external API for embeddings and MemVector for storage and search.
<?php
// app/Providers/MemVectorServiceProvider.php
namespace App\Providers;
use Illuminate\Support\ServiceProvider;
class MemVectorServiceProvider extends ServiceProvider
{
public function register(): void
{
$this->app->singleton(MemVectorStore::class, function ($app) {
return new MemVectorStore(storage_path('app/vectors'), [
'storage' => 'disk', // mmap-backed, persists across requests
'dimensions' => 1536, // OpenAI text-embedding-3-small
'distance' => 'cosine',
'quantization' => 'f16', // Half precision to save memory
]);
});
}
}
Register it in bootstrap/app.php or config/app.php.
<?php
// app/Services/EmbeddingService.php
namespace App\Services;
use Illuminate\Support\Facades\Http;
class EmbeddingService
{
public function embed(string $text): array
{
$response = Http::withToken(config('services.openai.api_key'))
->post('https://api.openai.com/v1/embeddings', [
'model' => 'text-embedding-3-small',
'input' => $text,
]);
return $response->json('data.0.embedding');
}
public function embedBatch(array $texts): array
{
$response = Http::withToken(config('services.openai.api_key'))
->post('https://api.openai.com/v1/embeddings', [
'model' => 'text-embedding-3-small',
'input' => $texts,
]);
return array_map(
fn($item) => $item['embedding'],
$response->json('data')
);
}
}
<?php
// app/Console/Commands/IndexDocuments.php
namespace App\Console\Commands;
use App\Models\Article;
use App\Services\EmbeddingService;
use Illuminate\Console\Command;
use MemVectorStore;
class IndexDocuments extends Command
{
protected $signature = 'vectors:index {--fresh : Rebuild the entire index}';
protected $description = 'Index all articles into the vector store';
public function handle(MemVectorStore $store, EmbeddingService $embeddings): int
{
$articles = Article::whereNull('embedded_at')
->orWhere('updated_at', '>', 'embedded_at')
->cursor();
$batch = [];
$keys = [];
foreach ($articles as $article) {
$batch[] = $article->title . ' ' . $article->body;
$keys[] = $article;
if (count($batch) >= 20) {
$this->indexBatch($store, $embeddings, $keys, $batch);
$batch = [];
$keys = [];
}
}
if (!empty($batch)) {
$this->indexBatch($store, $embeddings, $keys, $batch);
}
$this->info("Index complete. Total vectors: {$store->count()}");
return self::SUCCESS;
}
private function indexBatch(
MemVectorStore $store,
EmbeddingService $embeddings,
array $articles,
array $texts,
): void {
$vectors = $embeddings->embedBatch($texts);
$items = [];
foreach ($articles as $i => $article) {
$items[] = [
'key' => "article_{$article->id}",
'vector' => $vectors[$i],
'metadata' => json_encode([
'id' => $article->id,
'title' => $article->title,
'slug' => $article->slug,
]),
];
$article->update(['embedded_at' => now()]);
}
$store->batchSet($items);
$this->info("Indexed " . count($items) . " articles");
}
}
<?php
// app/Http/Controllers/SearchController.php
namespace App\Http\Controllers;
use App\Services\EmbeddingService;
use Illuminate\Http\Request;
use MemVectorStore;
class SearchController extends Controller
{
public function __invoke(
Request $request,
MemVectorStore $store,
EmbeddingService $embeddings,
) {
$request->validate(['q' => 'required|string|max:500']);
$queryVector = $embeddings->embed($request->input('q'));
$results = $store->search($queryVector, 10);
// Hydrate results with full models
$articleIds = array_map(function ($result) {
$meta = json_decode($result['metadata'], true);
return $meta['id'];
}, $results);
$articles = \App\Models\Article::whereIn('id', $articleIds)->get()
->keyBy('id');
$ranked = array_map(function ($result) use ($articles) {
$meta = json_decode($result['metadata'], true);
return [
'article' => $articles[$meta['id']] ?? null,
'score' => $result['score'],
];
}, $results);
return view('search.results', [
'query' => $request->input('q'),
'results' => $ranked,
]);
}
}
// routes/web.php
Route::get('/search', SearchController::class)->name('search');
Vector search is 0.1–5ms in PHP-FPM. The embedding API call adds ~100ms.
| OpenSwoole | PHP-FPM + Laravel | |
|---|---|---|
| Model loading | Once per worker (persistent) | Per-request or external API |
| Vector store | Shared memory across workers | Disk-backed mmap |
| Embeddings | Local GGUF models, 5–15 ms | External API, 50–200 ms |
| Search | 0.1–5 ms | 0.1–5 ms |
| Total latency | 10–30 ms | 100–250 ms |
| Per-token cost | None | API pricing |
Vector search speed is the same either way. OpenSwoole saves on embedding latency and API costs.
The easiest way to install is via PIE:
pie install memvector/ext-memvector
Or build from source:
# Basic installation
phpize && ./configure --enable-memvector && make && make install
# With local embedding support (requires llama.cpp)
phpize && ./configure --enable-memvector --with-llama=/usr/local && make && make install
Add to your php.ini:
extension=memvector
Verify it's loaded:
php -m | grep memvector
If you want local embeddings, download a model. all-MiniLM-L6-v2 is a good starting point at 24 MB and 384 dimensions:
curl -L -o /models/all-MiniLM-L6-v2.Q8_0.gguf \
https://huggingface.co/leliuga/all-MiniLM-L6-v2-GGUF/resolve/main/all-MiniLM-L6-v2.Q8_0.gguf
The project is on GitHub: github.com/memvector/ext-memvector
Join 4,000+ others and never miss out on new tips, tutorials, and more.