MemVector: Vector Search, Embeddings and RAG in a PHP Extension

MemVector is a PHP extension for vector storage, embedding generation, similarity search and reranking. Everything runs inside your PHP process — no external vector database, no Python sidecar, no per-token API costs.

MemVector is a good fit for AI workloads and AI developers. If you're building RAG pipelines, semantic search, recommendation engines or any application that works with embeddings, MemVector gives you the vector primitives you need without leaving PHP.

What MemVector does

A typical vector search setup in PHP requires calling an embedding API, sending vectors to a database like Pinecone or Qdrant, then querying over the network — 200–600ms across three network hops. MemVector does it all in-process in under 10ms.

What it supports

  • Key-value vector storage with memory, disk (mmap) and shared memory backends
  • Local embedding generation from GGUF models via llama.cpp
  • HNSW index with cosine, dot product, euclidean and manhattan distance
  • Cross-encoder reranking for two-stage retrieval
  • Quantization: F16, Int8, binary and product quantization
  • Up to 4,096 dimensions with metadata per vector (up to 4 KB)

Performance

OperationMemVectorCloud API
Embedding generation5–15 ms50–200 ms
Vector search0.1–5 ms10–50 ms
Full RAG pipeline10–30 ms200–600 ms
Cost per token$0Pay-per-use

Memory footprint: ~1 MB for the extension, ~33 MB with all-MiniLM model loaded, ~40 MB for 100K vectors at 384 dimensions with quantization.

Vector calculation without PHP arrays

You don't need embeddings or AI models to use MemVector. The most basic use case is storing vectors and computing distances between them. Compared to PHP arrays, the memory difference is significant — a PHP array of 1,536 floats takes ~73 KB per vector, MemVector stores the same in ~6 KB:

StoragePer vector (1,536 dim)10,000 vectors
PHP array~73 KB~715 MB
MemVector (float32)~6 KB~60 MB
MemVector F16~3 KB~30 MB
MemVector Int8~1.5 KB~15 MB
MemVector binary~192 B~1.9 MB

Quantization trades a small amount of precision for much lower memory usage. F16 and Int8 work well for most use cases. Binary is useful when you have millions of vectors and memory is tight.

Speed is also different. Cosine similarity on a 1,536-dimension vector takes ~0.5–1ms in a PHP loop. MemVector does it in microseconds and can search across thousands of vectors in that time.

PHP array approach vs MemVector

Cosine similarity in plain PHP:

function cosineSimilarity(array $a, array $b): float
{
    $dot = 0.0;
    $normA = 0.0;
    $normB = 0.0;

    for ($i = 0, $n = count($a); $i < $n; $i++) {
        $dot += $a[$i] * $b[$i];
        $normA += $a[$i] * $a[$i];
        $normB += $b[$i] * $b[$i];
    }

    return $dot / (sqrt($normA) * sqrt($normB));
}

// Compare a query against all stored vectors
$bestScore = -1;
$bestKey = null;

foreach ($vectors as $key => $vec) {
    $score = cosineSimilarity($queryVector, $vec);
    if ($score > $bestScore) {
        $bestScore = $score;
        $bestKey = $key;
    }
}

With MemVector:

$store = new MemVectorStore('/data/vectors', [
    'storage'    => 'memory',
    'dimensions' => 1536,
    'distance'   => 'cosine',
]);

// Store vectors (from any source — API, database, CSV, etc.)
$store->set('product_1', $vector1, json_encode(['name' => 'Widget A']));
$store->set('product_2', $vector2, json_encode(['name' => 'Widget B']));
$store->set('product_3', $vector3, json_encode(['name' => 'Widget C']));

// Find the 5 most similar vectors — uses SIMD and HNSW index internally
$results = $store->search($queryVector, 5);

// Each result has key, score, and metadata
foreach ($results as $result) {
    echo "{$result['key']}: {$result['score']}\n";
}

The vectors can come from anywhere — an API, a CSV, a database, or your own code. You can switch the distance metric depending on what you need:

// Dot product — useful for unnormalized vectors, recommendation scores
$store = new MemVectorStore(null, ['dimensions' => 384, 'distance' => 'dot']);

// Euclidean distance — useful for spatial data, clustering
$store = new MemVectorStore(null, ['dimensions' => 384, 'distance' => 'euclidean']);

// Manhattan distance — useful for grid-based distances, sparse features
$store = new MemVectorStore(null, ['dimensions' => 384, 'distance' => 'manhattan']);

MemVector with OpenSwoole

MemVector works best with long-lived processes. With OpenSwoole:

  • Models load once per worker and stay in memory across requests
  • shm storage mode shares the vector store across all workers
  • Concurrent reads/writes work well with OpenSwoole coroutines
  • mmap-based disk storage survives server restarts

Semantic search API example

<?php

use OpenSwoole\Http\Server;
use OpenSwoole\Http\Request;
use OpenSwoole\Http\Response;

$server = new Server('0.0.0.0', 9501);

$server->set([
    'worker_num' => 4,
]);

$server->on('workerStart', function (Server $server, int $workerId) {
    // Load embedding model once per worker — persists across all requests
    $server->embedding = new MemVectorEmbedding('/models/all-MiniLM-L6-v2.Q8_0.gguf');

    // Use shared memory so all workers access the same vector store
    $server->store = new MemVectorStore('/data/vectors', [
        'storage'    => 'shm',
        'dimensions' => $server->embedding->dimensions(), // 384
        'distance'   => 'cosine',
    ]);

    echo "Worker {$workerId}: model and store ready\n";
});

$server->on('request', function (Request $request, Response $response) use ($server) {
    $path = $request->server['request_uri'];

    if ($path === '/index' && $request->getMethod() === 'POST') {
        // Index a document
        $body = json_decode($request->getContent(), true);
        $key = $body['id'];
        $text = $body['text'];
        $metadata = json_encode($body['metadata'] ?? []);

        $vector = $server->embedding->embed($text);
        $server->store->set($key, $vector, $metadata);

        $response->header('Content-Type', 'application/json');
        $response->end(json_encode([
            'status' => 'indexed',
            'key'    => $key,
            'dimensions' => count($vector),
        ]));
    } elseif ($path === '/search') {
        // Semantic search
        $query = $request->get['q'] ?? '';
        $topK = (int) ($request->get['top_k'] ?? 10);

        $queryVector = $server->embedding->embed($query);
        $results = $server->store->search($queryVector, $topK);

        $response->header('Content-Type', 'application/json');
        $response->end(json_encode([
            'query'   => $query,
            'results' => $results,
        ]));
    } elseif ($path === '/stats') {
        $response->header('Content-Type', 'application/json');
        $response->end(json_encode($server->store->stats()));
    } else {
        $response->status(404);
        $response->end('Not Found');
    }
});

$server->start();

Try it out:

# Index documents
curl -X POST http://localhost:9501/index \
  -H 'Content-Type: application/json' \
  -d '{"id": "doc_1", "text": "OpenSwoole is an async PHP framework", "metadata": {"source": "docs"}}'

curl -X POST http://localhost:9501/index \
  -H 'Content-Type: application/json' \
  -d '{"id": "doc_2", "text": "PHP 8.5 introduces the pipe operator", "metadata": {"source": "blog"}}'

# Search
curl "http://localhost:9501/search?q=async+programming&top_k=5"

Two-stage search with reranking

Broad vector search first, then rerank with a cross-encoder for better precision:

$server->on('workerStart', function (Server $server, int $workerId) {
    $server->embedding = new MemVectorEmbedding('/models/all-MiniLM-L6-v2.Q8_0.gguf');
    $server->reranker = new MemVectorReranker('/models/bge-reranker-v2-m3-Q8_0.gguf');
    $server->store = new MemVectorStore('/data/vectors', [
        'storage'    => 'shm',
        'dimensions' => $server->embedding->dimensions(),
        'distance'   => 'cosine',
    ]);
});

$server->on('request', function (Request $request, Response $response) use ($server) {
    if ($request->server['request_uri'] === '/rag') {
        $query = $request->get['q'] ?? '';

        // Broad vector search to get 50 candidates
        $queryVector = $server->embedding->embed($query);
        $candidates = $server->store->search($queryVector, 50);

        // Rerank down to top 5
        $reranked = $server->reranker->rerank($query, $candidates, 5);

        $response->header('Content-Type', 'application/json');
        $response->end(json_encode([
            'query'   => $query,
            'results' => $reranked,
        ]));
    }
});

The whole pipeline completes in 10–30ms.

WebSocket example

<?php

use OpenSwoole\WebSocket\Server;
use OpenSwoole\WebSocket\Frame;

$server = new Server('0.0.0.0', 9502);

$server->on('workerStart', function ($server, $workerId) {
    $server->embedding = new MemVectorEmbedding('/models/all-MiniLM-L6-v2.Q8_0.gguf');
    $server->store = new MemVectorStore('/data/vectors', [
        'storage'    => 'shm',
        'dimensions' => $server->embedding->dimensions(),
        'distance'   => 'cosine',
    ]);
});

$server->on('message', function (Server $server, Frame $frame) {
    $data = json_decode($frame->data, true);

    if ($data['action'] === 'search') {
        $vector = $server->embedding->embed($data['query']);
        $results = $server->store->search($vector, $data['top_k'] ?? 5);

        $server->push($frame->fd, json_encode([
            'type'    => 'results',
            'query'   => $data['query'],
            'results' => $results,
        ]));
    }
});

$server->start();

MemVector with PHP-FPM and Laravel

MemVector also works with PHP-FPM. No persistent model loading or shared memory, but disk-backed mmap storage is still fast. Use an external API for embeddings and MemVector for storage and search.

Service provider

<?php
// app/Providers/MemVectorServiceProvider.php

namespace App\Providers;

use Illuminate\Support\ServiceProvider;

class MemVectorServiceProvider extends ServiceProvider
{
    public function register(): void
    {
        $this->app->singleton(MemVectorStore::class, function ($app) {
            return new MemVectorStore(storage_path('app/vectors'), [
                'storage'      => 'disk',          // mmap-backed, persists across requests
                'dimensions'   => 1536,            // OpenAI text-embedding-3-small
                'distance'     => 'cosine',
                'quantization' => 'f16',           // Half precision to save memory
            ]);
        });
    }
}

Register it in bootstrap/app.php or config/app.php.

Embedding service

<?php
// app/Services/EmbeddingService.php

namespace App\Services;

use Illuminate\Support\Facades\Http;

class EmbeddingService
{
    public function embed(string $text): array
    {
        $response = Http::withToken(config('services.openai.api_key'))
            ->post('https://api.openai.com/v1/embeddings', [
                'model' => 'text-embedding-3-small',
                'input' => $text,
            ]);

        return $response->json('data.0.embedding');
    }

    public function embedBatch(array $texts): array
    {
        $response = Http::withToken(config('services.openai.api_key'))
            ->post('https://api.openai.com/v1/embeddings', [
                'model' => 'text-embedding-3-small',
                'input' => $texts,
            ]);

        return array_map(
            fn($item) => $item['embedding'],
            $response->json('data')
        );
    }
}

Indexing documents with an Artisan command

<?php
// app/Console/Commands/IndexDocuments.php

namespace App\Console\Commands;

use App\Models\Article;
use App\Services\EmbeddingService;
use Illuminate\Console\Command;
use MemVectorStore;

class IndexDocuments extends Command
{
    protected $signature = 'vectors:index {--fresh : Rebuild the entire index}';
    protected $description = 'Index all articles into the vector store';

    public function handle(MemVectorStore $store, EmbeddingService $embeddings): int
    {
        $articles = Article::whereNull('embedded_at')
            ->orWhere('updated_at', '>', 'embedded_at')
            ->cursor();

        $batch = [];
        $keys = [];

        foreach ($articles as $article) {
            $batch[] = $article->title . ' ' . $article->body;
            $keys[] = $article;

            if (count($batch) >= 20) {
                $this->indexBatch($store, $embeddings, $keys, $batch);
                $batch = [];
                $keys = [];
            }
        }

        if (!empty($batch)) {
            $this->indexBatch($store, $embeddings, $keys, $batch);
        }

        $this->info("Index complete. Total vectors: {$store->count()}");
        return self::SUCCESS;
    }

    private function indexBatch(
        MemVectorStore $store,
        EmbeddingService $embeddings,
        array $articles,
        array $texts,
    ): void {
        $vectors = $embeddings->embedBatch($texts);

        $items = [];
        foreach ($articles as $i => $article) {
            $items[] = [
                'key'      => "article_{$article->id}",
                'vector'   => $vectors[$i],
                'metadata' => json_encode([
                    'id'    => $article->id,
                    'title' => $article->title,
                    'slug'  => $article->slug,
                ]),
            ];
            $article->update(['embedded_at' => now()]);
        }

        $store->batchSet($items);
        $this->info("Indexed " . count($items) . " articles");
    }
}

Search controller

<?php
// app/Http/Controllers/SearchController.php

namespace App\Http\Controllers;

use App\Services\EmbeddingService;
use Illuminate\Http\Request;
use MemVectorStore;

class SearchController extends Controller
{
    public function __invoke(
        Request $request,
        MemVectorStore $store,
        EmbeddingService $embeddings,
    ) {
        $request->validate(['q' => 'required|string|max:500']);

        $queryVector = $embeddings->embed($request->input('q'));
        $results = $store->search($queryVector, 10);

        // Hydrate results with full models
        $articleIds = array_map(function ($result) {
            $meta = json_decode($result['metadata'], true);
            return $meta['id'];
        }, $results);

        $articles = \App\Models\Article::whereIn('id', $articleIds)->get()
            ->keyBy('id');

        $ranked = array_map(function ($result) use ($articles) {
            $meta = json_decode($result['metadata'], true);
            return [
                'article' => $articles[$meta['id']] ?? null,
                'score'   => $result['score'],
            ];
        }, $results);

        return view('search.results', [
            'query'   => $request->input('q'),
            'results' => $ranked,
        ]);
    }
}

Route

// routes/web.php
Route::get('/search', SearchController::class)->name('search');

Vector search is 0.1–5ms in PHP-FPM. The embedding API call adds ~100ms.

OpenSwoole vs PHP-FPM comparison

OpenSwoolePHP-FPM + Laravel
Model loadingOnce per worker (persistent)Per-request or external API
Vector storeShared memory across workersDisk-backed mmap
EmbeddingsLocal GGUF models, 5–15 msExternal API, 50–200 ms
Search0.1–5 ms0.1–5 ms
Total latency10–30 ms100–250 ms
Per-token costNoneAPI pricing

Vector search speed is the same either way. OpenSwoole saves on embedding latency and API costs.

Installation

The easiest way to install is via PIE:

pie install memvector/ext-memvector

Or build from source:

# Basic installation
phpize && ./configure --enable-memvector && make && make install

# With local embedding support (requires llama.cpp)
phpize && ./configure --enable-memvector --with-llama=/usr/local && make && make install

Add to your php.ini:

extension=memvector

Verify it's loaded:

php -m | grep memvector

If you want local embeddings, download a model. all-MiniLM-L6-v2 is a good starting point at 24 MB and 384 dimensions:

curl -L -o /models/all-MiniLM-L6-v2.Q8_0.gguf \
  https://huggingface.co/leliuga/all-MiniLM-L6-v2-GGUF/resolve/main/all-MiniLM-L6-v2.Q8_0.gguf

The project is on GitHub: github.com/memvector/ext-memvector