MemVector: Vector Search, Embeddings and RAG in a PHP Extension

MemVector is a PHP extension for vector storage, embedding generation, similarity search and reranking. Everything runs inside your PHP process — no external vector database, no Python sidecar, no per-token API costs.

MemVector is a good fit for AI workloads and AI developers. If you're building RAG pipelines, semantic search, recommendation engines or any application that works with embeddings, MemVector gives you the vector primitives you need without leaving PHP.

What MemVector does

A typical vector search setup in PHP requires calling an embedding API, sending vectors to a database like Pinecone or Qdrant, then querying over the network — 200–600ms across three network hops. MemVector does it all in-process in under 10ms.

What it supports

Key-value vector storage with memory, disk (mmap) and shared memory backends
Local embedding generation from GGUF models via llama.cpp
HNSW index with cosine, dot product, euclidean and manhattan distance
Cross-encoder reranking for two-stage retrieval
Quantization: F16, Int8, binary and product quantization
Up to 4,096 dimensions with metadata per vector (up to 4 KB)

Performance

Operation	MemVector	Cloud API
Embedding generation	5–15 ms	50–200 ms
Vector search	0.1–5 ms	10–50 ms
Full RAG pipeline	10–30 ms	200–600 ms
Cost per token	$0	Pay-per-use

Memory footprint: ~1 MB for the extension, ~33 MB with all-MiniLM model loaded, ~40 MB for 100K vectors at 384 dimensions with quantization.

Vector calculation without PHP arrays

You don't need embeddings or AI models to use MemVector. The most basic use case is storing vectors and computing distances between them. Compared to PHP arrays, the memory difference is significant — a PHP array of 1,536 floats takes ~73 KB per vector, MemVector stores the same in ~6 KB:

Storage	Per vector (1,536 dim)	10,000 vectors
PHP array	~73 KB	~715 MB
MemVector (float32)	~6 KB	~60 MB
MemVector F16	~3 KB	~30 MB
MemVector Int8	~1.5 KB	~15 MB
MemVector binary	~192 B	~1.9 MB

Quantization trades a small amount of precision for much lower memory usage. F16 and Int8 work well for most use cases. Binary is useful when you have millions of vectors and memory is tight.

Speed is also different. Cosine similarity on a 1,536-dimension vector takes ~0.5–1ms in a PHP loop. MemVector does it in microseconds and can search across thousands of vectors in that time.

PHP array approach vs MemVector

Cosine similarity in plain PHP:

function cosineSimilarity(array $a, array $b): float
{
    $dot = 0.0;
    $normA = 0.0;
    $normB = 0.0;

    for ($i = 0, $n = count($a); $i < $n; $i++) {
        $dot += $a[$i] * $b[$i];
        $normA += $a[$i] * $a[$i];
        $normB += $b[$i] * $b[$i];
    }

    return $dot / (sqrt($normA) * sqrt($normB));
}

// Compare a query against all stored vectors
$bestScore = -1;
$bestKey = null;

foreach ($vectors as $key => $vec) {
    $score = cosineSimilarity($queryVector, $vec);
    if ($score > $bestScore) {
        $bestScore = $score;
        $bestKey = $key;
    }
}

With MemVector:

$store = new MemVectorStore('/data/vectors', [
    'storage'    => 'memory',
    'dimensions' => 1536,
    'distance'   => 'cosine',
]);

// Store vectors (from any source — API, database, CSV, etc.)
$store->set('product_1', $vector1, json_encode(['name' => 'Widget A']));
$store->set('product_2', $vector2, json_encode(['name' => 'Widget B']));
$store->set('product_3', $vector3, json_encode(['name' => 'Widget C']));

// Find the 5 most similar vectors — uses SIMD and HNSW index internally
$results = $store->search($queryVector, 5);

// Each result has key, score, and metadata
foreach ($results as $result) {
    echo "{$result['key']}: {$result['score']}\n";
}

The vectors can come from anywhere — an API, a CSV, a database, or your own code. You can switch the distance metric depending on what you need:

// Dot product — useful for unnormalized vectors, recommendation scores
$store = new MemVectorStore(null, ['dimensions' => 384, 'distance' => 'dot']);

// Euclidean distance — useful for spatial data, clustering
$store = new MemVectorStore(null, ['dimensions' => 384, 'distance' => 'euclidean']);

// Manhattan distance — useful for grid-based distances, sparse features
$store = new MemVectorStore(null, ['dimensions' => 384, 'distance' => 'manhattan']);

MemVector with OpenSwoole

MemVector works best with long-lived processes. With OpenSwoole:

Models load once per worker and stay in memory across requests
shm storage mode shares the vector store across all workers
Concurrent reads/writes work well with OpenSwoole coroutines
mmap-based disk storage survives server restarts

Semantic search API example

<?php

use OpenSwoole\Http\Server;
use OpenSwoole\Http\Request;
use OpenSwoole\Http\Response;

$server = new Server('0.0.0.0', 9501);

$server->set([
    'worker_num' => 4,
]);

$server->on('workerStart', function (Server $server, int $workerId) {
    // Load embedding model once per worker — persists across all requests
    $server->embedding = new MemVectorEmbedding('/models/all-MiniLM-L6-v2.Q8_0.gguf');

    // Use shared memory so all workers access the same vector store
    $server->store = new MemVectorStore('/data/vectors', [
        'storage'    => 'shm',
        'dimensions' => $server->embedding->dimensions(), // 384
        'distance'   => 'cosine',
    ]);

    echo "Worker {$workerId}: model and store ready\n";
});

$server->on('request', function (Request $request, Response $response) use ($server) {
    $path = $request->server['request_uri'];

    if ($path === '/index' && $request->getMethod() === 'POST') {
        // Index a document
        $body = json_decode($request->getContent(), true);
        $key = $body['id'];
        $text = $body['text'];
        $metadata = json_encode($body['metadata'] ?? []);

        $vector = $server->embedding->embed($text);
        $server->store->set($key, $vector, $metadata);

        $response->header('Content-Type', 'application/json');
        $response->end(json_encode([
            'status' => 'indexed',
            'key'    => $key,
            'dimensions' => count($vector),
        ]));
    } elseif ($path === '/search') {
        // Semantic search
        $query = $request->get['q'] ?? '';
        $topK = (int) ($request->get['top_k'] ?? 10);

        $queryVector = $server->embedding->embed($query);
        $results = $server->store->search($queryVector, $topK);

        $response->header('Content-Type', 'application/json');
        $response->end(json_encode([
            'query'   => $query,
            'results' => $results,
        ]));
    } elseif ($path === '/stats') {
        $response->header('Content-Type', 'application/json');
        $response->end(json_encode($server->store->stats()));
    } else {
        $response->status(404);
        $response->end('Not Found');
    }
});

$server->start();

Try it out:

# Index documents
curl -X POST http://localhost:9501/index \
  -H 'Content-Type: application/json' \
  -d '{"id": "doc_1", "text": "OpenSwoole is an async PHP framework", "metadata": {"source": "docs"}}'

curl -X POST http://localhost:9501/index \
  -H 'Content-Type: application/json' \
  -d '{"id": "doc_2", "text": "PHP 8.5 introduces the pipe operator", "metadata": {"source": "blog"}}'

# Search
curl "http://localhost:9501/search?q=async+programming&top_k=5"

Two-stage search with reranking

Broad vector search first, then rerank with a cross-encoder for better precision:

$server->on('workerStart', function (Server $server, int $workerId) {
    $server->embedding = new MemVectorEmbedding('/models/all-MiniLM-L6-v2.Q8_0.gguf');
    $server->reranker = new MemVectorReranker('/models/bge-reranker-v2-m3-Q8_0.gguf');
    $server->store = new MemVectorStore('/data/vectors', [
        'storage'    => 'shm',
        'dimensions' => $server->embedding->dimensions(),
        'distance'   => 'cosine',
    ]);
});

$server->on('request', function (Request $request, Response $response) use ($server) {
    if ($request->server['request_uri'] === '/rag') {
        $query = $request->get['q'] ?? '';

        // Broad vector search to get 50 candidates
        $queryVector = $server->embedding->embed($query);
        $candidates = $server->store->search($queryVector, 50);

        // Rerank down to top 5
        $reranked = $server->reranker->rerank($query, $candidates, 5);

        $response->header('Content-Type', 'application/json');
        $response->end(json_encode([
            'query'   => $query,
            'results' => $reranked,
        ]));
    }
});

The whole pipeline completes in 10–30ms.

WebSocket example

<?php

use OpenSwoole\WebSocket\Server;
use OpenSwoole\WebSocket\Frame;

$server = new Server('0.0.0.0', 9502);

$server->on('workerStart', function ($server, $workerId) {
    $server->embedding = new MemVectorEmbedding('/models/all-MiniLM-L6-v2.Q8_0.gguf');
    $server->store = new MemVectorStore('/data/vectors', [
        'storage'    => 'shm',
        'dimensions' => $server->embedding->dimensions(),
        'distance'   => 'cosine',
    ]);
});

$server->on('message', function (Server $server, Frame $frame) {
    $data = json_decode($frame->data, true);

    if ($data['action'] === 'search') {
        $vector = $server->embedding->embed($data['query']);
        $results = $server->store->search($vector, $data['top_k'] ?? 5);

        $server->push($frame->fd, json_encode([
            'type'    => 'results',
            'query'   => $data['query'],
            'results' => $results,
        ]));
    }
});

$server->start();

MemVector with PHP-FPM and Laravel

MemVector also works with PHP-FPM. No persistent model loading or shared memory, but disk-backed mmap storage is still fast. Use an external API for embeddings and MemVector for storage and search.

Service provider

<?php
// app/Providers/MemVectorServiceProvider.php

namespace App\Providers;

use Illuminate\Support\ServiceProvider;

class MemVectorServiceProvider extends ServiceProvider
{
    public function register(): void
    {
        $this->app->singleton(MemVectorStore::class, function ($app) {
            return new MemVectorStore(storage_path('app/vectors'), [
                'storage'      => 'disk',          // mmap-backed, persists across requests
                'dimensions'   => 1536,            // OpenAI text-embedding-3-small
                'distance'     => 'cosine',
                'quantization' => 'f16',           // Half precision to save memory
            ]);
        });
    }
}

Embedding service

<?php
// app/Services/EmbeddingService.php

namespace App\Services;

use Illuminate\Support\Facades\Http;

class EmbeddingService
{
    public function embed(string $text): array
    {
        $response = Http::withToken(config('services.openai.api_key'))
            ->post('https://api.openai.com/v1/embeddings', [
                'model' => 'text-embedding-3-small',
                'input' => $text,
            ]);

        return $response->json('data.0.embedding');
    }

    public function embedBatch(array $texts): array
    {
        $response = Http::withToken(config('services.openai.api_key'))
            ->post('https://api.openai.com/v1/embeddings', [
                'model' => 'text-embedding-3-small',
                'input' => $texts,
            ]);

        return array_map(
            fn($item) => $item['embedding'],
            $response->json('data')
        );
    }
}

Indexing documents with an Artisan command

<?php
// app/Console/Commands/IndexDocuments.php

namespace App\Console\Commands;

use App\Models\Article;
use App\Services\EmbeddingService;
use Illuminate\Console\Command;
use MemVectorStore;

class IndexDocuments extends Command
{
    protected $signature = 'vectors:index {--fresh : Rebuild the entire index}';
    protected $description = 'Index all articles into the vector store';

    public function handle(MemVectorStore $store, EmbeddingService $embeddings): int
    {
        $articles = Article::whereNull('embedded_at')
            ->orWhere('updated_at', '>', 'embedded_at')
            ->cursor();

        $batch = [];
        $keys = [];

        foreach ($articles as $article) {
            $batch[] = $article->title . ' ' . $article->body;
            $keys[] = $article;

            if (count($batch) >= 20) {
                $this->indexBatch($store, $embeddings, $keys, $batch);
                $batch = [];
                $keys = [];
            }
        }

        if (!empty($batch)) {
            $this->indexBatch($store, $embeddings, $keys, $batch);
        }

        $this->info("Index complete. Total vectors: {$store->count()}");
        return self::SUCCESS;
    }

    private function indexBatch(
        MemVectorStore $store,
        EmbeddingService $embeddings,
        array $articles,
        array $texts,
    ): void {
        $vectors = $embeddings->embedBatch($texts);

        $items = [];
        foreach ($articles as $i => $article) {
            $items[] = [
                'key'      => "article_{$article->id}",
                'vector'   => $vectors[$i],
                'metadata' => json_encode([
                    'id'    => $article->id,
                    'title' => $article->title,
                    'slug'  => $article->slug,
                ]),
            ];
            $article->update(['embedded_at' => now()]);
        }

        $store->batchSet($items);
        $this->info("Indexed " . count($items) . " articles");
    }
}

Search controller

<?php
// app/Http/Controllers/SearchController.php

namespace App\Http\Controllers;

use App\Services\EmbeddingService;
use Illuminate\Http\Request;
use MemVectorStore;

class SearchController extends Controller
{
    public function __invoke(
        Request $request,
        MemVectorStore $store,
        EmbeddingService $embeddings,
    ) {
        $request->validate(['q' => 'required|string|max:500']);

        $queryVector = $embeddings->embed($request->input('q'));
        $results = $store->search($queryVector, 10);

        // Hydrate results with full models
        $articleIds = array_map(function ($result) {
            $meta = json_decode($result['metadata'], true);
            return $meta['id'];
        }, $results);

        $articles = \App\Models\Article::whereIn('id', $articleIds)->get()
            ->keyBy('id');

        $ranked = array_map(function ($result) use ($articles) {
            $meta = json_decode($result['metadata'], true);
            return [
                'article' => $articles[$meta['id']] ?? null,
                'score'   => $result['score'],
            ];
        }, $results);

        return view('search.results', [
            'query'   => $request->input('q'),
            'results' => $ranked,
        ]);
    }
}

Route

// routes/web.php
Route::get('/search', SearchController::class)->name('search');

Vector search is 0.1–5ms in PHP-FPM. The embedding API call adds ~100ms.

OpenSwoole vs PHP-FPM comparison

	OpenSwoole	PHP-FPM + Laravel
Model loading	Once per worker (persistent)	Per-request or external API
Vector store	Shared memory across workers	Disk-backed mmap
Embeddings	Local GGUF models, 5–15 ms	External API, 50–200 ms
Search	0.1–5 ms	0.1–5 ms
Total latency	10–30 ms	100–250 ms
Per-token cost	None	API pricing

Vector search speed is the same either way. OpenSwoole saves on embedding latency and API costs.

Installation

The easiest way to install is via PIE:

pie install memvector/ext-memvector

Or build from source:

# Basic installation
phpize && ./configure --enable-memvector && make && make install

# With local embedding support (requires llama.cpp)
phpize && ./configure --enable-memvector --with-llama=/usr/local && make && make install

Add to your php.ini:

extension=memvector

Verify it's loaded:

php -m | grep memvector

If you want local embeddings, download a model. all-MiniLM-L6-v2 is a good starting point at 24 MB and 384 dimensions:

curl -L -o /models/all-MiniLM-L6-v2.Q8_0.gguf \
  https://huggingface.co/leliuga/all-MiniLM-L6-v2-GGUF/resolve/main/all-MiniLM-L6-v2.Q8_0.gguf

The project is on GitHub: github.com/memvector/ext-memvector

MemVector: Vector Search, Embeddings and RAG in a PHP Extension

What MemVector does

What it supports

Performance

Vector calculation without PHP arrays

PHP array approach vs MemVector

MemVector with OpenSwoole

Semantic search API example

Two-stage search with reranking

WebSocket example

MemVector with PHP-FPM and Laravel

Service provider

Embedding service

Indexing documents with an Artisan command

Search controller

Route

OpenSwoole vs PHP-FPM comparison

Installation

Community

Open Swoole Documentation

OpenSwoole Components

Open Swoole Server

PHP Coroutine

Coroutine Clients

Open Swoole Multiprocessing

Open Swoole Community