Building the Cloud Backend for Solo Vault

May 10, 2026

Counting readers

aws · fastapi · pgvector · rag · cloud-computing

I thought Solo Vault would be the simple part of Solo IDE.

Upload a file. Store some metadata. Make it searchable. Done.

Then we started listing the files an IDE actually needs to remember: PDFs, DOCX files, code snippets, CSVs, JSON configs, screenshots, emails, zipped folders, project notes, and random unsorted blobs from a user's machine. That turned the "simple upload feature" into a real indexing backend.

Solo Vault became the cloud memory layer for Solo IDE. The goal is to let the IDE search a user's own project knowledge, not just the open files in the editor. If an agent needs the design notes, a PDF requirement doc, a previous architecture decision, or a chunk of source code, Vault should be able to retrieve it.

The backend ended up being a mix of local developer infrastructure and AWS production architecture: FastAPI, Celery, Redis, MinIO, PostgreSQL, pgvector, S3, SQS, Step Functions, Lambda, EventBridge, SNS, Cognito, KMS, Secrets Manager, CloudWatch, and API Gateway.

The repo is here: github.com/Sachin1801/solo-vault-backend.

Demo Video

Here is the project demo showing the Solo Vault backend and indexing flow:

Watch the demo on YouTube

What We Were Building

Solo IDE already had a local Vault. It could ingest files on one machine and search them locally. That is useful, but it has limits.

Local-only indexing does not naturally give you cloud sync. It does not give you a central API for remote search. It also ties heavy parsing and embedding work to the desktop app, which is exactly where I do not want long-running jobs to live.

So the cloud version needed to do a few things well:

  1. Accept Vault entries from the IDE.
  2. Store uploaded files in object storage.
  3. Validate and parse many file types.
  4. Chunk content in a repeatable way.
  5. Generate vector embeddings and store them in pgvector.
  6. Keep all rows tenant-scoped by user_id.
  7. Stream progress back while indexing runs.
  8. Map cleanly onto AWS services for the cloud project.

The local development stack looks like this:

POST /index  (FastAPI)
  -> Redis queue
  -> Celery worker
       -> validate
       -> download
       -> parse
       -> chunk
       -> embed
       -> store

Progress events
  -> Redis pub/sub
  -> WebSocket /ws/{entry_id}

Storage
  -> MinIO for S3-compatible files
  -> PostgreSQL + pgvector for metadata, chunks, and embeddings

That local stack mattered because we could run the whole service without waiting on AWS deploys. MinIO stood in for S3. Redis was the broker, cache, and progress bus. Postgres with pgvector matched the database shape we wanted in the cloud.

The Pipeline

The cleanest part of the project is the indexing pipeline. Every uploaded file becomes a PipelineJob, then moves through the same stages:

validate -> download -> parse -> chunk -> embed -> store

Each stage has one job.

validate checks file size, MIME type, code extension fallbacks, and whether the object exists in S3 or MinIO. Files over 50 MB are rejected. Low-confidence classifier results get pushed into unsorted so we can still index best-effort content instead of pretending the classifier was right.

download pulls the file through a rate-limited S3 client. It also computes a SHA-256 file hash. That hash is used for idempotency: if the same user uploads the same file twice, we can reuse the previously stored chunks instead of parsing and embedding everything again.

parse dispatches by entry kind. PDFs and DOCX files go through document parsing. Images go through OCR. CSV, JSON, YAML, and TOML get turned into schema-style text. Code files include a language header. ZIP archives are unpacked and parsed member by member, with limits so one huge archive does not take over the worker.

chunk is where the IDE and backend contract matters. Chunks have to be deterministic. If the desktop index and the cloud index split text differently, search results become hard to compare and reindexing becomes painful.

The core rule is simple:

CHUNK_SIZE = 500
OVERLAP = 50
ENC = tiktoken.get_encoding("cl100k_base")

Document chunks try to preserve paragraphs. Code chunks try to split around function and class boundaries. Data chunks keep schema information separate from row samples. Images become a single chunk because OCR output is usually short enough.

embed takes the final chunk text and returns vectors.

store writes everything transactionally into Postgres:

vault.entries
vault.chunks
vault.chunks_fts

The important decision was denormalizing user_id into every chunk row. That makes retrieval queries simpler and safer because search can filter directly with WHERE user_id = $1 without joining back through entries first.

Progress Was Not Optional

Indexing can be slow. OCR can be slow. Embedding can be slow. A desktop app should not feel frozen because a PDF is being processed.

So every stage emits progress:

{
  "type": "index_progress",
  "entry_id": "e1",
  "step": "embed",
  "progress_pct": 80,
  "status": "running",
  "message": "Embedding"
}

The FastAPI app exposes WS /ws/{entry_id}. The worker publishes progress into Redis. The WebSocket handler subscribes to progress:{entry_id} and streams updates until the job is done or failed.

This is a small detail, but it changes the product feel. The IDE can show "validating", "parsing", "embedding", and "indexed" instead of a spinner with no explanation.

It also made debugging easier. If a job got stuck, we could tell which stage was responsible without reading every log line first.

The AWS Version

For the cloud project, the same logical pipeline maps onto AWS services.

Solo IDE
  -> API Gateway REST API
  -> Lambda CRUD handlers
  -> S3 upload
  -> SQS indexing queue
  -> Step Functions
       -> validate
       -> download/parse
       -> chunk
       -> embed
       -> store
       -> notify
  -> RDS PostgreSQL + pgvector
  -> EventBridge/SNS/WebSocket progress

The AWS architecture uses more services than the local stack, but each service has a clear job:

  • Cognito handles user auth.
  • API Gateway exposes Vault CRUD, search, and WebSocket routes.
  • S3 stores uploaded files.
  • SQS buffers indexing work.
  • Step Functions gives per-stage execution history and retries.
  • Lambda runs lightweight API and pipeline stages.
  • RDS PostgreSQL + pgvector stores entries, chunks, full-text rows, and vectors.
  • EventBridge and SNS route status notifications.
  • DynamoDB is available for cloud-synced agent sessions.
  • KMS and Secrets Manager handle encryption and secrets.
  • CloudWatch gives logs, metrics, and alarms.

The part I liked most was that the local architecture and AWS architecture share the same mental model. I did not have to explain two different systems. The local version is "FastAPI plus Celery runs the stages." The AWS version is "Step Functions runs the stages."

What Was Harder Than Expected

The hardest part was not writing a route or parsing a file. It was keeping contracts stable across the desktop app, local indexer, cloud pipeline, and database schema.

Chunking is one example. A 500-token chunk with 50-token overlap sounds like a tiny implementation detail. It is not. It decides how search results are shaped. It decides whether local and cloud indexes can be compared. It decides whether reindexing is required after a change.

Embeddings are the bigger version of that problem. The model and vector dimension get baked into:

  • the pgvector column type
  • stored chunk rows
  • cached embeddings
  • query embedding code
  • search ranking behavior

Changing that after indexing real data is not a normal refactor. It is a data migration and a full reindex.

The second hard part was file diversity. A PDF, CSV, Python file, screenshot, and ZIP archive should not all be treated like plain text. The parser layer needed enough structure to preserve useful information without turning into a research project.

The third hard part was failure behavior. If an S3 object disappears between enqueue and validate, the entry should not sit forever in a fake pending state. If parsing fails, the user needs a real failure event. If the same file is uploaded twice, the second job should not waste compute.

Those details are boring until they are missing. Then they become the whole product.

The Team

This was team work. I worked on this with:

The split was roughly:

  • Cloud foundation: VPC, networking, Cognito, API Gateway.
  • Security and data layer: KMS, Secrets Manager, RDS PostgreSQL, pgvector schema.
  • Indexing pipeline: validation, parsing, chunking, embedding, storage, progress.
  • AWS orchestration: SQS, Step Functions, Lambda/ECS extraction, EventBridge, SNS.
  • Demo and QA: dataset uploads, integration tests, benchmarks, docs, and runbooks.

The contributor list in the repo tells the same story. This was not one script glued to one endpoint. It was a real backend with infrastructure, API design, data modeling, asynchronous jobs, and operational concerns.

What I Would Do Differently

I would write the embedding contract in one place before any infra lands. Model name, vector dimension, chunking rules, tokenizer, cache keys, and database schema should all point to the same source of truth.

I would also make the first version of the pipeline event-driven earlier. Celery was great for local development, but Step Functions forces cleaner boundaries. Every stage needs an input shape, output shape, retry behavior, and failure path. That discipline helps even before deployment.

Finally, I would start benchmark tooling sooner. The repo eventually got scripts for bulk indexing and stage timing, but performance questions showed up before the tooling did. For this kind of system, "which stage is slow?" should be easy to answer from day one.

Key Takeaways

  1. A cloud memory layer is not just file upload. The useful part is the indexing contract: parse, chunk, embed, store, and retrieve in a way the IDE can trust.

  2. Deterministic chunking is infrastructure. Once chunks feed embeddings and search, chunking rules become part of the data contract.

  3. Progress events are product quality. Users should know whether a file is validating, parsing, embedding, stored, or failed. It also makes debugging much easier.

  4. pgvector keeps retrieval simple. Metadata, chunks, full-text search rows, and vector search can live in one PostgreSQL system instead of splitting early across multiple databases.

  5. Async pipelines need idempotency. entry_id, file hashes, ON CONFLICT, and user-scoped chunks are what keep retries and duplicate uploads from turning into duplicate data.

  6. Local parity matters. MinIO, Redis, Celery, and Postgres gave us a local version of the cloud system, which made the AWS design much easier to reason about.

Solo Vault started as "make project files searchable." It turned into a useful lesson in cloud backend design: the hard part is not choosing services. The hard part is keeping the contracts clear enough that every service can do one job and hand off to the next one cleanly.