Skip to main content
Version: v4.4.0

Object / Blob Storage

This page documents storage backends, blob upload routing, and core Docker mount behavior.

Scope of this page

  • Focus: object/blob backends, keyspaces, upload/read paths, and storage debugging.
  • Not covered here: relational metadata tables and SQL state modeling (see Database).

Storage backends

  • Embedded (default): embedded SeaweedFS (weed mini) blob storage.
  • External: external S3-compatible object storage.

Metadata database mode (SQLite vs Postgres) is configured separately in Database.

SeaweedFS Compatibility Note (April 16, 2026)

OpenReader currently pins embedded SeaweedFS to 4.18 in CI and Docker builds. 4.19 introduced intermittent InternalError responses on S3 PutObject in our upload flow.

Storage variables are documented in Environment Variables.

Ports

  • 3003: OpenReader app and API routes
  • 8333: Embedded SeaweedFS S3 endpoint for direct browser blob access
info

8333 is only needed for direct browser presigned access to embedded SeaweedFS.

Upload behavior

  • Primary path: browser uploads to presigned URL from /api/documents/blob/upload/presign.
  • Fallback path: /api/documents/blob/upload/fallback when direct upload fails/unreachable.
  • Document read path: direct presigned access from /api/documents/blob/get/presign, with /api/documents/blob/get/fallback as the app-server fallback.
  • Preview path: /api/documents/blob/preview/ensure reports generation status and version; presigned and fallback routes serve the generated artifact.

Browser Cache Storage

The browser may retain reusable document, preview, and TTS audio responses in the versioned openreader-blobs-v1 Cache Storage cache. This is strictly an evictable performance optimization:

  • The server database and object storage remain authoritative.
  • Clearing or losing Cache Storage must not change application correctness.
  • Cache keys are same-origin synthetic identities and are not fetchable server routes.
  • Successful full 200 responses may be cached; partial, opaque, redirect-error, and failed responses are not.
  • Presigned URLs are network sources only and are never used as persistent cache identities.

Synthetic key layouts:

  • /openreader-cache/documents/{documentId}/{contentVersion}
  • /openreader-cache/previews/{documentId}/{previewVersion}
  • /openreader-cache/audio/{audioKey}/{version}
  • /openreader-cache/audiobooks/{bookId}/chapters/{chapterIndex}/{version} when reusable chapter playback is enabled

Explicit audiobook downloads and exports are not persistently cached.

Document previews

  • PDF/EPUB previews are generated server-side and stored in object storage under document_previews_v1.
  • Preview generation is triggered on upload registration and also backfills on first preview request for older docs.
  • Preview artifacts are temporary-cache friendly and can be regenerated from the source document blob.

FS / Volume Mounts

App data mount

  • Target: /app/docstore
  • Recommended: yes, for persistence
  • Purpose: persists SeaweedFS blob data, SQLite metadata DB, migrations, and local runtime temp state
  • Mount string: -v openreader_docstore:/app/docstore

Library source mount (optional)

  • Target: /app/docstore/library
  • Recommended: optional, use read-only (:ro)
  • Purpose: exposes host files as a source for server library import
  • Mount string: -v /path/to/your/library:/app/docstore/library:ro
  • Details: Server Library Import

Private blob endpoint mode

If 8333 is not published externally:

  • Document uploads still work through upload fallback proxy
  • Reads/snippets continue through app API routes
  • Direct presigned browser upload/download to embedded endpoint is unavailable
warning

Without 8333, expect higher app-server traffic because uploads/downloads go through API routes instead of direct object endpoint access.

Audiobook Storage Debug Commands

Audiobook assets are stored in object storage under the audiobooks_v1 keyspace. Use these commands to inspect and download objects for debugging.

# List all audiobook objects
aws s3 ls "s3://$S3_BUCKET/$S3_PREFIX/audiobooks_v1/" --recursive

# Filter to one book id (replace <book-id>)
aws s3 ls "s3://$S3_BUCKET/$S3_PREFIX/audiobooks_v1/" --recursive | grep "<book-id>-audiobook/"

# Download one object by full key
aws s3 cp "s3://$S3_BUCKET/$S3_PREFIX/audiobooks_v1/<path>/<file>.m4b" "./audiobook.m4b"

TTS Segment Storage

Server-side TTS segment audio is stored in object storage under the tts_segments_v1 keyspace.

Typical key layout:

  • ${S3_PREFIX}/tts_segments_v1/users/<url-encoded-user-id>/docs/<document-id>/<document-version>/<settings-hash>/<segment-id>.mp3
  • ${S3_PREFIX}/tts_segments_v1/ns/<test-namespace>/users/<url-encoded-user-id>/docs/... (test namespace mode)

Notes:

  • For the corresponding normalized SQL metadata model (tts_segment_entries and tts_segment_variants), see Database.

Account Deletion Cleanup

Account deletion performs best-effort object cleanup:

  • Document blobs + preview artifacts
  • Audiobook blobs
  • TTS segment blobs under tts_segments_v1

If object deletion fails, account deletion still proceeds and orphaned objects may require manual cleanup.

TTS Segment Storage Debug Commands

Use these commands to inspect segment objects.

# List all TTS segment objects
aws s3 ls "s3://$S3_BUCKET/$S3_PREFIX/tts_segments_v1/" --recursive

# Filter to one document id (replace <document-id>)
aws s3 ls "s3://$S3_BUCKET/$S3_PREFIX/tts_segments_v1/" --recursive | grep "/docs/<document-id>/"