Skip to content

RavenDB-26046 - Add CDC Sink documentation#2387

Open
ayende wants to merge 15 commits intoravendb:mainfrom
ayende:claude/cdc-sink-docs-main
Open

RavenDB-26046 - Add CDC Sink documentation#2387
ayende wants to merge 15 commits intoravendb:mainfrom
ayende:claude/cdc-sink-docs-main

Conversation

@ayende
Copy link
Copy Markdown
Member

@ayende ayende commented Apr 3, 2026

Summary

Adds full documentation for the new CDC Sink ongoing task (RavenDB 7.2, RavenDB-26046).

Core pages (16): overview, how-it-works, schema-design, embedded-tables, linked-tables, column-mapping, patching, delete-strategies, property-retention, attachment-handling, configuration-reference, api-reference, monitoring, failover-and-consistency, troubleshooting, server-configuration

PostgreSQL pages (9): prerequisites-checklist, wal-configuration, permissions-and-roles, initial-setup, replica-identity, replica-identity-manual-setup, cleanup-and-maintenance, monitoring-postgres, studio-ui

PostgreSQL examples (4): simple-migration, denormalization, event-sourcing, complex-nesting

SQL Server (1): overview stub

Key topics covered:

  • CdcColumnMapping with Column, Name, and CdcColumnType (Default, Json, Attachment)
  • Embedded tables, linked tables, multi-level nesting, relation types
  • JavaScript patches, $row, $old, load(), OnDelete strategies
  • GUID-based slot/publication naming, auto ALTER PUBLICATION
  • Initial load sequence, CDC streaming, failover behavior
  • Error threshold and exponential backoff
  • REST API endpoints, server configuration keys
  • PostgreSQL WAL setup, REPLICA IDENTITY, permissions

Test plan

  • Browse to /server/ongoing-tasks/cdc-sink/overview and verify sidebar navigation
  • Spot-check code samples render correctly (C#, SQL, JavaScript)
  • Verify PostgreSQL and SQL Server subsections appear under CDC Sink in sidebar

ayende added 12 commits April 3, 2026 00:44
Adds full CDC Sink ongoing task documentation in Docusaurus MDX format:

- 16 core pages: overview, how-it-works, schema-design, embedded-tables,
  linked-tables, column-mapping, patching, delete-strategies,
  property-retention, attachment-handling, configuration-reference,
  api-reference, monitoring, failover-and-consistency, troubleshooting,
  server-configuration
- 9 PostgreSQL pages: prerequisites-checklist, wal-configuration,
  permissions-and-roles, initial-setup, replica-identity,
  replica-identity-manual-setup, cleanup-and-maintenance,
  monitoring-postgres, studio-ui
- 4 PostgreSQL examples: simple-migration, denormalization,
  event-sourcing, complex-nesting
- 1 SQL Server stub: overview
- 4 _category_.json navigation files
…al features

- Replace ColumnsMapping (Dictionary) + AttachmentNameMapping (Dictionary) with
  unified Columns list of CdcColumnMapping { Column, Name, Type } across all files
- Add CdcColumnType enum documentation (Default, Json, Attachment)
- Add REST API endpoints table to configuration-reference
- Add CdcSink.PollIntervalInSec to server-configuration
- Add error handling details to monitoring (threshold, fallback, exponential backoff)
- Add ALTER PUBLICATION auto-fix note to postgres/initial-setup
- Fix how-it-works: sequential scan description, Child Before Parent section
- Fix Startup and Verification: split into per-database subsections
- Update all prose references from ColumnsMapping to Columns list
…chment handling

- Replace all new() shorthand with new CdcColumnMapping() across all files
- attachment-handling: clarify that text columns (text, nvarchar, etc.) as well
  as binary columns can use Type = CdcColumnType.Attachment
…$old documentation

- Add postgres/type-mapping.mdx: full reference table of PostgreSQL column types
  and their JavaScript/CLR equivalents (scalars, arrays, json/jsonb, bytea, pgvector)

- Add patching.mdx "$row and $old: Names and Types"

- Fix cleanup-and-maintenance.mdx: replace obsolete "Configuration Changes That Rename
  Slots" section (described hash-based naming, no longer accurate) with correct
  "Slot and Publication Names Are Immutable" section reflecting enforced immutability
- Name → CollectionName on CdcSinkTableConfig across all files
- Remove Type from CdcSinkLinkedTableConfig (linked tables have no relation type)
- Remove Disabled from CdcSinkEmbeddedTableConfig, add LinkedTables
- Add FactoryName table (Npgsql, SqlClient, MySql) to configuration-reference
- Add CdcColumnMapping and CdcColumnType reference sections
- Add put(id, document) and del(id) to patch capabilities
- Add JSON Columns section to column-mapping
- Remove non-existent Array References section from linked-tables
- Remove non-existent Disabling an Embedded Table section from embedded-tables
- Update server-configuration descriptions (MaxBatchSize, MaxFallbackTimeInSec,
  PollIntervalInSec applies to SQL Server only)
- Fix licensing link in overview
- Add SQL Server and MySQL/MariaDB as supported source databases
- postgres/overview.mdx: connection string, logical replication explanation,
  prerequisites summary, section index
- sql-server/overview.mdx: expand from stub to full page with connection string,
  CDC prerequisites, polling behavior, SourceTableSchema default
- mysql/overview.mdx + _category_.json: connection string, binlog prerequisites,
  streaming behavior, required privileges
…roubleshooting sections

- Source Schema Changes: how each database engine handles DDL changes
  on source tables while CDC Sink is running (adding/removing/renaming
  columns, SQL Server capture instance limitations)
- Partial Export/Import and State Loss: @cdc-states collection, recovery
  guidance, SkipInitialLoad workaround, LSN editing risks
ayende added 3 commits April 7, 2026 14:21
…update behavior

Broken links fixed:
- api-reference: send-multiple-operations → what-are-operations
- attachment-handling: what-are-attachments → attachments/overview
- column-mapping, patching: postgres/type-mapping.mdx (nonexistent)
  → cross-reference to patching.mdx#row-column-types

PR comment fixes:
- MySQL overview: rename MyMySqlConnection → MySqlConnection
- Postgres overview: slot/publication names are GUID-based on first
  use, not deterministic hash-based

New content:
- how-it-works: "Updating the Task Configuration" section explaining
  that config changes only apply to new CDC events going forward.
  Existing documents are not retroactively re-processed. To apply
  changes to all documents, delete and recreate the task.
Move schema change documentation from troubleshooting into its own
page with per-engine detail:
- PostgreSQL: auto-detects via RelationMessage, most resilient
- MySQL: detects via TableMapEvent column types, auto-recovers
- SQL Server: requires explicit capture instance procedure (create
  new instance, drain old, then drop)
- Quick reference table, SQL examples, recovery mechanism
- Troubleshooting retains a short summary with link to the new page
MySQL CDC detects changes by column position. Compound ALTER TABLE
statements (add + drop, ADD COLUMN ... AFTER ...) cause positional
shifts that are hard to resolve. Apply one change at a time and let
CDC Sink catch up between each.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant