Skip to content

Commit 3a24f7d

Browse files
committed
Merge branch 'claims' into 'main'
Update claim types Closes #18 and #34 See merge request peerdb/peerdb!43
2 parents dd55929 + 8f76407 commit 3a24f7d

167 files changed

Lines changed: 36643 additions & 8191 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitlab-ci.yml

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,21 +40,23 @@ test_go:
4040

4141
image: golang:1.25-alpine3.22
4242

43+
tags:
44+
- saas-linux-2xlarge-amd64
45+
4346
services:
44-
- name: elasticsearch:7.16.3
47+
- name: "$CI_REGISTRY_IMAGE/elastic/$ELASTIC_VERSION:latest"
4548
alias: elasticsearch
4649
variables:
4750
network.bind_host: 0.0.0.0
4851
network.publish_host: elasticsearch
49-
ES_JAVA_OPTS: "-Xmx1000m"
5052
discovery.type: single-node
5153
xpack.security.enabled: false
5254
ingest.geoip.downloader.enabled: false
5355
cluster.routing.allocation.disk.watermark.flood_stage: "100%"
56+
ES_LOG_STYLE: file
5457
- name: registry.gitlab.com/tozd/docker/postgresql:18
5558
alias: postgres
5659
variables:
57-
LOG_TO_STDOUT: 1
5860
PGSQL_ROLE_1_USERNAME: test
5961
PGSQL_ROLE_1_PASSWORD: test
6062
PGSQL_DB_1_NAME: test
@@ -68,10 +70,16 @@ test_go:
6870
- (cd /go; go install gotest.tools/gotestsum@v1.13.0)
6971
- (cd /go; go install github.com/boumenot/gocover-cobertura@v1.4.0)
7072
- go version
73+
- wget --spider "http://elasticsearch:9200/_cluster/health?wait_for_status=yellow&timeout=120s"
7174

7275
script:
7376
- make test-ci
7477

78+
after_script:
79+
# TODO: This does not really work. We should move services inside a DinD and add docker binary here.
80+
- docker cp elasticsearch:/usr/share/elasticsearch/logs elasticsearch-logs
81+
- docker cp postgres:/var/log/postgresql postgresql-logs
82+
7583
artifacts:
7684
when: always
7785
reports:
@@ -83,6 +91,8 @@ test_go:
8391
- tests.xml
8492
- coverage.html
8593
- coverage.xml
94+
- elasticsearch-logs
95+
- postgresql-logs
8696
expire_in: never
8797

8898
coverage: '/coverage: \d+\.\d+% of statements/'
@@ -119,6 +129,9 @@ lint_go:
119129

120130
image: golang:1.25-alpine3.22
121131

132+
tags:
133+
- saas-linux-2xlarge-amd64
134+
122135
before_script:
123136
- apk --update add make bash gcc musl-dev git
124137
- wget -O- -nv https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(go env GOPATH)/bin v2.5.0
@@ -273,6 +286,9 @@ docker:
273286

274287
image: docker:28-cli
275288

289+
tags:
290+
- saas-linux-2xlarge-amd64
291+
276292
services:
277293
- name: docker:28-dind
278294
command:

.golangci.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,9 @@ linters:
2020
check-type-assertions: true
2121
exhaustruct:
2222
allow-empty: true
23+
goconst:
24+
ignore-string-values:
25+
- "none"
2326
gocritic:
2427
disabled-checks:
2528
- ifElseChain

CLAUDE.md

Lines changed: 28 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,8 @@ Key features:
1313

1414
- **Versioned document store** with full change history
1515
- **Real-time collaboration** with conflict detection
16-
- **Claims-based document schema** supporting 11+ claim types (identifiers, text, relations, amounts, time, files, etc.)
16+
- **Claims-based document schema** supporting 12 claim types (identifiers, strings, HTML, amounts, time, links,
17+
references, etc.)
1718
- **Adaptive search UI** that automatically adjusts to data and provides relevant filters
1819
- **Multi-site support** with separate schemas and indices per site
1920

@@ -162,56 +163,64 @@ Manages append-only logs for real-time collaboration sessions.
162163
- Pagination (max 5000 operations per page)
163164
- PostgreSQL-backed with JSONB columns
164165

165-
**PeerDB usage**: Tracks document editing sessions with conflict detection.
166+
**PeerDB usage**: Tracks document edit sessions with conflict detection.
166167

167168
#### 3. **Document Schema** (`document/`)
168169

169-
Claims-based document system supporting 11 claim types:
170+
Claims-based document system supporting 12 claim types:
170171

171172
- **IdentifierClaim**: External IDs (e.g., Wikidata Q-IDs)
172-
- **TextClaim**: Rich text with HTML in multiple languages
173173
- **StringClaim**: Plain text strings
174-
- **RelationClaim**: Relationships to other documents
175-
- **AmountClaim/AmountRangeClaim**: Numeric values with units
176-
- **TimeClaim/TimeRangeClaim**: Timestamps and time ranges
177-
- **FileClaim**: File references
178-
- **ReferenceClaim**: URL references
179-
- **NoValueClaim/UnknownValueClaim**: Missing data markers
174+
- **HTMLClaim**: Rich text with HTML
175+
- **AmountClaim**: Numeric values with precision
176+
- **AmountIntervalClaim**: Numeric intervals with bounds
177+
- **TimeClaim**: Timestamps with precision
178+
- **TimeIntervalClaim**: Time intervals with bounds
179+
- **LinkClaim**: URL/IRI links
180+
- **ReferenceClaim**: Relationships to other documents
181+
- **HasClaim**: Property-only claim (can hold nested claims via sub-claims)
182+
- **NoneClaim**: Explicitly states no value exists
183+
- **UnknownClaim**: Value exists but is unknown
180184

181185
**Core structure**:
182186

183187
```go
184188
type D struct {
185-
CoreDocument // ID (22-char identifier), Score, Scores
186-
Mnemonic // Human-readable identifier
187-
Claims // ClaimTypes (collections of claims)
189+
CoreDocument // ID (22-char identifier), Base (base for computing ID)
190+
Claims // *ClaimTypes (collections of claims)
191+
}
192+
193+
type CoreClaim struct {
194+
ID identifier.Identifier
195+
Confidence Confidence // float64 in [-1, 1]
196+
Sub *ClaimTypes // optional sub-claims
188197
}
189198
```
190199

191200
**Important patterns**:
192201

193202
- Use the Visitor pattern to traverse/manipulate claims
194-
- Claims reference properties (also documents) via `prop.id`
195-
- Built-in classes and properties defined in `core/` (NAME, etc.)
203+
- Claims reference properties (also documents) via `Prop`
204+
- Each claim embeds `CoreClaim` with ID, Confidence, and optional Sub-claims
205+
- Built-in classes and properties defined in `core/` (NAME, DESCRIPTION, etc.)
196206

197207
#### 4. **Search** (`search/search.go`)
198208

199209
Elasticsearch query builder with session-based filtering.
200210

201211
**Filter types**:
202212

203-
- `RelFilter`: Filter by relation claims
213+
- `RefFilter`: Filter by reference claims
204214
- `AmountFilter`: Filter by numeric ranges
205215
- `TimeFilter`: Filter by time ranges
206-
- `StringFilter`: Text search on string/text claims
207216

208217
**Limits**: Max 1000 search results per query
209218

210219
#### 5. **Storage** (`storage/storage.go`)
211220

212221
Chunked file upload management with begin/append/end lifecycle.
213222

214-
#### 6. **ES Bridge** (`internal/es/bridge.go`)
223+
#### 6. **ES Bridge** (`internal/search/bridge.go`)
215224

216225
Listens to Store changesets and synchronizes to Elasticsearch using bulk indexing.
217226

@@ -293,6 +302,6 @@ Single PeerDB instance can serve multiple sites:
293302

294303
### ElasticSearch Index
295304

296-
- Index configuration generated in internal/mapping package
305+
- Index configuration generated in internal/search package
297306
- Auto-created on first run if missing
298307
- Run `./peerdb populate` to initialize with core PeerDB properties

Makefile

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,11 @@ package-lock.json: package.json
3939
test:
4040
gotestsum --format pkgname --packages ./... -- -race -timeout 10m -cover -covermode atomic
4141

42+
# We use -p 1 to run only one package test and thus test process at a time. This allows us to control the number of
43+
# connections made to a PostgreSQL service to not hit the limit of connections to it. If multiple test processes run
44+
# in parallel, then our current logic where we track connections and pools inside one process is not enough.
4245
test-ci:
43-
gotestsum --format pkgname --packages ./... --junitfile tests.xml -- -race -timeout 10m -coverprofile=coverage.txt -covermode atomic
46+
gotestsum --format pkgname --packages ./... --junitfile tests.xml -- -p 1 -race -timeout 10m -coverprofile=coverage.txt -covermode atomic
4447
gocover-cobertura < coverage.txt > coverage.xml
4548
go tool cover -html=coverage.txt -o coverage.html
4649

0 commit comments

Comments
 (0)