Skip to content

Commit 240f723

Browse files
author
alex-omophub
committed
v1.4.0 release
1 parent e5f1cec commit 240f723

5 files changed

Lines changed: 200 additions & 4 deletions

File tree

DESCRIPTION

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,15 @@
11
Package: omophub
22
Title: R Client for the 'OMOPHub' Medical Vocabulary API
3-
Version: 1.3.0
3+
Version: 1.4.0
44
Authors@R: c(
55
person("Alex", "Chen", email = "alex@omophub.com", role = c("aut", "cre", "cph")),
66
person("Observational Health Data Science and Informatics", role = c("cph"))
77
)
88
Description: Provides an R interface to the 'OMOPHub' API for accessing
99
'OHDSI ATHENA' standardized medical vocabularies. Supports concept search,
10-
vocabulary exploration, hierarchy navigation, relationship queries, and
11-
concept mappings with automatic pagination and rate limiting.
10+
semantic search using neural embeddings, concept similarity, vocabulary
11+
exploration, hierarchy navigation, relationship queries, and concept
12+
mappings with automatic pagination and rate limiting.
1213
License: MIT + file LICENSE
1314
URL: https://github.com/omopHub/omophub-R,
1415
https://docs.omophub.com,

NEWS.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,19 @@
1+
# omophub 1.4.0
2+
3+
## New Features
4+
5+
* **Semantic search** (`semantic()`, `semantic_all()`): Natural language concept
6+
search using neural embeddings. Search for clinical intent like "high blood
7+
sugar levels" to find diabetes-related concepts. Supports filtering by
8+
vocabulary, domain, standard concept, concept class, and minimum similarity
9+
threshold. `semantic_all()` provides automatic pagination with progress bar.
10+
11+
* **Similarity search** (`similar()`): Find concepts similar to a reference
12+
concept ID, concept name, or natural language query. Three algorithm options:
13+
`'semantic'` (neural embeddings), `'lexical'` (string matching), and
14+
`'hybrid'` (combined). Configurable similarity threshold with optional
15+
detailed scores and explanations.
16+
117
# omophub 1.3.0
218

319
## New Features

README.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,12 @@ results <- client$search$semantic(
103103

104104
# Fetch all results with auto-pagination
105105
all_results <- client$search$semantic_all("chronic kidney disease", page_size = 50)
106+
107+
# Find concepts similar to a reference concept
108+
similar <- client$search$similar(concept_id = 201826, algorithm = "hybrid")
109+
for (s in similar$similar_concepts) {
110+
cat(sprintf("%s (score: %.2f)\n", s$concept_name, s$similarity_score))
111+
}
106112
```
107113

108114
## Use Cases
@@ -192,7 +198,7 @@ concepts_df %>%
192198
| Resource | Description | Key Methods |
193199
|----------|-------------|-------------|
194200
| `concepts` | Concept lookup and batch operations | `get()`, `get_by_code()`, `batch()`, `suggest()` |
195-
| `search` | Full-text and semantic search | `basic()`, `advanced()`, `semantic()`, `semantic_all()`, `basic_all()` |
201+
| `search` | Full-text and semantic search | `basic()`, `advanced()`, `semantic()`, `semantic_all()`, `similar()`, `basic_all()` |
196202
| `hierarchy` | Navigate concept relationships | `ancestors()`, `descendants()` |
197203
| `mappings` | Cross-vocabulary mappings | `get()`, `map()` |
198204
| `vocabularies` | Vocabulary metadata | `list()`, `get()`, `stats()` |

inst/examples/search_concepts.R

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,108 @@ for (c in concepts) {
174174
}
175175
cat("\n")
176176

177+
# ============================================================================
178+
# Semantic Search
179+
# ============================================================================
180+
181+
cat("7. Semantic search (natural language)\n")
182+
cat("-------------------------------------\n")
183+
184+
# Search using natural language - understands clinical intent
185+
results <- client$search$semantic("high blood sugar levels", page_size = 5)
186+
187+
cat("Semantic results for 'high blood sugar levels':\n")
188+
for (r in results$data$results) {
189+
cat(sprintf(" [%d] %s (similarity: %.2f)\n",
190+
r$concept_id, r$concept_name, r$similarity_score))
191+
}
192+
cat("\n")
193+
194+
# ============================================================================
195+
# Semantic Search with Filters
196+
# ============================================================================
197+
198+
cat("8. Semantic search with filters\n")
199+
cat("-------------------------------\n")
200+
201+
results <- client$search$semantic(
202+
"heart attack",
203+
vocabulary_ids = "SNOMED",
204+
domain_ids = "Condition",
205+
threshold = 0.5,
206+
page_size = 5
207+
)
208+
209+
cat("Filtered semantic results for 'heart attack':\n")
210+
for (r in results$data$results) {
211+
cat(sprintf(" [%d] %s (similarity: %.2f)\n",
212+
r$concept_id, r$concept_name, r$similarity_score))
213+
}
214+
cat("\n")
215+
216+
# ============================================================================
217+
# Auto-Paginated Semantic Search
218+
# ============================================================================
219+
220+
cat("9. Auto-paginated semantic search\n")
221+
cat("----------------------------------\n")
222+
223+
all_results <- client$search$semantic_all(
224+
"chronic kidney disease",
225+
page_size = 10,
226+
max_pages = 3,
227+
progress = FALSE
228+
)
229+
230+
cat(sprintf("Fetched %d concepts for 'chronic kidney disease':\n", nrow(all_results)))
231+
if (nrow(all_results) > 0) {
232+
for (i in seq_len(min(5, nrow(all_results)))) {
233+
cat(sprintf(" [%d] %s\n",
234+
all_results$concept_id[i],
235+
all_results$concept_name[i]))
236+
}
237+
if (nrow(all_results) > 5) {
238+
cat(sprintf(" ... and %d more\n", nrow(all_results) - 5))
239+
}
240+
}
241+
cat("\n")
242+
243+
# ============================================================================
244+
# Similarity Search
245+
# ============================================================================
246+
247+
cat("10. Similarity search\n")
248+
cat("---------------------\n")
249+
250+
# Find concepts similar to Type 2 diabetes mellitus
251+
similar <- client$search$similar(
252+
concept_id = 201826,
253+
algorithm = "hybrid",
254+
similarity_threshold = 0.6,
255+
page_size = 5
256+
)
257+
258+
cat("Concepts similar to 'Type 2 diabetes mellitus':\n")
259+
for (s in similar$similar_concepts) {
260+
cat(sprintf(" [%d] %s (score: %.2f)\n",
261+
s$concept_id, s$concept_name, s$similarity_score))
262+
}
263+
cat("\n")
264+
265+
# Similarity by natural language query
266+
similar <- client$search$similar(
267+
query = "high blood pressure",
268+
algorithm = "semantic",
269+
page_size = 5
270+
)
271+
272+
cat("Concepts similar to 'high blood pressure' (semantic):\n")
273+
for (s in similar$similar_concepts) {
274+
cat(sprintf(" [%d] %s (score: %.2f)\n",
275+
s$concept_id, s$concept_name, s$similarity_score))
276+
}
277+
cat("\n")
278+
177279
# ============================================================================
178280
# Done
179281
# ============================================================================

vignettes/getting-started.Rmd

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,77 @@ results <- client$search$basic(
117117
)
118118
```
119119

120+
## Semantic Search
121+
122+
Search using natural language queries powered by neural embeddings:
123+
124+
```{r semantic-search}
125+
# Natural language search - understands clinical intent
126+
results <- client$search$semantic("high blood sugar levels")
127+
for (r in results$data$results) {
128+
cat(sprintf("%s (similarity: %.2f)\n", r$concept_name, r$similarity_score))
129+
}
130+
```
131+
132+
Filter semantic search results:
133+
134+
```{r semantic-filtered}
135+
results <- client$search$semantic(
136+
"heart attack",
137+
vocabulary_ids = "SNOMED",
138+
domain_ids = "Condition",
139+
threshold = 0.5
140+
)
141+
```
142+
143+
Fetch all semantic search results with automatic pagination:
144+
145+
```{r semantic-all}
146+
all_results <- client$search$semantic_all(
147+
"chronic kidney disease",
148+
page_size = 50,
149+
max_pages = 5,
150+
progress = TRUE
151+
)
152+
print(nrow(all_results))
153+
```
154+
155+
## Similarity Search
156+
157+
Find concepts similar to a reference concept:
158+
159+
```{r similar-by-id}
160+
# Find concepts similar to Type 2 diabetes mellitus
161+
similar <- client$search$similar(concept_id = 201826)
162+
for (s in similar$similar_concepts) {
163+
cat(sprintf("%s (score: %.2f)\n", s$concept_name, s$similarity_score))
164+
}
165+
```
166+
167+
Search by natural language query with different algorithms:
168+
169+
```{r similar-by-query}
170+
# Semantic similarity (neural embeddings)
171+
similar <- client$search$similar(
172+
query = "high blood pressure",
173+
algorithm = "semantic"
174+
)
175+
176+
# Lexical similarity (string matching)
177+
similar <- client$search$similar(
178+
query = "high blood pressure",
179+
algorithm = "lexical"
180+
)
181+
182+
# Hybrid (combined - default)
183+
similar <- client$search$similar(
184+
query = "high blood pressure",
185+
algorithm = "hybrid",
186+
include_scores = TRUE,
187+
include_explanations = TRUE
188+
)
189+
```
190+
120191
## Autocomplete
121192

122193
Get suggestions for autocomplete:

0 commit comments

Comments
 (0)