Skip to content

Commit bde5c4c

Browse files
text changes, demo replaced by ann_to_process and warning in the last paragraph removed.
1 parent d3b22c6 commit bde5c4c

1 file changed

Lines changed: 8 additions & 17 deletions

File tree

notebooks/collections_demos/bonemarrowwsi_pediatricleukemia.ipynb

Lines changed: 8 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@
3030
"This notebook introduces the `BoneMarrowWSI-PediatricLeukemia` collection, which is presented in [this preprint](https://www.arxiv.org/pdf/2509.15895) and was recently added to [Imaging Data Commons](https://portal.imaging.datacommons.cancer.gov/).\n",
3131
"\n",
3232
"- **Images**: The `BoneMarrowWSI-PediatricLeukemia` dataset comprises bone marrow aspirate smear WSIs for 246 pediatric cases of leukemia, including acute lymphoid leukemia (ALL), acute myeloid leukemia (AML), and chronic myeloid leukemia (CML). The smears were prepared for the initial diagnosis (i.e., without prior treatment), stained in accordance with the Pappenheim method, and scanned at 40x magnification.\n",
33-
"- **Annotations**: The images have been annotated with rectangular regions of interest (ROI) of the evaluable monolayer area and a total of 45176 cell bounding box annotations have been placed (with few exceptions) within the ROIs. For a subset of 232 ROIs all cells and other haematological structures have been labelled by multiple experts in a consensus labeling approach with 49 distinct (cell type) classes. The consensus labelling approach worked as follows: each bounding box was successively labelled by different experts in so-called \"annotation sessions\" until (a) the bounding box has been labelled by at least two experts, and (b) the most frequent label constitues at least half of all labels given to that bounding box (and is then termed \"consensus class\"). In summary, the following annotations are available: \n",
33+
"- **Annotations**: The images have been annotated with rectangular regions of interest (ROI) of the evaluable monolayer area and more than 40000 cell bounding box annotations have been placed (with few exceptions) within the ROIs. For a subset of them all cells and other haematological structures have additionally been labelled by multiple experts in a consensus labeling approach with 49 distinct (cell type) classes. The consensus labelling approach worked as follows: each bounding box was successively labelled by different experts in so-called \"annotation sessions\" until (a) the bounding box had been labelled by at least two experts, and (b) the most frequent label had constituted at least half of all labels given to that bounding box (and had then been termed \"consensus class\"). In summary, the following annotations are available: \n",
3434
"\n",
3535
" - For each slide: ROI annotations of the monolayer area for each slide\n",
3636
" - For some slides: Unlabeled cell bounding boxes\n",
@@ -103,7 +103,7 @@
103103
},
104104
"source": [
105105
"## Finding the `BoneMarrowWSI-PediatricLeukemia` dataset on IDC\n",
106-
"To access and download image and ANNs files, we utilize the Python package [idc-index](https://github.com/ImagingDataCommons/idc-index)."
106+
"To access and download image and ANNs files, we utilize the Python package [idc-index](https://github.com/ImagingDataCommons/idc-index) and fetch the `ann_index`, specific to DICOM ANN objects."
107107
]
108108
},
109109
{
@@ -164,7 +164,7 @@
164164
"id": "1-ZR8MkhFqKX"
165165
},
166166
"source": [
167-
"Next, let's have a look on the available annotation (ANN) files. The following query collects information about ANN files on series-level from idc-index's `ann_index`."
167+
"Next, we have a look at the available annotation (ANN) files. The following query collects information about ANN files on series-level from idc-index's `ann_index`."
168168
]
169169
},
170170
{
@@ -203,7 +203,7 @@
203203
"* Each slide has \"Monolayer regions of interest for cell classification\" annotations.\n",
204204
"* For some slides, there is one ANN Series with \"Unlabeled cell bounding boxes\", while for others, there are multiple ANN Series containing \"Cell bounding boxes with cell type labels\" for different annotation sessions and the consensus labels.\n",
205205
"\n",
206-
"We will use this knowledge later in this notebook to facilitate filtering directly for labeled or unlabeled cell annotations.\n",
206+
"We will use this knowledge of the **SeriesDescription** later in this notebook to facilitate filtering directly for labeled or unlabeled cell annotations.\n",
207207
"\n"
208208
]
209209
},
@@ -345,7 +345,7 @@
345345
"- **'roi_id'**: the ID of the ROI\n",
346346
"- **'roi_label'**: its label \n",
347347
"- **'roi_coordinates'**: the 2D coordinates in the image coordinate system of the referenced slide level\n",
348-
"- **'reference_SeriesInstanceUID'** and **'reference_SOPInstanceUID'**: the SeriesInstanceUID and SOPInstanceUID of the slide level the annotations refer to.\n"
348+
"- **'reference_SeriesInstanceUID'** and **'reference_SOPInstanceUID'**: the SeriesInstanceUID and SOPInstanceUID of the slide level the annotations refer to. reference_SeriesInstanceUID can either be obtained from ann_index or read from the ANN file directly - for consistency with reference_SOPInstanceUID the later approach was chosen here.\n"
349349
]
350350
},
351351
{
@@ -1109,7 +1109,7 @@
11091109
"- **'cell_label_code_scheme'**: Tuple of code of the cell label and designator of the coding scheme, e.g. (414387006, SCT) which is code 414387006 from SNOMED CT ontology\n",
11101110
"- **'cell_label'**: Code meaning of the cell label defined in cell_label_code_scheme e.g. 'Structure of haematological system'. Have a look at the [SNOMED Browser](https://browser.ihtsdotools.org/?perspective=full&conceptId1=414387006&edition=MAIN&release=&languages=en) for this example.\n",
11111111
"- **'cell_coordinates'**: the 2D coordinates in the image coordinate system of the referenced slide level\n",
1112-
"- **'reference_SeriesInstanceUID'** and **'reference_SOPInstanceUID'**: the SeriesInstanceUID and SOPInstanceUID of the slide level the annotations refer to."
1112+
"- **'reference_SeriesInstanceUID'** and **'reference_SOPInstanceUID'**: the SeriesInstanceUID and SOPInstanceUID of the slide level the annotations refer to. reference_SeriesInstanceUID can either be obtained from ann_index or read from the ANN file directly - for consistency with reference_SOPInstanceUID the later approach was chosen here."
11131113
]
11141114
},
11151115
{
@@ -23914,8 +23914,8 @@
2391423914
}
2391523915
],
2391623916
"source": [
23917-
"# This code may run for 2-3 minutes, if you remove the demo mode please be patient :)\n",
23918-
"labeled_cells = get_cell_annotations(subset='labeled', demo=True)\n",
23917+
"# This code will run longer as you increase the number of ANN files to be processed\n",
23918+
"labeled_cells = get_cell_annotations(subset='labeled', ann_to_process=10)\n",
2391923919
"sorted_cell_labels = labeled_cells.sort_values(by=['reference_SOPInstanceUID', 'cell_id', 'annotation_session'])\n",
2392023920
"display(sorted_cell_labels.style.hide(axis='index')) # don't show row index"
2392123921
]
@@ -23931,15 +23931,6 @@
2393123931
"In the cell below, we catch some of those cases:"
2393223932
]
2393323933
},
23934-
{
23935-
"cell_type": "markdown",
23936-
"metadata": {
23937-
"id": "GUYVO0gWfnfH"
23938-
},
23939-
"source": [
23940-
">[!CAUTION] In the current release, labels from the annotation sessions are flawed, please refrain from using them. However, consensus labels/classes are correct!"
23941-
]
23942-
},
2394323934
{
2394423935
"cell_type": "code",
2394523936
"execution_count": null,

0 commit comments

Comments
 (0)