Tucan

This repository provides a pip-installable software package to run Tucan for methylation-based tumor classification in a clinical setting.

System requirements

Python >= 3.9
Dependencies and version constraints are defined in pyproject.toml: https://github.com/UMCUGenetics/tucan/blob/main/pyproject.toml
The software has been tested on Linux (Ubuntu 20.04) and macOS.
No non-standard hardware is required (CPU sufficient; GPU optional).

Installation

Installation time: ~5–10 minutes on a standard desktop computer.

git clone https://github.com/UMCUGenetics/tucan.git
cd tucan
Install the project in 'editable' mode pip install -e .
Download the pretrained model from Hugging Face

pip install -U huggingface_hub
python -c "from src.tucan.download_model import get_model; print(get_model())"

changle dir cd models
zip file zip -r model.zip model

The model path returned by this command should be provided to the -m argument.

Usage

Usage: tucan [-h] [-i INPUT_FILE] [-m MODEL] [-c NUM_CPGS] [-o OUTPUT_FILE] [-s NUM_SAMPLINGS]
                   [-f FILE_TYPE]

Options:
  -h, --help            show this help message and exit
  -i INPUT_FILE, --input_file INPUT_FILE
                        path to input file
  -m MODEL, --model MODEL
                        specify path to model zip you want to use.
  -c NUM_CPGS, --num_CpGs NUM_CPGS
                        specify the number of samples CpG sites (default is to use all available sites).
  -o OUTPUT_FILE, --output_file OUTPUT_FILE
                        path to output file
  -s NUM_SAMPLINGS, --num_samplings NUM_SAMPLINGS
                        Specify the number of random samples of size num_CpGs. Default is 1 random
                        sampling.
  -f FILE_TYPE, --file_type FILE_TYPE
                        input file type 'bed' or 'csv'

Recommendation: For optimal performance, set -c between 10,000 and 20,000 CpG sites. Using too many CpG sites may cause the model to become overconfident and increase the likelihood of misclassification.

Preparing data into the right format: bed or csv

Tucan accepts input in either BED or CSV format, containing the methylation calls.
An example BED file snippet is available here: data/bedExample.bed.

Methylation calls can be extracted from a Nanopore BAM file using modkit.
Detailed instructions for this process are available in the Sturgeon repository.

Note: The Sturgeon implementation determines methylation status at CpG sites based on the Illumina methylation array coordinates, including a ±25 bp window around each CpG site.
This means methylation calls are aggregated not only at the exact array position but also within this surrounding margin.

Demo

A small example dataset is provided in the repository under demo_tucan/Data, along with the corresponding expected output in demo_tucan/Output.

Run the demo:

tucan -i demo_tucan/Data/testSample1.merged_probes_methyl_calls.bed -m <path_to_model> -c 10000 -o demo_tucan/Output/OutcomeTestSample1.csv -s 1 -f csv

Expected runtime: ~1–5 minutes per sample on a standard CPU.

Running Tucan on your data

Extract methylation calls from Nanopore BAM files (e.g. using modkit)
Convert to BED or CSV format
Run Tucan using the command above

Reproducibility

The example dataset and output provided in demo_tucan can be used to verify correct installation and execution of the Tucan software. The full methodology is described in the manuscript Methods section.

Table: Abbreviation - Tumor Type

See the full table: docs/tumor_abbreviation.md

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
data		data
demo_tucan		demo_tucan
docs		docs
src/tucan		src/tucan
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tucan

System requirements

Installation

Usage

Preparing data into the right format: bed or csv

Demo

Run the demo:

Running Tucan on your data

Reproducibility

Table: Abbreviation - Tumor Type

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Tucan

System requirements

Installation

Usage

Preparing data into the right format: bed or csv

Demo

Run the demo:

Running Tucan on your data

Reproducibility

Table: Abbreviation - Tumor Type

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages