Describe the bug
I'm trying to wrap my head around some impossible mappings, and I am unable to understand WTF is going on... In short, I'm trying to solve the problem where one given variant returns multiple mappings on the other build, caused by multiple transcripts mapping elsewhere. While trying to resolve this using what I thought was logical, I got stuck completely.
To Reproduce
This GRCh37-based duplication returns two GRCh38 mappings. Summarized output:
{
"NC_000009.11:g.95237063_95237068dup": {
"NC_000009.11:g.95237063_95237068dup": {
"g_hgvs": "NC_000009.11:g.95237063_95237068dup",
"hgvs_t_and_p": {
"NM_001012267.3": {
"gap_statement": null,
"primary_assembly_loci": {
"grch37": {
"NC_000009.11": {
"hgvs_genomic_description": "NC_000009.11:g.95237063_95237068dup"
}
},
"grch38": {
"NC_000009.12": {
"hgvs_genomic_description": "NC_000009.12:g.92474781_92474786dup"
}
}
},
"select_status": {
"mane_select": true
},
"t_hgvs": "NM_001012267.3:c.564+94922_564+94927dup"
},
"NM_017680.6": {
"gap_statement": "NM_017680.6 contains 3 fewer bases between c.152_153 than NC_000009.11",
"primary_assembly_loci": {
"grch37": {
"NC_000009.11": {
"hgvs_genomic_description": "NC_000009.11:g.95237063_95237068dup"
}
},
"grch38": {
"NC_000009.12": {
"hgvs_genomic_description": "NC_000009.12:g.92474784_92474786del"
}
}
},
"select_status": {
"mane_select": true
},
"t_hgvs": "NM_017680.6:c.150_153-1dup"
}
}
}
}
}
My (faulty?) logic was that the deep intronic variant is probably wrong. Even though there is no gap_statement given, I assumed that because this is a deep intronic variant, VV had to kind of "guess" where in the intron the variant would end up. However, I don't understand the given mapping for NM_017680.6:c.150_153-1dup. Actually, that cDNA variant seems to make no sense at all. That duplicates all of the intron? The GRCh38 mapping returns NC_000009.12:g.92474784_92474786del (note, a genomic dup maps to a dup of the entire intron, but then a del on the other build), but when I submit that deletion, I don't get the original dup back, either.
See the summarized output for NC_000009.12:g.92474784_92474786del:
{
"NC_000009.12:g.92474784_92474786del": {
"NC_000009.12:g.92474784_92474786del": {
"g_hgvs": "NC_000009.12:g.92474784_92474786del",
"hgvs_t_and_p": {
"NM_001012267.3": {
"gap_statement": null,
"primary_assembly_loci": {
"grch37": {
"NC_000009.11": {
"hgvs_genomic_description": "NC_000009.11:g.95237066_95237068del"
}
},
"grch38": {
"NC_000009.12": {
"hgvs_genomic_description": "NC_000009.12:g.92474784_92474786del"
}
}
},
"select_status": {
"mane_select": true
},
"t_hgvs": "NM_001012267.3:c.564+94925_564+94927del"
},
"NM_017680.6": {
"gap_statement": "NM_017680.6 contains 3 fewer bases between c.152_153 than NC_000009.12",
"primary_assembly_loci": {
"grch37": {
"NC_000009.11": {
"hgvs_genomic_description": "NC_000009.11:g.95237066_95237068del"
}
},
"grch38": {
"NC_000009.12": {
"hgvs_genomic_description": "NC_000009.12:g.92474784_92474786del"
}
}
},
"select_status": {
"mane_select": true
},
"t_hgvs": "NM_017680.6:c.151_153="
}
}
}
}
}
I can understand that a dup returning a del is, in theory, possible, but when I reverse it, I only get dels back! So:
NC_000009.11:g.95237063_95237068dup maps to NM_001012267.3:c.564+94922_564+94927dup and then to NC_000009.12:g.92474781_92474786dup.
NC_000009.11:g.95237063_95237068dup also maps to NM_017680.6:c.150_153-1dup (???) and then to NC_000009.12:g.92474784_92474786del. I can't possibly imagine that the cDNA description is correct.
- Taking that last output,
NC_000009.12:g.92474784_92474786del, this maps to NM_001012267.3:c.564+94925_564+94927del (different mapping as given for the first input) and then to NC_000009.11:g.95237066_95237068del (different mapping as given for the first input).
NC_000009.12:g.92474784_92474786del also maps to NM_017680.6:c.151_153= (different mapping as given for the first input) and then to NC_000009.11:g.95237066_95237068del (different mapping as given for the first input, but the same result now as the other transcript).
In the direction of GRCh38 to GRCh37, at least both liftovers agree. In the GRCh37 to GRCh38 direction, I get two different values, of which I'm pretty sure the second one is incorrect, as the cDNA description is a full intronic duplication. I can't get my head around this.
While this will, most likely, involve lots of coding, can you also check my logic when solving the general issue of having multiple mappings? Is it indeed more likely that deep intronic variants provide lower quality liftovers? Or should I use some other logic? Both transcripts are MANE Select.
Expected behavior
Variant mapping and lifting over should ideally be reversible.
Describe the bug
I'm trying to wrap my head around some impossible mappings, and I am unable to understand WTF is going on... In short, I'm trying to solve the problem where one given variant returns multiple mappings on the other build, caused by multiple transcripts mapping elsewhere. While trying to resolve this using what I thought was logical, I got stuck completely.
To Reproduce
This GRCh37-based duplication returns two GRCh38 mappings. Summarized output:
{ "NC_000009.11:g.95237063_95237068dup": { "NC_000009.11:g.95237063_95237068dup": { "g_hgvs": "NC_000009.11:g.95237063_95237068dup", "hgvs_t_and_p": { "NM_001012267.3": { "gap_statement": null, "primary_assembly_loci": { "grch37": { "NC_000009.11": { "hgvs_genomic_description": "NC_000009.11:g.95237063_95237068dup" } }, "grch38": { "NC_000009.12": { "hgvs_genomic_description": "NC_000009.12:g.92474781_92474786dup" } } }, "select_status": { "mane_select": true }, "t_hgvs": "NM_001012267.3:c.564+94922_564+94927dup" }, "NM_017680.6": { "gap_statement": "NM_017680.6 contains 3 fewer bases between c.152_153 than NC_000009.11", "primary_assembly_loci": { "grch37": { "NC_000009.11": { "hgvs_genomic_description": "NC_000009.11:g.95237063_95237068dup" } }, "grch38": { "NC_000009.12": { "hgvs_genomic_description": "NC_000009.12:g.92474784_92474786del" } } }, "select_status": { "mane_select": true }, "t_hgvs": "NM_017680.6:c.150_153-1dup" } } } } }My (faulty?) logic was that the deep intronic variant is probably wrong. Even though there is no
gap_statementgiven, I assumed that because this is a deep intronic variant, VV had to kind of "guess" where in the intron the variant would end up. However, I don't understand the given mapping forNM_017680.6:c.150_153-1dup. Actually, that cDNA variant seems to make no sense at all. That duplicates all of the intron? The GRCh38 mapping returnsNC_000009.12:g.92474784_92474786del(note, a genomic dup maps to a dup of the entire intron, but then a del on the other build), but when I submit that deletion, I don't get the original dup back, either.See the summarized output for
NC_000009.12:g.92474784_92474786del:{ "NC_000009.12:g.92474784_92474786del": { "NC_000009.12:g.92474784_92474786del": { "g_hgvs": "NC_000009.12:g.92474784_92474786del", "hgvs_t_and_p": { "NM_001012267.3": { "gap_statement": null, "primary_assembly_loci": { "grch37": { "NC_000009.11": { "hgvs_genomic_description": "NC_000009.11:g.95237066_95237068del" } }, "grch38": { "NC_000009.12": { "hgvs_genomic_description": "NC_000009.12:g.92474784_92474786del" } } }, "select_status": { "mane_select": true }, "t_hgvs": "NM_001012267.3:c.564+94925_564+94927del" }, "NM_017680.6": { "gap_statement": "NM_017680.6 contains 3 fewer bases between c.152_153 than NC_000009.12", "primary_assembly_loci": { "grch37": { "NC_000009.11": { "hgvs_genomic_description": "NC_000009.11:g.95237066_95237068del" } }, "grch38": { "NC_000009.12": { "hgvs_genomic_description": "NC_000009.12:g.92474784_92474786del" } } }, "select_status": { "mane_select": true }, "t_hgvs": "NM_017680.6:c.151_153=" } } } } }I can understand that a dup returning a del is, in theory, possible, but when I reverse it, I only get dels back! So:
NC_000009.11:g.95237063_95237068dupmaps toNM_001012267.3:c.564+94922_564+94927dupand then toNC_000009.12:g.92474781_92474786dup.NC_000009.11:g.95237063_95237068dupalso maps toNM_017680.6:c.150_153-1dup(???) and then toNC_000009.12:g.92474784_92474786del. I can't possibly imagine that the cDNA description is correct.NC_000009.12:g.92474784_92474786del, this maps toNM_001012267.3:c.564+94925_564+94927del(different mapping as given for the first input) and then toNC_000009.11:g.95237066_95237068del(different mapping as given for the first input).NC_000009.12:g.92474784_92474786delalso maps toNM_017680.6:c.151_153=(different mapping as given for the first input) and then toNC_000009.11:g.95237066_95237068del(different mapping as given for the first input, but the same result now as the other transcript).In the direction of GRCh38 to GRCh37, at least both liftovers agree. In the GRCh37 to GRCh38 direction, I get two different values, of which I'm pretty sure the second one is incorrect, as the cDNA description is a full intronic duplication. I can't get my head around this.
While this will, most likely, involve lots of coding, can you also check my logic when solving the general issue of having multiple mappings? Is it indeed more likely that deep intronic variants provide lower quality liftovers? Or should I use some other logic? Both transcripts are MANE Select.
Expected behavior
Variant mapping and lifting over should ideally be reversible.