-
Notifications
You must be signed in to change notification settings - Fork 35
Expand file tree
/
Copy pathChangeLog
More file actions
234 lines (183 loc) · 9.53 KB
/
Copy pathChangeLog
File metadata and controls
234 lines (183 loc) · 9.53 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
2026-05-16 Timo Lassmann <timolassmann@icloud.com>
* version 3.6.0 - Drop Python 3.9/3.10 support; CI Node 24 upgrade
- Minimum Python version raised to 3.11. Python 3.9 reached EOL in
October 2025 and 3.10 reaches EOL in October 2026; both are now
beyond their supported lifecycle for upstream wheels (pillow,
matplotlib). Wheels and CI matrices now target 3.11 / 3.12 / 3.13
/ 3.14.
- GitHub Actions updated to Node.js 24 successor versions
(checkout v5, setup-python v6, upload/download-artifact v5,
codecov v5) ahead of the 2026-06-02 forced Node 24 switch.
- Drive-by: dropped redundant `brew install cmake` on macOS-latest
runners (cmake is preinstalled).
* version 3.5.2 - Four-mode preset system, threadpool, hardening
Algorithm and presets
- Four mode presets stable for protein and nucleotide:
fast, default, recall, accurate. Per-mode configurations
derived by NSGA-III multi-objective optimisation on
BAliBASE v4 (protein) and BRAliBASE (RNA).
- Unified nucleotide preset (DNA and RNA share one path).
- Ensemble alignment with POAR consensus and per-column /
per-residue confidence scores.
- New --add mode: append sequences to an existing alignment.
- Sparse consistency-bonus matrix for large MSAs (avoids
quadratic memory in column count).
- Removed deprecated 'precise' mode alias (Python only;
never shipped as a wheel).
Parallelism
- Chase-Lev work-stealing thread pool is now the default
parallelism backend; OpenMP is optional (USE_OPENMP=ON).
- macOS wheels no longer link libomp.dylib, resolving the
conda-forge / numpy OpenMP runtime conflict.
Security / robustness
- POAR file parsing now rejects malformed inputs that would
cause integer overflow in pair-count or per-pair entry
count (lib/src/poar.c). Limited threat model (requires
explicit --load-poar) but tightened anyway.
- finalise_alignment is now atomic w.r.t. msa->sequences:
validates all per-sequence linear lengths before swapping
any pointer, so a failure leaves the MSA unchanged.
- kalign_msa_compare wraps finalise_alignment in RUN(); a
failure surfaces cleanly instead of producing a half-
finalised MSA.
- DSSIM stress test now uses KALIGN_MATRIX_AUTO so it
exercises both protein and DNA biotypes correctly.
- Bumped locked dependencies (cryptography, flask, werkzeug,
pillow, pygments, pytest, urllib3, requests, mako, black)
past their CVE fix versions.
Build / tooling
- build.zig updated for zig 0.16 (four cross-compile targets).
- tests/check-local.sh: one-command pre-push gate that runs
zig build + native ctest + Linux ASAN container + pytest.
- Containerfile.memcheck (Ubuntu + ASAN + Valgrind) for local
Linux memory-bug reproduction.
- Removed paper-side benchmark machinery, optimizer scripts,
and design-rationale PRDs from the public tree (preserved
in the manuscript repository for reproducibility).
- Cleaned up ~550 lines of dead code (coretralign,
bitShiftRight256ymm, unused split() in bisectingKmeans).
- Build is now warning-free on clang and GCC.
- Dropped the broken benchmark CI workflow (BAliBASE download
endpoint has been gone for months).
* version 3.5.1 - Bugfix release
- Fix memory leak in build_tree_from_pairwise (realign/ensemble)
- Move seaborn from core to optional dependency
- Fix benchmark workflow to handle unavailable datasets gracefully
- Black formatting fixes
* version 3.5.0 - Three modes
Kalign now has three modes: default (best general-purpose), fast
(same as v3.4), and precise (ensemble, highest accuracy, ~10x slower).
- Ensemble alignment (--precise or --ensemble N)
- Per-column confidence scores from ensemble mode
- Ensemble consensus save/load (--save-poar, --load-poar)
- Alignment refinement (--refine)
- PFASUM substitution matrices (--type pfasum, pfasum43, pfasum60)
- stdin via -i - convention (samtools/bcftools style)
- Python align() now supports all parameters (mode, ensemble, etc.)
- License changed from GPL-3.0-or-later to Apache-2.0
- Fixed crash in Python align() on Linux (uninitialized MSA fields)
- Fixed selenocysteine (U) handling in reduced alphabet (Debian #1127766)
2026-02-10 Timo Lassmann <timolassmann@icloud.com>
* version 3.4.9
- Updated documentation to use kalign-python package name
- Added claude code release command
2024-04-24 Timo Lassmann <timolassmann@icloud.com>
* version 3.4.1
- Fixed an issue when kalign is given hundreds of identical sequences.
- added build.zig
2023-12-10 Timo Lassmann <timolassmann@icloud.com>
* version 3.4.0
- Added a simple sequence simulator for testing
- Fixed an issue where alignments would be slighly different
depending on the number of threads used.
2022-11-05 Timo Lassmann <timolassmann@icloud.com>
* version 3.3.5
- Added a check to find and remove sequences of length 0.
2022-10-28 Timo Lassmann <timolassmann@icloud.com>
* version 3.3.4 - Cmake and more
- switched to cmake
Added:
1) a Kalign library to make it easier to use Kalign from another projects
2) a block version of Gene Myers bit parallel string matching code (described
here: Myers, Gene. "A fast bit-vector algorithm for approximate string matching
based on dynamic programming." Journal of the ACM (JACM) 46.3 (1999): 395-415).
This means Kalign will now run equivalently on processors with and without AVX2
instructions (e.g. Apple M1 / M2 and ARM chips).
3) alignment types giving users more control over alignment parameters.
4) multi-threading
2022-03-21 Timo Lassmann <timolassmann@icloud.com>
* version 3.3.2 - Bug Fix
There was a bug in building a guide tree from highly similar sequences. The fix
was involved distributing identical sequences equally among branches. This only happened
when there were thousands of identical sequences.
In addition Kalign now compiles on Apple's M1 chip and possibly on other ARM architectures
as well (although I did not test the latter).
2021-04-16 Timo Lassmann <timolassmann@icloud.com>
* version 3.3.1 - Bug Fix
The previous version kalign checked the top 50 sequences in inputs to determine
whether the sequences are aligned or not. If the first 50 sequences are not aligned,
but following sequences contain gaps (or other characters!) kalign can crash. In this
version (3.3.1) kalign checks all sequences, thereby avoiding this issue.
To alert users to the situation described above and to warn users about the presence of
odd characters, kalign now produces a warning message like this:
[Date Time] : LOG : Start io tests.
[Date Time] : LOG : reading: dev/data/a2m.good.1
[Date Time] : LOG : Detected protein sequences.
[Date Time] : WARNING : -------------------------------------------- (rwalign.c line 505)
[Date Time] : WARNING : The input sequences contain gap characters: (rwalign.c line 506)
[Date Time] : WARNING : "-" : 36 found (rwalign.c line 510)
[Date Time] : WARNING : BUT the sequences do not seem to be aligned! (rwalign.c line 514)
[Date Time] : WARNING : (rwalign.c line 515)
[Date Time] : WARNING : Kalign will remove the gap characters and (rwalign.c line 516)
[Date Time] : WARNING : align the sequences. (rwalign.c line 517)
[Date Time] : WARNING : -------------------------------------------- (rwalign.c line 518)
2020-11-06 Timo Lassmann <timolassmann@icloud.com>
* version 3.3 - Threading and more
- Kalign now runs pairwise distance estimation, guide tree building and alignments in parallel.
- Memory optimisations.
- Optimised bi-sectional K-means algorithm.
- added -clean option to check for sequences with identical names but different sequences.
- fixed minor bug in alignment I/O module
2020-09-24 Timo Lassmann <timolassmann@icloud.com>
* version 3.2.7 - Development version
- dynamic programming in now more modular.
- fixed rare bug in alignment input / output
- added gap parameters (--gpo, --gpe, --tgpe)
- for protein alignment I now use the CorBLOSUM66_13plus matrix from:
Hess M, Keul F, Goesele M, Hamacher K.
Addressing inaccuracies in BLOSUM computation improves homology search performance.
BMC bioinformatics. 2016 Dec 1;17(1):189.
with the empirically derived gap penalties.
2020-04-22 Timo Lassmann <timolassmann@icloud.com>
* version 3.2.5
- Bug fix: when given long output named the first lines in msf
output could be truncated.
2020-04-01 Timo Lassmann <timolassmann@icloud.com>
* version 3.2.4
- Fixed issue relating to stdin input on clusters.
- Added more sanity checks
2020-03-16 Timo Lassmann
* version 3.2.3
- replaced timing code with code from the easel lib.
2020-02-23 Timo Lassmann
* version 3.2.2
- Fixed minor bug in rwaln test routine. It assumed that input alignments
were correctly formatted (which was not true for one test case). The
kalign executable was never affected by this.
- Added a script to test a few alignments.
2020-02-22 Timo lassmann
* version 3.2.1
minor bug fix removed "-lrt" required for old glibc versions and replaced
with a search in configure.ac: AC_SEARCH_LIBS([clock_gettime],[rt])
2020-02-15 Timo Lassmann
* version 3.2.0
Added support for reading sequences from standard input:
cat file.fasta | kalign -f fasta | ....
Added support for combining multiple input files into one alignment:
kalign sequencesA.fa sequencesB.fa > msa.fa
Also works in combination:
cat file.fasta | kalign sequencesA.fa sequencesB.fa > msa.fa
Minor:
- added m4 macros to enable / disable compiler flags
- added m4 macro for valgrind. Now there is a make target
called check-valgrind that run all tests through valgrind.