AncestralClust: clustering of divergent nucleotide sequences by ancestral sequence reconstruction using phylogenetic trees
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
AncestralClust : clustering of divergent nucleotide sequences by ancestral sequence reconstruction using phylogenetic trees. / Pipes, Lenore; Nielsen, Rasmus.
In: Bioinformatics, Vol. 38, No. 3, 2022, p. 663-670.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - AncestralClust
T2 - clustering of divergent nucleotide sequences by ancestral sequence reconstruction using phylogenetic trees
AU - Pipes, Lenore
AU - Nielsen, Rasmus
PY - 2022
Y1 - 2022
N2 - Motivation: Clustering is a fundamental task in the analysis of nucleotide sequences. Despite the exponential increase in the size of sequence databases of homologous genes, few methods exist to cluster divergent sequences. Traditional clustering methods have mostly focused on optimizing high speed clustering of highly similar sequences. We develop a phylogenetic clustering method which infers ancestral sequences for a set of initial clusters and then uses a greedy algorithm to cluster sequences.Results: We describe a clustering program AncestralClust, which is developed for clustering divergent sequences. We compare this method with other state-of-the-art clustering methods using datasets of homologous sequences from different species. We show that, in divergent datasets, AncestralClust has higher accuracy and more even cluster sizes than current popular methods.
AB - Motivation: Clustering is a fundamental task in the analysis of nucleotide sequences. Despite the exponential increase in the size of sequence databases of homologous genes, few methods exist to cluster divergent sequences. Traditional clustering methods have mostly focused on optimizing high speed clustering of highly similar sequences. We develop a phylogenetic clustering method which infers ancestral sequences for a set of initial clusters and then uses a greedy algorithm to cluster sequences.Results: We describe a clustering program AncestralClust, which is developed for clustering divergent sequences. We compare this method with other state-of-the-art clustering methods using datasets of homologous sequences from different species. We show that, in divergent datasets, AncestralClust has higher accuracy and more even cluster sizes than current popular methods.
KW - SEARCH
U2 - 10.1093/bioinformatics/btab723
DO - 10.1093/bioinformatics/btab723
M3 - Journal article
C2 - 34668516
VL - 38
SP - 663
EP - 670
JO - Bioinformatics (Online)
JF - Bioinformatics (Online)
SN - 1367-4811
IS - 3
ER -
ID: 291295087