Skip to main content

Alignment Trimming (Beta)

Trim alignments based on the proportion of missing data or the number of parsimony informative sites. This feature will filter sites based on the proportion of missing data and the number of parsimony informative sites.

For example:

>seq1
ATGCAT--A
>seq2
ATGCATA--
>seq3
ATGCAA-A-
>seq4
ATGCATA--

If we set the proportion of missing data to 0.5. It will generate an output without the last two sites. The output will be:

>seq1
ATGCAT-
>seq2
ATGCATA
>seq3
ATGCAA-
>seq4
ATGCATA
info

This feature is still in beta. Please report any issues you encounter. For more information, see the Try Beta Features section.

Filtering based on the proportion of missing data

To trim alignments, use the align trim command with the following options:

segul align trim --dir <alignment-dir> --missing-data <value>

For example:

segul align trim --dir alignments/ --missing-data 0.5

Filtering based on parsimony informative sites

To trim based on the minimum threshold of parsimony informative sites, use the --pinf option:

segul align trim --dir <alignment-dir> --pinf <value>

For example, this command will retain sites with at least 50 parsimony informative sites:

segul align trim --dir alignments/ --pinf 50

Amino acid alignment trimming

If the input is amino acid sequences, you need to use the datatype aa option. For example:

segul align trim --dir alignments/ --missing-data 0.5 --datatype aa

Specifying the output directory and format

By default, the output directory is Align-Trim. You can change the output directory name using the --output or -o option. For example, to name the output directory align-trim, use the command below:

segul align trim --dir alignments/ --output align-trim

To specify the output format, use the --output-format or -F option. The default output format is NEXUS. For example, to change the output format to FASTA, use the command below:

segul align trim --dir alignments/ --output-format fasta