Skip to main content

Alignment Conversion

SEGUL can convert a single file to multiple files in the same directory.

Converting a single file

To convert a single file, use the --input or -i.

segul align convert --input [a-path-to-file] --output-format [sequence-format]

SEGUL should be able to infer the input file format based on the extension of your file. If your file extension is uncommon (for example, .aln instead of .fas), specify the --input-format:

segul align convert --input [a-path-to-file] --input-format [sequence-format] --output-format [sequence-format]

For example, here we will convert a nexus file containing DNA sequences named loci.nexus to fasta format:

segul align convert --input alignments/loci.nexus -output-format fasta

You can ignore specifying the output format if you want the output files in NEXUS format. For example, we will convert loci.fasta to nexus format:

segul align convert --input alignments/loci.fasta

Some segul arguments are available in short format (note the uppercase 'F' for the output format):

segul align convert -i alignments/loci.nexus -f nexus  -F fasta --datatype aa -o alignment-fasta
note

When converting alignments, SEGUL only changes the file extension and maintains the original file names for the output files. This behavior is the same for a single or multiple file format conversion, giving the app the flexibility to convert many files in a single command.

Batch converting sequence files in a directory

We have the option to provide the input files to the program. First, use the --dir or -d input. For a directory input, it is required to specify the --input-format:

segul align convert --dir [path-to-your-repository] --input-format [sequence-format] --output [your-output-dir-name] --output-format [sequence-format]

In short format:

segul align convert -d [path-to-your-repository] -f [sequence-format] -o [your-output-dir-name]

For example, suppose we want to convert all the nexus files in the directory below to fasta formats and name the output directory alignments-fas:

alignments/
├── locus_1.nexus
├── locus_2.nexus
└── locus_3.nexus

The command will be:

segul align convert -d alignments/ -f nexus -F fasta -o alignments-fas

We can also input wildcard (*) using the --input or -i option to achieve the same results:

segul align convert -i alignments/*.nexus -f nexus -F fasta -o alignments-fas

The outputs will be:

alignments-fas/
├── locus_1.fas
├── locus_2.fas
└── locus_3.fas

Converting amino acid sequences

By default, the SEGUL datatype is set to convert DNA sequences. If your file contains amino acid sequences, use the argument --datatype aa. For example:

segul align convert --input alignments/loci.nexus -output-format fasta --datatype aa

Specifying the output directory

By default, segul will write the result in a directory called SEGUL-convert. To specify the output directory, use the --output argument. For example, here, we will specify the output directory to alignment-fasta.

segul align convert --input alignments/loci -output-format fasta --datatype aa --output alignment-fasta

Sorting the output sequences

By default, SEGUL maintains the original order of the sequences in the input file(s). Using the --sort flag, you can sort the sequences in alphabetical order based on their IDs:

segul align convert --input alignments/loci -output-format fasta --datatype aa --output alignment-fasta --sort