📄️ Overview
The goals of SEGUL CLI are to be easy to use for beginners and to provide powerful options for experienced users. Some common arguments have short options. Some of them also have default values when they are possible and safe to have. This way, we will save time typing the commands.
📄️ Command Options
SEGUL CLI command is structured this way:
📄️ Alignment Concatenation
SEGUL CLI provides an easy way to concatenate multiple alignments and generate the partition setting simultaneously.
📄️ Alignment Conversion
SEGUL can convert a single file to multiple files in the same directory.
📄️ Alignment Filtering
In a typical phylogenomic workflow, you may want to filter problematic alignments before running a phylogenetic analysis. This feature provides multiple ways to filter alignments.
📄️ Alignment Partition Conversion
SEGUL CLI can convert single and multiple partition files. It can also extract partitions embedded in NEXUS sequence files.
📄️ Alignment Splitting
SEGUL alignment splitting splits a concatenated alignment into multiple alignments based on an input partition.
📄️ Alignment Summary
SEGUL generates different summary statistics for DNA and amino acid sequences. By default, the data type is set to the DNA sequence. In general, the command is as follows:
📄️ Alignment Trimming (Beta)
Trim alignments based on the proportion of missing data or the number of parsimony informative sites. This feature will filter sites based on the proportion of missing data and the number of parsimony informative sites.
📄️ Genomic File Conversion (Beta)
SEGUL currently supports only converting Multiple Alignment Format (MAF).
📄️ Genomic Summary
Since version 0.19.0, segul can calculate summary statistics for raw reads and contiguous sequences.
📄️ Sequence Addition (Beta)
Add sequences to existing sequence files/alignments. Allow adding sequences from multiple sources to multiple destinations. The file formats for the source and destinations can be different, but SEGUL requires matching file names for both to add the sequences. If the destination files are aligned, all the output sequences will be unaligned. We recommend using MAFFT to align the resulting sequence files.
📄️ Sequence Extraction
SEGUL can extract sequences based on the sequence ID in a collection of alignments. You can input the sequence ID in three ways:
📄️ Sequence Filtering
The sequence filtering method works at the sequence level, which differs from the SEGUL alignment filtering feature, which works on the alignment level. Using the alignment filtering feature will filter the entire alignment that does not meet the filtering criteria. However, the sequence filtering feature will remove sequences that do not meet the criteria while retaining the same alignment if at least one sequence is left in the alignment. The feature works on many alignments simultaneously and will never overwrite your original datasets; it will create new files with the filtered sequences.
📄️ Sequence ID Extraction
Often, we need to know what the taxa in our dataset are. The most straightforward command would be:
📄️ Sequence ID Mapping
To map the distribution of your samples across your dataset, you only need to pass --map flag in the finding unique IDs command:
📄️ Sequence Removal
Based on a list of IDs, you can remove sequences in a collection of alignments. This feature is the opposite of the segul extract feature. Removing less than half of the sequences is faster than segul extract.
📄️ Sequence ID Renaming
SEGUL provides an easy way to rename sequence IDs across all your alignments. To use this function, SEGUL requires a list of the original IDs and the names it needs to change. The input IDs can be written in a tabulated format as a comma-delimited file (.csv) or a tab-delimited file (.tsv).
📄️ Sequence Translation
To translate DNA alignment to amino acid:
📄️ Log File
Except for the spinning emoji and the program progress messages, all the terminal output is written in the log file and saved in the current working directory. The log file also includes the time and the log status.