Introduction
Thank you for using SEGUL! 🙏🏻
We develop SEGUL (SEquence and Genomic UtiLities) to address the need for a high-performance and accessible phylogenomic tool. It is particularly well-suited for large-scale phylogenomic projects, especially those involving thousands of loci and hundreds of samples. Additionally, SEGUL is capable of handling small Sanger sequences effectively. SEGUL is a practical solution to typical phylogenomic data analyses and a proof of concept for genomic software that scales from smartphones, tablets, and personal computers to high-performance computing clusters. Check out Chan 2024 perspective for an independent review of SEGUL.
Citation
Handika, H., and J. A. Esselstyn. 2024. SEGUL: Ultrafast, memory-efficient and mobile-friendly software for manipulating and summarizing phylogenomic datasets. Molecular Ecology Resources. https://doi.org/10.1111/1755-0998.13964.
If you are interested in reading the journal article but cannot access it, you can request a copy from the authors on ResearchGate.
Navigating the Documentation
Below, we provide a quick start with an overview of using SEGUL. The quick start assumes familiarity with phylogenomic data and some experience using related tools. Detailed instructions on installing SEGUL are available in the installation guidelines. The guidelines are structured by interfaces (i.e., GUI, CLI, or API) organized in sections. Within a section, we provide detailed instructions on how to use each feature. In general, feature guidelines are independent of each other. You should be able to jump to a feature you need for the interface of SEGUL you use without checking out another section. However, we recommend checking the GUI general usage section if you use the GUI app. For the CLI application, we recommend starting with the CLI Overview section. We recommend starting with the API Introduction for Python API. These sections provide a general overview, input and output formats, datatype, and other general information on using the interface. If you have questions, issues, or feedback about the app and its documentation, please refer to the support section. We also welcome collaboration and contribution.
A dropdown menu at the top of each page allows you to navigate to different documentation sections for mobile reading. The sidebar on the right side of the page provides the same functionality for larger screen reading.
Quick Start ⏱️
Platform Support
Desktop
Platform | GUI | CLI |
---|---|---|
Linux | ✅ | ✅ |
MacOS | ✅ | ✅ |
Windows | ✅ | ✅ |
Windows Subsystem for Linux (WSL) | ❌ | ✅ |
Mobile
Platform | GUI | CLI |
---|---|---|
iOS | ✅ | ❌ |
iPadOS | ✅ | ❌ |
Android | ✅ | ❌ |
CLI vs GUI vs API
For general users, the choice will be between the command-line (CLI) and graphical interface (GUI) applications. If you answer yes to any of the following questions, use SEGUL CLI. Otherwise, either GUI or CLI will help you.
- Are you planning to run the app in a non-GUI environment like an HPC cluster?
- Do you run SEGUL as part of a pipeline?
- Do you need the utmost efficiency?
You can also install both interfaces and use them interchangeably. They operate in their environment and do not interfere with the other. Depending on workflow scenarios, you can take advantage of the strengths of either one. Learn more about the differences between the CLI and GUI versions here.
For developers, SEGUL is available as a Rust crate, which you can easily integrate into Rust code. The Python library is available as a Python binding to the SEGUL API called PySEGUL. It does not require Rust knowledge to use it. The Rust crate also works with R. However, using it in R requires some Rust knowledge.
Learn more about using SEGUL API here.
If you write a pipeline in Python, we recommend using PySEGUL. It is available in the Python Package Index (PyPI), which will simplify dependency management.
Installation
We recommend downloading GUI apps from your operating system's official store. It is a one-click installation. The fastest installation route for CLI is using the pre-compiled binaries. If you are familiar with installing single executable CLI apps, download the latest release using the links below. The installation guide provides more detailed instructions.
Platform | Link | Description |
---|---|---|
Linux ARM64 | download | For Linux on ARM64 architecture (uncommon). |
Linux x86_64 (static) | download | For Linux with old GLIBC version. Common in HPC clusters. |
Linux x86_64 | download | For Linux with modern GLIBC version. Most recent Linux distribution, including WSL. |
MacOS ARM64 | download | For MacOS on Apple M series. |
MacOS x86_64 | download | For MacOS on Intel. |
Windows x86_64 | download | Most Windows devices. It may work on ARM Windows as well. |
If you use conda on Linux or MacOS. You can install SEGUL using it. Ensure you have bioconda channel setup before installing SEGUL. To install SEGUL, use the following command:
conda install bioconda::segul
Or if you use mamba:
mamba install segul
Note that the Conda installation may not work in old Linux distributions often found in HPC clusters. Learn more about the installation using Bioconda here.
Usage Overview
SEGUL has a growing list of features to help you manipulate and summarize your phylogenomic datasets.
Feature Quick Links
Feature | Quick Link |
---|---|
Alignment concatenation | CLI / GUI |
Alignment conversion | CLI / GUI |
Alignment filtering | CLI / GUI |
Alignment splitting | CLI / GUI |
Alignment partition conversion | CLI / GUI |
Alignment summary statistics | CLI / GUI |
Genomic summary statistics | CLI / GUI |
Sequence extraction | CLI / GUI |
Sequence filtering | CLI / GUI feature in development |
Sequence ID extraction | CLI / GUI |
Sequence ID mapping | CLI / GUI |
Sequence ID renaming | CLI / GUI |
Sequence removal | CLI / GUI |
Sequence translation | CLI / GUI |
Supported File Formats
Supported input formats for Genomic tasks:
File Format | Description | Supported extensions |
---|---|---|
FASTQ | For genomic read summary statistics. Support compressed and uncompressed format. | .fastq , .fq , fastq.gz , fg.gz |
FASTA | For contig summary statistics. | .fasta , .fa , .fna , .fsa , .fas |
Supported input and output file formats for Alignment and Sequence tasks:
File Format | Description | Supported extensions |
---|---|---|
FASTA | Include support for interleaved format. | .fasta , .fa , .fna , .fsa , .fas |
PHYLIP | Support relaxed-PHYLIP only. Include support for interleaved format. Learn the differences here. | .phy , .phylip , .ph |
NEXUS | Include support for interleaved format. | .nexus , .nex , .nxs |
Supported input and output partition formats:
File Format | Description | Supported extensions |
---|---|---|
RAxML | RAxML partition file. | .txt , .part , .partition |
NEXUS | NEXUS partition file. | .nexus , .nex , .nxs |
SEGUL CLI can handle non-standard file extensions that are not listed above. Use the --format
option to set the input format. The GUI version will not allow inputting non-standard file extensions. You can change the file extension to one of the supported extensions or use the CLI version.
Example Dataset
We provide sample datasets to help you get started or test the app if you find any issues using your datasets. The datasets include small alignments in SEGUL-supported formats. Due to the large file sizes, we cannot provide genomic datasets. You can download the genomic data from public repositories, such as NCBI SRA.
Dataset | Link |
---|---|
FASTA | Download |
NEXUS | Download |
Relaxed-PHYLIP | Download |
Interleaved relaxed-PHYLIP | Download |
Concatenated NEXUS | Download |
GUI Usage
- Open the app.
- Use the navigation bar to select the feature you want to use. For example, to concatenate alignments, click the "Alignments" button.
- Use the dropdown menu to select the task. For alignment concatenation, select "Concatenate alignments."
- Click the "Add file" button to add the input files. You can also input a directory on desktop platforms by clicking the "Add directory" button. The app will look for files matching the directory.
- The input tab bar displays all the input files. You can remove a file by clicking the "Remove" button. Removing a file will only remove it from the input list, not the file system.
- Click the "Add output directory" button to add the output directory. On mobile platforms, this directory will be the app's default directory.
- You also need to add the parameters for some tasks. For example, you must add the filtering parameters to filter alignments.
- Click the "Run" button to start the task.
- Once done, the app will display the output in the output tab bar. You can also tab the file to open it in the app file viewer. The current version only supports plain text and comma-separated (CSV) data.
- You can also share the output. There are two share options. The quick share will create a zip file containing the output and share it using the system share sheet. You can also share individual files by clicking the share button on the output viewer.
The mobile version of the GUI has limited capabilities in handling many files. Find out more in the guideline for mobile users.
CLI Usage
The segul
command is structured as follows:
segul <command> <subcommand> --option1 value1 --option2 value2
For example, to concatenate alignments:
segul align concat --dir alignments/ --output aln-concat
Some arguments are required, while others are optional. For example, the --output
option is optional. The app will use the default output directory if you do not provide it. The command below will concatenate all the alignments in the alignments/
directory and save the output in the Align-Concat
directory.
segul align concat --dir alignments/
We recommend using the segul --help
option to see the available options for each command, the segul <command> --help
option to see the available options for each subcommand, and the segul <command> <subcommand> --help
option to see the available options for each subcommand.
Please see the command usage section for more detailed usage information.
Since version 0.19.0, you don't need to specify the input format. The app will automatically detect the format based on the file extension. The lower version only allows auto-detect format for non-directory input. If you have non-standard file extensions, use the --format
option to set the input format regardless of the version.
The commands below expect SEGUL version 0.19.0+. If you use a lower version, the subcommand becomes the command, and the dir input option requires a specific input format.
For example:
segul alignment concat -d <input-directory>
In the version lower than 0.19.0, the command above will be:
segul concat -d <input-directory> -f <input-format>
Feature | Commands |
---|---|
Alignment concatenation | segul align concat -d <input-directory> |
Alignment conversion | segul align convert -d <input-directory> |
Alignment filtering | segul align filter -d <input-directory> <filtering-options> |
Alignment splitting | segul align split -d <input-directory> |
Alignment partition conversion | segul partition convert -d <input-directory> |
Alignment summary statistics | segul align summary -d <input-directory> |
Contig summary statistics | segul contig summary -d <input-directory> |
Read summary statistics | segul read summary -d <input-directory> |
Sequence extraction | segul sequence extract -d <input-directory> <extraction-options> |
Sequence filtering | segul sequence filter -d <input-directory> <filtering-options> |
Sequence ID extraction | segul sequence id -d <input-directory> |
Sequence ID mapping | segul sequence id --map -d <input-directory> |
Sequence ID renaming | segul sequence rename -d <input-directory> |
Sequence removal | segul sequence remove -d <input-directory> |
Sequence translation | segul sequence translate -d <input-directory> |
Main help | segul --help |
Command help | segul <command> --help |
Subcommand help | segul <command> <subcommand> --help |
API Usage
SEGUL API is available as a Rust crate. You can use it to develop your application or integrate it with other programming languages. The API is available on crates.io.
To add SEGUL API to your project, you can use the cargo add
command:
cd my-project
cargo add segul
Or add manually in the Cargo.toml
file:
[dependencies]
segul = "0.*"
If you want to use SEGUL API in Python, we provide a Python binding called PySEGUL. The library allows you to access SEGUL features like using any Python library. No Rust knowledge is needed. Install it using pip:
pip install pysegul
import pysegul
def concat_alignments():
input_dir = 'tests/data'
input_format = 'nexus'
datatype = 'dna'
output_format = 'fasta'
partition_format = 'raxml'
prefix = 'concatenated'
output_dir = 'tests/output'
concat = pysegul.AlignmentConcatenation(
input_format,
datatype,
output_dir,
output_format,
partition_format,
prefix
)
concat.from_dir(input_dir)
# For inputting a list of files instead of a directory
input_paths = ['tests/data/alignment1.nex', 'tests/data/alignment2.nex']
concat.from_files(input_paths)
Most of the PySEGUL features follow the same code pattern except for features that require specific parameters, such as alignment filtering, sequence extraction, and sequence removal. For these features, use a setter to input files or directories. Then, the matching method parameters are used to run the analyses. For example, to extract sequences using regular expression:
import pysegul
def extract_sequences():
input_dir = 'tests/align-data'
input_format = 'nexus'
datatype = 'dna'
output_format = 'fasta'
output_dir = 'tests/output'
extract = pysegul.SequenceExtraction(
input_format,
datatype,
output_dir,
output_format,
)
extract.input_dir = input_dir
extract.extract_regex("(?i)^(abce)")
Learn more about using PySEGUL here.