Skip to main content

Introduction

Thank you for using SEGUL! 🙏🏻

We develop SEGUL (SEquence and Genomic UtiLities) to address the need for a high-performance and accessible phylogenomic tool. It is particularly well-suited for large-scale phylogenomic projects, especially those involving thousands of loci and hundreds of samples. Additionally, SEGUL is capable of handling small Sanger sequences effectively. SEGUL is a practical solution to typical phylogenomic data analyses and a proof of concept for genomic software that scales from smartphones, tablets, and personal computers to high-performance computing clusters. Check out Chan 2024 perspective for an independent review of SEGUL.

info

SEGUL CLI is available on Bioconda. It is available for Linux and MacOS users using ARM or x86_64 CPUs. To install SEGUL, use the following command:

conda install bioconda::segul

Or if you use mamba:

mamba install segul

Learn more about the installation using Bioconda here.

Citation

Handika, H., and J. A. Esselstyn. 2024. SEGUL: Ultrafast, memory-efficient and mobile-friendly software for manipulating and summarizing phylogenomic datasets. Molecular Ecology Resources. https://doi.org/10.1111/1755-0998.13964.

If you are interested in reading the journal article but cannot access it, you can request a copy from the authors on ResearchGate.

Below, we provide a quick start with an overview of using SEGUL. The quick start assumes familiarity with phylogenomic data and some experience using related tools. Detailed instructions on installing SEGUL are available in the installation guidelines. The guidelines are structured by interfaces (i.e., GUI, CLI, or API) organized in sections. Within a section, we provide detailed instructions on how to use each feature. In general, feature guidelines are independent of each other. You should be able to jump to a feature you need for the interface of SEGUL you use without checking out another section. However, we recommend checking the GUI general usage section if you use the GUI app. For the CLI application, we recommend starting with the CLI Overview section. We recommend starting with the API Introduction for Python API. These sections provide a general overview, input and output formats, datatype, and other general information on using the interface. If you have questions, issues, or feedback about the app and its documentation, please refer to the support section. We also welcome collaboration and contribution.

tip

A dropdown menu at the top of each page allows you to navigate to different documentation sections for mobile reading. The sidebar on the right side of the page provides the same functionality for larger screen reading.

Quick Start ⏱️

Platform Support

Desktop

PlatformGUICLI
Linux
MacOS
Windows
Windows Subsystem for Linux (WSL)

Mobile

PlatformGUICLI
iOS
iPadOS
Android

CLI vs GUI vs API

For general users, the choice will be between the command-line (CLI) and graphical interface (GUI) applications. If you answer yes to any of the following questions, use SEGUL CLI. Otherwise, either GUI or CLI will help you.

  1. Are you planning to run the app in a non-GUI environment like an HPC cluster?
  2. Do you run SEGUL as part of a pipeline?
  3. Do you need the utmost efficiency?

You can also install both interfaces and use them interchangeably. They operate in their environment and do not interfere with the other. Depending on workflow scenarios, you can take advantage of the strengths of either one. Learn more about the differences between the CLI and GUI versions here.

For developers, SEGUL is available as a Rust crate, which you can easily integrate into Rust code. The Python library is available as a Python binding to the SEGUL API called PySEGUL. It does not require Rust knowledge to use it. The Rust crate also works with R. However, using it in R requires some Rust knowledge.

Learn more about using SEGUL API here.

tip

If you write a pipeline in Python, we recommend using PySEGUL. It is available in the Python Package Index (PyPI), which will simplify dependency management.

Installation

We recommend downloading GUI apps from your operating system's official store. It is a one-click installation. The fastest installation route for CLI is using the pre-compiled binaries. If you are familiar with installing single executable CLI apps, download the latest release using the links below. The installation guide provides more detailed instructions.

PlatformLinkDescription
Linux ARM64downloadFor Linux on ARM64 architecture (uncommon).
Linux x86_64 (static)downloadFor Linux with old GLIBC version. Common in HPC clusters.
Linux x86_64downloadFor Linux with modern GLIBC version. Most recent Linux distribution, including WSL.
MacOS ARM64downloadFor MacOS on Apple M series.
MacOS x86_64downloadFor MacOS on Intel.
Windows x86_64downloadMost Windows devices. It may work on ARM Windows as well.
tip

If you use conda on Linux or MacOS. You can install SEGUL using it. Ensure you have bioconda channel setup before installing SEGUL. To install SEGUL, use the following command:

conda install bioconda::segul

Or if you use mamba:

mamba install segul

Note that the Conda installation may not work in old Linux distributions often found in HPC clusters. Learn more about the installation using Bioconda here.

Usage Overview

SEGUL has a growing list of features to help you manipulate and summarize your phylogenomic datasets.

FeatureQuick Link
Alignment concatenationCLI / GUI
Alignment conversionCLI / GUI
Alignment filteringCLI / GUI
Alignment splittingCLI / GUI
Alignment partition conversionCLI / GUI
Alignment summary statisticsCLI / GUI
Genomic summary statisticsCLI / GUI
Sequence extractionCLI / GUI
Sequence filteringCLI / GUI feature in development
Sequence ID extractionCLI / GUI
Sequence ID mappingCLI / GUI
Sequence ID renamingCLI / GUI
Sequence removalCLI / GUI
Sequence translationCLI / GUI

Supported File Formats

Supported input formats for Genomic tasks:

File FormatDescriptionSupported extensions
FASTQFor genomic read summary statistics. Support compressed and uncompressed format..fastq, .fq, fastq.gz, fg.gz
FASTAFor contig summary statistics..fasta, .fa, .fna, .fsa, .fas

Supported input and output file formats for Alignment and Sequence tasks:

File FormatDescriptionSupported extensions
FASTAInclude support for interleaved format..fasta, .fa, .fna, .fsa, .fas
PHYLIPSupport relaxed-PHYLIP only. Include support for interleaved format. Learn the differences here..phy, .phylip, .ph
NEXUSInclude support for interleaved format..nexus, .nex, .nxs

Supported input and output partition formats:

File FormatDescriptionSupported extensions
RAxMLRAxML partition file..txt, .part, .partition
NEXUSNEXUS partition file..nexus, .nex, .nxs
info

SEGUL CLI can handle non-standard file extensions that are not listed above. Use the --format option to set the input format. The GUI version will not allow inputting non-standard file extensions. You can change the file extension to one of the supported extensions or use the CLI version.

Example Dataset

We provide sample datasets to help you get started or test the app if you find any issues using your datasets. The datasets include small alignments in SEGUL-supported formats. Due to the large file sizes, we cannot provide genomic datasets. You can download the genomic data from public repositories, such as NCBI SRA.

DatasetLink
FASTADownload
NEXUSDownload
Relaxed-PHYLIPDownload
Interleaved relaxed-PHYLIPDownload
Concatenated NEXUSDownload

GUI Usage

  1. Open the app.
  2. Use the navigation bar to select the feature you want to use. For example, to concatenate alignments, click the "Alignments" button.
  3. Use the dropdown menu to select the task. For alignment concatenation, select "Concatenate alignments."
  4. Click the "Add file" button to add the input files. You can also input a directory on desktop platforms by clicking the "Add directory" button. The app will look for files matching the directory.
  5. The input tab bar displays all the input files. You can remove a file by clicking the "Remove" button. Removing a file will only remove it from the input list, not the file system.
  6. Click the "Add output directory" button to add the output directory. On mobile platforms, this directory will be the app's default directory.
  7. You also need to add the parameters for some tasks. For example, you must add the filtering parameters to filter alignments.
  8. Click the "Run" button to start the task.
  9. Once done, the app will display the output in the output tab bar. You can also tab the file to open it in the app file viewer. The current version only supports plain text and comma-separated (CSV) data.
  10. You can also share the output. There are two share options. The quick share will create a zip file containing the output and share it using the system share sheet. You can also share individual files by clicking the share button on the output viewer.
warning

The mobile version of the GUI has limited capabilities in handling many files. Find out more in the guideline for mobile users.

CLI Usage

The segul command is structured as follows:

segul <command> <subcommand> --option1 value1 --option2 value2

For example, to concatenate alignments:

segul align concat --dir alignments/ --output aln-concat

Some arguments are required, while others are optional. For example, the --output option is optional. The app will use the default output directory if you do not provide it. The command below will concatenate all the alignments in the alignments/ directory and save the output in the Align-Concat directory.

segul align concat --dir alignments/

We recommend using the segul --help option to see the available options for each command, the segul <command> --help option to see the available options for each subcommand, and the segul <command> <subcommand> --help option to see the available options for each subcommand.

Please see the command usage section for more detailed usage information.

tip

Since version 0.19.0, you don't need to specify the input format. The app will automatically detect the format based on the file extension. The lower version only allows auto-detect format for non-directory input. If you have non-standard file extensions, use the --format option to set the input format regardless of the version.

warning

The commands below expect SEGUL version 0.19.0+. If you use a lower version, the subcommand becomes the command, and the dir input option requires a specific input format.

For example:

segul alignment concat -d <input-directory>

In the version lower than 0.19.0, the command above will be:

segul concat -d <input-directory> -f <input-format>
FeatureCommands
Alignment concatenationsegul align concat -d <input-directory>
Alignment conversionsegul align convert -d <input-directory>
Alignment filteringsegul align filter -d <input-directory> <filtering-options>
Alignment splittingsegul align split -d <input-directory>
Alignment partition conversionsegul partition convert -d <input-directory>
Alignment summary statisticssegul align summary -d <input-directory>
Contig summary statisticssegul contig summary -d <input-directory>
Read summary statisticssegul read summary -d <input-directory>
Sequence extractionsegul sequence extract -d <input-directory> <extraction-options>
Sequence filteringsegul sequence filter -d <input-directory> <filtering-options>
Sequence ID extractionsegul sequence id -d <input-directory>
Sequence ID mappingsegul sequence id --map -d <input-directory>
Sequence ID renamingsegul sequence rename -d <input-directory>
Sequence removalsegul sequence remove -d <input-directory>
Sequence translationsegul sequence translate -d <input-directory>
Main helpsegul --help
Command helpsegul <command> --help
Subcommand helpsegul <command> <subcommand> --help

API Usage

SEGUL API is available as a Rust crate. You can use it to develop your application or integrate it with other programming languages. The API is available on crates.io.

To add SEGUL API to your project, you can use the cargo add command:

cd my-project

cargo add segul

Or add manually in the Cargo.toml file:

[dependencies]
segul = "0.*"

If you want to use SEGUL API in Python, we provide a Python binding called PySEGUL. The library allows you to access SEGUL features like using any Python library. No Rust knowledge is needed. Install it using pip:

pip install pysegul
import pysegul

def concat_alignments():
input_dir = 'tests/data'
input_format = 'nexus'
datatype = 'dna'
output_format = 'fasta'
partition_format = 'raxml'
prefix = 'concatenated'
output_dir = 'tests/output'
concat = pysegul.AlignmentConcatenation(
input_format,
datatype,
output_dir,
output_format,
partition_format,
prefix
)
concat.from_dir(input_dir)
# For inputting a list of files instead of a directory
input_paths = ['tests/data/alignment1.nex', 'tests/data/alignment2.nex']
concat.from_files(input_paths)

Most of the PySEGUL features follow the same code pattern except for features that require specific parameters, such as alignment filtering, sequence extraction, and sequence removal. For these features, use a setter to input files or directories. Then, the matching method parameters are used to run the analyses. For example, to extract sequences using regular expression:

import pysegul

def extract_sequences():
input_dir = 'tests/align-data'
input_format = 'nexus'
datatype = 'dna'
output_format = 'fasta'
output_dir = 'tests/output'
extract = pysegul.SequenceExtraction(
input_format,
datatype,
output_dir,
output_format,
)
extract.input_dir = input_dir
extract.extract_regex("(?i)^(abce)")

Learn more about using PySEGUL here.

Additional Resources