Sequence Extraction
Extract sequences from alignment files based on sequence IDs or regular expression.
Steps
- Select the
Sequences
button from the navigation bar. - Select
Extract sequences
from the dropdown menu. - Add the input files by clicking the
Add file
button. On desktop platforms, you can also input a directory by clicking theAdd directory
button. The app will look for matching files in the directory. - All the input files will be displayed in the input tab bar. You can remove the file by clicking the
Remove
button. Removing the file will only remove it from input list and not from the file system. - Select input format (optional). See the supported file extensions for the list of supported extensions for alignment files.
- Add extraction parameters.
- Add the output directory by clicking the
Add output directory
button. On mobile platforms, the directory will be the default directory for the app. - Click the
Run
button labeledExtract
to start the task. - Share the output (optional).
Parameters
The app allows you to extract sequences based on the following parameters:
Input ID in a file
The app allows you to input a file containing sequence IDs to extract. The app will use the file to match the sequence ID. The file should be in plain text format. Each line should contain one sequence ID.
seq1
seq2
seq3
Input semi-colon separated IDs
The app will use the sequence IDs to match the sequence ID.
seq1;seq2;seq3
Write regular expression
The regular expression will be used to match the sequence ID. The syntax for the regular expression follows the Rust regex syntax. You can use regex101 to test your regular expression.
Example:
^seq[0-9]+
will match all sequence ID that starts withseq
followed by one or more numbers.^seq[0-9]{2}
will match all sequence ID that starts withseq
followed by exactly two numbers.^rattus
will match all sequence ID that starts withrattus
.
Output file
All matched sequences will be extracted from each input file. The app will create a new file for each input file. For example, if theSet the format in output section to determine the output format.