Skip to main content

Sequence Removal

Remove sequences based on the sequence ID. It is the opposite of sequence extraction. It is faster when the sequence to be removed is less than a half of the total sequences. Available methods:

  • Regular expression
  • List of sequence ID

Steps

Install PySEGUL using pip if you haven't done it yet

pip install pysegul

Create a new Python script, import the library, and write python code

import pysegul

def remove_sequences():
input_dir = 'tests/align-data'
input_format = 'nexus'
datatype = 'dna'
output_format = 'fasta'
output_dir = 'tests/output'
remove = pysegul.SequenceRemoval(
input_format,
datatype,
output_dir,
output_format,
)
remove.input_dir = input_dir
# Using regular expression method
remove.remove_regex("(?i)^(abce)")
# using list of sequence ID method
remove.remove_list(['abce1', 'abce2'])