Go to Submission form



Outline

Tuiuiu removes from a sequence or from a set of sequences areas as large as possible that do not contain researched repeats.

Tuiuiu is used as a preliminary step before applying a multiple local aligner tool.

Publication

Modeling and algorithmic details are provided in the following paper.
Please, cite this paper if you use Tuiuiu.

P. Peterlongo, G. Sacomoto, A. Pereira do Lago, N. Pisanti, M.-F. Sagot
Lossless filter for multiple repeats with bounded edit distance
BMC Algorithms for Molecular Biology 2009, 4:3 doi:10.1186/1748-7188-4-3

Link: http://www.almob.org/content/4/1/3

Why chose Tuiuiu?

You can use Tuiuiu if you search for multiple repeats in a set of large sequence(s). You must have a quite precise idea on the length of repeats and on the degree of similarities between repeats.

Tuiui is convenient both for repeats occurring among several sequences, or for repeats occurring several times inside a unique sequence.

Why NOT chose Tuiuiu?

Tuiuiu is not convenient if you look for short repeats (with length bellow 30 bp), or if you look for repeat with hight degeneracy (distance bigger than 15 %). Moreover, it is not worth to apply Tuiuiu on short sequences that could be directly treated by tools like clustalw, meme, or glam for instance.

In practice

You have a large (set of) sequence(s) in which you are looking for some repeats. You provide this sequence(s) to Tuiuiu and you precise the characteristics of the repeats you are looking for. After computation Tuiuiu gives some informations about the process and the filtration results. Possibly Tuiuiu show errors that occurred. In success case (the more frequent !) Tuiuiu provides a fasta file results containing filtered sequences. This filtered sequences are ready to be used in any other multiple local alignment tool.

Example of use:

We have waited for our set of sequences for months. And, here we are, the sequences eventually arrived. In these sequence, we are looking for all approximate repeats of length 100 that have at least one occurrences in each sequence. Between each occurrence of a repeat we want to authorize at most 10 edit operations (insertion, deletion, or substitution).

Our set of sequences, let's say we have 20 sequences, is too big to be analyzed by classical tools, that would be too long. This is a typical case for Tuiuiu.

Submission form

We suppose in this example, that in a set of 20 DNA sequences, we want to find repeats of length 100 having an occurrence in at least 10 sequences. The edit distance authorized between each pair of such repeats is at most 10.







A tuiuiu is a Stork found in Central and South America.



Image from "http://www.dbi.ufms.br/ecopan/"


By kind permission of Ecologia de Campo III.



        Specify input data

  • You have to indicate if you have a unique sequence (check “Single”) or a set of sequences (check “Multiple”)
  • You may consider only forward strand or both strands. Note that considering the both strands increase the computation time. Thus choose between
    • only forward strand
    • both strands - forward and reverse complement
parameters if input data

       Upload the sequences, either giving a file name, or copying sequences. The sequences must be in fasta format. The sequences used for this example are 20 random sequences, each containing an occurrence of a repetition of length 100 , pairwise distant by at most 10 editing operations (substitutions, insertions and deletions).


  • You cat either Paste a sequence or indicate a file to update. Moreover, Mobyle disposes from a bookmark system. This system enables to reuse several times the same file or to use the output of any Mobyle program as the input.

input data


    Specify the parameters

  • You have to choose between version “good” and version “excellent”. In a few words, excellent is slower and provides better filtration and vice versa. See the paper for precisions.

  • L, r, d and k. In this example we look for repeats whose occurrences are of length L=100, occurring at least in r=10 sequences (or r=10 times in the sequence if you used “Single” sequence), with a maximal edit distance equal to d=10.

    Furthermore, we keep the default k parameter that provide a good balance between specificity and speed.

parameters


        Choose the output format

  • Tuiuiu can
    • Replace each removed characters by a 'N'

    • Replace all set of consecutive removed characters by one 'N'

    • Do not replace removed characters (concatenate results)


display



    email (mandatory)

You have to provide an email for performing jobs. If the computations takes more than 10 seconds a support email is sent.

email

       Then, run ! run





Results page


results page


After you pressed “Run”, you'll reach the result page. Remember that if computation took
more than 10 seconds, you will receive an email linking to the result page.

If a problem occurred, you'll find a field describing what happened.
Else, if all was fine, you'll find the following fields:

  • Informations about the process, that provides informations about:

    • Input data

    • parameters used

    • Positions conserved - between '[' and ']' -

    • Computations informations (Size and percentage kept by Tuiuiu, execution time)

  • Filtered sequences in fasta format. Note that this file can be bookmarked for further use either by Tuiuiu or by any program housed by mobyle server. Of course I may also download this file.

  • A Job archive link that enable to download a zip containing all results presented on the results page.



For any remarks or questions, don't hesitate to send an email to Pierre Peterlongo pierre.peterlongo@irisa.fr