In principle, it can be applied to both in-vitro and in-vivo experiments.
Input data is a set of fixed-length sequences (in FASTA format) that are enriched with instances of a motif.
Sequence files can be retrieved via upload or URL in compressed or uncompressed form. Supported file extensions are:
Here, we briefly describe two main in-vitro technologies that are used to characterize the DNA-binding specificities of transcription factors (TFs): SMiLE-seq and HT-SELEX.
SMiLE-seq (Selective Microfluidics-based Ligand Enrichment followed by sequencing)
SMiLE-seq is a new technique for the characterization of DNA-binding proteins in a much faster, more accurate and effcient way. The core of SMiLE-seq is a microfluidic platform that involves capillary loading of in vitro-transcribed and -translated bait TFs, and target double-stranded DNA from a pool of random sequences. The transcription factor (TF) is bound to the surface of the microluidic device by antibodies and some fraction of the DNA binds to the TF. The unbound DNA is expelled by washing, so the bound fraction can be measured. Bound DNA is subjected to high-throughput sequencing and a hidden Markov model (HMM)-based TF motif discovery pipeline for de novo identification of DNA-binding specificities and affinities of different families of full-length TFs and TF dimers.
HT-SELEX (Hight-Throughput Systematic Evolution of Ligands by EXponential Enrichment)
Using purified proteins to select high-affinity binding sites from random libraries in vitro is a very powerful technique. SELEX involves the binding of proteins to a mixure of oligonucleotides containing, in the first round, a random set of DNA sequences of typically 16-24 bp in lenght, flanked by primers that allow PCR amplification. Although higher affinity sites have a higher probability of being bound by the TF, after a single selection most of the bound sequences are still low affinity because they greatly exceed the number of high-affinity sequences. To increase the fraction of high-affinity sites, the bound fraction can be amplified and rebound and those steps repeated as many times as needed. Typically, after several rounds, the selected sites would be cloned and sequenced, often obtaining fewer than 100 independent sites. As sequencing methods are capable of much longer read lengths than the size of a binding site, ligating several sites together and sequencing them all at once, a technique known as SELEX-serial analysis of gene expression (SELEX-SAGE), has made the method much more efficient. SELEX-SAGE is capable of obtaining thousands of sequenced binding sites and requires fewer rounds of selection. By utilizing new sequencing technologies it is possible to derive binding energy profiles from SELEX methods quite efficiently using a method called high-throughput SELEX (HT-SELEX). HT-SELEX consists of several cycles of incubating the DNA-binding protein with a mixture of DNA sequences, enrichment of the bound DNA sequences, sequencing a sample of them and feeding them to the next cycle. An advantage of this approach is that the output (the number of counts observed for each sequence) is digital and there is a sufficient depth of data to allow use of sophisticated staistical methods that provide more accurate models of protein DNA-binding specificity than previously available.