Palindromes, homopolymers and simple repeats .

Description

This series contains features that are directly computed from the genome sequence with ad hoc scripts, including short palindromes, homopolymers and simple repeats. Some of these features appear to be enriched or depleted in certains regions. For instance, palindromes are enriched in the regulatory regions of certain species. Simple repeats tend to be depleted in conserved non-coding regions.

Source

Samples

From H. sapiens (Feb 2009 GRCh37/hg19).

Sequence-derived:

Filename Description Feature GEO-ID
1 cg.sga CpG dinucleotides CG -
2 wwwwww.sga W-hexamers 6W -
3 ssssss.sga S-hexamers 6S -
4 rrrrrr.sga R(+)/Y(-)-hexamers 6R -
5 mmmmmm.sga M(+)/K(-)-hexamers 6M -
6 aaaaaa.sga hexa-homopolymers (aaaaaa) aaaaaa -
7 ababab.sga 3x2-repeats (ababab) ababab -
8 abcabc.sga 2x3-repeats (abcabc) abcabc -
9 abcxyz.sga hexa-palindromes (abcxyz) abcxyz -
10 abcNxyz.sga hepta-palindromes (abcNxyz) abcNxyz -

Notes on samples:

Technical Notes

The SGA files were generated with the following Perl scripts: available from the MGA script archive at: https://epd.expasy.org/ftp/mga/scripts/

Last update: 1 Oct 2018