ENSEMBL67, ORF starts

Description

Transcription Start Sites of ENSEMBL database downloaded from Biomart.

Source

Data have been downloaded from BioMart
Input file format: Tab-delimited TXT

Samples

From Yeast Apr 2011 (NCBI3.1/sacCer3) Assembly

Filename Description Feature GEO-ID
1 sacCer3_ENSEMBL67.sga ORF start from ENSEMBL67 TSS -

Technical Notes

The following attributes have been selected:

  1. Ensembl Transcript ID
  2. Chromosome Name
  3. Strand
  4. Transcript Start (bp)
  5. Transcript End (bp)
  6. Gene Start (bp)
  7. Gene End (bp)
  8. Status (transcript)
  9. Status (gene)
  10. Associated Gene Name
Then, transcrips have been filtered according to the following rules:
  1. Transcript length > 0 [Transcript Start different from Transcript End]
  2. Transcript lies on full chromosomes
This can be archived using the following awk command:

awk -F \\t '
$3 == "1" && $4 != $5 {print "chr" $2 "\tTSS\t" $4 "\t+\t" 1 "\t" $10}
$3 == "-1" && $4 != $5 {print $2 "\tTSS\t" $5 "\t-\t" 1 "\t" $10}
' biomart_output.txt | sort -s -k1,1 -k3,3n -k4,4 | compact_sga.pl > ENSEMBL.sga

The SGA file can than be transformed into an FPS file using sga2fps.pl

References

  1. Haider S, Ballester B, Smedley D, Zhang J, Rice P, Kasprzyk A.
    BioMart Central Portal--unified access to biological data. Nucleic Acids Res. 37:W23-7. PMID: 19420058

Genome browser viewable files

None.