GSE32218, Histone Modifications in two cell types

Description

This track shows probable locations of the specified histone modifications in the given cell types as determined by chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq). Each experiment is associated with an input signal, which represents the control condition where immunoprecipitation with non-specific immunoglobulin was performed in the same cell type. For each experiment (cell type vs. antibody) this track shows a graph of enrichment for histone modification (Signal), along with sites that have the greatest evidence of histone modification, as identified by the PeakSeq algorithm (Peaks).

For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf

Overall design

Cells were grown according to the approved ENCODE cell culture protocols. For details on the chromatin immunoprecipitation protocol used, see Euskirchen et. al., (2007), Rozowsky et. al. (2009) and Auerbach et. al. (2009).

DNA recovered from the precipitated chromatin was sequenced on the Illumina (Solexa) sequencing platform and mapped to the genome using the Eland alignment program. ChIP-seq data was scored based on sequence reads (length ~30 bps) that align uniquely to the human genome. From the mapped tags a signal map of ChIP DNA fragments (average fragment length ~ 200 bp) was constructed where the signal height is the number of overlapping fragments at each nucleotide position in the genome.

For each 1 Mb segment of each chromosome, a peak height threshold was determined by requiring a false discovery rate <= 0.01 when comparing the number of peaks above said threshold to the number of peaks obtained from multiple simulations of a random null background with the same number of mapped reads (also accounting for the fraction of mapable bases for sequence tags in that 1 Mb segment). The number of mapped tags in a putative binding region is compared to the normalized (normalized by correlating tag counts in genomic 10 kb windows) number of mapped tags in the same region from an input DNA control. Using a binomial test, only regions that have a p-value <= 0.01 are considered to be significantly enriched compared to the input DNA control.

Source

Files downloaded from FTP site: ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP008/SRP008276
Input file format: SRA

Samples

From Mouse Jul. 2007 (NCBI37/mm9) Assembly

Filename Description Feature GEO-ID
1 GSM798324.sga MEL H3K4me3 H3K4me3 GSM798324
2 GSM798328.sga MEL H3K4me3 H3K4me3 GSM798328
3 GSM798327.sga CH12 H3K4me3 H3K4me3 GSM798327
4 GSM798323.sga MEL Input treated Input GSM798323
5 GSM798325.sga MEL Input Input GSM798325
6 GSM798326.sga CH12 Input Input GSM798326

Technical Notes

SRA files were downloaded from GEO and processed using the following bash commands:

  1. Extract FASTQ from SRA file:
    fastq-dump SAMPLE.sra
    
  2. Map reads to genome using Bowtie:
    bowtie --sam -l 36 -n 3 mm9 -q SAMPLE.fastq > SAMPLE.sam
    
  3. Clean the results from unmapped reads:
    awk 'BEGIN {FS="\t"} $3 != "\*" {print $0}' SAMPLE.sam > SAMPLE_clean.sam
    
  4. Make BAM file:
    samtools view -bS -o SAMPLE.bam SAMPLE_clean.sam
    
  5. Sort it:
    samtools sort SAMPLE.bam SAMPLE_sorted
    
  6. Make BED file:
    bamToBed -i SAMPLE_sorted.bam > SAMPLE.bed
    
  7. Make SGA file:
    bed2sga.pl -s mm9 -f FEATURE < SAMPLE.bed | sort -s -k1,1 -k3,3n -k4,4 | compactsga > SAMPLE.sga
    

References

  1. GEO series GSE32218 Histone Modifications by ChIP-seq from ENCODE/Stanford/Yale