GSE31755, Histone Modifications by ChIP-seq


This track, produced as part of the ENCODE Project, displays maps of histone modifications genome-wide using ChIP-seq in different cell lines. The ChIP-seq method involves first using formaldehyde to cross-link histones and other DNA-associated proteins to genomic DNA within cells. The cross-linked chromatin is subsequently extracted, sheared, and immunoprecipitated using specific antibodies. After reversal of cross-links, the immunoprecipitated DNA is sequenced and mapped to the human reference genome. The relative enrichment of each antibody-target (epitope) across the genome is inferred from the density of mapped fragments.

Chemical modifications (e.g. methylation or acetylation) of the histone proteins present in chromatin influence gene expression by changing how accessible the chromatin is to transcription factors. Shown for each experiment (defined as a particular antibody and a particular cell type) is a track of enrichment for the specifically modified histone (Signal), along with sites that have the greatest enrichment (Peaks). Also included for each cell type is the input signal, which represents the control condition where no antibody targeting was performed. In general the following chemical modifications have associated genetic phenotypes:

H3K4me3 and H3K9Ac are considered to be marks of active or potentially active promoter regions. H3K4me1 and H3K27Ac are considered to be marks of active or potentially active enhancer regions. H3K36me3 and H3K79me2 are considered to be marks of transcriptional elongation. H3K27me3 and H3K9me3 are considered to be marks of inactive regions.

For data usage terms and conditions, please refer to and

Overall design

Cells were grown according to the approved ENCODE cell culture protocols. Briefly, cells were crosslinked, chromatin was extracted and sonicated using a Bioruptor sonicator (Diagenode) to an average size of 300-500bp, and individual ChIP assays were performed using antibodies to modified histones. For the K562 and Ntera2 histone ChIP-seq samples, immunoprecipitates were collected using protein G-coupled magnetic beads; a detailed ChIP and library protocol can be found at For the U2OS histone ChIP-seq samples, immunoprecipitates were collected using StaphA cells; a detailed protocol can be found at Library DNA was quantitated using either a Nanodrop or a BioAnalyzer and sequenced on an Illumina GA2.

The sequencing reads were mapped to the genome using the Eland alignment program. ChIP-seq data was scored based on sequence reads (length ~30 bps) that align uniquely to the human genome. From the mapped tags, a signal map of ChIP DNA fragments (average fragment length ~ 200 bp) was constructed where the signal height is the number of overlapping fragments at each nucleotide position in the genome.

For each 1 Mb segment of each chromosome, a peak height threshold was determined by requiring a false discovery rate <= 0.05 when comparing the number of peaks above threshold as compared to the number obtained from multiple simulations of a random null background with the same number of mapped reads (also accounting for the fraction of mapable bases for sequence tags in that 1 Mb segment). The number of mapped tags in a putative binding region is compared to the normalized (normalized by correlating tag counts in genomic 10 kb windows) number of mapped tags in the same region from an input DNA control. Using a binomial test, only regions that have a p-value <= 0.05 are considered to be significantly enriched compared to the input DNA control.


Files downloaded from GEO series: GSE31755
Input file format: SRA


From Human Mar. 2006 (NCBI36/hg18) Assembly

Filename Description Feature GEO-ID
1 GSM788088.sga K562 H3K27me3B H3K27me3B GSM788088
2 GSM788085.sga K562 H3K4me1 H3K4me1 GSM788085
3 GSM788087.sga K562 H3K4me3B H3K4me3B GSM788087
4 GSM788082.sga K562 H3K9acB H3K9acB GSM788082
5 GSM788074.sga K562 Input Input GSM788074
6 GSM788071.sga NT2-D1 H3K27me3B H3K27me3B GSM788071
7 GSM788081.sga NT2-D1 H3K36me3B H3K36me3B GSM788081
8 GSM788083.sga NT2-D1 H3K4me1 H3K4me1 GSM788083
9 GSM788072.sga NT2-D1 H3K4me3B H3K4me3B GSM788072
10 GSM788086.sga NT2-D1 H3K9acB H3K9acB GSM788086
11 GSM788080.sga NT2-D1 H3K9me3 H3K9me3 GSM788080
12 GSM788077.sga NT2-D1 Input Input GSM788077
13 GSM818826.sga PANC-1 H3K27ac H3K27ac GSM818826
14 GSM818827.sga PANC-1 H3K4me1_pAb-037-050 H3K4me1_pAb-037-050 GSM818827
15 GSM818828.sga PANC-1 Input Input GSM818828
16 GSM788073.sga PBMC H3K27me3B H3K27me3B GSM788073
17 GSM788084.sga PBMC H3K4me1 H3K4me1 GSM788084
18 GSM788075.sga PBMC H3K4me3B H3K4me3B GSM788075
19 GSM788079.sga PBMC H3K9me3 H3K9me3 GSM788079
20 GSM788070.sga PBMC Input Input GSM788070
21 GSM788076.sga U2OS H3K36me3B H3K36me3B GSM788076
22 GSM788078.sga U2OS H3K9me3 H3K9me3 GSM788078
23 GSM788069.sga U2OS Input Input GSM788069

Technical Notes

SRA files were downloaded from GEO and processed using the following bash commands:

  1. Extract FASTQ from SRA file:
    fastq-dump SAMPLE.sra
  2. Map reads to genome using Bowtie:
    bowtie --best --strata -m1 --sam -l 36 -n 3 h_sapiens_ncbi36 -q SAMPLE.fastq > SAMPLE.sam
  3. Clean the results from unmapped reads:
    awk 'BEGIN {FS="\t"} $3 != "\*" {print $0}' SAMPLE.sam > SAMPLE_clean.sam
  4. Make BAM file:
    samtools view -bS -o SAMPLE.bam SAMPLE_clean.sam
  5. Sort it:
    samtools sort SAMPLE.bam SAMPLE_sorted
  6. Make BED file:
    bamToBed -i SAMPLE_sorted.bam > SAMPLE.bed
  7. Make SGA file: -s hg18 -f FEATURE < SAMPLE.bed | sort -s -k1,1 -k3,3n -k4,4 | compactsga > SAMPLE.sga


  1. GEO series GSE31755 Histone Modifications by ChIP-seq from ENCODE/Stanford/Yale/Davis/Harvard.