GSE30263, CTCF Binding Sites by ChIP-seq from ENCODE/University of Washington

Description

CTCF ChIP-seq data runs on several cell lines. Last update November 2013.

Source

Files downloaded from FTP site: ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP007/SRP007362
Input file format: SRA

Samples

From Human Feb. 2009 (GRCh37/hg19) Assembly

Filename Description Feature GEO-ID
1 GSM1022640.sga A549 CTCF rep1 CTCF GSM1022640
2 GSM1022639.sga A549 CTCF rep2 CTCF GSM1022639
3 GSM749695.sga AG04449 CTCF rep1 CTCF GSM749695
4 GSM749678.sga AG04449 CTCF rep2 CTCF GSM749678
5 GSM749769.sga AG04450 CTCF rep1 CTCF GSM749769
6 GSM1022635.sga AG04450 CTCF rep2 CTCF GSM1022635
7 GSM749750.sga AG09309 CTCF rep1 CTCF GSM749750
8 GSM749680.sga AG09309 CTCF rep2 CTCF GSM749680
9 GSM749728.sga AG09319 CTCF rep1 CTCF GSM749728
10 GSM749723.sga AG09319 CTCF rep2 CTCF GSM749723
11 GSM749714.sga AG10803 CTCF rep1 CTCF GSM749714
12 GSM749759.sga AG10803 CTCF rep2 CTCF GSM749759
13 GSM749666.sga AoAF CTCF rep1 CTCF GSM749666
14 GSM749736.sga AoAF CTCF rep2 CTCF GSM749736
15 GSM1022653.sga BE2_C CTCF rep1 CTCF GSM1022653
16 GSM1022650.sga BE2_C CTCF rep2 CTCF GSM1022650
17 GSM749677.sga BJ CTCF rep1 CTCF GSM749677
18 GSM749752.sga BJ CTCF rep2 CTCF GSM749752
19 GSM749748.sga Caco-2 CTCF rep1 CTCF GSM749748
20 GSM749689.sga Caco-2 CTCF rep2 CTCF GSM749689
21 GSM749708.sga GM06990 CTCF rep1 CTCF GSM749708
22 GSM749705.sga GM06990 CTCF rep2 CTCF GSM749705
23 GSM749711.sga GM12801 CTCF rep1 CTCF GSM749711
24 GSM749676.sga GM12864 CTCF rep1 CTCF GSM749676
25 GSM749762.sga GM12864 CTCF rep2 CTCF GSM749762
26 GSM1022664.sga GM12864 CTCF rep3 CTCF GSM1022664
27 GSM749740.sga GM12865 CTCF rep1 CTCF GSM749740
28 GSM749725.sga GM12865 CTCF rep2 CTCF GSM749725
29 GSM1022636.sga GM12865 CTCF rep3 CTCF GSM1022636
30 GSM849305.sga GM12866 CTCF rep1 CTCF GSM849305
31 GSM849301.sga GM12867 CTCF rep1 CTCF GSM849301
32 GSM849300.sga GM12868 CTCF rep1 CTCF GSM849300
33 GSM849303.sga GM12869 CTCF rep1 CTCF GSM849303
34 GSM849302.sga GM12870 CTCF rep1 CTCF GSM849302
35 GSM849304.sga GM12871 CTCF rep1 CTCF GSM849304
36 GSM749694.sga GM12872 CTCF rep1 CTCF GSM749694
37 GSM749692.sga GM12872 CTCF rep2 CTCF GSM749692
38 GSM1022633.sga GM12872 CTCF rep3 CTCF GSM1022633
39 GSM749730.sga GM12873 CTCF rep1 CTCF GSM749730
40 GSM749686.sga GM12873 CTCF rep2 CTCF GSM749686
41 GSM1022629.sga GM12873 CTCF rep3 CTCF GSM1022629
42 GSM749757.sga GM12874 CTCF rep1 CTCF GSM749757
43 GSM749741.sga GM12874 CTCF rep2 CTCF GSM749741
44 GSM749764.sga GM12875 CTCF rep1 CTCF GSM749764
45 GSM749670.sga GM12875 CTCF rep2 CTCF GSM749670
46 GSM749704.sga GM12878 CTCF rep1 CTCF GSM749704
47 GSM749706.sga GM12878 CTCF rep2 CTCF GSM749706
48 GSM749696.sga HA-sp CTCF rep1 CTCF GSM749696
49 GSM1022668.sga HA-sp CTCF rep2 CTCF GSM1022668
50 GSM1022661.sga HAc CTCF rep1 CTCF GSM1022661
51 GSM1022662.sga HAc CTCF rep2 CTCF GSM1022662
52 GSM749743.sga HBMEC CTCF rep1 CTCF GSM749743
53 GSM749710.sga HBMEC CTCF rep2 CTCF GSM749710
54 GSM749732.sga HCFaa CTCF rep1 CTCF GSM749732
55 GSM1022657.sga HCM CTCF rep1 CTCF GSM1022657
56 GSM1022677.sga HCM CTCF rep2 CTCF GSM1022677
57 GSM749735.sga HCPEpiC CTCF rep1 CTCF GSM749735
58 GSM749745.sga HCPEpiC CTCF rep2 CTCF GSM749745
59 GSM1022652.sga HCT-116 CTCF rep1 CTCF GSM1022652
60 GSM1022651.sga HCT-116 CTCF rep2 CTCF GSM1022651
61 GSM749712.sga HEEpiC CTCF rep1 CTCF GSM749712
62 GSM749726.sga HEEpiC CTCF rep2 CTCF GSM749726
63 GSM749668.sga HEK293 CTCF rep1 CTCF GSM749668
64 GSM749687.sga HEK293 CTCF rep2 CTCF GSM749687
65 GSM1022644.sga HFF CTCF rep1 CTCF GSM1022644
66 GSM1022671.sga HFF-Myc CTCF rep1 CTCF GSM1022671
67 GSM1022669.sga HFF-Myc CTCF rep2 CTCF GSM1022669
68 GSM749688.sga HL-60 CTCF rep1 CTCF GSM749688
69 GSM749753.sga HMEC CTCF rep1 CTCF GSM749753
70 GSM1022631.sga HMEC CTCF rep2 CTCF GSM1022631
71 GSM749665.sga HMF CTCF rep1 CTCF GSM749665
72 GSM749675.sga HMF CTCF rep2 CTCF GSM749675
73 GSM749681.sga HPAF CTCF rep1 CTCF GSM749681
74 GSM749751.sga HPAF CTCF rep2 CTCF GSM749751
75 GSM749699.sga HPF CTCF rep1 CTCF GSM749699
76 GSM749717.sga HPF CTCF rep2 CTCF GSM749717
77 GSM749727.sga HRE CTCF rep1 CTCF GSM749727
78 GSM749737.sga HRE CTCF rep2 CTCF GSM749737
79 GSM749673.sga HRPEpiC CTCF rep1 CTCF GSM749673
80 GSM1022665.sga HRPEpiC CTCF rep2 CTCF GSM1022665
81 GSM749674.sga HUVEC CTCF rep1 CTCF GSM749674
82 GSM749749.sga HUVEC CTCF rep2 CTCF GSM749749
83 GSM1022630.sga HVMF CTCF rep1 CTCF GSM1022630
84 GSM1022628.sga HVMF CTCF rep2 CTCF GSM1022628
85 GSM749729.sga HeLa-S3 CTCF rep1 CTCF GSM749729
86 GSM749739.sga HeLa-S3 CTCF rep2 CTCF GSM749739
87 GSM749715.sga HepG2 CTCF rep1 CTCF GSM749715
88 GSM749683.sga HepG2 CTCF rep2 CTCF GSM749683
89 GSM749690.sga K562 CTCF rep1 CTCF GSM749690
90 GSM749733.sga K562 CTCF rep2 CTCF GSM749733
91 GSM1022658.sga MCF-7 CTCF rep1 CTCF GSM1022658
92 GSM1022663.sga MCF-7 CTCF rep2 CTCF GSM1022663
93 GSM1022643.sga NB4 CTCF rep1 CTCF GSM1022643
94 GSM1022675.sga NHDF-neo CTCF rep1 CTCF GSM1022675
95 GSM1022676.sga NHDF-neo CTCF rep2 CTCF GSM1022676
96 GSM749707.sga NHEK CTCF rep1 CTCF GSM749707
97 GSM749747.sga NHEK CTCF rep2 CTCF GSM749747
98 GSM1022626.sga NHLF CTCF rep1 CTCF GSM1022626
99 GSM1022667.sga RPTEC CTCF rep1 CTCF GSM1022667
100 GSM1022666.sga RPTEC CTCF rep2 CTCF GSM1022666
101 GSM749684.sga SAEC CTCF rep1 CTCF GSM749684
102 GSM749779.sga SAEC CTCF rep2 CTCF GSM749779
103 GSM749693.sga SK-N-SH_RA CTCF rep1 CTCF GSM749693
104 GSM749667.sga SK-N-SH_RA CTCF rep2 CTCF GSM749667
105 GSM749768.sga WERI-Rb-1 CTCF rep1 CTCF GSM749768
106 GSM749679.sga WERI-Rb-1 CTCF rep2 CTCF GSM749679
107 GSM1022637.sga WI-38 CTCF rep1 CTCF GSM1022637
108 GSM1022634.sga WI-38 CTCF rep2 CTCF GSM1022634
109 GSM1022674.sga A549 Input rep1 Input GSM1022674
110 GSM749702.sga AG04449 Input rep1 Input GSM749702
111 GSM749697.sga AG04450 Input rep1 Input GSM749697
112 GSM749718.sga AG09309 Input rep1 Input GSM749718
113 GSM749671.sga AG09319 Input rep1 Input GSM749671
114 GSM749734.sga AG10803 Input rep1 Input GSM749734
115 GSM749772.sga AoAF Input rep1 Input GSM749772
116 GSM1022648.sga BE2_C Input rep1 Input GSM1022648
117 GSM749770.sga BJ Input rep1 Input GSM749770
118 GSM1022625.sga CD20+_RO01778 Input rep1 Input GSM1022625
119 GSM1022645.sga CD20+_RO01794 Input rep1 Input GSM1022645
120 GSM749691.sga Caco-2 Input rep1 Input GSM749691
121 GSM749731.sga GM06990 Input rep1 Input GSM749731
122 GSM749701.sga GM12801 Input rep1 Input GSM749701
123 GSM749754.sga GM12864 Input rep1 Input GSM749754
124 GSM749777.sga GM12865 Input rep1 Input GSM749777
125 GSM749765.sga GM12872 Input rep1 Input GSM749765
126 GSM749685.sga GM12873 Input rep1 Input GSM749685
127 GSM749742.sga GM12874 Input rep1 Input GSM749742
128 GSM749724.sga GM12875 Input rep1 Input GSM749724
129 GSM749669.sga GM12878 Input rep1 Input GSM749669
130 GSM1022649.sga H7-hESC Input rep1 Input GSM1022649
131 GSM749720.sga HA-sp Input rep1 Input GSM749720
132 GSM1022673.sga HAc Input rep1 Input GSM1022673
133 GSM749746.sga HBMEC Input rep1 Input GSM749746
134 GSM749713.sga HCF Input rep1 Input GSM749713
135 GSM749761.sga HCFaa Input rep1 Input GSM749761
136 GSM749738.sga HCM Input rep1 Input GSM749738
137 GSM749776.sga HCPEpiC Input rep1 Input GSM749776
138 GSM749774.sga HCT-116 Input rep1 Input GSM749774
139 GSM749698.sga HEEpiC Input rep1 Input GSM749698
140 GSM749767.sga HEK293 Input rep1 Input GSM749767
141 GSM1022627.sga HFF Input rep1 Input GSM1022627
142 GSM1022654.sga HFF-Myc Input rep1 Input GSM1022654
143 GSM749775.sga HL-60 Input rep1 Input GSM749775
144 GSM749755.sga HMEC Input rep1 Input GSM749755
145 GSM749763.sga HMF Input rep1 Input GSM749763
146 GSM749709.sga HPAF Input rep1 Input GSM749709
147 GSM749773.sga HPF Input rep1 Input GSM749773
148 GSM749778.sga HRE Input rep1 Input GSM749778
149 GSM749771.sga HRPEpiC Input rep1 Input GSM749771
150 GSM749758.sga HUVEC Input rep1 Input GSM749758
151 GSM749703.sga HVMF Input rep1 Input GSM749703
152 GSM749721.sga HeLa-S3 Input rep1 Input GSM749721
153 GSM749756.sga HepG2 Input rep1 Input GSM749756
154 GSM749672.sga Jurkat Input rep1 Input GSM749672
155 GSM749719.sga K562 Input rep1 Input GSM749719
156 GSM1022672.sga LNCaP Input rep1 Input GSM1022672
157 GSM749760.sga MCF-7 Input rep1 Input GSM749760
158 GSM1022659.sga Monocytes-CD14+_RO01746 Input rep1 Input GSM1022659
159 GSM749716.sga NB4 Input rep1 Input GSM749716
160 GSM749722.sga NHDF-neo Input rep1 Input GSM749722
161 GSM749744.sga NHEK Input rep1 Input GSM749744
162 GSM1022641.sga NHLF Input rep1 Input GSM1022641
163 GSM1022632.sga PANC-1 Input rep1 Input GSM1022632
164 GSM1022642.sga RPTEC Input rep1 Input GSM1022642
165 GSM749682.sga SAEC Input rep1 Input GSM749682
166 GSM1022638.sga SK-N-MC Input rep1 Input GSM1022638
167 GSM749700.sga SK-N-SH_RA Input rep1 Input GSM749700
168 GSM1022646.sga SKMC Input rep1 Input GSM1022646
169 GSM749766.sga WERI-Rb-1 Input rep1 Input GSM749766
170 GSM1022647.sga WI-38 Input rep1 Input GSM1022647
171 GSM1022655.sga WI-38 Input rep2 Input GSM1022655
172 GSM1022656.sga H7-hESC Input rep1 Input GSM1022656
173 GSM1022660.sga H7-hESC Input rep2 Input GSM1022660
174 GSM1022670.sga H7-hESC Input rep3 Input GSM1022670

Technical Notes

SRA files were downloaded from GEO and processed using the following bash commands:

  1. Extract FASTQ from SRA file:
    fastq-dump SAMPLE.sra
    
  2. Map reads to genome using Bowtie:
    bowtie --best --strata -m1 --sam -l 36 -n 3 h_sapiens_ncbi36 -q SAMPLE.fastq > SAMPLE.sam
    
  3. Clean the results from unmapped reads:
    awk 'BEGIN {FS="\t"} $3 != "\*" {print $0}' SAMPLE.sam > SAMPLE_clean.sam
    
  4. Make BAM file:
    samtools view -bS -o SAMPLE.bam SAMPLE_clean.sam
    
  5. Sort it:
    samtools sort SAMPLE.bam SAMPLE_sorted
    
  6. Make BED file:
    bamToBed -i SAMPLE_sorted.bam > SAMPLE.bed
    
  7. Make SGA file:
    bed2sga.pl -s hg18 -f FEATURE < SAMPLE.bed | sort -s -k1,1 -k3,3n -k4,4 | compactsga > SAMPLE.sga
    

59 new samples were added to this data serie in November 2013. They are: GSM1022625 GSM1022626 GSM1022627 GSM1022628 GSM1022629 GSM1022630 GSM1022631 GSM1022632 GSM1022633 GSM1022634 GSM1022635 GSM1022636 GSM1022637 GSM1022638 GSM1022639 GSM1022640 GSM1022641 GSM1022642 GSM1022643 GSM1022644 GSM1022645 GSM1022646 GSM1022647 GSM1022648 GSM1022649 GSM1022650 GSM1022651 GSM1022652 GSM1022653 GSM1022654 GSM1022655 GSM1022656 GSM1022657 GSM1022658 GSM1022659 GSM1022660 GSM1022661 GSM1022662 GSM1022663 GSM1022664 GSM1022665 GSM1022666 GSM1022667 GSM1022668 GSM1022669 GSM1022670 GSM1022671 GSM1022672 GSM1022673 GSM1022674 GSM1022675 GSM1022676 GSM1022677 GSM849300 GSM849301 GSM849302 GSM849303 GSM849304 GSM849305.
Note: sample names follow UCSC schema as specified in the description file for this serie. There are some discrepances between UCSC names and GEO names for the replica number.

References

  1. GEO series GSE30263 CTCF Binding Sites by ChIP-seq from ENCODE/University of Washington.
  2. Wang H, Maurano MT, Qu H, Varley KE et al.
    Widespread plasticity in CTCF occupancy linked to DNA methylation. Genome Res 2012 Sep;22(9):1680-8. PMID: 22955980