SNP2TFBS DATA - FILE FORMATS ----------------------------------------------------------------------------- SNP2TFBS master file format (custom) "snp2tfbs_JASPAR_CORE_2014_vert.txt.gz" ----------------------------------------------------------------------------- 1. rsID 2. Chromosome 3. SNP pos (ref) 4. Number of TFs affected by SNP 5-8. TF names, ref PWM score, alt PWM score, score difference (alt-ref) 9-. If column 4>1 then a variable of 4 column is present for each TF match similar to 5-8. ----------------------------------------------------------------------------- SNP2TFBS master file format (BED) "snp2tfbs_JASPAR_CORE_2014_vert.bed.gz" ----------------------------------------------------------------------------- 1. Chromosome 2. SNP pos 3. SNP pos 4. ref allele 5. alt allele 6. rsID 7. Number of TFs affected by SNP 8. Names of TF affected 9. Score difference Column 8 and 9 are sorted based on absolute score difference of afftected TF in ref and alt genome ----------------------------------------------------------------------------- SNP2TFBS master file format (annovar). This may be used along with annovar to annotate variants of interest "snp2tfbs_customAnnovar.txt.gz" ----------------------------------------------------------------------------- 1. Chromosome 2. SNP pos 3. SNP pos 4. ref allele 5. alt allele 6. Information: rsID; MATCH(number of TF affected by the variant); TF (names of TF affected);ScoreDiff (sorted on absolute score difference) ----------------------------------------------------------------------------- "Custom" output format for individual TFs (Single PWM format) "directory: mapped_files/custom*" ----------------------------------------------------------------------------- 1. rsID 2. Chromosome 3. SNP pos (reference) 4. SNP pos (alternate) 5. ref-allele 6. alt-allele 7. ref PWM match start position (. if absent) 8. ref PWM match end position (. if absent) 9. ref PWM match seq (. if absent) 10. ref PWM match score (. if absent) 11. alt PWM match start position (. if absent) 12. alt PWM match end position (. if absent) 13. alt PWM match seq (. if absent) 14. alt PWM match score (. if absent) 15. strand 16. score difference alt-ref (if absent use low score threshold) 17. flag 1/0 (interesting SNP - 1 means one score >= high score threshold) ----------------------------------------------------------------------------- "Annotated" files format for individual TFs (Single PWM format) These are annotated for refGene using annovar "directory: mapped_files/annotated" ----------------------------------------------------------------------------- 1. Chromosome 2. SNP position 3. ref allele 4. alt allele 5. Functional annotation of gene 6. Gene name 7. Gene details (. if absent) 8. ExonicFunc.refGeneAAChange (. if absent) 9. rsID ----------------------------------------------------------------------------- "bed" format for individual TFs (Single PWM format) "directory: mapped_files/bed" ----------------------------------------------------------------------------- 1. Chromosome 2. SNP position 3. SNP position 4. rsID 5. Allele change (ref>alt) ----------------------------------------------------------------------------- "sga" format for individual TFs (Single PWM format) "directory: mapped_files/sga" ----------------------------------------------------------------------------- 1. Chromosome 2. TF name 3. SNP position 4. Strand ("0") 5. Count 6. rsID 7. ref allele 8. alt allele 9. ref PWM match score (. if absent) 10. alt PWM match score (. if absent) 11. Strand of ref PWM Refer https://epd.expasy.org/chipseq/documents.php for details on "sga" format