Skip to content

Use chicken in vignettes/README #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jun 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: tfboot
Title: Bootstrapping and statistical analysis for TFBS-disrupting SNPs
Version: 0.1.0
Version: 0.2.0
Authors@R:
c(person(given = "Stephen",
family = "Turner",
Expand Down Expand Up @@ -30,14 +30,14 @@ Imports:
tibble,
tidyr
Suggests:
BSgenome.Cfamiliaris.UCSC.canFam3,
BSgenome.Ggallus.UCSC.galGal6,
TxDb.Ggallus.UCSC.galGal6.refGene,
org.Gg.eg.db,
AnnotationDbi,
knitr,
rmarkdown,
TxDb.Cfamiliaris.UCSC.canFam3.ncbiRefSeq
Remotes:
stephenturner/TxDb.Cfamiliaris.UCSC.canFam3.ncbiRefSeq
rmarkdown
VignetteBuilder: knitr
Depends:
R (>= 2.10)
R (>= 4.2.0)
LazyData: true
Language: en-US
2 changes: 1 addition & 1 deletion R/tfboot.R
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ get_upstream_snps <- function(snps, txdb, level="genes", ...) {
stop("level must be either 'genes' or 'transcripts'")
}
# Get upstream intervals of those features
upstream <- get_upstream(features, ...)
upstream <- suppressWarnings(get_upstream(features, ...))
# Intersect with SNPs to get SNPs in those upstream regions
upstreamsnps <- plyranges::join_overlap_intersect(snps, upstream)
# Set an attribute to check later if needed
Expand Down
98 changes: 49 additions & 49 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,36 +58,36 @@ library(tfboot)
``` r
mbres <- vignettedata$mbres
mbres
#> # A tibble: 449 × 10
#> gene_id SNP_id tf pctRef pctAlt scoreRef scoreAlt effect alleleDiff alleleEffectSize
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
#> 1 ARHGAP30 38:21390808:T:C ACE2 0.754 0.957 3.73 4.71 strong 0.980 0.199
#> 2 ARHGAP30 38:21394731:T:C ASCL1 0.882 0.774 4.68 4.13 weak -0.558 -0.105
#> 3 ARHGAP30 38:21394731:T:C ASCL1 0.857 0.726 5.75 4.91 strong -0.846 -0.127
#> 4 ARHGAP30 38:21390808:T:C AT5G04390 0.861 0.692 4.29 3.48 strong -0.808 -0.163
#> 5 ARHGAP30 38:21394731:T:C Atoh1 0.997 0.848 6.19 5.29 strong -0.901 -0.145
#> 6 ARHGAP30 38:21390808:T:C BZIP52 0.765 0.860 4.98 5.57 weak 0.591 0.0918
#> 7 ARHGAP30 38:21394731:T:C Bhlha15 0.746 0.853 4.98 5.68 weak 0.696 0.105
#> 8 ARHGAP30 38:21394731:T:C ERF096 0.741 0.927 3.47 4.28 strong 0.811 0.176
#> 9 ARHGAP30 38:21394731:T:C ERF13 0.723 0.903 3.99 4.95 strong 0.966 0.177
#> 10 ARHGAP30 38:21394731:T:C ERF7 0.738 0.942 2.72 3.40 weak 0.678 0.189
#> # ℹ 439 more rows
#> # A tibble: 1,335 × 10
#> gene_id SNP_id tf pctRef pctAlt scoreRef scoreAlt effect alleleDiff alleleEffectSize
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
#> 1 100316005 chr33:7472152_G/C ACE2 0.986 0.783 4.85 3.87 strong -0.980 -0.199
#> 2 100316005 chr33:7472176_T/G ADR1 0.759 0.917 3.34 4.01 weak 0.673 0.154
#> 3 100316005 chr33:7471139_A/C AFT2 0.964 0.878 5.50 5.02 weak -0.480 -0.0843
#> 4 100316005 chr33:7472176_T/G AFT2 0.902 0.740 5.15 4.24 strong -0.911 -0.160
#> 5 100316005 chr33:7473834_G/T AFT2 0.879 0.705 5.02 4.04 strong -0.980 -0.172
#> 6 100316005 chr33:7472152_G/C AGL42 0.859 0.693 5.14 4.15 strong -0.986 -0.165
#> 7 100316005 chr33:7471139_A/C AGL55 0.695 0.861 3.97 4.91 strong 0.939 0.165
#> 8 100316005 chr33:7472830_A/T ALX3 0.972 0.776 4.23 3.43 strong -0.806 -0.185
#> 9 100316005 chr33:7473857_T/C ALX3 0.919 0.773 4.02 3.42 weak -0.600 -0.138
#> 10 100316005 chr33:7473583_G/A ARG80 0.731 0.961 3.16 4.15 strong 0.993 0.230
#> # ℹ 1,325 more rows
mball <- vignettedata$mball
mball
#> # A tibble: 34,053 × 10
#> gene_id SNP_id tf pctRef pctAlt scoreRef scoreAlt effect alleleDiff alleleEffectSize
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
#> 1 ACKR1 38:22769951:C:T ABF3 0.920 0.733 4.88 3.94 strong -0.947 -0.179
#> 2 ACKR1 38:22769951:C:T ARG80 0.910 0.727 3.93 3.14 strong -0.788 -0.182
#> 3 ACKR1 38:22769951:C:T CAMTA2 0.916 0.703 4.09 3.17 strong -0.921 -0.207
#> 4 ACKR1 38:22769951:C:T CAMTA3 0.940 0.741 4.27 3.38 strong -0.893 -0.197
#> 5 ACKR1 38:22769951:C:T HES1 0.863 0.682 4.90 3.90 strong -0.997 -0.176
#> 6 ACKR1 38:22769951:C:T NHP10 0.881 0.716 5.41 4.43 strong -0.981 -0.160
#> 7 ACKR1 38:22769951:C:T OJ1058_F05.8 0.851 0.701 5.57 4.59 strong -0.974 -0.149
#> 8 ACKR1 38:22769951:C:T SUT1 0.888 0.639 3.61 2.62 strong -0.993 -0.245
#> 9 ACKR1 38:22769951:C:T ZNF816 0.840 0.896 7.97 8.47 weak 0.498 0.0530
#> 10 ACKR1 38:22769951:C:T nuc-1 0.856 0.697 5.42 4.42 strong -0.998 -0.158
#> # ℹ 34,043 more rows
#> # A tibble: 12,689 × 10
#> gene_id SNP_id tf pctRef pctAlt scoreRef scoreAlt effect alleleDiff alleleEffectSize
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
#> 1 100315917 chr33:5706617_C/T ARG80 0.979 0.749 4.23 3.24 strong -0.993 -0.230
#> 2 100315917 chr33:5706617_C/T ARG81 0.991 0.825 5.89 4.90 strong -0.982 -0.165
#> 3 100315917 chr33:5703167_G/T ARGFX 0.781 0.889 5.06 5.74 weak 0.677 0.105
#> 4 100315917 chr33:5706617_C/T ARR1 0.634 0.880 2.53 3.42 strong 0.892 0.231
#> 5 100315917 chr33:5707492_C/T AT1G19040 0.882 0.779 7.60 6.73 strong -0.861 -0.100
#> 6 100315917 chr33:5704406_C/G AT3G46070 0.784 0.892 5.13 5.79 weak 0.666 0.103
#> 7 100315917 chr33:5706617_C/T ATHB-12 0.781 0.890 4.12 4.67 weak 0.552 0.106
#> 8 100315917 chr33:5706617_C/T ATHB-13 0.721 0.868 4.98 5.98 strong 0.993 0.145
#> 9 100315917 chr33:5706617_C/T ATHB-16 0.813 0.966 5.32 6.32 strong 0.998 0.153
#> 10 100315917 chr33:5706617_C/T ATHB-4 0.778 0.931 4.98 5.94 strong 0.963 0.151
#> # ℹ 12,679 more rows
```

Summarize motifbreakR results on our 5 genes of interest. This shows us
Expand All @@ -101,7 +101,7 @@ mbsmry
#> # A tibble: 1 × 7
#> ngenes nsnps nstrong alleleDiffAbsMean alleleDiffAbsSum alleleEffectSizeAbsMean alleleEffectSizeAbsSum
#> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 5 449 363 0.837 376. 0.157 70.7
#> 1 5 1335 950 0.798 1066. 0.155 206.
```

Bootstrap resample motifbreakR results for all genes. Resample sets of 5
Expand All @@ -112,18 +112,18 @@ set.seed(42)
mbboot <- mb_bootstrap(mball, ngenes=5, boots = 250)
mbboot$bootwide
#> # A tibble: 250 × 9
#> boot genes ngenes nsnps nstrong alleleDiffAbsMean alleleDiffAbsSum alleleEffectSizeAbsMean alleleEffectSizeAbsSum
#> <int> <chr> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 HSD17B7;SLAMF9;LOC102151674;LOC111094585;LOC102155451 5 1463 1100 0.808 1182. 0.162 236.
#> 2 2 SLAMF7;LOC111094577;LOC111094537;HSD17B7;LOC111094548 5 631 419 0.774 489. 0.159 100.
#> 3 3 GPATCH2;CDK18;LOC102153439;LOC111094501;LOC111094466 5 342 258 0.810 277. 0.157 53.6
#> 4 4 LOC111094614;LOC111094516;CD1E;LOC111094588;LOC111094527 5 789 539 0.785 620. 0.160 126.
#> 5 5 LOC111094522;LOC111094553;FAM177B;LOC111094466;CNTN2 5 608 435 0.800 486. 0.151 91.8
#> 6 6 LOC111094613;SLAMF9;RXRG;LOC111094514;APOA2 5 568 381 0.779 443. 0.147 83.4
#> 7 7 PIGM;LOC111094605;LOC111094471;LOC111094509;AIDA 5 991 706 0.799 792. 0.159 158.
#> 8 8 LEMD1;SH2D1B;FCER1A;CDK18;LOC111094593 5 472 336 0.790 373. 0.159 74.9
#> 9 9 FCER1G;LOC111094573;LOC111094582;LEMD1;LOC111094560 5 685 490 0.802 549. 0.152 104.
#> 10 10 EIF2D;LOC102152982;PPP1R15B;LOC111094577;LOC111094514 5 808 486 0.756 611. 0.148 120.
#> boot genes ngenes nsnps nstrong alleleDiffAbsMean alleleDiffAbsSum alleleEffectSizeAbsMean alleleEffectSizeAbsSum
#> <int> <chr> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 431304;426469;100315917;396544;374124 5 1062 750 0.797 847. 0.155 164.
#> 2 2 426183;395959;431304;429501;396170 5 1244 914 0.808 1005. 0.157 195.
#> 3 3 102465361;426183;396544;426469;429172 5 886 655 0.812 720. 0.149 132.
#> 4 4 396007;407779;693245;429501;100498692 5 1514 1139 0.815 1234. 0.153 232.
#> 5 5 426883;396544;408041;426183;426469 5 925 623 0.787 728. 0.147 136.
#> 6 6 425059;429035;100529062;396007;425613 5 1350 1026 0.821 1109. 0.155 209.
#> 7 7 408042;426880;100498692;425499;426885 5 823 627 0.823 677. 0.157 129.
#> 8 8 396170;425058;426886;395587;396045 5 1262 910 0.805 1016. 0.147 185.
#> 9 9 102466833;426183;100529061;396045;395959 5 1263 963 0.822 1038. 0.153 193.
#> 10 10 429035;408042;100529062;100529061;425613 5 1289 987 0.823 1061. 0.154 198.
#> # ℹ 240 more rows
mbboot$bootdist
#> # A tibble: 6 × 2
Expand All @@ -144,14 +144,14 @@ distribution from bootstrap resampling.
bootstats <- mb_bootstats(mbsmry, mbboot)
bootstats
#> # A tibble: 6 × 5
#> metric stat bootdist bootmax p
#> <chr> <dbl> <list> <dbl> <dbl>
#> 1 nsnps 449 <dbl [250]> 1646 0.82
#> 2 nstrong 363 <dbl [250]> 1253 0.732
#> 3 alleleDiffAbsMean 0.837 <dbl [250]> 0.845 0.02
#> 4 alleleDiffAbsSum 376. <dbl [250]> 1347. 0.796
#> 5 alleleEffectSizeAbsMean 0.157 <dbl [250]> 0.173 0.556
#> 6 alleleEffectSizeAbsSum 70.7 <dbl [250]> 264. 0.816
#> metric stat bootdist bootmax p
#> <chr> <dbl> <list> <dbl> <dbl>
#> 1 nsnps 1335 <dbl [250]> 1597 0.136
#> 2 nstrong 950 <dbl [250]> 1210 0.188
#> 3 alleleDiffAbsMean 0.798 <dbl [250]> 0.831 0.836
#> 4 alleleDiffAbsSum 1066. <dbl [250]> 1304. 0.148
#> 5 alleleEffectSizeAbsMean 0.155 <dbl [250]> 0.163 0.456
#> 6 alleleEffectSizeAbsSum 206. <dbl [250]> 251. 0.128
```

Visualize the results:
Expand Down
Binary file modified data/vignettedata.rda
Binary file not shown.
16 changes: 0 additions & 16 deletions inst/extdata/README.sh

This file was deleted.

Binary file added inst/extdata/galGal6-chr33.vcf.gz
Binary file not shown.
Binary file added inst/extdata/galGal6-chr33.vcf.gz.tbi
Binary file not shown.
Binary file removed inst/extdata/rosie38.vcf.gz
Binary file not shown.
Binary file removed inst/extdata/rosie38.vcf.gz.tbi
Binary file not shown.
75 changes: 75 additions & 0 deletions inst/extdata/simulate-vcf.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
library(BSgenome.Ggallus.UCSC.galGal6)
library(TxDb.Ggallus.UCSC.galGal6.refGene)
library(VariantAnnotation)

set.seed(2023-06-19)

makesnp <- function(x) {
if (x=="N") return("N")
nt <- c("A", "C", "G", "T")
stopifnot(x %in% nt)
nt <- setdiff(nt, x)
sample(nt, 1)
}

# Set the BSGenome
bsgenome <- BSgenome.Ggallus.UCSC.galGal6

# Get the length of each chromosome from the BSGenome object
chromosome_lengths <- seqlengths(bsgenome)

# Define how many SNPs you want to generate per chromosome
num_snps <- 20000 # Change as needed

# Initialize an empty GRanges object to store your simulated SNPs
simulated_snps <- GRanges()

# Loop through each chromosome
for (chr_name in names(chromosome_lengths)) {

# skip everything except chr33
if (chr_name!="chr33") next

# Generate random positions for the SNPs
snp_positions <- sort(sample(10000:(chromosome_lengths[chr_name]-10000), num_snps))

# Generate GRanges object for this chromosome
chr_snps <- GRanges(seqnames = chr_name,
ranges = IRanges(start = snp_positions, width = 1),
strand = "*")

# Add the simulated SNPs for this chromosome to the total
simulated_snps <- c(simulated_snps, chr_snps)
}

# Make the VCF
vcf <- VCF(rowRanges = simulated_snps,
colData=DataFrame(Samples=1L, row.names = "sample1"),
exptData=list(header=VCFHeader()))

# Set ref alleles (DNAStringSet)
ref(vcf) <- getSeq(bsgenome, simulated_snps)

# Set ref alleles (DNAStringSetList)
alt(vcf) <-
ref(vcf) |>
as.character() |>
sapply(makesnp) |>
unname() |>
VariantAnnotation:::.toDNAStringSetList()

# Set the genotypes
assays(vcf, withDimnames=FALSE) <- list(GT=matrix(rep("1/1", length(vcf)), ncol=1, dimnames = list(NULL, "sample1")))


# Write the VCF
vcffile <- here::here("inst/extdata/galGal6-chr33.vcf")
writeVcf(vcf, vcffile)
system(paste0("head -n10 ", vcffile))

# Make it spec compliant
system(paste0("gsed -i '1 i##fileformat=VCFv4.3' ", vcffile))
system(paste0("bgzip -f ", vcffile))
system(paste0("tabix -f ", vcffile, ".gz"))
system(paste0("bcftools view ", vcffile, ".gz | head -n10"))

Binary file removed inst/extdata/todd38.vcf.gz
Binary file not shown.
Binary file removed inst/extdata/todd38.vcf.gz.tbi
Binary file not shown.
Binary file modified man/figures/README-plot_bootstats-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading