generate an expected hic track based on observed hic data

shaman_shuffle_hic_track

shaman_shuffle_hic_track(track_db, obs_track_nm, work_dir,
  exp_track_nm = paste0(obs_track_nm, "_shuffle"), max_jobs = 25,
  shuffle = 80, grid_small = 5e+05, grid_high = 1e+06,
  grid_step_iter = 40, dist_resolution = NA, smooth = NA)

Arguments

track_db	Directory of the misha database.
obs_track_nm	Name of observed 2D genomic track for the hic data.
work_dir	Centralized directory to store temporary files.
exp_track_nm	Name of expected 2D genomic track.
max_jobs	Maximal number of qsub or local jobs - for optimal performance provide the number of chromosomes.
shuffle	Average number of shuffling transitions for each observed point in the chromosomal contact matrix.
grid_small	Initial size of maximum distance between contact pairs consdered for switching
grid_high	Final size of maximum distance between contact pairs consdered for switching
grid_step_iter	Number of iterations in each grid size
dist_resolution	Number of bins in each log2 distance unit. If NA, value is determined based on observed data (recommended).
smooth	Number of bins to use for smoothing the MCMC target function: the decay curve. If NA, value is determined based on observed data (recommended).

Details

This function generates an expected 2D hic track based on observed hic data. Each chromosome is shuffled seperately, to generate an expected shuffled contact matrix Note that this function requires sge (qsub) or multicore to be enabled. Parameter can be set via shaman.sge_support or shaman.mc_support in shaman.conf file. Reshuffling of an entire dataset will require 7 hours per 1 billion reads on a machine with one core per chromosome.

Each step creates temporary files of the shuffled matrices which are then joined to a track. Temporary files are deleted upon track creation.

Examples


# The example below runs on the test misha db provided with shaman.
# Note that this is a toy db sampled from K562 ela data - shuffling the observed track will not produce the expected track.
# options(shaman.sge_support=1) #configuring sge engine mode - preferred
options(shaman.mc_support=1)    #configuring multi-core mode
if (gtrack.exists("hic_obs_shuffle")) {
    gtrack.rm("hic_obs_shuffle", force=TRUE)
    gdb.reload()
}
ret <- shaman_shuffle_hic_track(shaman::shaman_get_test_track_db(),
obs_track_nm="hic_obs",
work_dir=tempdir(),                # this can be set only in multi-core mode. For sge mode, work_dir must be accessible by all jobs.
shuffle=1,                         # default is set to 80
grid_step_iter=1,                  # default is set to 40
max_jobs=parallel::detectCores())  # optimally set to number of chromosomes
#> Warning: 1 full chrom files were not shuffled:
#>  hic_obs_chrY_0_0.full_chrom_shuffled.uniq
#> chr1
#> chr10
#> chr11
#> chr12
#> chr13
#> chr14
#> chr15
#> chr16
#> chr17
#> chr18
#> chr19
#> chr2
#> chr20
#> chr21
#> chr22
#> chr3
#> chr4
#> chr5
#> chr6
#> chr7
#> chr8
#> chr9
#> chrX
#> chrY
#> missing file
#> Reading input file(s)...
#> 5%...11%...18%...26%...33%...40%...48%...55%...62%...69%...76%...84%...91%...98%...100%
#> Writing the track...
#> 8%...26%...47%...65%...78%...95%...100%
gdb.reload()
gtrack.ls("hic_obs_shuffle") #new shuffled track that was created
#> [1] "hic_obs_shuffle"

Arguments

Details

Examples

Contents