shaman_shuffle_hic_track
shaman_shuffle_hic_track(track_db, obs_track_nm, work_dir, exp_track_nm = paste0(obs_track_nm, "_shuffle"), max_jobs = 25, shuffle = 80, grid_small = 5e+05, grid_high = 1e+06, grid_step_iter = 40, dist_resolution = NA, smooth = NA)
track_db | Directory of the misha database. |
---|---|
obs_track_nm | Name of observed 2D genomic track for the hic data. |
work_dir | Centralized directory to store temporary files. |
exp_track_nm | Name of expected 2D genomic track. |
max_jobs | Maximal number of qsub or local jobs - for optimal performance provide the number of chromosomes. |
shuffle | Average number of shuffling transitions for each observed point in the chromosomal contact matrix. |
grid_small | Initial size of maximum distance between contact pairs consdered for switching |
grid_high | Final size of maximum distance between contact pairs consdered for switching |
grid_step_iter | Number of iterations in each grid size |
dist_resolution | Number of bins in each log2 distance unit. If NA, value is determined based on observed data (recommended). |
smooth | Number of bins to use for smoothing the MCMC target function: the decay curve. If NA, value is determined based on observed data (recommended). |
This function generates an expected 2D hic track based on observed hic data. Each chromosome is shuffled seperately, to generate an expected shuffled contact matrix Note that this function requires sge (qsub) or multicore to be enabled. Parameter can be set via shaman.sge_support or shaman.mc_support in shaman.conf file. Reshuffling of an entire dataset will require 7 hours per 1 billion reads on a machine with one core per chromosome.
Each step creates temporary files of the shuffled matrices which are then joined to a track. Temporary files are deleted upon track creation.
# The example below runs on the test misha db provided with shaman. # Note that this is a toy db sampled from K562 ela data - shuffling the observed track will not produce the expected track. # options(shaman.sge_support=1) #configuring sge engine mode - preferred options(shaman.mc_support=1) #configuring multi-core mode if (gtrack.exists("hic_obs_shuffle")) { gtrack.rm("hic_obs_shuffle", force=TRUE) gdb.reload() } ret <- shaman_shuffle_hic_track(shaman::shaman_get_test_track_db(), obs_track_nm="hic_obs", work_dir=tempdir(), # this can be set only in multi-core mode. For sge mode, work_dir must be accessible by all jobs. shuffle=1, # default is set to 80 grid_step_iter=1, # default is set to 40 max_jobs=parallel::detectCores()) # optimally set to number of chromosomes#> Warning: 1 full chrom files were not shuffled: #> hic_obs_chrY_0_0.full_chrom_shuffled.uniq#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#> Reading input file(s)... #> 5%...11%...18%...26%...33%...40%...48%...55%...62%...69%...76%...84%...91%...98%...100% #> Writing the track... #> 8%...26%...47%...65%...78%...95%...100%gdb.reload() gtrack.ls("hic_obs_shuffle") #new shuffled track that was created#> [1] "hic_obs_shuffle"