shaman_shuffle_hic_track

shaman_shuffle_hic_track(track_db, obs_track_nm, work_dir,
  exp_track_nm = paste0(obs_track_nm, "_shuffle"), max_jobs = 25,
  shuffle = 80, grid_small = 5e+05, grid_high = 1e+06,
  grid_step_iter = 40, dist_resolution = NA, smooth = NA)

Arguments

track_db

Directory of the misha database.

obs_track_nm

Name of observed 2D genomic track for the hic data.

work_dir

Centralized directory to store temporary files.

exp_track_nm

Name of expected 2D genomic track.

max_jobs

Maximal number of qsub or local jobs - for optimal performance provide the number of chromosomes.

shuffle

Average number of shuffling transitions for each observed point in the chromosomal contact matrix.

grid_small

Initial size of maximum distance between contact pairs consdered for switching

grid_high

Final size of maximum distance between contact pairs consdered for switching

grid_step_iter

Number of iterations in each grid size

dist_resolution

Number of bins in each log2 distance unit. If NA, value is determined based on observed data (recommended).

smooth

Number of bins to use for smoothing the MCMC target function: the decay curve. If NA, value is determined based on observed data (recommended).

Details

This function generates an expected 2D hic track based on observed hic data. Each chromosome is shuffled seperately, to generate an expected shuffled contact matrix Note that this function requires sge (qsub) or multicore to be enabled. Parameter can be set via shaman.sge_support or shaman.mc_support in shaman.conf file. Reshuffling of an entire dataset will require 7 hours per 1 billion reads on a machine with one core per chromosome.

Each step creates temporary files of the shuffled matrices which are then joined to a track. Temporary files are deleted upon track creation.

Examples

# The example below runs on the test misha db provided with shaman. # Note that this is a toy db sampled from K562 ela data - shuffling the observed track will not produce the expected track. # options(shaman.sge_support=1) #configuring sge engine mode - preferred options(shaman.mc_support=1) #configuring multi-core mode if (gtrack.exists("hic_obs_shuffle")) { gtrack.rm("hic_obs_shuffle", force=TRUE) gdb.reload() } ret <- shaman_shuffle_hic_track(shaman::shaman_get_test_track_db(), obs_track_nm="hic_obs", work_dir=tempdir(), # this can be set only in multi-core mode. For sge mode, work_dir must be accessible by all jobs. shuffle=1, # default is set to 80 grid_step_iter=1, # default is set to 40 max_jobs=parallel::detectCores()) # optimally set to number of chromosomes
#> Warning: 1 full chrom files were not shuffled: #> hic_obs_chrY_0_0.full_chrom_shuffled.uniq
#> chr1
#> chr10
#> chr11
#> chr12
#> chr13
#> chr14
#> chr15
#> chr16
#> chr17
#> chr18
#> chr19
#> chr2
#> chr20
#> chr21
#> chr22
#> chr3
#> chr4
#> chr5
#> chr6
#> chr7
#> chr8
#> chr9
#> chrX
#> chrY
#> missing file
#> Reading input file(s)... #> 5%...11%...18%...26%...33%...40%...48%...55%...62%...69%...76%...84%...91%...98%...100% #> Writing the track... #> 8%...26%...47%...65%...78%...95%...100%
gdb.reload() gtrack.ls("hic_obs_shuffle") #new shuffled track that was created
#> [1] "hic_obs_shuffle"