Skip to content

getPeak2GeneLinks Error #442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
YanFang0620 opened this issue Nov 27, 2020 · 27 comments
Closed

getPeak2GeneLinks Error #442

YanFang0620 opened this issue Nov 27, 2020 · 27 comments
Labels
bug Something isn't working

Comments

@YanFang0620
Copy link

Hi,
I try to follow the steps for chapter"Peak2GeneLinkage with ArchR" using the turorial data.

After run the first step:

projHeme5 <- addPeak2GeneLinks(
ArchRProj = projHeme5,
reducedDims = "IterativeLSI"
)

I got this log file:

ArchR-addPeak2GeneLinks-c9f637af053-Date-2020-11-27_Time-12-49-47.log

However, when I run next step:

p2g <- getPeak2GeneLinks(
ArchRProj = projHeme5,
corCutOff = 0.45,
resolution = 1,
returnLoops = TRUE
)

I got an error message:

Error in .validInput(input = start, name = "start", valid = c("integer")) :
Input value for 'start' is not a integer, (start = numeric) please supply valid input!
In addition: Warning message:
In min(abs(c(input%%1, input%%1 - 1)), na.rm = TRUE) :
no non-missing arguments to min; returning Inf

Is there anyone could help me to solve this problem? Thanks a lot!

@YanFang0620 YanFang0620 added the bug Something isn't working label Nov 27, 2020
@rcorces
Copy link
Collaborator

rcorces commented Nov 27, 2020

I cannot reproduce your error. Perhaps you have missed some steps in the tutorial, such as adding a peak set or a peak matrix.

@YanFang0620
Copy link
Author

I cannot reproduce your error. Perhaps you have missed some steps in the tutorial, such as adding a peak set or a peak matrix.

Thank you for your reply!
I checked the available matrix in projHeme5:

getAvailableMatrices(projHeme5)
[1] "ATACMatrix" "ChIPMatrix"
[3] "EncodeTFBSMatrix" "GeneIntegrationMatrix"
[5] "GeneScoreMatrix" "MotifMatrix"
[7] "PeakMatrix" "TileMatrix"

It seems like there already added a PeakMatrix in the ArchRProject.

So how could I check whether there's any peak set in my ArchRProject?

@YanFang0620
Copy link
Author

I cannot reproduce your error. Perhaps you have missed some steps in the tutorial, such as adding a peak set or a peak matrix.

I also tried to check:

head(projHeme5@peakSet)
GRanges object with 6 ranges and 12 metadata columns:
seqnames ranges strand | score
|
Mono chr1 752495-752995 * | 27.3149
CD4.N chr1 757871-758371 * | 5.55037
CD4.N chr1 762690-763190 * | 17.9242
GMP chr1 773670-774170 * | 4.90925
B chr1 801006-801506 * | 15.2793
B chr1 805039-805539 * | 37.7072
replicateScoreQuantile groupScoreQuantile Reproducibility

Mono 0.865 0.737 2
CD4.N 0.835 0.541 2
CD4.N 0.977 0.917 2
GMP 0.513 0.176 2
B 0.685 0.418 2
B 0.955 0.895 2
GroupReplicate distToGeneStart nearestGene

Mono Mono..scATAC_BMMC_R1 10156 LINC00115
CD4.N CD4.N.
.scATAC_PBMC_R1 4780 LINC00115
CD4.N CD4.N..scATAC_PBMC_R1 30 LINC01128
GMP GMP.
.scATAC_CD34_BMMC_R1 10948 LINC01128
B B..scATAC_PBMC_R1 10925 FAM41C
B B.
.scATAC_BMMC_R1 6892 FAM41C
peakType distToTSS nearestTSS GC idx

Mono Distal 10156 uc001aau.3 0.485 1
CD4.N Distal 4780 uc001aau.3 0.5609 2
CD4.N Promoter 30 uc021oeh.1 0.6966 3
GMP Intronic 10741 uc021oeh.1 0.4571 4
B Distal 10925 uc021oei.1 0.4371 5
B Intronic 6892 uc021oei.1 0.7285 6


seqinfo: 23 sequences from an unspecified genome; no seqlengths

So there should be peak set in the ArchRProject.
Actually, all of steps were ran smoothly except the getPeak2GeneLinks() step.

@rcorces
Copy link
Collaborator

rcorces commented Nov 28, 2020

Can you run the following code line by line and tell me what happens?

ArchRProj = projHeme5
corCutOff = 0.45
resolution = 1

coA <- metadata(ArchRProj@peakSet)$CoAccessibility
coA <- coA[coA$correlation >= corCutOff,,drop=FALSE]
peakSummits <- resize(metadata(coA)$peakSet, 1, "center")
summitTiles <- floor(start(peakSummits) / resolution) * resolution + floor(resolution / 2)

loops <- ArchR:::.constructGR(
  seqnames = seqnames(peakSummits[coA[,1]]),
  start = summitTiles[coA[,1]],
  end = summitTiles[coA[,2]]
)

@YanFang0620
Copy link
Author

Can you run the following code line by line and tell me what happens?

ArchRProj = projHeme5
corCutOff = 0.45
resolution = 1

coA <- metadata(ArchRProj@peakSet)$CoAccessibility
coA <- coA[coA$correlation >= corCutOff,,drop=FALSE]
peakSummits <- resize(metadata(coA)$peakSet, 1, "center")
summitTiles <- floor(start(peakSummits) / resolution) * resolution + floor(resolution / 2)

loops <- ArchR:::.constructGR(
  seqnames = seqnames(peakSummits[coA[,1]]),
  start = summitTiles[coA[,1]],
  end = summitTiles[coA[,2]]
)

Hi, I ran the code line by line,and finally I got the loops:

> loops
GRanges object with 96566 ranges and 0 metadata columns:
          seqnames              ranges strand
             <Rle>           <IRanges>  <Rle>
      [1]     chr1       845576-856623      *
      [2]     chr1       845576-856623      *
      [3]     chr1       856623-901555      *
      [4]     chr1       894703-895398      *
      [5]     chr1       894703-895398      *
      ...      ...                 ...    ...
  [96562]     chrX 153665228-153686239      *
  [96563]     chrX 153941234-153959650      *
  [96564]     chrX 153941234-153959650      *
  [96565]     chrX 153959650-153960343      *
  [96566]     chrX 153959650-153960343      *
  -------
  seqinfo: 23 sequences from an unspecified genome; no seqlengths

@rcorces
Copy link
Collaborator

rcorces commented Nov 28, 2020

That just doesnt make sense to me.

If you are running this command:

p2g <- getPeak2GeneLinks(
ArchRProj = projHeme5,
corCutOff = 0.45,
resolution = 1,
returnLoops = TRUE
)

EDIT NOTE: The below is actually not correct. Copy-paste error.
Then the below code is exactly what is being run. If you can run this and it doesnt return an error, then you should try to re-run your original command.

ArchRProj = projHeme5
corCutOff = 0.45
resolution = 1
returnLoops = TRUE



if(is.null(ArchRProj@peakSet)){
  return(NULL)
}


if(is.null(metadata(ArchRProj@peakSet)$CoAccessibility)){
  return(NULL)
}else{
  
  coA <- metadata(ArchRProj@peakSet)$CoAccessibility
  coA <- coA[coA$correlation >= corCutOff,,drop=FALSE]
  
  if(returnLoops){
    
    peakSummits <- resize(metadata(coA)$peakSet, 1, "center")
    
    if(!is.null(resolution)){
      summitTiles <- floor(start(peakSummits) / resolution) * resolution + floor(resolution / 2)
    }else{
      summitTiles <- start(peakSummits)
    }
    
    loops <- ArchR:::.constructGR(
      seqnames = seqnames(peakSummits[coA[,1]]),
      start = summitTiles[coA[,1]],
      end = summitTiles[coA[,2]]
    )
    metadata(coA) <- list()
    mcols(loops) <- coA[,-c(1:3)]
    mcols(loops)$value <- coA$correlation
    
    loops <- loops[order(mcols(loops)$value, decreasing=TRUE)]
    loops <- unique(loops)
    loops <- loops[width(loops) > 0]
    loops <- sort(sortSeqlevels(loops))
    
    loops <- SimpleList(CoAccessibility = loops)

    return(loops)
  }else{
    return(coA)
  }
  
}

@YanFang0620
Copy link
Author

ArchRProj = projHeme5
corCutOff = 0.45
resolution = 1
returnLoops = TRUE

Thank you so much for your suggestion.
However, if I try to run the original codes you provide above,finally I got an error:

Error: no function to return from, jumping to top level

@rcorces
Copy link
Collaborator

rcorces commented Nov 29, 2020

Of course you cannot run the line that says return(loops) because you are not in a function to return from. Can you run the code line by line instead?

Also, have you tried re-running your original command:

p2g <- getPeak2GeneLinks(
ArchRProj = projHeme5,
corCutOff = 0.45,
resolution = 1,
returnLoops = TRUE
)

@YanFang0620
Copy link
Author

If I run:

p2g <- getPeak2GeneLinks(
ArchRProj = projHeme5,
corCutOff = 0.45,
resolution = 1,
returnLoops = TRUE
)

It still showed me:

Error in .validInput(input = start, name = "start", valid = c("integer")) : 
  Input value for 'start' is not a integer, (start = numeric) please supply valid input!
In addition: Warning message:
In min(abs(c(input%%1, input%%1 - 1)), na.rm = TRUE) :
  no non-missing arguments to min; returning Inf

@rcorces
Copy link
Collaborator

rcorces commented Nov 29, 2020

What happens when you run all of this code? It should print one number (1 - 4).

ArchRProj = projHeme5, 
corCutOff = 0.45, 
FDRCutOff = 0.0001,
varCutOffATAC = 0.25,
varCutOffRNA = 0.25,
resolution = 1, 
returnLoops = TRUE

if(is.null(ArchRProj@peakSet)){
  print("1")
}

if(is.null(metadata(ArchRProj@peakSet)$Peak2GeneLinks)){
  print("2")
}else{
  
  p2g <- metadata(ArchRProj@peakSet)$Peak2GeneLinks
  p2g <- p2g[which(p2g$Correlation >= corCutOff & p2g$FDR <= FDRCutOff), ,drop=FALSE]
  
  if(!is.null(varCutOffATAC)){
    p2g <- p2g[which(p2g$VarQATAC > varCutOffATAC),]
  }
  
  if(!is.null(varCutOffRNA)){
    p2g <- p2g[which(p2g$VarQRNA > varCutOffRNA),]
  }
  
  if(returnLoops){
    
    peakSummits <- resize(metadata(p2g)$peakSet, 1, "center")
    geneStarts <- resize(metadata(p2g)$geneSet, 1, "start")
    
    if(!is.null(resolution)){
      summitTiles <- floor(start(peakSummits) / resolution) * resolution + floor(resolution / 2)
      geneTiles <- floor(start(geneStarts) / resolution) * resolution + floor(resolution / 2)
    }else{
      summitTiles <- start(peakSummits)
      geneTiles <- start(geneTiles)
    }
    
    loops <- ArchR:::.constructGR(
      seqnames = seqnames(peakSummits[p2g$idxATAC]),
      start = summitTiles[p2g$idxATAC],
      end = geneTiles[p2g$idxRNA]
    )
    mcols(loops)$value <- p2g$Correlation
    mcols(loops)$FDR <- p2g$FDR
    
    loops <- loops[order(mcols(loops)$value, decreasing=TRUE)]
    loops <- unique(loops)
    loops <- loops[width(loops) > 0]
    loops <- sort(sortSeqlevels(loops))
    
    loops <- SimpleList(Peak2GeneLinks = loops)
    
    print("3")
    
  }else{
    
    print("4")
    
  }
  
}

@YanFang0620
Copy link
Author

It didn't print 1-4. It showed me:

image

@rcorces
Copy link
Collaborator

rcorces commented Nov 29, 2020

Ok well at least that is consistent. Try this:

run this code:

ArchRProj = projHeme5, 
corCutOff = 0.45, 
FDRCutOff = 0.0001,
varCutOffATAC = 0.25,
varCutOffRNA = 0.25,
resolution = 1, 
returnLoops = TRUE

if(is.null(ArchRProj@peakSet)){
  print("1")
}

if(is.null(metadata(ArchRProj@peakSet)$Peak2GeneLinks)){
  print("2")
}else{
  
  p2g <- metadata(ArchRProj@peakSet)$Peak2GeneLinks
  p2g <- p2g[which(p2g$Correlation >= corCutOff & p2g$FDR <= FDRCutOff), ,drop=FALSE]
  
  if(!is.null(varCutOffATAC)){
    p2g <- p2g[which(p2g$VarQATAC > varCutOffATAC),]
  }
  
  if(!is.null(varCutOffRNA)){
    p2g <- p2g[which(p2g$VarQRNA > varCutOffRNA),]
  }
  
  if(returnLoops){
    
    peakSummits <- resize(metadata(p2g)$peakSet, 1, "center")
    geneStarts <- resize(metadata(p2g)$geneSet, 1, "start")
    
    if(!is.null(resolution)){
      summitTiles <- floor(start(peakSummits) / resolution) * resolution + floor(resolution / 2)
      geneTiles <- floor(start(geneStarts) / resolution) * resolution + floor(resolution / 2)
    }else{
      summitTiles <- start(peakSummits)
      geneTiles <- start(geneTiles)
    }
  }
}

Then show me the output of each of these lines:

head(seqnames(peakSummits[p2g$idxATAC]))

str(summitTiles)
str(geneTiles)

str(summitTiles[p2g$idxATAC])
str(geneTiles[p2g$idxRNA])

@YanFang0620
Copy link
Author

The following are the output I got:

> head(seqnames(peakSummits[p2g$idxATAC]))
factor-Rle of length 0 with 0 runs
  Lengths: 
  Values : 
Levels(23): chr1 chr2 chr3 chr4 chr5 chr6 ... chr19 chr20 chr21 chr22 chrX
> str(summitTiles)
 num [1:140865] 752745 758121 762940 773920 801256 ...
> str(geneTiles)
 num [1:18601] 69091 762902 812182 860530 894679 ...
> str(summitTiles[p2g$idxATAC])
 num(0) 
> str(geneTiles[p2g$idxRNA])
 num(0) 

@rcorces
Copy link
Collaborator

rcorces commented Nov 29, 2020

Ok. That shows where the problem is coming from. Show me:
metadata(projHeme5@peakSet)$Peak2GeneLinks

@YanFang0620
Copy link
Author

> metadata(projHeme5@peakSet)$Peak2GeneLinks
DataFrame with 900358 rows and 6 columns
         idxATAC    idxRNA         Correlation       FDR
       <integer> <integer>           <numeric> <numeric>
1              1         2  -0.137592741888392       NaN
2              2         2  -0.014714218205002       NaN
3              3         2  -0.275911420781701       NaN
4              4         2   0.213618225737381       NaN
5              5         2  0.0532897125162421       NaN
...          ...       ...                 ...       ...
900354    140861     18601 -0.0632468248590322       NaN
900355    140862     18601   -0.20948768020154       NaN
900356    140863     18601  -0.154581524969957       NaN
900357    140864     18601 -0.0844574975792414       NaN
900358    140865     18601  0.0175566266875722       NaN
                 VarQATAC           VarQRNA
                <numeric>         <numeric>
1       0.717133425620275 0.490672544486856
2      0.0561104603698577 0.490672544486856
3       0.886913001810244 0.490672544486856
4        0.22905618854932 0.490672544486856
5       0.268434316544209 0.490672544486856
...                   ...               ...
900354  0.107315514854648 0.480619321541853
900355  0.104277144784013 0.480619321541853
900356  0.536797643133497 0.480619321541853
900357  0.397692826465055 0.480619321541853
900358  0.219422851666489 0.480619321541853

@rcorces
Copy link
Collaborator

rcorces commented Nov 29, 2020

I guess I owe you an apology. Sorry - this is related to the issue #385 where you originally posted but I didnt catch it until now. As in that issue, the FDRs are all NaN.

So this line:
p2g <- p2g[which(p2g$Correlation >= corCutOff & p2g$FDR <= FDRCutOff), ,drop=FALSE]
is causing p2g to be empty which causes your error.

I'm going to close this and we can continue the discussion on the previous thread. Sorry that this is not working at the moment. We will get it fixed.

@rcorces rcorces closed this as completed Nov 29, 2020
@jgranja24
Copy link
Contributor

Sorry I think the bug is now fixed in the new release branch -----

To install --
devtools::install_github("GreenleafLab/ArchR", ref="release_1.0.1", repos = BiocManager::repositories())

The change --
o$TStat <- (o$Correlation / sqrt((pmax(1-o$Correlation^2, 0.00000000000000001, na.rm = TRUE))/(ncol(seATAC)-2))) #T-statistic P-value

@YanFang0620
Copy link
Author

I guess I owe you an apology. Sorry - this is related to the issue #385 where you originally posted but I didnt catch it until now. As in that issue, the FDRs are all NaN.

So this line:
p2g <- p2g[which(p2g$Correlation >= corCutOff & p2g$FDR <= FDRCutOff), ,drop=FALSE]
is causing p2g to be empty which causes your error.

I'm going to close this and we can continue the discussion on the previous thread. Sorry that this is not working at the moment. We will get it fixed.

No, its OK. Thank you so much for your patience and let me know about this.
Hope it could be fixed soon. This is a really great package for scATAC-seq analysis!

@sylestiel
Copy link

@rcorces
If installation of the revised version is the solution it is still not addressing the problem for me.
devtools::install_github("GreenleafLab/ArchR", ref="release_1.0.1", repos = BiocManager::repositories())

@rcorces
Copy link
Collaborator

rcorces commented Dec 4, 2020

That doesnt sound possible. did you re-run addPeak2GeneLinks() as I suggested?

@sylestiel
Copy link

I just did it and still does not work!!
Screen Shot 2020-12-04 at 1 01 54 PM

@sylestiel
Copy link

Screen Shot 2020-12-04 at 1 03 14 PM

@jgranja24
Copy link
Contributor

Can you copy the output if you paste addPeak2GeneLinks into R which returns the function. Are you sure you reset your r session and reloading ArchR?

@rcorces
Copy link
Collaborator

rcorces commented Dec 4, 2020

@sylestiel - "it doesnt work" is relative. It is no longer giving you the error and the function is running to completion.

@sylestiel
Copy link

addPeak2GeneLinks
function (ArchRProj = NULL, reducedDims = "IterativeLSI", useMatrix = "GeneIntegrationMatrix",
dimsToUse = 1:30, scaleDims = NULL, corCutOff = 0.75, cellsToUse = NULL,
k = 100, knnIteration = 500, overlapCutoff = 0.8, maxDist = 250000,
scaleTo = 10^4, log2Norm = TRUE, predictionCutoff = 0.4,
addEmpiricalPval = FALSE, seed = 1, threads = max(floor(getArchRThreads()/2),
1), verbose = TRUE, logFile = createLogFile("addPeak2GeneLinks"))
{
.validInput(input = ArchRProj, name = "ArchRProj", valid = c("ArchRProj"))
.validInput(input = reducedDims, name = "reducedDims", valid = c("character"))
.validInput(input = dimsToUse, name = "dimsToUse", valid = c("numeric",
"null"))
.validInput(input = scaleDims, name = "scaleDims", valid = c("boolean",
"null"))
.validInput(input = corCutOff, name = "corCutOff", valid = c("numeric",
"null"))
.validInput(input = cellsToUse, name = "cellsToUse", valid = c("character",
"null"))
.validInput(input = k, name = "k", valid = c("integer"))
.validInput(input = knnIteration, name = "knnIteration",
valid = c("integer"))
.validInput(input = overlapCutoff, name = "overlapCutoff",
valid = c("numeric"))
.validInput(input = maxDist, name = "maxDist", valid = c("integer"))
.validInput(input = scaleTo, name = "scaleTo", valid = c("numeric"))
.validInput(input = log2Norm, name = "log2Norm", valid = c("boolean"))
.validInput(input = threads, name = "threads", valid = c("integer"))
.validInput(input = verbose, name = "verbose", valid = c("boolean"))
.validInput(input = logFile, name = "logFile", valid = c("character"))
tstart <- Sys.time()
.startLogging(logFile = logFile)
.logThis(mget(names(formals()), sys.frame(sys.nframe())),
"addPeak2GeneLinks Input-Parameters", logFile = logFile)
.logDiffTime(main = "Getting Available Matrices", t1 = tstart,
verbose = verbose, logFile = logFile)
AvailableMatrices <- getAvailableMatrices(ArchRProj)
if ("PeakMatrix" %ni% AvailableMatrices) {
stop("PeakMatrix not in AvailableMatrices")
}
if (useMatrix %ni% AvailableMatrices) {
stop(paste0(useMatrix, " not in AvailableMatrices"))
}
ArrowFiles <- getArrowFiles(ArchRProj)
tstart <- Sys.time()
dfAll <- .safelapply(seq_along(ArrowFiles), function(x) {
DataFrame(cellNames = paste0(names(ArrowFiles)[x], "#",
h5read(ArrowFiles[x], paste0(useMatrix, "/Info/CellNames"))),
predictionScore = h5read(ArrowFiles[x], paste0(useMatrix,
"/Info/predictionScore")))
}, threads = threads) %>% Reduce("rbind", .)
.logDiffTime(sprintf("Filtered Low Prediction Score Cells (%s of %s, %s)",
sum(dfAll[, 2] < predictionCutoff), nrow(dfAll), round(sum(dfAll[,
2] < predictionCutoff)/nrow(dfAll), 3)), t1 = tstart,
verbose = verbose, logFile = logFile)
keep <- sum(dfAll[, 2] >= predictionCutoff)/nrow(dfAll)
dfAll <- dfAll[which(dfAll[, 2] > predictionCutoff), ]
set.seed(seed)
peakSet <- getPeakSet(ArchRProj)
.logThis(peakSet, "peakSet", logFile = logFile)
geneSet <- .getFeatureDF(ArrowFiles, useMatrix, threads = threads)
geneStart <- GRanges(geneSet$seqnames, IRanges(geneSet$start,
width = 1), name = geneSet$name, idx = geneSet$idx)
.logThis(geneStart, "geneStart", logFile = logFile)
rD <- getReducedDims(ArchRProj, reducedDims = reducedDims,
corCutOff = corCutOff, dimsToUse = dimsToUse)
if (!is.null(cellsToUse)) {
rD <- rD[cellsToUse, , drop = FALSE]
}
idx <- sample(seq_len(nrow(rD)), knnIteration, replace = !nrow(rD) >=
knnIteration)
.logDiffTime(main = "Computing KNN", t1 = tstart, verbose = verbose,
logFile = logFile)
knnObj <- .computeKNN(data = rD, query = rD[idx, ], k = k)
.logDiffTime(main = "Identifying Non-Overlapping KNN pairs",
t1 = tstart, verbose = verbose, logFile = logFile)
keepKnn <- determineOverlapCpp(knnObj, floor(overlapCutoff *
k))
knnObj <- knnObj[keepKnn == 0, ]
.logDiffTime(paste0("Identified ", nrow(knnObj), " Groupings!"),
t1 = tstart, verbose = verbose, logFile = logFile)
knnObj <- lapply(seq_len(nrow(knnObj)), function(x) {
rownames(rD)[knnObj[x, ]]
}) %>% SimpleList
chri <- gtools::mixedsort(unique(paste0(seqnames(peakSet))))
chrj <- gtools::mixedsort(unique(paste0(seqnames(geneStart))))
chrij <- intersect(chri, chrj)
geneDF <- mcols(geneStart)
peakDF <- mcols(peakSet)
geneDF$seqnames <- seqnames(geneStart)
peakDF$seqnames <- seqnames(peakSet)
.logDiffTime(main = "Getting Group RNA Matrix", t1 = tstart,
verbose = verbose, logFile = logFile)
groupMatRNA <- .getGroupMatrix(ArrowFiles = getArrowFiles(ArchRProj),
featureDF = geneDF, groupList = knnObj, useMatrix = useMatrix,
threads = threads, verbose = FALSE)
rawMatRNA <- groupMatRNA
.logThis(groupMatRNA, "groupMatRNA", logFile = logFile)
.logDiffTime(main = "Getting Group ATAC Matrix", t1 = tstart,
verbose = verbose, logFile = logFile)
groupMatATAC <- .getGroupMatrix(ArrowFiles = getArrowFiles(ArchRProj),
featureDF = peakDF, groupList = knnObj, useMatrix = "PeakMatrix",
threads = threads, verbose = FALSE)
rawMatATAC <- groupMatATAC
.logThis(groupMatATAC, "groupMatATAC", logFile = logFile)
.logDiffTime(main = "Normalizing Group Matrices", t1 = tstart,
verbose = verbose, logFile = logFile)
groupMatRNA <- t(t(groupMatRNA)/colSums(groupMatRNA)) * scaleTo
groupMatATAC <- t(t(groupMatATAC)/colSums(groupMatATAC)) *
scaleTo
if (log2Norm) {
groupMatRNA <- log2(groupMatRNA + 1)
groupMatATAC <- log2(groupMatATAC + 1)
}
names(geneStart) <- NULL
seRNA <- SummarizedExperiment(assays = SimpleList(RNA = groupMatRNA,
RawRNA = rawMatRNA), rowRanges = geneStart)
metadata(seRNA)$KNNList <- knnObj
.logThis(seRNA, "seRNA", logFile = logFile)
names(peakSet) <- NULL
seATAC <- SummarizedExperiment(assays = SimpleList(ATAC = groupMatATAC,
RawATAC = rawMatATAC), rowRanges = peakSet)
metadata(seATAC)$KNNList <- knnObj
.logThis(seATAC, "seATAC", logFile = logFile)
rm(groupMatRNA, groupMatATAC)
gc()
.logDiffTime(main = "Finding Peak Gene Pairings", t1 = tstart,
verbose = verbose, logFile = logFile)
o <- DataFrame(findOverlaps(.suppressAll(resize(seRNA, 2 *
maxDist + 1, "center")), resize(rowRanges(seATAC), 1,
"center"), ignore.strand = TRUE))
o$distance <- distance(rowRanges(seRNA)[o[, 1]], rowRanges(seATAC)[o[,
2]])
colnames(o) <- c("B", "A", "distance")
if (addEmpiricalPval) {
.logDiffTime(main = "Computing Background Correlations",
t1 = tstart, verbose = verbose, logFile = logFile)
nullCor <- .getNullCorrelations(seATAC, seRNA, o, 1000)
}
.logDiffTime(main = "Computing Correlations", t1 = tstart,
verbose = verbose, logFile = logFile)
o$Correlation <- rowCorCpp(as.integer(o$A), as.integer(o$B),
assay(seATAC), assay(seRNA))
o$VarAssayA <- .getQuantiles(matrixStats::rowVars(assay(seATAC)))[o$A]
o$VarAssayB <- .getQuantiles(matrixStats::rowVars(assay(seRNA)))[o$B]
o$TStat <- (o$Correlation/sqrt((max(1 - o$Correlation^2,
1e-17))/(ncol(seATAC) - 2)))
o$Pval <- 2 * pt(-abs(o$TStat), ncol(seATAC) - 2)
o$FDR <- p.adjust(o$Pval, method = "fdr")
out <- o[, c("A", "B", "Correlation", "FDR", "VarAssayA",
"VarAssayB")]
colnames(out) <- c("idxATAC", "idxRNA", "Correlation", "FDR",
"VarQATAC", "VarQRNA")
mcols(peakSet) <- NULL
names(peakSet) <- NULL
metadata(out)$peakSet <- peakSet
metadata(out)$geneSet <- geneStart
if (addEmpiricalPval) {
out$EmpPval <- 2 * pnorm(-abs(((out$Correlation - mean(nullCor[[2]]))/sd(nullCor[[2]]))))
out$EmpFDR <- p.adjust(out$EmpPval, method = "fdr")
}
dir.create(file.path(getOutputDirectory(ArchRProj), "Peak2GeneLinks"),
showWarnings = FALSE)
outATAC <- file.path(getOutputDirectory(ArchRProj), "Peak2GeneLinks",
"seATAC-Group-KNN.rds")
.safeSaveRDS(seATAC, outATAC, compress = FALSE)
outRNA <- file.path(getOutputDirectory(ArchRProj), "Peak2GeneLinks",
"seRNA-Group-KNN.rds")
.safeSaveRDS(seRNA, outRNA, compress = FALSE)
metadata(out)$seATAC <- outATAC
metadata(out)$seRNA <- outRNA
metadata(ArchRProj@peakSet)$Peak2GeneLinks <- out
.logDiffTime(main = "Completed Peak2Gene Correlations!",
t1 = tstart, verbose = verbose, logFile = logFile)
.endLogging(logFile = logFile)
ArchRProj
}
<bytecode: 0x1076420a0>
<environment: namespace:ArchR>

@jgranja24
Copy link
Contributor

You need to reload R and ArchR.

This issue has been fixed in the branch you are referring to (See Above) --

o$TStat <- (o$Correlation/sqrt((max(1 - o$Correlation^2,
1e-17))/(ncol(seATAC) - 2)))

@sylestiel
Copy link

Ok. Many Thanks!!! I will do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants