Skip to content

Incorporating both p-values and the overall column #52

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mgtrek opened this issue May 17, 2021 · 5 comments
Closed

Incorporating both p-values and the overall column #52

mgtrek opened this issue May 17, 2021 · 5 comments

Comments

@mgtrek
Copy link

mgtrek commented May 17, 2021

Hi, thanks for a great tool. I thought this wasn't possible to do, and have been trying to do it manually, until I saw this: #21

Does anyone know how to generate a table1 that incorporates both p-values and the overall column as that particular user does in the example shown? Thanks!

@benjaminrich
Copy link
Owner

Yes. You can do it like this:

pvalue <- function(x, ...) {
  x <- x[-length(x)]  # Remove "overall" group
  # Construct vectors of data y, and groups (strata) g
  y <- unlist(x)
  g <- factor(rep(1:length(x), times=sapply(x, length)))
  if (is.numeric(y)) {
    # For numeric variables, perform an ANOVA
    p <- summary(aov(y ~ g))[[1]][["Pr(>F)"]][1]
  } else {
    # For categorical variables, perform a chi-squared test of independence
    p <- chisq.test(table(y, g))$p.value
  }
  # Format the p-value, using an HTML entity for the less-than sign.
  # The initial empty string places the output on the line below the variable label.
  c("", sub("<", "&lt;", format.pval(p, digits=3, eps=0.001)))
}

table1(~ Q1 + Q2 | ano, data=data, overall="Total",
    render.missing=NULL, render.categorical="FREQ (PCTnoNA%)",
    extra.col=list(`Valor-p`=pvalue), extra.col.pos=4)

image

@mgtrek
Copy link
Author

mgtrek commented May 17, 2021

Works like a charm, thank you so much for the quick reply.

Would be a great addition to this statement "These are the only 2 elements the list will have, because we will use overall=F in table1 (otherwise, there would be a third element in the list corresponding to the overall column). " here https://cran.r-project.org/web/packages/table1/vignettes/table1-examples.html

Just to let you/anyone else who stumbles on this issue, I had trouble adding p-values at all using the table1 package I'd downloaded from CRAN, and had to uninstall that and install directly from here on Github using remotes::install_github("benjaminrich/table1"). I don't know if it's just me or anyone else had the same issue, but putting this out here in case it helps anyone.

@mgtrek mgtrek closed this as completed May 17, 2021
@tianfeiwei
Copy link

aov(y ~ g)

It's a great function. I just wondering how do we pass aov(y ~ g + a + b + c)? I tried exact function but the table1 function is showing me an error message: object 'a' not found

@Tophey
Copy link

Tophey commented Apr 30, 2023

Thank you all for posting this! It helps me understand the function better.
The code below also works. I just subsetted x using "x[1:2]".

pvalue <- function(x, ...) {
# Construct vectors of data y, and groups (strata) g
y <- unlist(x[1:2])
g <- factor(rep(1:length(x[1:2]), times=sapply(x[1:2], length)))
if (is.numeric(y)) {
# For numeric variables, t test
# check homoscedasticity assumption
if (var.test(y ~ g)$p.value >= 0.05){
# enter equal variance argument accordingly in t.test()
p <- t.test(y ~ g, var.equal = T)$p.value
}else{
p <- t.test(y ~ g, var.equal = F)$p.value
}
} else {
# For categorical variables, perform a chi-squared test of independence
p <- chisq.test(table(y, g))$p.value
}
# Format the p-value, using an HTML entity for the less-than sign.
# The initial empty string places the output on the line below the variable label.
c("", sub("<", "<", format.pval(p, digits=3, eps=0.001)))
}

@emoryn
Copy link

emoryn commented Mar 26, 2024

Thank you for posting this. I modified it for my needs right now and wanted to share. I added a check before doing the anova to make sure there are 2 or more strata. I also added for the categorical variables if any cell values are <10 to do fishers.test() and if not then do chisq.test(). I made two functions one for my tables that don't have the overall col and one for the ones that do have it. Idk if anyone else will find this helpful but just incase. I definitely found what yall posted helpful :) Also I'm new to stats so if I got anything wrong please tell me without being rude, I'm just a worm!

# function for tables without the overall column
pval_wo_overall <- function(x, ...) {
  # Construct vectors of data y, and groups (strata) g
  y <- unlist(x)
  g <- factor(rep(1:length(x), times = sapply(x, length)))
  
  # check if variable is numeric
  if (is.numeric(y)) {
    # For numeric variables, perform an ANOVA
    # check there are 2 or more groups
    if (length(levels(g))>=2){
      # if there are 2 or more strata set p to the following
      p <- summary(aov(y ~ g))[[1]][["Pr(>F)"]][1]
    } else {
      # if not 2 or more groups then set p to NA
      p <- NA
    }
    
    # if not numeric check if character or factor
  } else if (is.character(y)==T | is.factor(y)==T){
    # For categorical variables, perform a fisher exact test or chisq
    
    # make a 2x2 table
    t <- table(y, g)
    # store the cell values from the table
    cell_vals <- c(t[1,1], t[1,2], t[2,1], t[2,2])

    # if any of the cell vals less than 10 do fisher.test from stats package
    if (any(cell_vals<10)==T){
      p <- stats::fisher.test(table(y, g))$p.value
      # if none are <10 do chisq
    } else if (any(cell_vals<10)==F){
      p <- stats::chisq.test(table(y, g))$p.value
      # if neither condition met set to NA
    }else {
      p <- NA
    }
    
  }
  # Format the p-value, using an HTML entity for the less-than sign.
  # The initial empty string places the output on the line below the variable label.
  c("", sub("<", "<", format.pval(p, digits = 3, eps = 0.001)))
}

# function for the tables with the overall column
pval_w_overall <- function(x, ...) {
  x <- x[-length(x)] # Remove "overall" group
  # Construct vectors of data y, and groups (strata) g
  y <- unlist(x)
  g <- factor(rep(1:length(x), times = sapply(x, length)))
  
  # check if variable is numeric
  if (is.numeric(y)) {
    # For numeric variables, perform an ANOVA
    # check there are 2 or more groups
    if (length(levels(g))>=2){
      # if there are 2 or more strata set p to the following
      p <- summary(aov(y ~ g))[[1]][["Pr(>F)"]][1]
    } else {
      # if not 2 or more groups then set p to NA
      p <- NA
    }
    
    # if not numeric check if character or factor
  } else if (is.character(y)==T | is.factor(y)==T){
    # For categorical variables, perform a fisher exact test or chisq
    
    # make a 2x2 table
    t <- table(y, g)
    # store the cell values from the table
    cell_vals <- c(t[1,1], t[1,2], t[2,1], t[2,2])

    # if any of the cell vals less than 10 do fisher.test from stats package
    if (any(cell_vals<10)==T){
      p <- stats::fisher.test(table(y, g))$p.value
      # if none are <10 do chisq
    } else if (any(cell_vals<10)==F){
      p <- stats::chisq.test(table(y, g))$p.value
      # if neither condition met set to NA
    }else {
      p <- NA
    }
    
  }
  # Format the p-value, using an HTML entity for the less-than sign.
  # The initial empty string places the output on the line below the variable label.
  c("", sub("<", "<", format.pval(p, digits = 3, eps = 0.001)))
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants