Appendix A — FAQs

A.1 How do I install R packages other than `SeroTrackR`?

The key package that you will need to be able to run the SeroTrackR R package and perform any other subsequent analyses is the tidyverse R packages.

install.packages("tidyverse")

Then once you have installed the tidyverse meta-package, then you load the library in as below:

library(tidyverse)

There are many dependencies that the SeroTrackR R package relies on. These packages help us to wrangle our data, process our MFI data into RAU, apply the machine learning classification algorithm, and create PDF reports.

If you are using R via RStudio, then when you load the SeroTrackR package it may prompt you to install all of these other files. Select yes if that is the case.

If it does not prompt you automatically, or if you are using another platform to use R, then use the code below:

setup <- function(){
  needed <- c(
    # Imports: Required 
    "dplyr", "drc", "forcats", "ggplot2", "here", "janitor", 
    "kableExtra", "knitr", "magrittr", "openxlsx", "parsnip", 
    "purrr", "ranger", "readr", "readxl", "rmarkdown", "stats", 
    "stringr", "tidyr", "tidyselect", "utils", "workflows", 
    # Imports: Suggested
    "glue", "htmltools", "httr", "jsonlite", "shiny.fluent", 
    "tidyverse", "zoo"
    )
  for(package in needed){
    if(!sum(installed.packages() %in% package)){
      install.packages(package)
    }
    
    require(package, character.only = TRUE)
  }
}

setup()

A.2 How do I organise my files?

It is best practice to create an R project (.Rproj) file and store all of your files in there. A comprehensive tutorial on how to create an R project can be found here.

You can then create a Quarto Markdown Document data_processing.qmd inside your R project where you can process all of your serological data.

my_R_project
└── data/
    ├── raw_data_plate1.csv
    ├── raw_data_plate2.csv
    ├── raw_data_plate3.csv
    ├── platelayout.xlsx
└── results/
└── data_processing.qmd

A.3 How can I read in my data?

Once you have an R project (.Rproj) setup, you can save all of your files into a new folder called “data”. Then you can read in your data as follows:

my_raw_data       <- "data/my_plate1.csv"
my_plate_layout   <- "data/my_plate_layout.xlsx"

A.4 Is there capability to run this without internet?

You will need internet to initially download the R package and to process certain files which call upon the internet.

Important

The developers are currently working on how to leverage a safe no-internet option that allows for the incorporation of the classification algorithm. The reason we use the internet at the moment is because the files containing the algorithm are quite large.

A.5 I have multiple plate layout files. How can I input them?

Use the getPlateLayout() function to create a master plate layout file to then input into the other functions in the package!

getPlateLayout("your/folder/with/plate/layouts/")

Here replace “your/folder/with/plate/layouts/” with the main file that contains your folders. For example, if your folder looks like this:

my_R_project/
└── data/
    ├── plate_1/
    │   ├── raw_magpix_data_plate1.csv
    │   └── plate_layout_1.xlsx
    ├── plate_2/
    │   ├── raw_magpix_data_plate2.csv
    │   └── plate_layout_2.xlsx
    └── plate_3/
        ├── raw_magpix_data_plate3.csv
        └── plate_layout_3.xlsx

you would write:

getPlateLayout("data/")

you could ALSO write:

getPlateLayout()

OR:

getPlateLayout(folder_path = c("plate_layout_1.xlsx", "plate_layout_2.xlsx", "plate_layout_3.xlsx"))

A.6 I have multiple Luminex data types I’d like to analyse. How can I do this?

If you have, for example, both Bio-Plex and MAGPIX files and would like to analyse them both, then you can do some clever data manipulation as below:

*Note that in this example, there is one plate layout that contains all files in it. But the same idea can apply to readPlateLayout().

Input your Bio-Plex file/s:

Using a reproducible example:

library(SeroTrackR)
library(tidyverse)

bioplex_raw_plates <- c(
  system.file("extdata", "example_BioPlex_plate1.xlsx", package = "SeroTrackR"),
  system.file("extdata", "example_BioPlex_plate2.xlsx", package = "SeroTrackR")
)
all_plate_layout <- system.file("extdata", "example_platelayout_1.xlsx", package = "SeroTrackR")

For your data:

bioplex_raw_plates <- c(
  "data/example_BioPlex_plate1.xlsx", 
  "data/example_BioPlex_plate2.xlsx"
)
all_plate_layout <- "data/example_platelayout_1.xlsx"

Input your MAGPIX file/s:

Using a reproducible example:

magpix_raw_plate     <- system.file("extdata", "example_MAGPIX_plate3.csv", package = "SeroTrackR")

For your data:

magpix_raw_plate      <- "data/example_MAGPIX_plate3.csv"

Read the serological data files in:

# Serological data
bioplex_sero_data <- readSeroData(
  raw_data = bioplex_raw_plates,
  platform = "bioplex"
)

PASS: File example_bioplex_plate1.xlsx successfully validated.

PASS: File example_bioplex_plate2.xlsx successfully validated.

magpix_sero_data <- readSeroData(
  raw_data = magpix_raw_plate,
  platform = "magpix", 
  version = "4.2"
)

PASS: File example_magpix_plate3.csv successfully validated.

Merge the files together:

sero_data_merged <- NULL
# data_raw 
sero_data_merged$data_raw <- bioplex_sero_data$data_raw %>% 
  bind_rows(magpix_sero_data$data_raw)

# results 
sero_data_merged$results <- bioplex_sero_data$results %>% 
  bind_rows(magpix_sero_data$results)

# counts 
sero_data_merged$counts <- bioplex_sero_data$counts %>% 
  bind_rows(magpix_sero_data$counts)

# blanks
sero_data_merged$blanks <- bioplex_sero_data$blanks %>% 
  bind_rows(magpix_sero_data$blanks)

# stds
sero_data_merged$stds <- bioplex_sero_data$stds %>% 
  bind_rows(magpix_sero_data$stds)

# run 
sero_data_merged$run <- bioplex_sero_data$run %>% 
  bind_rows(magpix_sero_data$run)

Continue the rest of the pipeline:

plate_list_all  <- readPlateLayout(
  plate_layout = all_plate_layout, 
  sero_data = sero_data_merged
)

qc_results <- runQC(
  sero_data = sero_data_merged, 
  plate_list = plate_list_all
)

mfi_to_rau_output <- MFItoRAU(
  sero_data = sero_data_merged,
  plate_list = plate_list_all, 
  qc_results = qc_results, 
  std_point = 10
)

# etc..

A.7 How can I count the number of seropositive and seronegative samples?

To do this, use the template code below, assuming that you have followed the steps outlined here.

classifyResults_output %>% 
  count(pred_class_max)

# A tibble: 2 × 2
  pred_class_max     n
  <fct>          <int>
1 seronegative      43
2 seropositive     209

To calculate the serostatus PER PLATE:

classifyResults_output %>% 
  count(Plate, pred_class_max)

# A tibble: 6 × 3
  Plate  pred_class_max     n
  <chr>  <fct>          <int>
1 plate1 seronegative       9
2 plate1 seropositive      75
3 plate2 seronegative      17
4 plate2 seropositive      67
5 plate3 seronegative      17
6 plate3 seropositive      67

To calculate the percentage of the serostatus per plate:

classifyResults_output %>% 
  count(Plate, pred_class_max) %>% 
  group_by(Plate) %>% 
  mutate(Percent = round(100 * n / sum(n), 1))

# A tibble: 6 × 4
# Groups:   Plate [3]
  Plate  pred_class_max     n Percent
  <chr>  <fct>          <int>   <dbl>
1 plate1 seronegative       9    10.7
2 plate1 seropositive      75    89.3
3 plate2 seronegative      17    20.2
4 plate2 seropositive      67    79.8
5 plate3 seronegative      17    20.2
6 plate3 seropositive      67    79.8

You can do some clever data wrangling to make the table look like this:

classifyResults_output %>% 
  # Count our serostatus by plate 
  count(Plate, pred_class_max) %>% 
  group_by(Plate) %>% 
  mutate(
    # Calculate serostatus prevalence for each plate as a percentage 
    Percent = round(100 * n / sum(n), 1),
    # Create new variable "Count" to make it easier to interpret 
    Count = paste0(n, " (", Percent, "%)")
  ) %>% 
  # Remove the "Percent" and "n" columns as they are unnecessary now 
  dplyr::select(-c(Percent, n)) %>% 
  # Create cleaner table to view each serostatus type as a column 
  pivot_wider(
    id_cols = Plate,
    names_from = pred_class_max, 
    values_from = Count
  )

# A tibble: 3 × 3
# Groups:   Plate [3]
  Plate  seronegative seropositive
  <chr>  <chr>        <chr>       
1 plate1 9 (10.7%)    75 (89.3%)  
2 plate2 17 (20.2%)   67 (79.8%)  
3 plate3 17 (20.2%)   67 (79.8%)

# FAQs ## How do I install R packages other than `SeroTrackR`? :::{.callout-success} The key package that you will need to be able to run the `SeroTrackR` R package and perform any other subsequent analyses is the `tidyverse` R packages. ```{r} #| exec: false #| eval: false install.packages("tidyverse") ``` Then once you have installed the tidyverse meta-package, then you load the library in as below: ```{r} #| exec: false #| eval: false library(tidyverse) ``` ::: There **are many** dependencies that the `SeroTrackR` R package relies on. These packages help us to wrangle our data, process our MFI data into RAU, apply the machine learning classification algorithm, and create PDF reports. :::{.callout-success} If you are using R via RStudio, then when you load the SeroTrackR package it may prompt you to install all of these other files. Select yes if that is the case. If it does not prompt you automatically, or if you are using another platform to use R, then use the code below: ```{r} #| exec: false #| eval: false setup <- function(){ needed <- c( # Imports: Required "dplyr", "drc", "forcats", "ggplot2", "here", "janitor", "kableExtra", "knitr", "magrittr", "openxlsx", "parsnip", "purrr", "ranger", "readr", "readxl", "rmarkdown", "stats", "stringr", "tidyr", "tidyselect", "utils", "workflows", # Imports: Suggested "glue", "htmltools", "httr", "jsonlite", "shiny.fluent", "tidyverse", "zoo" ) for(package in needed){ if(!sum(installed.packages() %in% package)){ install.packages(package) } require(package, character.only = TRUE) } } setup() ``` ::: ## How do I organise my files? It is best practice to create an `R project (.Rproj)` file and store all of your files in there. A comprehensive tutorial on how to create an R project can be found [here](https://kzeglinski.github.io/new_wehi_r_course/session_1.html). You can then create a Quarto Markdown Document `data_processing.qmd` inside your R project where you can process all of your serological data. ```bash my_R_project └── data/ ├── raw_data_plate1.csv ├── raw_data_plate2.csv ├── raw_data_plate3.csv ├── platelayout.xlsx └── results/ └── data_processing.qmd ``` ## How can I read in my data? Once you have an `R project (.Rproj)` setup, you can save all of your files into a new folder called "data". Then you can read in your data as follows: ```{r} #| exec: false #| eval: false my_raw_data <- "data/my_plate1.csv" my_plate_layout <- "data/my_plate_layout.xlsx" ``` ## Is there capability to run this without internet? You will need internet to initially download the R package and to process certain files which call upon the internet.  ::: {.callout-important} The developers are currently working on how to leverage a safe no-internet option that allows for the incorporation of the classification algorithm. The reason we use the internet at the moment is because the files containing the algorithm are quite **large**. ::: ## I have multiple plate layout files. How can I input them? Use the `getPlateLayout()` function to create a master plate layout file to then input into the other functions in the package! ```{r} #| exec: false #| eval: false getPlateLayout("your/folder/with/plate/layouts/") ``` Here replace "your/folder/with/plate/layouts/" with the main file that contains your folders. For example, if your folder looks like this: ``` bash my_R_project/ └── data/ ├── plate_1/ │ ├── raw_magpix_data_plate1.csv │ └── plate_layout_1.xlsx ├── plate_2/ │ ├── raw_magpix_data_plate2.csv │ └── plate_layout_2.xlsx └── plate_3/ ├── raw_magpix_data_plate3.csv └── plate_layout_3.xlsx ``` you would write: ```{r} #| exec: false #| eval: false getPlateLayout("data/") ``` you could ALSO write: ```{r} #| exec: false #| eval: false getPlateLayout() ``` OR: ```{r} #| exec: false #| eval: false getPlateLayout(folder_path = c("plate_layout_1.xlsx", "plate_layout_2.xlsx", "plate_layout_3.xlsx")) ``` ## I have multiple Luminex data types I'd like to analyse. How can I do this? If you have, for example, both Bio-Plex and MAGPIX files and would like to analyse them both, then you can do some clever data manipulation as below: *Note that in this example, there is one plate layout that contains all files in it. But the same idea can apply to `readPlateLayout()`. **Input your Bio-Plex file/s**: Using a reproducible example: ```{r} #| message: false #| warning: false library(SeroTrackR) library(tidyverse) bioplex_raw_plates <- c( system.file("extdata", "example_BioPlex_plate1.xlsx", package = "SeroTrackR"), system.file("extdata", "example_BioPlex_plate2.xlsx", package = "SeroTrackR") ) all_plate_layout <- system.file("extdata", "example_platelayout_1.xlsx", package = "SeroTrackR") ``` For your data: ```{r} #| exec: false #| eval: false bioplex_raw_plates <- c( "data/example_BioPlex_plate1.xlsx", "data/example_BioPlex_plate2.xlsx" ) all_plate_layout <- "data/example_platelayout_1.xlsx" ``` **Input your MAGPIX file/s**: Using a reproducible example: ```{r} magpix_raw_plate <- system.file("extdata", "example_MAGPIX_plate3.csv", package = "SeroTrackR") ``` For your data: ```{r} #| exec: false #| eval: false magpix_raw_plate <- "data/example_MAGPIX_plate3.csv" ``` **Read the serological data files in**: ```{r} # Serological data bioplex_sero_data <- readSeroData( raw_data = bioplex_raw_plates, platform = "bioplex" ) magpix_sero_data <- readSeroData( raw_data = magpix_raw_plate, platform = "magpix", version = "4.2" ) ``` **Merge the files together**: ```{r} sero_data_merged <- NULL # data_raw sero_data_merged$data_raw <- bioplex_sero_data$data_raw %>% bind_rows(magpix_sero_data$data_raw) # results sero_data_merged$results <- bioplex_sero_data$results %>% bind_rows(magpix_sero_data$results) # counts sero_data_merged$counts <- bioplex_sero_data$counts %>% bind_rows(magpix_sero_data$counts) # blanks sero_data_merged$blanks <- bioplex_sero_data$blanks %>% bind_rows(magpix_sero_data$blanks) # stds sero_data_merged$stds <- bioplex_sero_data$stds %>% bind_rows(magpix_sero_data$stds) # run sero_data_merged$run <- bioplex_sero_data$run %>% bind_rows(magpix_sero_data$run) ``` **Continue the rest of the pipeline**: ```{r} #| exec: false #| eval: false plate_list_all <- readPlateLayout( plate_layout = all_plate_layout, sero_data = sero_data_merged ) qc_results <- runQC( sero_data = sero_data_merged, plate_list = plate_list_all ) mfi_to_rau_output <- MFItoRAU( sero_data = sero_data_merged, plate_list = plate_list_all, qc_results = qc_results, std_point = 10 ) # etc.. ``` ## How can I count the number of seropositive and seronegative samples? To do this, use the template code below, assuming that you have followed the steps outlined [here](pvseroapp.qmd). ```{r} #| echo: false #| warning: false #| message: false library(SeroTrackR) library(tidyverse) your_raw_data <- c( system.file("extdata", "example_MAGPIX_plate1.csv", package = "SeroTrackR"), system.file("extdata", "example_MAGPIX_plate2.csv", package = "SeroTrackR"), system.file("extdata", "example_MAGPIX_plate3.csv", package = "SeroTrackR") ) your_plate_layout <- system.file("extdata", "example_platelayout_1.xlsx", package = "SeroTrackR") magpix_sero_data <- readSeroData( raw_data = your_raw_data, platform = "magpix", version = "4.2" ) plate_list_all <- readPlateLayout( plate_layout = your_plate_layout, sero_data = magpix_sero_data ) qc_results <- runQC( sero_data = magpix_sero_data, plate_list = plate_list_all ) mfi_to_rau_output <- MFItoRAU( sero_data = magpix_sero_data, plate_list = plate_list_all, qc_results = qc_results, std_point = 10 ) classifyResults_output <- classifyResults( mfi_to_rau_output, algorithm_type = "antibody_model", sens_spec = "balanced", qc_results ) ``` ```{r} classifyResults_output %>% count(pred_class_max) ``` To calculate the serostatus PER PLATE: ```{r} classifyResults_output %>% count(Plate, pred_class_max) ``` To calculate the **percentage** of the serostatus per plate: ```{r} classifyResults_output %>% count(Plate, pred_class_max) %>% group_by(Plate) %>% mutate(Percent = round(100 * n / sum(n), 1)) ``` You can do some clever data wrangling to make the table look like this: ```{r} classifyResults_output %>% # Count our serostatus by plate count(Plate, pred_class_max) %>% group_by(Plate) %>% mutate( # Calculate serostatus prevalence for each plate as a percentage Percent = round(100 * n / sum(n), 1), # Create new variable "Count" to make it easier to interpret Count = paste0(n, " (", Percent, "%)") ) %>% # Remove the "Percent" and "n" columns as they are unnecessary now dplyr::select(-c(Percent, n)) %>% # Create cleaner table to view each serostatus type as a column pivot_wider( id_cols = Plate, names_from = pred_class_max, values_from = Count ) ```