Getting Started with rTASSEL

In this notebook, we will show you the basic usages of rTASSEL. rTASSEL can be utilized in genetic analysis pipelines in several ways:

  • Genotype data handling and manipulation
  • Genetic analysis (e.g. association, LD, etc.)
  • Visualization

Load package

rTASSEL can simply be loaded using the library() method once installed:

In [ ]:
library(rTASSEL)

Load genotype and phenotype data

In the following code block, we will load example phenotype and genotype data from a file path within the rTASSEL package. Once loaded, we can use rTASSEL's readGenotypePhenotype method to create an rTASSEL data object from the file paths:

In [ ]:
# Load gentoype info
genoPathHMP <- system.file("extdata", "mdp_genotype.hmp.txt", package = "rTASSEL")

# Load phenotype info
phenoPath  <- system.file("extdata", "mdp_traits.txt", package = "rTASSEL")

# Create rTASSEL data object
tasGenoPheno <- readGenotypePhenotype(
    genoPathOrObj    = genoPathHMP,
    phenoPathDFOrObj = phenoPath
)

# View data object
tasGenoPheno

Generate kinship analysis

In TASSEL, for mixed linear model analyses, a kinship matrix calculated from genotype data is necessary. This can be accomplished by calculating a kinship TASSEL object using the function kinshipMatrix() from rTASSEL. The main parameter input is a TasselGenotypePhenotype class object that contains a genotype data set (i.e. our prior data object):

In [ ]:
tasKin <- kinshipMatrix(tasObj = tasGenoPheno)
In [ ]:
# Show summary object
tasKin

Since this object is essentially a TASSEL reference object, it is relatively not of use until association analysis. We can coerce this TASSEL object by using the following function, as.matrix() from base R to return a matrix object:

In [ ]:
# Get full R matrix
tasKinRMat <- as.matrix(tasKin)

# Inspect the first 5 rows and columns
tasKinRMat[1:5, 1:5]

Association analyses

One of TASSEL's most powerful functionalities is its capability of performing a variety of different association modeling techniques. If you have started reading the walkthrough here it is strongly suggested that you read the other components of this walkthrough since the following parameters require what we have previously created!

If you are not familar with these methods, more information about how these operate in base TASSEL can be found at following links:

The assocModelFitter() function has several primary components:

  • tasObj: a TasselGenotypePhenotype class R object
  • formula: an R-based linear model formula
  • fitMarkers: a boolean parameter to differentiate between BLUE and GLM analyses
  • kinship: a TASSEL kinship object
  • fastAssociation: a boolean parameter for data sets that have many traits

Probably the most important concept of this function is formula parameter. If you are familar with standard R linear model functions, this concept is fairly similar. In TASSEL, a linear model is composed of the following scheme:

$$y \sim A_{1} + A_{2} + \dots A_{n}$$

Where $y$ is any TASSEL data type and $A_{n}$ is any TASSEL covariate and / or factor types. In the following example, we will fit a mixed linear model (MLM) with our rTASSEL data object. In addition to the prior parameters, we will also need the TASSEL kinship object:

In [ ]:
# Calculate MLM
tasMLM <- assocModelFitter(
    tasObj = tasGenoPheno,  # <- our prior TASSEL object
    formula = EarHT ~ .,    # <- run only EarHT
    fitMarkers = TRUE,      # <- set this to TRUE for GLM
    kinship = tasKin,       # <- our prior kinship object
    fastAssociation = FALSE
)

# Return names of objects returned
names(tasMLM)

For more information about the extent of these methods, please read this section of our introductory vignette.

Visualizations

rTASSEL supports automated visualizations for Manhattan plots and linkage disequilibrium (LD) analyses. In this example, we will generate a Manhattan plot for our prior MLM object:

In [ ]:
# Generate Manhattan plot for ear height trait
manhattanEH <- manhattanPlot(
    assocStats = tasMLM$MLM_Stats,
    trait      = "EarHT",
    threshold  = 5
)

# View visualization
manhattanEH

Similarly, we can also visualize LD using automated methods. Like most LD plots, it is wise to filter your genotype information to a specific region of interest:

In [ ]:
# Filter genotype table by position
tasGenoPhenoFilt <- filterGenotypeTableSites(
    tasObj              = tasGenoPheno,
    siteRangeFilterType = "position",
    startPos            = 228e6,
    endPos              = 300e6,
    startChr            = 2,
    endChr              = 2
)

# Generate and visualize LD
myLD <- ldPlot(
    tasObj  = tasGenoPhenoFilt,
    ldType  = "All",
    plotVal = "r2",
    verbose = FALSE
)

# View LD
myLD

Genomic prediction

rTASSEL also allows for phenotypic prediction through genotype information via genomic best linear unbiased predictors (gBLUPs):

In [ ]:
tasCV <- genomicPrediction(
    tasPhenoObj = tasGenoPheno,
    kinship     = tasKin,
    doCV        = TRUE,
    kFolds      = 5,
    nIter       = 1
)
head(tasCV)