## Getting Started with rTASSEL¶

In this notebook, we will show you the basic usages of rTASSEL. rTASSEL can be utilized in genetic analysis pipelines in several ways:

• Genotype data handling and manipulation
• Genetic analysis (e.g. association, LD, etc.)
• Visualization

rTASSEL can simply be loaded using the library() method once installed:

In [ ]:
library(rTASSEL)


### Load genotype and phenotype data¶

In the following code block, we will load example phenotype and genotype data from a file path within the rTASSEL package. Once loaded, we can use rTASSEL's readGenotypePhenotype method to create an rTASSEL data object from the file paths:

In [ ]:
# Load gentoype info
genoPathHMP <- system.file("extdata", "mdp_genotype.hmp.txt", package = "rTASSEL")

phenoPath  <- system.file("extdata", "mdp_traits.txt", package = "rTASSEL")

# Create rTASSEL data object
genoPathOrObj    = genoPathHMP,
phenoPathDFOrObj = phenoPath
)

# View data object
tasGenoPheno


### Generate kinship analysis¶

In TASSEL, for mixed linear model analyses, a kinship matrix calculated from genotype data is necessary. This can be accomplished by calculating a kinship TASSEL object using the function kinshipMatrix() from rTASSEL. The main parameter input is a TasselGenotypePhenotype class object that contains a genotype data set (i.e. our prior data object):

In [ ]:
tasKin <- kinshipMatrix(tasObj = tasGenoPheno)

In [ ]:
# Show summary object


Since this object is essentially a TASSEL reference object, it is relatively not of use until association analysis. We can coerce this TASSEL object by using the following function, as.matrix() from base R to return a matrix object:

In [ ]:
# Get full R matrix

# Inspect the first 5 rows and columns


### Association analyses¶

One of TASSEL's most powerful functionalities is its capability of performing a variety of different association modeling techniques. If you have started reading the walkthrough here it is strongly suggested that you read the other components of this walkthrough since the following parameters require what we have previously created!

If you are not familar with these methods, more information about how these operate in base TASSEL can be found at following links:

The assocModelFitter() function has several primary components:

• tasObj: a TasselGenotypePhenotype class R object
• formula: an R-based linear model formula
• fitMarkers: a boolean parameter to differentiate between BLUE and GLM analyses
• kinship: a TASSEL kinship object
• fastAssociation: a boolean parameter for data sets that have many traits

Probably the most important concept of this function is formula parameter. If you are familar with standard R linear model functions, this concept is fairly similar. In TASSEL, a linear model is composed of the following scheme:

$$y \sim A_{1} + A_{2} + \dots A_{n}$$

Where $y$ is any TASSEL data type and $A_{n}$ is any TASSEL covariate and / or factor types. In the following example, we will fit a mixed linear model (MLM) with our rTASSEL data object. In addition to the prior parameters, we will also need the TASSEL kinship object:

In [ ]:
# Calculate MLM
tasMLM <- assocModelFitter(
tasObj = tasGenoPheno,  # <- our prior TASSEL object
formula = EarHT ~ .,    # <- run only EarHT
fitMarkers = TRUE,      # <- set this to TRUE for GLM
kinship = tasKin,       # <- our prior kinship object
fastAssociation = FALSE
)

# Return names of objects returned
names(tasMLM)


### Visualizations¶

rTASSEL supports automated visualizations for Manhattan plots and linkage disequilibrium (LD) analyses. In this example, we will generate a Manhattan plot for our prior MLM object:

In [ ]:
# Generate Manhattan plot for ear height trait
manhattanEH <- manhattanPlot(
assocStats = tasMLM\$MLM_Stats,
trait      = "EarHT",
threshold  = 5
)

# View visualization
manhattanEH


Similarly, we can also visualize LD using automated methods. Like most LD plots, it is wise to filter your genotype information to a specific region of interest:

In [ ]:
# Filter genotype table by position
tasGenoPhenoFilt <- filterGenotypeTableSites(
tasObj              = tasGenoPheno,
siteRangeFilterType = "position",
startPos            = 228e6,
endPos              = 300e6,
startChr            = 2,
endChr              = 2
)

# Generate and visualize LD
myLD <- ldPlot(
tasObj  = tasGenoPhenoFilt,
ldType  = "All",
plotVal = "r2",
verbose = FALSE
)

# View LD
myLD


### Genomic prediction¶

rTASSEL also allows for phenotypic prediction through genotype information via genomic best linear unbiased predictors (gBLUPs):

In [ ]:
tasCV <- genomicPrediction(
tasPhenoObj = tasGenoPheno,