Getting Started with the SteatoSITE gene browser

Features
More Help
Steatosite

About SteatoSITE

This app allows users to visualise the RNA-seq data in SteatoSITE. SteatoSITE is a retrospective multicentre national cohort of 940 patients, across the complete NAFLD spectrum, integrating quantitative digital pathology, hepatic bulk RNA-seq and 5.67 million days of longitudinal electronic health record follow-up into a secure, searchable platform to accelerate new biomedical discoveries in NAFLD. A total of 632 NAFLD-spectrum cases and 32 controls with successful sequencing are analysed here. NASH-CRN fibrosis stage is included as a factor.

For access to the full SteatoSITE resource, including rich clinical outcome annotation please visit https://steatosite.com/.

Features

Visualise the data:

  • clustering (PCA plots, heatmaps)
  • group comparisons (scatterplots, volcano plots)
  • gene-level boxplots of expression values

More Help and Info

Additional help information and more detailed instructions are provided under the “Instructions” tab.

App Info

The app has been developed by Tim Kendall (University of Edinburgh), heavily based upon the START app developed by Jessica Minnier, Jiri Sklenar, Anthony Paul Barnes, and Jonathan Nelson (Oregon Health & Science University, Knight Cardiovascular Institute and School of Public Health, https://doi.org/10.1093/bioinformatics/btw624).

This panel constructs box and whisker plots of log2(CPM) or CPM values with dot plots superimposed to show the raw data. When there are three data points the median and interquartile ranges are precisely the data values. Medians are denoted by horizontal lines and averages are denoted by open diamonds.

Filters

Visualization Settings

About SteatoSITE

This app allows users to visualise the RNA-seq data in SteatoSITE. Cases with successful sequencing are a subset of the 940 cases from across the NAFLD spectrum. NASH-CRN fibrosis stage is included as a factor.

For access to the full SteatoSITE resource, including rich clinical outcome annotation please visit https://steatosite.com/

Visualizations

Group Plots

PCA Plot

This plot uses Principal Component Analysis (PCA) to calculate the principal components of the expression data using data from all genes. Euclidean distances between expression values are used. Samples are projected on the first two principal components (PCs) and the percent variance explained by those PCs are displayed along the x and y axes. Ideally your samples will cluster by group identifier.

Analysis Plots

These plots use the p-values and fold changes to visualize your data.

Volcano Plot

This is a scatter plot log fold changes vs –log10(p-values) so that genes with the largest fold changes and smallest p-values are shown on the extreme top left and top right of the plot. Hover over points to see which gene is represented by each point. (https://en.wikipedia.org/wiki/Volcano_plot_(statistics))

Scatter Plot

This is a scatter plot of average gene expression in one group against another group. This allows the viewer to observe which genes have the largest differences between two groups. The smallest distances will be along the diagonal line, and points far away from the diagonal show the most differences. Hover over points to see which gene is represented by each point.

Gene Expression Boxplot

Use the search bar to look up genes in your data set. For selected gene(s) the stripchart (dotplot) and boxplots of the expression values are presented for each group. You may plot one or multiple genes along side each other. Hover over points for more information about the data.

Heatmap

A heatmap of expression values are shown, with genes and samples arranged by unsupervised clustering. You may filter on test results as well as P-value cutoffs. By default the top 100 genes (with lowest P-values) are shown.