| cyl | am | n | mean_mpg | sem_mpg |
|---|---|---|---|---|
| 4 | Automatic | 3 | 22.90000 | 0.8386497 |
| 4 | Manual | 8 | 28.07500 | 1.5852839 |
| 6 | Automatic | 4 | 19.12500 | 0.8158584 |
| 6 | Manual | 3 | 20.56667 | 0.4333333 |
| 8 | Automatic | 12 | 15.05000 | 0.8008991 |
| 8 | Manual | 2 | 15.40000 | 0.4000000 |
SIMP59: Data Selection and Visualisation VT25
7.5 credits
lecture)lab)Figure 1: In this section of the book, you’ll learn how to import, tidy, transform, and visualize data.
mtcars to create a summary table showing the mean and standard error of miles per gallon (mpg) for different cylinder counts (cyl) and transmission types (am) in the mtcars dataset using dplyr.| cyl | am | n | mean_mpg | sem_mpg |
|---|---|---|---|---|
| 4 | Automatic | 3 | 22.90000 | 0.8386497 |
| 4 | Manual | 8 | 28.07500 | 1.5852839 |
| 6 | Automatic | 4 | 19.12500 | 0.8158584 |
| 6 | Manual | 3 | 20.56667 | 0.4333333 |
| 8 | Automatic | 12 | 15.05000 | 0.8008991 |
| 8 | Manual | 2 | 15.40000 | 0.4000000 |
#| echo: true
#| output: false
# Load required library
library(dplyr)
# Use mtcars dataset (built-in R dataset)
data(mtcars)
# Create summary table with mean and SEM of mpg grouped by cyl and am
summary_table <- mtcars %>%
group_by(cyl, am) %>%
summarise(
mean_mpg = mean(mpg, na.rm = TRUE),
sem_mpg = sd(mpg, na.rm = TRUE) / sqrt(n()),
.groups = "drop" # Drop grouping structure after summarising
) %>%
mutate(
am = factor(am, levels = c(0, 1), labels = c("Automatic", "Manual")),
cyl = factor(cyl)
)
# Display the summary table
print(summary_table)Figure 2.5: Fuel efficiency versus displacement, for 32 cars (1973–74 models). This figure uses five separate scales to represent data: (i) the x axis (displacement); (ii) the y axis (fuel efficiency); (iii) the color of the data points (power); (iv) the size of the data points (weight); and (v) the shape of the data points (number of cylinders). Four of the five variables displayed (displacement, fuel efficiency, power, and weight) are numerical continuous. The remaining one (number of cylinders) can be considered to be either numerical discrete or qualitative ordered. Data source: Motor Trend, 1974.
Figure 6.7: 2016 median U.S. annual household income versus age group and race. Age groups are shown along the x axis, and for each age group there are four bars, corresponding to the median income of Asian, white, Hispanic, and black people, respectively. Data source: United States Census Bureau
Figure 6.8: 2016 median U.S. annual household income versus age group and race. In contrast to Figure 6.7 , now race is shown along the x axis, and for each race we show seven bars according to the seven age groups. Data source: United States Census Bureau
boxoffice %>%
ggplot(aes(x = fct_reorder(title_short, rank), y = amount)) +
geom_col(fill = "#56B4E9", width = 0.6, alpha = 0.9) +
scale_y_continuous(expand = c(0, 0),
breaks = c(0, 2e7, 4e7, 6e7),
labels = c("0", "20", "40", "60"),
name = "weekend gross (million USD)") +
scale_x_discrete(name = NULL,
expand = c(0, 0.4)) +
coord_cartesian(clip = "off") +
theme_dviz_hgrid(12, rel_small = 1) +
theme(
#axis.ticks.length = grid::unit(0, "pt"),
axis.line.x = element_blank(),
axis.ticks.x = element_blank()
)Figure 6.14: Internet adoption over time, for select countries. Color represents the percent of internet users for the respective country and year. Countries were ordered by percent internet users in 2016. Data source: World Bank
Figure 7.8: Density estimates of the ages of male and female Titanic passengers. To highlight that there were more male than female passengers, the density curves were scaled such that the area under each curve corresponds to the total number of male and female passengers with known age (468 and 288, respectively).
Figure 12.2: Head length versus body mass for 123 blue jays. The birds’ sex is indicated by color. At the same body mass, male birds tend to have longer heads (and specifically, longer bills) than female birds. Data source: Keith Tarvin, Oberlin College
Figure 12.3: Head length versus body mass for 123 blue jays. The birds’ sex is indicated by color, and the birds’ skull size by symbol size. Head-length measurements include the length of the bill while skull-size measurements do not. Head length and skull size tend to be correlated, but there are some birds with unusually long or short bills given their skull size. Data source: Keith Tarvin, Oberlin College
Figure 13.7: Monthly submissions to three preprint servers covering biomedical research. By direct labeling the lines instead of providing a legend, we have reduced the cognitive load required to read the figure. And the elimination of the legend removes the need for points of different shapes. Thus, we could streamline the figure further by eliminating the dots. Data source: Jordan Anaya, http://www.prepubmed.org/
Figure 14.7: Head length versus body mass for 123 blue jays. The birds’ sex is indicated by color. This figure is equivalent to Figure 12.2, except that now we have drawn linear trend lines on top of the individual data points. Data source: Keith Tarvin, Oberlin College
Figure 4.4: Median annual income in Texas counties. The highest median incomes are seen in major Texas metropolitan areas, in particular near Houston and Dallas. No median income estimate is available for Loving County in West Texas and therefore that county is shown in gray. Data source: 2015 Five-Year American Community Survey. Code
Figure 4.6: Percentage of people identifying as white in Texas counties. Whites are in the majority in North and East Texas but not in South or West Texas. Data source: 2010 Decennial U.S. Census. Code
Figure 15.11: Population density in every U.S. county, shown as a choropleth map. Population density is reported as persons per square kilometer. Data source: 2015 Five-Year American Community Survey. Code
Figure 16.5: Relationship between sample, sample mean, standard deviation, standard error, and confidence intervals, in an example of chocolate bar ratings. The observations (shown as jittered green dots) that make up the sample represent expert ratings of 125 chocolate bars from manufacturers in Canada, rated on a scale from 1 (unpleasant) to 5 (elite). The large orange dot represents the mean of the ratings. Error bars indicate, from top to bottom, twice the standard deviation, twice the standard error (standard deviation of the mean), and 80%, 95%, and 99% confidence intervals of the mean. Data source: Brady Brelinski, Manhattan Chocolate Society. Code
Figure 16.6: Confidence intervals widen with smaller sample size. Chocolate bars from Canada and Switzerland have comparable mean ratings and comparable standard deviations (indicated with simple black error bars). However, over three times as many Canadian bars were rated as Swiss bars, and therefore the confidence intervals (indicated with error bars of different colors and thickness drawn on top of one another) are substantially wider for the mean of the Swiss ratings than for the mean of the Canadian ratings. Data source: Brady Brelinski, Manhattan Chocolate Society. Code
Figure 16.7: Mean chocolate flavor ratings and associated confidence intervals for chocolate bars from manufacturers in six different countries. Data source: Brady Brelinski, Manhattan Chocolate Society. Code
Figure 16.8: Mean chocolate flavor ratings for manufacturers from five different countries, relative to the mean rating of U.S. chocolate bars. Canadian chocolate bars are significantly higher rated that U.S. bars. For the other four countries there is no significant difference in mean rating to the U.S. at the 95% confidence level. Confidence levels have been adjusted for multiple comparisons using Dunnett’s method. Data source: Brady Brelinski, Manhattan Chocolate Society. Code
Figure 16.9: Mean chocolate flavor ratings for manufacturers from four different countries, relative to the mean rating of U.S. chocolate bars. Each panel uses a different approach to visualizing the same uncertainty information. (a) Graded error bars with cap. (b) Graded error bars without cap. (c) Single-interval error bars with cap. (d) Single-interval error bars without cap. (e) Confidence strips. (f) Confidence distributions. Code
Figure 16.10: Mean butterfat contents in the milk of four cattle breeds. Error bars indicate +/- one standard error of the mean. Visualizations of this type are frequently seen in the scientific literature. While they are technically correct, they represent neither the variation within each category nor the uncertainty of the sample means particularly well. See Figure 7.11 for the variation in butterfat contents within individual breeds. Data Source: Canadian Record of Performance for Purebred Dairy Cattle. Code
Figure 16.15: Head length versus body mass for male blue jays, as in Figure 14.7. The straight blue line represents the best linear fit to the data, and the gray band around the line shows the uncertainty in the linear fit. The gray band represents a 95% confidence level. Data source: Keith Tarvin, Oberlin College. Code
Figure 16.17: Head length versus body mass for male blue jays. As in the case of error bars, we can draw graded confidence bands to highlight the uncertainty in the estimate. Data source: Keith Tarvin, Oberlin College. Code
Figure 21.1: Breakdown of passengers on the Titanic by gender, survival, and class in which they traveled (1st, 2nd, or 3rd). Code
Figure 21.3: Trends in Bachelor’s degrees conferred by U.S. institutions of higher learning. Shown are all degree areas that represent, on average, more than 4% of all degrees. This figure is labeled as “bad” because all panels use different y-axis ranges. This choice obscures the relative sizes of the different degree areas and it over-exagerates the changes that have happened in some of the degree areas. Data Source: National Center for Education Statistics. Code
Figure 21.4: Trends in Bachelor’s degrees conferred by U.S. institutions of higher learning. Shown are all degree areas that represent, on average, more than 4% of all degrees. Data Source: National Center for Education Statistics. Code
Figure 21.5: Trends in Bachelor’s Degrees conferred by U.S. institutions of higher learning. (a) From 1970 to 2015, the total number of degrees nearly doubled. (b) Among the most popular degree areas, social sciences, history, and education experienced a major decline, while business and health professions grew. Data Source: National Center for Education Statistics. Code
Figure 21.6: Variation of Figure 21.5 with poor labeling. The labels are too large and thick, they are in the wrong font, and they are placed in an awkward location. Also, while labeling with capital letters is fine and is in fact quite common, labeling needs to be consistent across all figures in a document. In this book, the convention is that multi-panel figures use lower lower-case labels, and thus this figure is inconsistent with the other figures in this book. Code
Figure 21.7: Physiology and body-composition of male and female athletes. (a) The data set encompasses 73 female and 85 male professional athletes. (b) Male athletes tend to have higher red blood cell (RBC, reported in units of (10^{12}) per liter) counts than female athletes, but there are no such differences for white blood cell counts (WBC, reported in units of (10^{9}) per liter). (c) Male athletes tend to have a lower body fat percentage than female athletes performing in the same sport. Data source: Telford and Cunningham (1991). Code
Figure 21.8: Physiology and body-composition of male and female athletes. This figure shows the exact same data as Figure 21.7, but now using a consistent visual language. Data for female athletes is always shown to the left of the corresponding data for male athletes, and genders are consistently color-coded throughout all elements of the figure. Data source: Telford and Cunningham (1991). Code
Figure 21.9: Variation of Figure 21.8 where all figure panels are slightly misaligned. Misalignments are ugly and should be avoided. Code
Figure 23.1: Percent body fat versus height in professional male Australian athletes. Each point represents one athlete. This figure devotes way too much ink to non-data. There are unnecessary frames around the entire figure, around the plot panel, and around the legend. The coordinate grid is very prominent, and its presence draws attention away from the data points. Data source: Telford and Cunningham (1991). Code
Figure 23.2: Percent body fat versus height in professional male Australian athletes. This figure is a cleaned-up version of Figure 23.1. Unnecessary frames have been removed, minor grid lines have been removed, and major grid lines have been drawn in light gray to stand back relative to the data points. Data source: Telford and Cunningham (1991). Code
Figure 23.3: Percent body fat versus height in professional male Australian athletes. In this example, the concept of removing non-data ink has been taken too far. The axis tick labels and title are too faint and are barely visible. The data points seem to float in space. The points in the legend are not sufficiently set off from the data points, and the casual observer might think they are part of the data. Data source: Telford and Cunningham (1991)
Figure 23.4: Percent body fat versus height in professional male Australian athletes. This figure adds a frame around the plot panel of Figure 23.2, and this frame helps separate the legend from the data. Data source: Telford and Cunningham (1991)
Figure 23.5: Survival of passengers on the Titanic, broken down by gender and class. This small-multiples plot is too minimalistic. The individual factes are not framed, so it’s difficult to see which part of the figure belongs to which facet. Further, the individual bars are not anchored to a clear baseline, and they seem to float.
Figure 23.6: Survival of passengers on the Titanic, broken down by gender and class. This is an improved version of Figure 23.5. The gray background in each facet clearly delineates the six groupings (survived or died in first, second, or third class) that make up this plot. Thin horizontal lines in the background provide a reference for the bar heights and facility comparison of bar heights among facets.
Figure 23.7: Stock price over time for four major tech companies. The stock price for each company has been normalized to equal 100 in June 2012. This figure mimics the ggplot2 default look, with white major and minor grid lines on a gray background. In this particular example, I think the grid lines overpower the data lines, and the result is a figure that is not well balanced and that doesn’t place sufficient emphasis on the data. Data source: Yahoo Finance
Figure 23.8: Indexed stock price over time for four major tech companies. In this variant of Figure 23.7, the data lines are not sufficiently anchored. This makes it difficult to ascertain to what extent they have deviated from the index value of 100 at the end of the covered time interval. Data source: Yahoo Finance
Figure 23.9: Indexed stock price over time for four major tech companies. Adding a thin horizontal line at the index value of 100 to Figure 23.8 helps provide an important reference throughout the entire time period the plot spans. Data source: Yahoo Finance
Figure 23.10: Indexed stock price over time for four major tech companies. Adding thin horizontal lines at all major y axis ticks provides a better set of reference points than just the one horizontal line of Figure 23.9. This design also removes the need for prominent x and y axis lines, since the evenly spaced horizontal lines create a visual frame for the plot panel. Data source: Yahoo Finance
Figure 26.1: The same 3D pie chart shown from four different angles. Rotating a pie into the third dimension makes pie slices in the front appear larger than they really are and pie slices in the back appear smaller. Here, in parts (a), (b), and (c), the blue slice corresponding to 25% of the data visually occupies more than 25% of the area representing the pie. Only part (d) is an accurate representation of the data. Code
Figure 26.2: Numbers of female and male passengers on the Titanic traveling in 1st, 2nd, and 3rd class, shown as a 3D stacked bar plot. The total numbers of passengers in 1st, 2nd, and 3rd class are 322, 279, and 711, respectively (see Figure 6.10). Yet in this plot, the 1st class bar appears to represent fewer than 300 passengers, the 3rd class bar appears to represent fewer than 700 passengers, and the 2nd class bar seems to be closer to 210–220 passengers than the actual 279 passengers. Furthermore, the 3rd class bar visually dominates the figure and makes the number of passengers in 3rd class appear larger than it actually is. Code
Figure 26.3: Fuel efficiency versus displacement and power for 32 cars (1973–74 models). Each dot represents one car, and the dot color represents the number of cylinders of the car. The four panels (a)–(d) show exactly the same data but use different perspectives. Data source: Motor Trend, 1974. Code
Figure 26.4: Fuel efficiency versus displacement and power for 32 cars (1973–74 models). The four panels (a)–(d) correspond to the same panels in Figure 26.3, only that all grid lines providing depth cues have been removed. Data source: Motor Trend, 1974. Code
Figure 26.5: Fuel efficiency versus displacement (a) and power (b). Data source: Motor Trend, 1974.
Figure 26.6: Power versus displacement for 32 cars, with fuel efficiency represented by dot size. Data source: Motor Trend, 1974.
Figure 26.7: Mortality rates in Virginia in 1940, visualized as a 3D bar plot. Mortality rates are shown for four groups of people (urban and rural females and males) and five age categories (50–54, 55–59, 60–64, 65–69, 70–74), and they are reported in units of deaths per 1000 persons. This figure is labeled as “bad” because the 3D perspective makes the plot difficult to read. Data source: Molyneaux, Gilliam, and Florant (1947)
Figure 26.8: Mortality rates in Virginia in 1940, visualized as a Trellis plot. Mortality rates are shown for four groups of people (urban and rural females and males) and five age categories (50–54, 55–59, 60–64, 65–69, 70–74), and they are reported in units of deaths per 1000 persons. Data source: Molyneaux, Gilliam, and Florant (1947)
Figure 29.1: Growth in monthly submissions to the quantitative biology (q-bio) section of the preprint server arXiv.org. A sharp transition in the rate of growth can be seen around 2014. While growth was rapid up to 2014, almost no growth occurred from 2014 to 2018. Note that the y axis is logarithmic, so a linear increase in y corresponds to exponential growth in preprint submissions. Data source: Jordan Anaya, http://www.prepubmed.org/. Code
Figure 29.2: The leveling off of submission growth to q-bio coincided with the introduction of the bioRxiv server. Shown are the growth in monthly submissions to the q-bio section of the general-purpose preprint server arxiv.org and to the dedicated biology preprint server bioRxiv. The bioRxiv server went live in November 2013, and its submission rate has grown exponentially since. It seems likely that many scientists who otherwise would have submitted preprints to q-bio chose to submit to bioRxiv instead. Data source: Jordan Anaya, http://www.prepubmed.org/. Code
Figure 29.3: Mean arrival delay versus distance from New York City. Each point represents one destination, and the size of each point represents the number of flights from one of the three major New York City airports (Newark, JFK, or LaGuardia) to that destination in 2013. Negative delays imply that the flight arrived early. Solid lines represent the mean trends between arrival delay and distance. Delta has consistently lower arrival delays than other airlines, regardless of distance traveled. American has among the lowest delays, on average, for short distances, but has among the highest delays for longer distances traveled. This figure is labeled as “bad” because it is overly complex. Most readers will find it confusing and will not intuitively grasp what it is the figure is showing. Data source: U.S. Dept. of Transportation, Bureau of Transportation Statistics. Code
Figure 29.4: Mean arrival delay for flights out of the New York City area in 2013, by airline. American and Delta have the lowest mean arrival delays of all airlines flying out of the New York City area. Data source: U.S. Dept. of Transportation, Bureau of Transportation Statistics.
Figure 29.5: Number of flights out of the New York City area in 2013, by airline. Delta and American are fourth and fifths largest carrier by flights out of the New York City area. Data source: U.S. Dept. of Transportation, Bureau of Transportation Statistics.
Figure 29.6: United Airlines departures out of Newark Airport (EWR) in 2013, by weekday. Most weekdays show approximately the same number of departures, but there are fewer departures on weekends. Data source: U.S. Dept. of Transportation, Bureau of Transportation Statistics.
Figure 29.7: Departures out of airports in the New York city area in 2013, broken down by airline, airport, and weekday. United Airlines and ExpressJet make up most of the departures out of Newark Airport (EWR), JetBlue, Delta, American, and Endeavor make up most of the departures out of JFK, and Delta, American, Envoy, and US Airways make up most of the departures out of LaGuardia (LGA). Most but not all airlines have fewer departures on weekends than during the work week. Data source: U.S. Dept. of Transportation, Bureau of Transportation Statistics.
lecture)lab)leaflet or ggplot2 + plotly) enable users to explore geographic patterns, drill down into regions, and overlay multiple data layers.ggraph or visNetwork) allow users to interactively explore relationships, zoom into clusters, and track influence across networks.shiny or flexdashboard) allow decision-makers to dynamically filter data, generate custom reports, and make data-driven decisions in real-time.#| echo: true
#| output: false
# Load required packages
library(plotly)
library(ggplot2)
# Load the mtcars dataset (built-in R dataset)
data(mtcars)
# --- Example 1: Standalone Plotly Plot ---
# Create an interactive scatter plot with Plotly
plotly_plot <- plot_ly(
data = mtcars,
x = ~wt, # Weight (x-axis)
y = ~mpg, # Miles per gallon (y-axis)
color = ~factor(cyl), # Color by number of cylinders
size = ~hp, # Size points by horsepower
type = "scatter",
mode = "markers",
text = ~paste("HP:", hp, "<br>Cyl:", cyl), # Hover text
hoverinfo = "text"
) %>%
layout(
title = "Standalone Plotly: Weight vs MPG",
xaxis = list(title = "Weight (1000 lbs)"),
yaxis = list(title = "Miles per Gallon"),
legend = list(title = list(text = "Cylinders"))
)
# Display the standalone Plotly plot
plotly_plotggplotly(), adding interactivity to static plots effortlessly.#| echo: true
#| output: false
# Load required packages
library(plotly)
library(ggplot2)
# Load the mtcars dataset (built-in R dataset)
data(mtcars)
# --- Example 2: ggplot2 with ggplotly ---
# Create a ggplot2 scatter plot
ggplot_plot <- ggplot(mtcars, aes(x = wt, y = mpg, color = factor(cyl), size = hp)) +
geom_point() +
labs(
title = "ggplot2: Weight vs MPG",
x = "Weight (1000 lbs)",
y = "Miles per Gallon",
color = "Cylinders",
size = "Horsepower"
) +
theme_minimal()
# Convert the ggplot2 plot to an interactive Plotly plot
ggplotly_plot <- ggplotly(ggplot_plot)
# Display the ggplotly plot
ggplotly_plot
# Optional: Save both plots as HTML files for sharing
#htmlwidgets::saveWidget(plotly_plot, "plotly_standalone.html")
#htmlwidgets::saveWidget(ggplotly_plot, "ggplotly_converted.html")plot_ly() function, initializing a plot with data and type specifications.add_trace() or plot_ly().layout() function for fine-tuning.mode (e.g., “lines”, “markers”) and marker control visual styling within traces.plot_ly() starts a plot, combining data, type (e.g., “scatter”), and mappings in one call.add_trace() layers additional data series, such as lines or points, onto an existing plot.layout() adjusts non-data elements, like titles or axis ranges, for presentation polish.ggplotly() converts ggplot2 objects into interactive Plotly plots with minimal effort.config() tweaks interactivity options, like hiding the toolbar or enabling downloads.plot(), enabling live data exploration.ui object, defining the app’s layout and input/output elements.server function handles logic, linking inputs to reactive outputs dynamically.sliderInput() or selectInput() capture user interactions for real-time updates.plotOutput() or tableOutput() display results driven by server calculations.shinyApp() launches an app by combining the ui and server components into one call.renderPlot() generates dynamic plots in the server, responding to user inputs.reactive() creates reactive expressions, caching computations for efficient updates.observe() triggers side effects, like printing messages, based on input changes.runApp() starts a Shiny app from a directory or file, streamlining development and testing.#| echo: true
#| output: false
# app.R
library(shiny)
library(threejs)
library(igraph)
library(htmlwidgets)
# Load the Zachary Karate Club network
zachary <- make_graph("Zachary")
# Extract nodes and edges for initial setup
nodes <- data.frame(id = 1:vcount(zachary), label = paste("Node", 1:vcount(zachary)))
edges <- as.data.frame(as_edgelist(zachary))
colnames(edges) <- c("from", "to")
# Define the Shiny UI
ui <- fluidPage(
titlePanel("Interactive 3D Zachary Network"),
sidebarLayout(
sidebarPanel(
h4("Select Nodes"),
checkboxGroupInput(
inputId = "selected_nodes",
label = "Choose nodes to display:",
choices = nodes$id,
selected = nodes$id # All nodes selected by default
)
),
mainPanel(
htmlOutput("network", width = "100%", height = "600px")
)
)
)
# Define the Shiny server
server <- function(input, output, session) {
# Reactive graph based on selected nodes
filtered_graph <- reactive({
# Get selected nodes
selected <- as.numeric(input$selected_nodes)
# If no nodes are selected, return an empty graph
if (length(selected) == 0) {
return(make_empty_graph())
}
# Filter edges to include only those between selected nodes
filtered_edges <- edges[edges$from %in% selected & edges$to %in% selected, ]
# Create a new igraph object with selected nodes and filtered edges
g <- graph_from_edgelist(as.matrix(filtered_edges), directed = FALSE)
# Add isolated nodes (selected nodes that have no edges in the filtered set)
isolated_nodes <- setdiff(selected, unique(c(filtered_edges$from, filtered_edges$to)))
if (length(isolated_nodes) > 0) {
g <- add_vertices(g, length(isolated_nodes), name = isolated_nodes)
}
return(g)
})
# Render the 3D network graph
output$network <- renderUI({
g <- filtered_graph()
# If the graph is empty, return a blank HTML message
if (vcount(g) == 0) {
return(HTML("<p>No nodes selected.</p>"))
}
# Generate 3D layout
coords <- layout_with_fr(g, dim = 3)
V(g)$x <- coords[, 1]
V(g)$y <- coords[, 2]
V(g)$z <- coords[, 3]
# Create the graphjs widget
graph_widget <- graphjs(
g,
vertex.size = 0.5,
vertex.label = paste("Node", V(g)$name),
edge.width = 1,
edge.color = "gray",
vertex.color = "blue"
)
# Return the widget as HTML
graph_widget
})
}
# Run the Shiny app
shinyApp(ui, server)#| echo: true
#| output: false
# app.R
library(shiny)
library(visNetwork)
library(igraph)
# Load the Zachary Karate Club network
zachary <- make_graph("Zachary")
# Extract nodes and edges
nodes <- data.frame(id = 1:vcount(zachary), label = paste("Node", 1:vcount(zachary)))
edges <- as.data.frame(as_edgelist(zachary))
colnames(edges) <- c("from", "to")
# Define UI
ui <- fluidPage(
titlePanel("Interactive Zachary Network"),
sidebarLayout(
sidebarPanel(
h4("Select Nodes"),
checkboxGroupInput(
inputId = "selected_nodes",
label = "Choose nodes to display:",
choices = nodes$id,
selected = nodes$id
)
),
mainPanel(visNetworkOutput("network"))
)
)
# Define Server
server <- function(input, output, session) {
filtered_graph <- reactive({
selected <- as.numeric(input$selected_nodes)
# Return empty graph early if no nodes selected
if (length(selected) == 0) {
return(make_empty_graph(n = 0))
}
filtered_edges <- edges[edges$from %in% selected & edges$to %in% selected, ]
# Create graph based on whether there are edges
if (nrow(filtered_edges) > 0) {
g <- graph_from_edgelist(as.matrix(filtered_edges), directed = FALSE)
} else {
g <- make_empty_graph(n = 0) # Start with truly empty graph
}
# Add isolated nodes with their original IDs
isolated_nodes <- setdiff(selected, unique(c(filtered_edges$from, filtered_edges$to)))
if (length(isolated_nodes) > 0) {
g <- add_vertices(g, length(isolated_nodes), name = as.character(isolated_nodes))
}
# Set vertex names based on actual vertices present
if (vcount(g) > 0) {
# Use the vertex IDs that are actually in the graph
current_vertices <- unique(as.numeric(c(filtered_edges$from, filtered_edges$to, isolated_nodes)))
V(g)$name <- as.character(current_vertices[1:vcount(g)])
}
return(g)
})
output$network <- renderVisNetwork({
g <- filtered_graph()
# Handle empty graph case
if (vcount(g) == 0) {
return(visNetwork(nodes = data.frame(id = integer(0), label = character(0)),
edges = data.frame(from = integer(0), to = integer(0))))
}
# Create node data frame with unique IDs
node_data <- data.frame(
id = as.numeric(V(g)$name), # Ensure IDs are numeric and unique
label = paste("Node", V(g)$name)
)
# Remove any duplicate IDs (safeguard)
node_data <- node_data[!duplicated(node_data$id), ]
# Create edge data frame, handling case with no edges
edge_data <- if (ecount(g) > 0) {
edge_matrix <- ends(g, E(g))
data.frame(
from = as.numeric(edge_matrix[,1]),
to = as.numeric(edge_matrix[,2])
)
} else {
data.frame(from = integer(0), to = integer(0))
}
visNetwork(nodes = node_data, edges = edge_data) %>%
visEdges(arrows = "to") %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE)
})
}
# Run the app
shinyApp(ui, server)Computer lab 4
References