Components for basic diffusion modeling with socmod

This vignette introduces agent-based modeling in socmod starting with basic functional and object-oriented programming in R. It therefore builds on an assumed knowledge of R basics which can be gained by studying, for example, Hands-On Programming with R or other basic R tutorial introductions.

To model social behavior it helps to have intuitive software tools grounded in an empirically-motivated, self-consistent scientific theory. socmod provides those tools, but to put it to best use it’s helpful to know how it works, especially for people used to using R in a data science context. Many of the same tools are transferable, especially declarative/functional programming, as is used and encouraged by the tidy approach in R for Data Science.

To create simulations that generate social behavior we have to go beyond tabular representations of the world to more complex representations of people and their interactions that can be measured. These measurements are recorded in a standard tabular format, e.g., CSV, which can then be analyzed using tidy/R for Data Science strategies.

In agent-based modeling, we create software representations of simulated people, i.e. agents, or other interacting entities, including the agents’ environment. Object-oriented programming is the natural choice for software design in this case because it provides a structure for defining custom objects like socially interacting agents. An object is one bit of data that could be a number or character type, but could also be something more complicated. In object-oriented programming we define custom classes that specify various data fields and function methods for maintaining and modifying the state of objects. The fields and methods of a class therefore are sets of related data and functions to represent things in the world.

To summarize, objects are a way to track and modify the state of different software entities. We can create simulations of real-world systems by defining custom object types called classes whose state and behaviors are modeled on relevant real-world features and behaviors.

In socmod, we define custom R objects for agent-based models of social behavior using the R6Class method that creates a new object type, i.e., a new class. We use tidy-style functional programming where helpful to represent model features and dynamics.

In the remainder of this vignette, I will first review variable assignment and data structures in R including vectors, lists, data.frames, and tibbles. We then review relevant topics in functional programming, then demonstrate how to create custom classes in R with the library R6. This tutorial then closes with a demonstration of a simple four-agent agent-based model of social behavior, kept simple to highlight the functional and object-oriented design patterns in socmod, representing Agent, an AgentBasedModel, and Trialclasses defined using the R6Class function.

Data collection types in R

In R, there is a hierarchy of data collection types that are necessary to know about for functional programming, which often involves applying a function across every element of a collection of data. Data collections include vectors, lists, data.frames, and tibbles, which we’ll cover here.

Collections of basic data types

In computer science, a collection is an abstract data type that organizes and stores instances of other data types. In R, the most basic data types are numbers (e.g., double or integer) and text, whose data type is character in R whether it’s a single character or several:

print(typeof('c'))

[1] "character"

print(typeof("social science is kewl"))

[1] "character"

Vectors

R vectors are defined using the c() function, e.g., vec <- c(0, 0, 1). The data type of all vector elements must be the same. This is enforced by R coercing data to different types, for example:

print(c(0, 0, 1))

[1] 0 0 1

print(c(0, 0, "yo"))

[1] "0"  "0"  "yo"

We index vectors using single square brackets, with R indexing starting from 1:

vec <- c(1, 2, 3, 4, 5) # equiv to vec <- 1:5 or seq(1, 5)
print(vec[1])

[1] 1

print(vec[length(vec)])

[1] 5

Lists

If one wants to keep elements of different types in a single collection, use the R list:

l <- list(0, 542, "yo")
print(l)

[[1]]
[1] 0

[[2]]
[1] 542

[[3]]
[1] "yo"

Note the visual representation has changed on printout. We now need to use double-square brackets to index list elements themselves:

print(l[[3]])

[1] "yo"

If we leave off one of the square brackets we effectively get a sub-list with just one element:

print(l[3])

[[1]]
[1] "yo"

This is useful if you want to create a new sub-list with more than one element:

print(l[c(2, 3)])

[[1]]
[1] 542

[[2]]
[1] "yo"

Named lists

Named lists are the primary key-value store in R, just like dict in Python. It allows us to label entries of the list and access them using double square brackets with the character name or using the $ access operator:

named_l <- list(a = c(0, 5, 6), b = c(7, 8, 9))
print(named_l$a == named_l[["a"]]) # compares element-by-element

[1] TRUE TRUE TRUE

print(all(named_l$b == named_l[["b"]]))

[1] TRUE

`tibble` (and `data.frame`)

Both the tibble and data.frame classes represent tabular data, meaning data that can be represented in table format, e.g., in comma- or tab-separated value format. There are some subtle differences listed below, but we’ll use tibble for representing our tables.

Tibbles are like fancy lists that have special properties that make data manipulation more efficient. The details of how this works aren’t super important at this point. The important thing to know is how one can initialize tibbles and access tibble columns similarly to lists, as shown in the following example:

tbl <- tibble::tibble(a = c(0, 5, 6), b = c(7, 8, 9))
print(tbl$a)

[1] 0 5 6

print(tbl[["b"]])

[1] 7 8 9

Using tibbles ensures that data manipulation and analysis using the tidyverse will work as expected. For example, the tidyverse provides functions for analyzing different groups of data within a larger dataset. This is a common data analysis pattern called split-apply-combine, which in the tidyverse translates to group-by and summarise when using the dplyr library. For example, we can calculate the mean of measurements in “experimental” and “control” conditions in some fake data:

observations <- tibble::tibble(
  condition = c("experimental", "experimental", "control", "control"),
  measurement = c(13.5, 14.6, 3.4, 5.4)
)

mean_measurement_tbl <- 
  observations %>% 
  dplyr::group_by(condition) %>% 
  dplyr::summarise(mean_measurement = mean(measurement))

print(mean_measurement_tbl)

# A tibble: 2 × 2
  condition    mean_measurement
  <chr>                   <dbl>
1 control                   4.4
2 experimental             14.0

Difference between `data.frame` and `tibble`

The tibble library provides a table data representation (also called tibble) that is a bit more flexible and intuitive than the R built-in data.frame. One reason I prefer tibbles is because traditional data.frames automatically convert strings to factors unless you tell them not to. Tibbles don’t, so you’re never surprised.

Column naming is also more flexible with tibbles. A data.frame requires syntactically valid R names, while tibbles can handle column names that include spaces or even non-standard characters. I often use this feature to use column names like Mean adaptation success that print nicely when used as labels in ggplot.

Finally, tibble operations always return a tibble. For example, if df were a data.frame, the operation df[, 1] would return a vector by default. If it were a tibble it would return another tibble, making behavior more predictable in data analysis pipelines.

Functional programming

Functional programming is especially useful for writing code that applies the same function to several inputs, but wants to leave it up to the user to specify exactly which function should be applied. In socmod we pass functions as arguments to other functions to specify how agents pick their interaction partners (i.e., teachers in a learning context) and how social learning works. Here are some simple examples demonstrating the key concepts and techniques of functional programming that could help with socmod programming.

Higher-order functions: functions with function arguments

In R, and other programming languages supporting the functional style, one can treat functions like any other data and pass it as an argument to other functions. A function that accepts a function as an argument is called a higher-order function. Here is a simple example of a higher order function that takes some data as a first argument and applies the function f to that data twice, putting the result of each calculation in a two-element vector.

repeat_2_higher_order_func <- function(data, f) {
  return (c(f(data), f(data)))
}

Below we call this higher order function by providing data=2 as the first argument and an anonymous function as the second argument, which can be written using \(arg) { ...function body... } syntax:

# \(x) {...} is equivalent to function(x) { ... }; these are 
# anonymous functions.
# Expecting to return c(4, 4)
repeat_2_higher_order_func(data = 2, f = \(x) { return (x * 2) })

[1] 4 4

The above is the representation of a vector printed to screen, so we see that our expectations were matched.

c(4, 4)

[1] 4 4

`map`: a common, useful higher-order function used often in `socmod`

One of the most useful and common higher-order functions is the map function. This function maps a function, denoted .f below, onto every element in a collection, denoted .x. This notation is from the map family of functions in the purrr library for tidy functional programming.

library(purrr)

Here is an example of mapping an anonymous function that multiplies its input by 3 onto a vector with entries 2 and 8:

# Now get we use map_vec that applies .f to every element of .x,
# expecting the following to return a vector with elements 3*2 
# and 3*8, i.e., c(6, 24).
purrr::map_vec(.x = c(2, 8), .f = \(x) { return(3 * x)})

[1]  6 24

If we want to apply a function that takes two variables, the value and index of an element in a collection, we can use the imap family of functions as follows to return a vector that contains the original element multiplied by its place in the input vector:

input_vec <- c(2, 5050, 6)
purrr::imap_vec(input_vec, \(el, idx) { return (idx * el) })

[1]     2 10100    18

Custom objects: R6 classes in socmod

Classes are ways to encapsulate diverse distinct, but related, processes, behaviors, data, attributes, and other types of information in a single object, i.e., a software representation of an instance of that entity. R6 is a library for creating our own custom classes that serve as an abstract template that specifies what distinguishes different types of objects/entities. Below we show first how to create a new agent, i.e., a new instance of the Agent class that is provided by socmod. After that is a fun example of how we can design a different social behavioral model, one of football/soccer matches. We write classes for players and teams and develop a play_match(team1, team2) function that pits two teams against each other.

The `Agent` class in `socmod`

Below we create a new instance of the Agent class using the class constructor, the function written socmod::Agent$new() below that creates a new instance of the class.

a1 <- socmod::Agent$new(1, name = "Matt", 
                        behavior = "Adaptive", 
                        fitness=1e6)

We can use the access operator, $ in R, to access the fields (i.e., attributes) of agent a1 like so:

print(a1$get_fitness())

[1] 1e+06

print(a1$get_name())

[1] "Matt"

print(a1$get_id())

[1] 1

print(a1$get_behavior())

[1] "Adaptive"

We can also use purrr::map functions over neighbors like so:

# Assign neighbors to a1.

a1$set_neighbors(c(
  socmod::Agent$new(id = 2, name = "n2"),
  socmod::Agent$new(id = 3, name = "n3")
))

# Get the list of neighbors back to check it worked.
neighbors <- a1$get_neighbors()
print(class(neighbors))  # should be [1] "Neighbors" "R6"

[1] "Neighbors" "R6"

# Neighbors$map() returns a list...
neighbor_names <- neighbors$map(\(n) n$get_name())
print(neighbor_names)

[[1]]
[1] "n2"

[[2]]
[1] "n3"

# ...use `unlist` to convert it to a vector:
print(unlist(neighbor_names))

[1] "n2" "n3"

Exercise: design and define classes from scratch

In class we started creating our model of a soccer player agent called Footballer, defined below. We ran out of time at the end of class to write methods for Footballer, i.e., ways that a football player could interact with the world, or that the world could act upon a soccer player. However we only had time to create a stub for two methods. A stub is a minimal chunk of code that does very little to nothing, but doesn’t get in the way by causing errors or anything like that. It enables us to document our plans for future development in the exact place where it would happen in the code.

Below we have stubs for score_goal and get_penalty methods for in-game behaviors. Other possibilities could include get_traded that would change its team and perhaps get_signed for cases where a player is a free agent.

library(R6)

Footballer <- R6Class("Footballer",
                        
  public = list(
    # Listing attributes as fields and 
    # setting to zero for their definition.
    speed = 0.0,  # units of max km/h
    accuracy = 0.0, # probability of scoring on a shot
    market_value = 0.0, # 
    aggressiveness = 0.0, # units of penalties per match
    team = "",
    
    initialize = function(speed = 15, 
                          accuracy = 0.2, market_value = 1e6, 
                          aggressiveness = 0.5, 
                          team = "Free agent") {
      self$speed = speed
      self$accuracy = accuracy
      self$market_value = market_value
      self$aggressiveness = aggressiveness
      self$team = team
    },
    
    # Stub two SoccerPlayer class methods...
    # ...one for scoring a goal in a game...
    scored_goal = function() {
      return (ifelse(runif(1) < self$accuracy, 1, 0))
    },
    # ...and one for getting a penalty in a game.
    get_penalty_on_play = function() {
      return (runif(1) < self$aggressiveness)
    }
  )
)

Team <- R6Class("Team",
                
  public = list(
    name = "",
    players = list(),
    wins = 0,
    ties = 0,
    payroll = 0,
    games_played = 0,
    
    # Team name is required, with players optionally specified.
    initialize = function(name, players = list()) {

      self$name = name

      # Initialize players and payroll.
      for (player in players) {
        self$payroll <- self$payroll + player$market_value
        player$team <- name
      }

      self$players <- players
    },
    
    # Add a player to the roster.
    sign_player = function(player) {
      # Add the player to the team.
      self$players <- c(players, player)
      # Update payroll.
      self$payroll <- self$payroll + player$market_value
      # Update the player's team to be this team's name.
      player$team <- self$name
    }
  )
)

footballer <- Footballer$new(accuracy = 0.5)
footballer$scored_goal()

[1] 0

# sum(purrr::map_vec(1:10, \(.) p$scored_goal()))

We can define a function that simulates playing a football match where the team that scores more goals wins, or a tie if teams have the same score. We track penalties received but assume they don’t have an effect on the match outcome. We assume each player gets ten chances to score a goal. We leave it to the reader as an exercise to make use of the Footballer$get_penalty_on_play() method within the game to add penalties and consequences in the game (e.g., penalty kicks or something), and to use the Team$sign_player() add more players to each roster, which should increase the mean total scores in each game based on the simple way we’ve modeled gameplay in play_match.

# Define a function that models a football match.
play_match <- function(team1, team2) {
  
  # Walk over each player to see how many scores they get, 
  # summing to get the total team score. 
  team1_score <- sum(
    purrr::map_vec(
      team1$players,
      \(player) {
      # Calculate the total goals scored by the current player.
        sum(purrr::map_vec(1:10, \(.) player$scored_goal()))
      }
    ) 
  )
  
  team2_score <- sum(
    purrr::map_vec(
      team2$players,
      \(player) {
      # Calculate the total goals scored by the current player.
        sum(purrr::map_vec(1:10, \(.) player$scored_goal()))
      }
    ) 
  )
  
  if (team1_score == team2_score) {
    team1$ties <- team1$ties + 1
    team2$ties <- team2$ties + 1
    cat("Tie game!\n")
  } else if (team1_score > team2_score) {
    cat(team1$name, "wins!!!\n")
    team1$wins <- team1$wins + 1
  } else {
    cat(team2$name, "wins!!!\n")
    team2$wins <- team2$wins + 1
  }
  
  cat(team1$name, ": ", team1_score, "   ",  
      team2$name, ": ", team2_score, "\n")
  
  team1$games_played <- team1$games_played + 1
  team2$games_played <- team2$games_played + 1
}

First we need teams, which we create and to which we assign players using the constructor, Team$new() in the code below.

# Initialize two teams, Whales and Squirrels, each with two
# players. All players have identical default attributes.
whales <- Team$new(
  name = "Whales", 
  players = c(Footballer$new(), Footballer$new())
)
squirrels <- Team$new(
  name = "Squirrels", 
  players = c(Footballer$new(), Footballer$new())
)

# Play three matches.
play_match(whales, squirrels)

Squirrels wins!!!
Whales :  5     Squirrels :  10

play_match(squirrels, whales)

Whales wins!!!
Squirrels :  2     Whales :  4

play_match(squirrels, whales)

Squirrels wins!!!
Squirrels :  7     Whales :  2

# Print how many games each time won.
cat("\nAfter three games...", "\nThe Whales have won", 
    whales$wins, "games and the Squirrels have won", squirrels$wins)


After three games...
The Whales have won 1 games and the Squirrels have won 2

Here we develop our first agent-based models of social behavior. We’ll start small, with the following four-household network of solar panel electrification using only success-biased learning (Figure 1). Then we’ll do a detailed comparison of different social learning strategies with different parameterizations.

Figure 1: Example social network and sustainable adaptation diffusion problem.

In the example, agent 2 has adopted rooftop solar. It has some adaptive fitness, which can be thought of as the financial gain from adopting a sustainable, adaptive behavior. We can directly create the social network using igraph::make_graph() then set it as the model’s social network, along with setting the learning strategy to be contagion learning. The adoption_rate parameter specifies how likely another agent is to adopt the adaptive behavior when exposed to it. The drop_rate is how likely an agent doing the adaptive behavior is to stop doing it and revert to the non-adaptive behavior.

Example 1: Time series of adaptation diffusion

# Run model and save figure
g <- igraph::make_graph(~ 1-2, 1-3, 1-4, 3-2)
mps <- make_model_parameters(
  contagion_learning_strategy, graph = g, adoption_rate = 0.8, drop_rate = 0.1
)
cabm <- make_abm(mps)
cabm$get_agent(2)$set_behavior("Adaptive")
trial <- run_trial(cabm, stop = fixated)

p <- plot_prevalence(trial)

# Same, but now call the adaptive behavior "Baglowie" instead of "Adaptive".
g <- igraph::make_graph(~ 1-2, 1-3, 1-4, 3-2)
mps <- make_model_parameters(
  contagion_learning_strategy, graph = g, adoption_rate = 0.8, drop_rate = 0.1
)

cabm <- make_abm(mps)
cabm$get_agent(2)$set_behavior("Baglowie");

trial <- run_trial(cabm, stop = fixated, adaptive_behavior = "Baglowie")

p <- plot_prevalence(trial, behavior_order = c("Legacy", "Baglowie"))

Example: comparison of learning strategies and parameters

Agent-based models in socmod are composed of sub-models of cognition and social behavior and social networks. One component of agent cognition is its social learning strategy. Three social learning strategies are included in socmod: success-biased, frequency-biased, and contagion learning strategies see ABM course notes for more details.

Here we show how to compare the three learning strategies across different learning parameters. One parameter we systematically vary affects only success-biased learning, and the other we systematically vary affects only contagion learning. We will design computational experiments to understand their effects, and overall differences between the three learning strategies.

# Load ggplot2 to avoid writing it during plotting below.
library(ggplot2)
# Model generation function used in both computational experiments below.
gen <- function(model_parameter_row) {
  
  # Extract adaptive_fitness to create agents.
  adaptive_fitness <- model_parameter_row$adaptive_fitness
  
  agent_1 <- socmod::Agent$new(1, behavior = "Legacy", 
                               fitness = 1.0, name = "a1")
  agent_2 <- socmod::Agent$new(2, behavior = "Adaptive", 
                               fitness = adaptive_fitness, name = "a2")
  agent_3 <- socmod::Agent$new(3, behavior = "Legacy", 
                               fitness = 1.0, name = "a3")
  agent_4 <- socmod::Agent$new(4, behavior = "Legacy", 
                               fitness = 1.0, name = "a4")
  
  agents <- list(agent_1, agent_2, agent_3, agent_4)
  graph <- igraph::make_graph(~ 1-2, 1-3, 1-4, 3-2)
  
  # Extract other necessary model parameters.
  learning_strategy <- model_parameter_row$learning_strategy
  drop_rate <- model_parameter_row$drop_rate
  adoption_rate <- model_parameter_row$adoption_rate
  
  # Make ModelParameters to encapsulate this model's parameters.
  model_parameters <- socmod::make_model_parameters(
    learning_strategy, graph, adaptive_fitness = adaptive_fitness, 
    adoption_rate = adoption_rate, drop_rate = drop_rate
  )
  
  return (
    socmod::make_abm(
      model_parameters,
      agents = agents
    )
  )
}

Experiment over adaptive fitness values

# Run the adaptive fitness trials if there is not already a variable
# name for it in the current environment. Change n_trials_per_param to
# a small number like 2-20 for development purposes.
if (!("trials_adaptive_fitness" %in% ls(all.names = TRUE))) {
  trials_adaptive_fitness <- socmod::run_trials(
    gen, 
    n_trials_per_param = 100,  # change to 2-20 for shorter runs
    stop = socmod::fixated, 
    syncfile = "trials-adaptive_fitness.RData",
    # overwrite = TRUE,
    learning_strategy = c(socmod::success_bias_learning_strategy,
                          socmod::frequency_bias_learning_strategy,
                          socmod::contagion_learning_strategy),
    adaptive_fitness = seq(0.8, 2.4, 0.2),
    adoption_rate = 0.6,
    drop_rate = 0.2
  )
}

trials_summary <- socmod::summarise_by_parameters(
  trials_adaptive_fitness, c("learning_strategy", "adaptive_fitness")
)

trials_success_rate <- dplyr::filter(trials_summary, Measure == "success_rate")

Now plot…

p <- ggplot(trials_success_rate, 
              aes(x=adaptive_fitness, y=Value, 
                  color=learning_strategy)
) +
  geom_line(linewidth=1.0) + geom_point(size=2.15) +
  ggsci::scale_color_aaas() + ggsci::scale_fill_aaas() +
  xlab("Adaptive fitness") + ylab("Success rate") +
  scale_x_continuous(breaks = sort(
    unique(trials_summary$adaptive_fitness)
  )) +
  theme_classic(base_size=14) +
  ylim(0, 1.0) +
  guides(color = guide_legend(title = "Learning strategy")) +
  ggtitle("Adoption rate = 0.6, drop rate = 0.2")

ggsave("vignettes/resources/adaptive_fitness_experiment.png", p, 
       width = 5, height = 3)

Example computational experiment testing effect of adaptive fitness on success rate across the three learning strategies.

Experiment over adoption rate values

if (!("trials_adoption" %in% ls(all.names = TRUE))) {
  trials_adoption <- socmod::run_trials(
    gen,
    n_trials_per_param = 100,
    stop = socmod::fixated, 
    syncfile = "trials-adoption-rate.RData",
    overwrite = TRUE,
    learning_strategy = c(socmod::success_bias_learning_strategy,
                          socmod::frequency_bias_learning_strategy,
                          socmod::contagion_learning_strategy),
    adaptive_fitness = 1.4,
    adoption_rate = c(0.05, 0.2, 0.4, 0.6, 0.8, 1.0),
    drop_rate = 0.2
  )
}
trials_summary <- socmod::summarise_by_parameters(
  trials_adoption, c("learning_strategy", "adoption_rate")
)

trials_success_rate <- dplyr::filter(
  trials_summary, Measure == "success_rate"
)

p <- ggplot(trials_success_rate, 
            aes(x=adoption_rate, y=Value, color=learning_strategy)) +
  geom_line(linewidth=1.0) + geom_point(size=2.15) +
  ggsci::scale_color_aaas() + ggsci::scale_fill_aaas() +
  xlab("Adoption rate") + ylab("Success rate") +
  scale_x_continuous(breaks = sort(unique(trials_summary$adoption_rate))) +
  theme_classic(base_size=14) +
  ylim(0, 1.0) +
  guides(color = guide_legend(title = "Learning strategy")) +
  ggtitle("Adaptive fitness = 1.4, drop rate = 0.2")

ggsave("vignettes/resources/adoption_rate_experiment.png", 
       p, width=5, height=3)

Example computational experiment testing effect of adoption rate on success rate across the three learning strategies.