Risk Analysis, Vol. 35, No. 9, 2015

DOI: 10.1111/risa.12325

Efficient Allocation of Resources for Defense of Spatially Distributed Networks Using Agent-Based Simulation William M. Kroshl,1,∗ Shahram Sarkani,2 and Thomas A Mazzuchi3

This article presents ongoing research that focuses on efficient allocation of defense resources to minimize the damage inflicted on a spatially distributed physical network such as a pipeline, water system, or power distribution system from an attack by an active adversary, recognizing the fundamental difference between preparing for natural disasters such as hurricanes, earthquakes, or even accidental systems failures and the problem of allocating resources to defend against an opponent who is aware of, and anticipating, the defender’s efforts to mitigate the threat. Our approach is to utilize a combination of integer programming and agent-based modeling to allocate the defensive resources. We conceptualize the problem as a Stackelberg “leader follower” game where the defender first places his assets to defend key areas of the network, and the attacker then seeks to inflict the maximum damage possible within the constraints of resources and network structure. The criticality of arcs in the network is estimated by a deterministic network interdiction formulation, which then informs an evolutionary agent-based simulation. The evolutionary agent-based simulation is used to determine the allocation of resources for attackers and defenders that results in evolutionary stable strategies, where actions by either side alone cannot increase its share of victories. We demonstrate these techniques on an example network, comparing the evolutionary agent-based results to a more traditional, probabilistic risk analysis (PRA) approach. Our results show that the agent-based approach results in a greater percentage of defender victories than does the PRA-based approach.

KEY WORDS: Agent-based simulation; game theory; resource allocation; terrorism

1. INTRODUCTION

protecting those systems from varying threats, both those resulting from natural events and from human design. One of the most pivotal decisions that must be made is the allocation of protective resources to various portions of the network. In practice, resources are never “unlimited,” and decisions must be made regarding how a limited quantity of resources should be allocated. In the context of this work, we consider that “resources” represent a finite, fungible, and unitary measure of either protective or offensive capability. Resources could be effort applied to gathering intelligence about a particular part of the network, or surveillance to provide information about those attacking a particular part of the network, or guards/terrorist units assigned to defend or target a

Those individuals responsible for the security of networks and systems are faced with the problem of 1 Engineering

Management and Systems Engineering, EMSE Off Campus Programs, George Washington University, 1 Old Oyster Point Rd., Newport News, VA, USA. 2 Engineering Management and System Engineering, George Washington University, 1776 G St. NW, Washington, DC, USA. 3 George Washington University, EMSE, 1776 G St. NW, Suite 110, Washington, DC, USA. ∗ Address correspondence to William M. Kroshl, George Washington University, Engineering Management and Systems Engineering, EMSE Off Campus Programs, 1 Old Oyster Point Rd., Newport News, VA, USA; tel: 1-888-694-9627; fax: 1-888-969-4851; [email protected].

1690

C 2015 Society for Risk Analysis 0272-4332/15/0100-1690$22.00/1 

Defense of Spatially Distributed Networks particular node. They could just as easily be the physical hardening of a key node in a spatially distributed physical network such as a shipping terminal in a pipeline or transformer station on a power distribution line. In this work, we discuss several different methods that can be used for the resource allocation decision, focusing on the situation where one has an active and malevolent “opponent” who is attempting to cause maximum damage to the network. Without loss of generality, we shall refer to this opponent and the opposition forces controlled by this individual as “terrorists.” As reported by Cox,(1) the Department of Homeland Security (DHS) standard for risk analysis in the chemical industry beginning in 2007 is based upon the classic method of defining risk as the product of threat, vulnerability, and consequence. He goes on to discuss how this model, while very useful, is limited when confronting an active opponent. The problem is that the uncertainty represented in this model is fundamentally different if it is a result of acts of nature or acts of man. Golany and Kaplan(2) discussed this at some length in 2009, characterizing the two types of uncertainty as either caused by random acts of nature or insufficient knowledge (probabilistic uncertainty) or caused by the lack of knowledge of the plans of an active adversary (strategic uncertainty). The problem of mitigation for strategic uncertainty has spurred significant discussion regarding the modeling, planning, and mitigation of risk when one has an active adversary. Insua and Rios(3) focused on the key difference between probabilistic risk analysis (PRA) and adversarial risk analysis (ARA). In PRA, there is a single decisionmaker, but in ARA there are two decisionmakers with mutually opposing interests. Our approach to the consideration of two decisionmakers in ARA is to consider the situation as a two-person game of the type generally known as “leader follower” or the Stackelberg game. In this formulation, one person (the leader) will place his resources without the knowledge of the opponent’s actions. The second player (the follower) then allocates resources with the full knowledge of how the leader has allocated his resources. We have developed an approach to allocating resources under strategic uncertainty using an agent-based model that finds an evolutionary stable solution (ESS) to the resource allocation problem.

1691 2. BACKGROUND 2.1. Brief Review of Selected Literature Brown and Cox(4) highlight the problems of using a typical systems engineering approach using PRA-based tactics against a thinking adversary in a 2011 article. They specifically criticize the assumption that the “same type of conditional probability assessment applies as well to terrorism risk analysis as to PRA of natural hazards and engineered systems . . . .” They further claim that applying these techniques (PRA) may increase the risk of terrorist attack when applying the traditional threat vulnerability consequence (TVC) concept previously discussed. They give several examples of how PRA estimates may be self-defeating (attackers deliberately not attacking targets that PRA directs them to attack because they know the defender will be defending against them). Recognizing the need to consider the behavior of the adversary in developing defensive strategies is a central component of Bier and Nagaraj’s(5) 2005 article, where they stated that “good defensive strategies must consider the adversaries’ behavior.” They maintained that the goals and motivations of the attackers must be considered to determine if the attackers are opportunistic or determined. To protect a specific target against opportunistic attackers (such as vandals) it is only necessary to provide a defense that is a little better than the rest (relatively). However, for a determined attacker this technique of resource allocation will not help appreciably to defend more important sites. Insua and Rios(3) provide several different possible methods to account for the proper modeling of two decisionmakers with mutually opposed interests, including influence diagrams and the solution of game-theoretic (GT) approaches using linear programming techniques in their 2009 article on ARA. They begin by discussing solutions in the context of a simultaneous game (where both players move at the same time) and then examine sequential games (such as the leader follower where moves are done in sequence versus simultaneously). Many of the solutions to these game-theory problems are characterized as finding the Nash equilibrium of the game. The Nash equilibrium is a combination of strategies where both players are at their optimal payoffs, and neither player can increase his payoff by his

1692 own actions alone. It should be noted that a game may have multiple Nash equilibria, and that these equilibria can be computationally challenging to find. One key point that Insua and Rios(3) bring out is that the solution for the sequential game need not correspond to the Nash equilibrium, due to the lack of common knowledge between the players. The issue of incomplete information is central to a recent article by Rothschild, McLay, and Guikema.(6) They extended an analysis where players had perfect information to one where they have imperfect information through the use of level k game theory, which uses a recursive algorithm to calculate the prior probabilities used to solve the game. The role of information (or lack thereof) and how it affects players was central to their analysis. Banks and Petralia(7) took another approach to ARA by using a Bayesian model to develop subjective probabilities of an opponent’s actions in a simplified betting game in their 2011 article. Recently, Rios and Insua(8) again approach the ARA problem through various models such as simultaneous defend-attack models, sequential defend-attack models, and the role that private information has on these analyses. In all these examples, they develop a predictive probability model to assess the actions of the adversary. Ezell and Bennett(9) examine many different techniques, including logic and fault trees, attack and success trees, influence diagrams, causal loop diagrams for system dynamics approaches, Bayesian network analysis, and GT analysis in their 2010 article. In addition to the role of information we have discussed above, they found two main problems with many GT approaches: determining the proper Nash equilibria to study out of the many that might exist, and the fact that the analysis, while it might give the best solution, may not adequately describe the terrorist’s actual actions. Overgaard(10) focuses on finding Nash equlibria under conditions of imperfect information between terrorists and the government. He cast this as a “signaling game,” as also was considered by Sandler and Lapan.(11) Machado and Tekinay(12) examine the problems of power management, reliability, and the preservation of wireless sensor networks using tree diagrams to examine the relationships. Of particular note to our work, they make a distinction between classic game theory, where the players have decision-taking abilities for obtaining payoff maximization strategies, and evolving game theory, where the most successful strategies are used with greater frequency by various agents.

Kroshl, Sarkani, and Mazzuchi The solution of various game-theory approaches to the ARA problem quite often involves the use of various linear programming techniques to solve for the Nash equilibrium. For example, Rose et al.,(13) examine resiliency as a key component of minimizing the effect of earthquakes in Memphis, Tennessee in their 1997 article. They develop a linear program to minimize the economic disruption caused by an earthquake through the reallocation of resources. Cox(14) makes the point that the games most relevant to terrorist analysis are the leader follower or attacker defender games, where one player (the defender) makes decisions without knowledge of what the attacker will do, and then the second player (the attacker) makes decisions with knowledge of the defender’s actions. Cox claimed these were similar to Stackelberg games in economic theory and discussed how they were complementary to PRA-based approaches. Pita and Jain(15) use these games to examine the assignment of police to the Los Angeles airport. They solve the problem with a mixed integer linear program. They claimed that the existing algorithms did not reflect cases where the adversary might not choose the optimal strategy, or that lack of information about the leader’s strategy might affect results. Leader follower games are most often solved as bilevel mixed integer linear programming problems, as discussed by Moore and Bard,(16) but they are computationally difficult to solve. Modern solvers combined with the availability of increased computational power since Moore and Bard’s work have facilitated this approach in many different contexts. Brown and Carlyle(17) develop a model called JOINT DEFENDER, which uses an attacker defender framework for allocating defending intercepts to counter nuclear attacks at a theater level. They solve this problem as a MIN MAX model that seeks to maximize the value of target damage, subject to constraints for missile availability and available target-missile pairings. These same ideas were applied to infrastructure defense the following year by Brown and Carlyle(18) in three distinct “variants” of models that examine an attack on the U.S. petroleum reserve through an attacker defender model, a defender attacker model for border defense, and a defender attacker defender model for the defense of electrical power grids. Hong(19) uses network interdiction approaches, solved by LP techniques, to examine conflict on a network. The most efficient allocation of security patrols in Boston harbor is the subject of research by An,(20) in which

Defense of Spatially Distributed Networks he uses optimization techniques to solve the leader follow game, but departs from the use of perfect rationality of decisionmakers. Sorrentino and Mecholsky(21) examine the issue of limited information and the effect of the network structure on evolution of strategies. They use a directed, weighted network for their work, and solve it using a nonlinear program to develop evolution strategies. Other research on solving Stackelberg games as bilevel mathematical programming problems has focused on solving the formulations of mathematical programs with equilibrium constraints (MPECs). Dempe, Kalashnikov, and Rio-Mercado(22) examine techniques to minimize the cash-out penalties for a natural gas shipper using a mixed integer bilevel formulation. They reformulate the problem to make it much easier to solve while minimizing the effect on the final answer. Chen et al.(23) examine the interplay between firms in the electricity and emission allowance markets, and solve the large-scale MPECs through a reformulation. Steffensen and Ulbrich(24) examine these same type of bilevel formulations and recommend solutions using a reformulation that was both robust and efficient. Siddiqui and Gabriel(25) use a Schurs decomposition to solve the bilevel formulation for large-scale problems. They use their model and formulation to solve the bilevel optimization of the U.S. natural gas market. 2.2. Identification of a Specific Need There is a growing body of work postulating that game theory, especially leader follower games, has a role to play in ARA. Much of the recent work involving resource allocation focuses on the use of various LP techniques to provide efficient allocation of resources. In this work, varying methods have been proposed to account for the effects of imperfect information and the effects of differing goals. However, these methods usually involve either the estimation of probabilities of players’ actions in one form or another, or some other “top-down” method of specifying the entire model. One key element of many of the studies on terrorism has been the need to adequately describe the terrorist’s motivations so that their actions can be predicted. Goldstein(26) proposes the use of agent-based models as a useful tool in the development of understanding about terrorists’ actions, but he does not evaluate the interactions of terrorists and defenders. Gilbert(27) characterizes agent-based models as differing from other equationbased computational models due to their focus.

1693 Agent-based models concentrate on developing a simple set of rules that both prescribe and constrain the actions of individual agents. The environment in an agent-based model represents the world the agents inhabit. Information relevant to their actions such as constraints on their information, movement, or interactions is all considered part of the environment. In an agent-based model, the complex interactions that characterize the entire system are allowed to develop as a natural outgrowth of the simple interaction rules of the individual agents. The behavior of agents tends to evolve over time. Other equationbased models are “top down” in that equations must be developed for all the complex interactions. While the literature on the use of linear programming techniques to solve leader follower games is rich and vital, by comparison the use of agent-based models to solve for strategies in game-theory problems and gain insights into the problem has been largely the province of evolutionary biology and economics. 2.3. Proposed Method for Fulfillment of the Need The problem of imperfect information is readily modeled in an agent-based model by the use of restrictions on how much information the agents possess about their environment. The issues of multiple goals for terrorists can be modeled as differing rules for the behavior of agents. Finally, the problem of how agents evolve over time (and how their goals might vary from one point in time to another) can be modeled by agents changing their actions based on the surrounding situation and their successes. By changing the rules for the behavior of agents in response to their history, the entire group can be considered to “evolve” over time. The evolution of agents over time may provide additional insights into how the terrorists’ goals and strategies might evolve over time as well. Our research has focused on the development of an agent-based model, or simulation, to model the behavior of agents over time. Differing goals for the terrorists and defenders can be examined. We examine the allocation of resources for both attack and defense on a spatially distributed physical network. We cannot guarantee that any agent-based model will find the Nash equilibrium of a leader follower game (if it exists). However, borrowing from evolutionary biology, the agent-based simulation can be considered as a heuristic to examine an evolutionarily stable solutions (ESS) that may exist in the leader follower game that evolves over time. Once modeled,

1694 the ESS will provide a resource allocation that is better, or certainly no worse than, one developed using a PRA-based approach. 2.4. Justification In a seminal article, Smith and Price(28) examine animal conflict as a sort of “limited war” with two distinct “tactics”: lethal and nonlethal. They cast this in a GT perspective, and define an “ESS” as a solution that is so defined that “if most of the members of the population adopt it, there is no mutant strategy which would give higher reproductive fitness.” Smith continues to develop these ideas in some detail in a later work on evolution and the theory of games.(29) In both of these works he defines the strategy in terms of expected payoffs, and states that an ESS is a mixed strategy that performs better than any mutant (differing) strategy. Vega-Rendondo(30) adapts these ideas to economic games and demonstrates that one cannot guarantee sufficient conditions in a general scenario for the existence of an ESS. Schaffer(31) brings the idea of an ESS to a finite population, and shows that the generalized ESS can have both locally and globally stable equilibria. He developed these ideas further in an economic context the following year.(32) He also discussed how, under certain conditions, the Nash equlibria and ESS were equivalent. In our current work, we make no particular claim as to whether or not the equilibria we determine are Nash equilibria. Rather, we present our model as a way of determining some of the ESS solutions, and then we show that the best of these solutions are equivalent or better than the pure PRA-based solution. This approach has the added advantage of easily modeling desired agent behavior and differing agent goals. 3. MODELING APPROACH The attacker and defender are provided with a quantity of resources, which are then allocated to various nodes of a network. Each of the nodes has two numeric quantities associated with it: one that reflects the importance of the node in maintaining flow through the system, called a node value (NV), and a quantity called the public relations value (PR), which reflects the value the terrorist would place on the publicity gained by an attack on this particular node. The NV is determined by a combination of two linear programming models. The first is a network

Kroshl, Sarkani, and Mazzuchi flow model, which determines the flow through each part of the network that will maximize the flow through the system. The second is a network interdiction model, which determines the identity of the nodes in the network that, if disabled, will cause the greatest reduction in the flow through the network. In the current version of the model, the PR values can range from 1 to 10. Methodologies for the proper assignment of these values would depend upon available intelligence and profiles of terrorist motivation. A detailed discussion of these methodologies is beyond the scope of the current work. In the context of this work, the PR value can best be considered as an alternative value for the nodes that may affect some terrorists and defenders, but not others. The agent-based model provides combinations of resource allocations for the terrorists and defenders, which represent evolutionarily stable strategies. By choosing the allocation of resources (number of resources assigned to each node) that result from the most favorable equilibrium for the defender, we develop a resource allocation for the defender. We utilize another agent-based model that utilizes a nonevolving, fixed allocation of defender resources to each node but allows the terrorists to evolve. This model provides a check of the ESS developed in the fully evolutionary model and also allows for exploration of the effects of alternative resource allocations. The overall approach is shown graphically in Fig. 1. Without loss of generality, we have assumed that the network is a directed flow capacitated network with sources and sinks located on the edges of the network. This assumption allows for a compact and relatively efficient network interdiction model. We deal with the issue of node-arc equivalence by splitting each node virtually into two nodes, one for all incoming arcs and one for all outgoing arcs, connected by a single virtual arc whose capacity defines the capacity of the node. 3.1. Network Flow Model Formulation The network flow model is one that calculates the maximum flow through a capacitated network, as described by Bazaraa et al.(33) The model formulation used in this work is presented below: Indices J nodes K arcs

Defense of Spatially Distributed Networks

1695

Fig. 1. Overall analysis approach.

Sets Kja set of arcs entering node j a = 1, 2, 3 . . . .n Kjb set of arcs leaving node j b = 1, 2, 3 . . . n Jt set of terminal nodes in the network Variables xk

flow through arc k

Parameters ck Model Max

those nodes. Network interdiction is related to a simple max flow model. However, it is more computationally difficult to calculate. The following formulation, taken from Wood,(34) is used to determine the most important arcs to interdict. It should be noted that, for consistency with the original work and his body of follow-on work on network interdiction, all variables, indices, and definitions are as defined by Wood. Indices

capacity of arc k 

J K xk ∀kj,a , j ∈ Jt flow into terminal arcs

k

Subject to xβ − xα = 0 β = K jb, α = K ja ∀ j ∈ / Jt balance constraints xk ≤ ck ∀k arc capacity constraints xk ≥ 0 ∀k nonnegativity constraint

flow

The input nodes in this model are contained within the flow balance constraints. The value of the input at each node is considered as its inflow capacity. The model seeks to maximize the flow into the terminal nodes of the network, ensuring that the flow into and out of every node is balanced, and ensuring that the capacity of each arc is not exceeded. By using the flow thorough each node as an input to the value of the node, the “bottlenecks,” or nodes that are operating at capacity, are identified. These are critical nodes that will affect the ability of the network to maintain flow in a damaged state. 3.2. Network Interdiction Model Formulation Interdiction is interruption of flow through the network achieved by reducing the capacity of one or more nodes to zero, effectively blocking flow through

nodes arcs

Sets Ns nodes that are sources Nt nodes that are sinks Ast set of arcs incident to a node in eitherNs or Nt A¯ st set of arcs that are not incident to a source or sink node (a complement) Variables α J variable associated with nodes variable associated with arcs; those βk whose value is 1 define the minimum capacity cut variable associated with arcs; those γk whose value is 1 define the interdicted arcs Parameters uk capacity of arc k rk resource cost for cutting arc k R total resources available for the interdiction Model Min

 k

arcs

ukβk

capacity of unbroken forward

1696

Kroshl, Sarkani, and Mazzuchi

Subject to αs − αt + βs,t + γs,t ≥ 0 for all arcs that connect to a source or sink node αs − αt + βs,t + γs,t ≥ 0 for all arcs that do not connect to a source or sink node αt − αs + βs,t + γs,t ≥ 0 for all arcs that do not connect to a source or sink node α j = 0 for all source nodes α j = 1 for all sink nodes rkγk ≤ R resource constraint for interdick

tion activity α j , βk, γk binary Once this model is run, the key variables are the value of the γk. These binary values indicate which arcs are interdicted. If the value is 1, that particular arc is interdicted. The number of arcs interdicted will vary depending upon the resource cost of interdicting each arc and the total resources available to perform the interdiction. 3.3. Determination of the NV in the Network These two models (the network flow model and the network interdiction model) are used in combination to determine NV. First, the max flow model is run and the flow through the nodes is tabulated. Then, the network interdiction model is run to identify the single node whose interdiction has the greatest effect upon the flow of the system. The flow through each node in the network is determined under the condition that this node is unavailable. The interdiction model is limited to selecting a single node by making the costs of interdiction and resources available for interdiction such that the constraints drive the model to selecting a single node to interdict. This process was repeated for several iterations, looking for other less critical nodes. The NV was determined by the maximum flow through the node for all cases (both max flow and interdicted). 3.4. Overview of Agent-Based Simulation Models Used Macal and North(35) provide a useful framework for the development of agent-based models. According to their construct, there are three distinct elements in an agent-based model. These are defined as follows: (1) A set of agents, their attributes and behaviors.

(2) A set of agent relationships and method of interaction: an underlying topology of connectedness defines how and with whom agents interact. (3) The agents’ environment: agents interact with their environment in addition to other agents. Each of these three elements is discussed in the following sections. 3.4.1. The Set of Agents Agents can have differing characteristics and rule sets that govern their behavior. Agents with the same characteristics and rule sets (but not necessarily the same values for those characteristics) are called “breeds.” Our models have three distinct breeds of agents: terrorists, defenders, and nodes. Although the nodes are agents, they only serve to describe features in the environment for the terrorists and defenders. They provide a location in the Cartesian grid that constitutes the agents’ “world,” focal points for interaction and data gathering. The behaviors of the defenders and terrorists are called “tactics.” These tactics all relate to how the agents pick their “target,” which is the individual node of the network they choose to attack or defend. Each defender will choose one node to defend, and each terrorist will choose one node to attack. Each agent has a rule that defines its tactics. The rule determines how the particular agent picks its target. Initially, these tactics are assigned randomly to all terrorists and defenders. The terrorists’ and defenders’ physical location on the Cartesian grid is also randomly assigned each turn. Over time and continuing repetitions, the population of less successful agents is reduced through attrition. The actual number of agents who cluster at each node tends to stabilize as the strategies stabilize. The proportion of agents at each node thus represents the resource allocation that should be assigned to the node. These proportions of agents are referred to as strategies, since they refer to the overall allocation of resources in the attack or defense of the node. The behaviors of the agent, or the tactics, whether they are terrorists or defenders, can be one of the following: (1) Attack or defend the most valuable node in the network. This choice reflects an “all or nothing” valuation by the agent. (2) Attack or defend the closest node in the network. This choice reflects a random spatial

Defense of Spatially Distributed Networks

(3)

(4)

(5)

(6)

distribution of effort, independent of the value of the node to the network. Attack or defend the least defended node within a specified distance. This strategy reflects the desire to attack or defend a network at its weakest point. For both sides, the specified distance reflects limited information available to the individual agent. Attack or defend the most valuable node within a specified distance. This reflects the desire to do the most damage in the attack, or to defend the most important part of the network. For both sides, the specified distance reflects limited information available to the individual agent. Attack or defend a random node. This strategy reflects a desire to attack or defend without regard to the value of a node, the number of defenders, or the relative strength or weakness of the node. Attack or defend the node with the highest PR value within a specified distance. This represents the desire to gain publicity for a cause through completing an attack, without regard for the actual value of the target node to the network or the defenses of this node. For the defender, it represents the desire to counter this type of action.

These behaviors, or tactics, of the terrorists and defenders are considered representative of the actions of actual decisionmakers who control the terrorists or defenders. These tactics could be modified to represent different behaviors or different goals. Although the possible tactics in this version of the model are symmetric between attackers and defenders, there is no requirement for them to be reflexive in this manner. 3.4.2. The Set of Agent Relationships The agents (terrorists and defenders) interact based on their location. Once the tactic for each agent is determined, and his “target” node is defined, the agent moves to that node. First the defenders determine their target node and move to it. After the defenders have been assigned in this way (the resource allocation of the defenders fixed), the terrorists then determine their targets and move to their target nodes. After all the agents have determined their strategy and moved toward their target node, the number of terrorist agents and defender agents

1697 are counted at each node. The agent breed (terrorist or defender) with the greatest number of agents at the node is considered the winner. If the terrorists have numbers in excess of what is required for victory, the strategy of the agent who is attempting to move to the node is randomly reassigned. This “overkill threshold” can be set to whatever value is desired. We have used twice the number of defenders as our threshold value. This value can be thought of as representing the degree of central control of the terrorists and the victory margins they feel are comfortable for planning. A smaller ratio implies more control and smaller desired margins, and a larger ratio implies less control and/or greater desired margins. Agents (terrorists and defenders) have the knowledge of the location of the network nodes, and terrorists have knowledge of the number of terrorists and defenders at each node. Defenders know how many defenders are at any particular node. Both breeds know the value and the PR value of all nodes. It is important to note that defenders move to their nodes first and they never have knowledge of the attackers. Within the construct of the model, the visibility of information about nodes’ characteristics and the number of agents at the node could be changed to represent limitations in information. The terrorist and defender agents evolve over time. If an agent “evolves,” its tactics change. After a number of turns, each agent is evaluated for the number of times he has succeeded and the number of times he has failed. A percentage of the least successful agents then have their strategy randomly reset to a new strategy. This represents evolution of the agents in that the less successful ones “die,” and, if these all had similar tactics, the number of agents with less successful tactics is reduced. All the success and failure history for each agent is then reset to zero and the simulation continues. Additionally, at this time a certain percentage of all the agents, regardless of their success or failure record, have their strategy randomly reassigned. This randomization helps to ensure that all parts of the solution space are explored. 3.4.3. The Agents’ Environment The agents’ environment is a Cartesian plane. The nodes of the network are assigned coordinates on this plane that correspond to the physical locations of the nodes. The nodes are connected by arcs, which begin and end at specified nodes. In the current version of the model, the arcs do not affect the

1698

Kroshl, Sarkani, and Mazzuchi

Fig. 2. Sequence diagram for allocation model.

decisions of the terrorists or defenders. They only serve to affect the value of the nodes through their effect on flow, as detailed in the max flow model and the interdiction model.

3.5. Agent-Based Models Used in This Research Two distinct agent-based models were developed for this research. In the first model (the allocation model), which was just described, the behaviors (tactics) of both the defending agents and the terrorist agents are allowed to evolve over time. This allocation model is used to develop the solutions corresponding to the ESS for the game. The second (comparison) model is used to compare a fixed (nonevolving) defensive strategy to evolving terrorists. The allocation of defensive resources, node by node, is assigned as an input to this model and the terrorists evolve to fight it. There are no defender agents in this model, only terrorists. The terrorists behave exactly as described in the previous

model. The only difference is that the resources at each node, and hence the overall defender strategy, are invariant over time. The terrorists evolve, changing their strategies to inflict maximum damage upon the network. This model is used to compare various resource allocations by defenders (solutions) to determine how well they protect the network as the terrorists evolve to attack the network and its resources. This comparison model provides insight into how well the resource allocation performs against an informed and evolving terrorist threat. The sequence diagram, Fig. 2, provides a visual representation of the flow of events in a single turn of the model. The observer tracks all events in the simulation, calculates statistics, and directs events as shown in the diagram. When the agents are first initialized, their tactic is randomly assigned, as is their location in the XY plane. A single cycle begins when the defenders move to the node that corresponds to their current tactic and location. Once this movement is complete, the terrorists then move to the node that

Defense of Spatially Distributed Networks corresponds to their current tactic and location. Each terrorist or defender agent represents a single unit of resources. The resolution of the conflict at each node depends upon the number of terrorists and defenders, i.e., the resource totals, at each node. The resolution mechanism for the conflicts at each node assigns victory to the breed of agent (terrorists or defenders) that has the most resources at that node. After the conflict is resolved and the win/loss counters of each terrorist and defender are updated, a new cycle begins by randomly reassigning locations to every terrorist and defender. After a number of cycles, the total number of victories is calculated for each terrorist and defender. A percentage of the agents with the least victories have their tactics randomly reassigned. Both the percentage of the agents to have their tactic reassigned and the number of cycles between these evaluations can be set as parameters in the simulation. Over time, the number of agents with the less successful tactics will be reduced, and the percentage of agents with successful tactics will increase. The overall strategy, or number of resources assigned to each node, will approach an ESS for the game. One inherent problem with any evolutionary approach to finding the equilibria points is that it is impossible to determine if all of the potential points have been found. Therefore, in no sense do we claim that our techniques provide the “best” solution to the resource allocation problem. No matter how many simulation runs are performed, there is a chance (albeit small) that a better allocation can be found. However, we will show that in our example network our approach of choosing the best equilibrium point provides an improvement (in the sense of more nodes are preserved) over a more traditional PRAbased approach. Our example in this article is a 13-node network. Similar results for larger networks using these same models and techniques have also been demonstrated by Kroshl(36) on networks of 31 nodes (based on a simplification of the topology of the Irish power grid) and an 85-node network based on Kinder-Morgan’s “Plantation Pipeline” along the East Coast of the United States. 4. EXAMPLE ANALYSIS We will present a brief example to illustrate our approach and demonstrate that our techniques provide an improvement over a pure PRA-based allocation. We use a single network, described in

1699 Table I. Summary of Scenario Variables Scenario 1 2 3 4 5 6

Publicity

Terrorist Strategy

Baseline Sinks high Sources high Baseline Sinks high Sources high

Uniform Uniform Uniform Favor PR Favor PR Favor PR

Section 4.1, and then perform a sensitivity analysis on six different scenarios. The scenarios are characterized by the distribution of terrorist strategies and the PR values of different nodes. The distribution of terrorist strategies is either (1) uniform, which implies that the initial distribution of strategies for all terrorists is uniformly distributed among the possible strategies, or (2) favoring PR, in which case one-half of the initial terrorists follow a strategy that attacks a node with a high PR value, and the rest of the terrorists’ strategies are uniformly distributed among the remaining strategies. The three sets of node PR values are (1) where one node has a higher PR value than the rest, (2) the source nodes have a higher PR value, and (3) the sink nodes have a higher PR value. The allocation model was run for the six scenarios (combinations of NVs and terrorist strategies), and best ESS resource allocation determined from those model runs. This allocation is called the GT allocation. The comparison model was then run on the PRA allocation and on the GT allocation for all six scenarios and the results compared. These variables are summarized in Table I.

4.1. Evaluation Network In order to evaluate our approach to resource allocation, we used a 13-node network. This network is representative of a small spatially distributed network, shown in Fig. 3. The values for the network nodes are shown in Table II. It is a capacitated, directed flow network. For clarity, the arcs that are used to connect the “input” and “output” sides of each node are not shown. The NVs were calculated as described in Section 3.3. This network was chosen for illustration purposes, to provide a small example for demonstration. We have demonstrated similar results on larger networks based on real-world networks, as discussed in Kroshl.(36)

1700

Kroshl, Sarkani, and Mazzuchi

Fig. 3. Sample network.

Table II. Sample Network Node Values Publicity Values Node ID ID A1 A2 B1 B2 B3 C1 C2 C3 D1 D2 D3 D4 D5

Node Value

Baseline PR

Sink High PR

Source High PR

15 45 15 30 15 20 20 15 20 10 20 20 15

1 1 1 1 1 1 1 1 1 7 1 1 1

1 1 1 1 1 1 1 1 7 7 7 7 7

7 7 1 1 1 1 1 1 1 1 1 1 1

Defense of Spatially Distributed Networks

1701

Table III. Resource Allocation for Defenders

Node ID A1 A2 B1 B2 B3 C1 C2 C3 D1 D2 D3 D4 D5

Scenario 1

Scenario 2

Scenario 3

Scenario 4

Scenario 5

Scenario 6

Baseline PR

Sink high Pr

Source high PR

Baseline PR

Sink high Pr

Source high PR

PRA

Terrorist strat uniform

Terrorist strat uniform

Terrorist strat uniform

Terrorist strat favors PR

Terrorist strat favors PR

Terrorist strat favors PR

12 35 12 23 12 15 15 12 15 7 15 15 12

14 25 19 17 19 17 15 16 14 10 10 10 14

15 25 19 16 19 17 15 16 14 9 11 10 14

14 24 19 16 19 17 15 16 15 9 11 11 14

14 25 19 17 19 17 15 16 14 10 10 10 14

15 26 19 16 19 17 15 16 14 9 10 10 14

14 24 19 17 19 17 15 16 15 9 11 10 14

Fig. 4. Distribution of defender victories for Scenarios 1–3.

4.2. Determination of Resource Allocations The resource allocation model was run for each of these six scenarios. Both terrorists and defenders had the same number of resources, which contributed to the seeming advantage held by the terrorists in application of resources. The model ran for 2,000 iterations. Examination of the results showed that the distribution of agents (either terrorists or defenders) at each node tended to stabilize with little change after 1,000 iterations. The number of defenders at each node was averaged for the last 500 iterations, as was the number of defender victories. This process was then repeated for 400

Fig. 5. Distribution of defender victories for Scenarios 4–6.

simulation runs to provide a distribution of results. The number of defender victories refers to the number of nodes that were successfully defended. The average number of defenders at each node for the results that were most favorable to the defenders were then determined. This allocation became the GT allocation of resources for the scenario. The PRA was determined by allocating the resources according to the proportion of total value held by each of the nodes. This allocation was invariant across all scenarios. The distribution of the number of defender victories is shown in Figs. 4 and 5. The resource allocations (strategies) used for the

1702

Kroshl, Sarkani, and Mazzuchi

Fig. 6. Example of terrorist tactics evolution.

defenders in the PRA case and each of the GT cases are summarized in Table III. Two hundred terrorists and two hundred defenders were used in these examples to minimize rounding error when determining a unitary number of defenders at each node. The population of agents that have a particular tactic will change over time, in response to the defenders of the network under attack. In a similar fashion, the strategy, or allocation of agents at each node, will change over time as the agents evolve. As an example, for Scenario 1, Fig. 6 shows the number of terrorist agents using each of the six defined tactics for both the PRA-based defender allocation and the GT-based defender allocation evolving over a period of 100 iterations of a single model run. 4.3. Comparison of Allocations Using the same run parameters (2,000 iterations per run, results taken as the average of the last 500 runs, 400 runs for each allocation and scenario), the

Table IV. p Values from Lilliefors Test for Normality Lilliefors Test Results Summary Scenario 1 2 3 4 5 6 1 2 3 4 5 6

Allocation

p value

PRA PRA PRA PRA PRA PRA GT GT GT GT GT GT

0.4122 >0.5 >0.5 >0.5 >0.5 >0.5 >0.5 >0.5 >0.5 0.0161 0.4607 0.1122

comparison model was then run for each of these scenarios. The comparison model examined the performance of the PRA allocation and the GT allocation against an evolving terrorist threat in all six of the scenarios. The results were examined for

Defense of Spatially Distributed Networks

1703

Fig. 7. Scenario 1 QQ plot: Baseline PR, terrorist strategies distributed uniformly.

Fig. 8. CDF plot for scenario 1.

normality using quantile-quantile (QQ) plots and tested for normality using the Lilliefors test. The QQ plot plots the quantiles of a normal distribution against the quantiles of the experimental data. The normal distribution is shown as a dashed line on the QQ plot, and the actual data are shown as “+” on the graph. This graph provides a visual indication of

how closely the data matche with a normal plot by showing how closely the “+” on the graph lie along the diagonal line. The Scenario 1 QQ plot is shown in Fig. 7. All the other scenario results are very similar. The Lilliefors test for normality is designed to test for normality when both μ and σ are unknown. The p values for the Lilliefors tests for normality are

1704 shown in Table IV. All the results are normal for any reasonable value of α. For each of the scenarios, the cumulative distribution function (CDF) of the number of defender victories observed with the PRA-based allocation and the GT-based allocation was compared. The CDF plot for Scenario 1 is shown in Fig. 8. The other plots are all very similar. A complete set of all these plots (QQ and CDF) plots can be found in Kroshl.(36) Examination of the CDF plot in Fig. 8 shows that the PRA-based allocation is dominated by the GTbased allocation in the sense that, for any specified value of the CDF, the GT-based approach always produces a higher number of defender victories than does the PRA-based approach. When conducting a t test for the two distributions in each scenario with the null hypothesis that μPRA = μGT against the alternative hypotheses that μPRA = μGT , the null is rejected in all scenarios with t statistics ranging from −310 to −360. In all of these cases, our GTbased allocation of resources yields a greater number of expected defender victories than a PRA-based allocation of resources. 5. CONCLUSIONS AND RECOMMENDATIONS FOR FURTHER RESEARCH Our research has shown that the evolutionary agent-based approach provides an allocation of resources that shows a statistical improvement in the number of defender victories over a PRA-based allocation for the cases where the terrorists can obtain information about the defenders and their network. These cases assumed that the terrorists were able to learn the values of various nodes, and that they were able to adapt to maximize their damage (the number of nodes damaged). This was demonstrated in a 13-node network representative of a spatially distributed directed network with sources and sinks located on the edges. While the strategies chosen are considered to be representative of actual tactics, an examination of other tactics, possibly the result of other agent-based models focusing on the terrorist tactics, would be very interesting. Other work might include modeling the effect of damaging two nodes at the same time. As currently configured, the model does not take the synergistic effects between multiple nodes into account. The process scales well to larger networks and increased numbers of agents, as demonstrated in

Kroshl, Sarkani, and Mazzuchi Kroshl.(36) The agent-based models will be effective in exploring the solution space of the problem. Rationality assumptions could be explored through the modification of the rule set for the agents and the way that agents using unsuccessful tactics are redirected. The role of information could be explored in a systematic manner by manipulation of the information available to agents. Finally, the effect of looking at the actual value of the nodes saved versus the total number of nodes damaged should be examined. ACKNOWLEDGMENTS The authors would like to thank the reviewers of this article for their insightful and useful comments and suggestions. Points of view and opinions expressed in this article are solely those of the authors and do not necessarily reflect the positions or policies of the George Washington University or the Johns Hopkins University Applied Physics Laboratory. REFERENCES 1. Cox LAT, Jr. Some limitations of “risk = threat x vulnerability x consequence” for risk analysis of terrorist attacks. Risk Analysis, 2008; 28(6):1749–1761. 2. Golany B, Kaplan EH, Marmur A et al. Nature plays with dice—Terrorists do not: Allocating resources to counter strategic versus probabilistic risks. European Journal of Operational Research, 2009; 192:198–208. 3. Insua IR, Rios J, Banks D. Adversarial risk analysis. Journal of the American Statistical Association, 2009; 104(486):841–854. 4. Brown G, Cox JL. How probabilistic risk assessment can mislead terrorism risk analysts. Risk Analysis, 2011; 31(2):196– 204. 5. Bier V, Nagaraj A, Abhichandani V. Optimal allocation of resources for defense of simple series and parallel systems from determined adversaries. Reliability Engineering & System Safety, 2005; 87:313–323. 6. Rothschild C, McLay L, Guikema S. Adversarial risk analysis with incomplete information: A level-k approach. Risk Analysis, 2012; 32(7):1219–1231. 7. Banks D, Petralia F, Wang S. Adversarial risk analysis: Borel games. Applied Stochastic Models in Business and Industry, 2011; 27(2):72–86. 8. Rios J, Insua DR. Adversarial risk analysis for counterterrorism modeling. Risk Analysis, 2012; 32(5):894–915. 9. Ezell BC, Bennett SP, Von Winterfeldt D, Sokolowski J, Collins AJ. Probabilistic risk analysis and terrorism risk. Risk Analysis, 2010; 30(4):575–589. 10. Overgaard PB. The scale of terrorist attacks as a signal of resources. Journal of Conflict Resolution, 1994; 38:452–478. 11. Lapan HE, Sandler T. To bargain or not to bargain: That is the question. American Economic Review, 1988; 78(2):16–21. 12. Machado R, Tekinay S. A survey of game-theoretic approaches in wireless sensor networks. Computer Networks, 2008; 52(16):3047–3061. 13. Rose A, Benavides J, Chang S, Szczesniak P, Dongsoon L. The regional economic impact of an earthquake: Direct and indirect effects of electricity. Journal of Regional Science, 1997; 37(3):437.

Defense of Spatially Distributed Networks 14. Cox JL. Game theory and risk analysis. Risk Analysis, 2009; 29(8):1062–1068. 15. Pita J, Jain M, Ord F et al. Effective solutions for real-world Stackelberg games: When agents must deal with human uncertainties. Pp. 369–376 in Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems, Vol. 1. Budapest, Hungary: International Foundation for Autonomous Agents and Multiagent Systems, 2009. 16. Moore JT, Bard JF. The mixed integer linear bilevel programming problem. Operations Research, 1990; 38(5):911–921. 17. Brown G, Carlyle M, Diehl D, Kline J, Wood K. A two-sided optimization for theater ballistic missile defense. Operations Research, 2005; 53(5):745–763. ´ J, Wood K. Defending critical 18. Brown G, Carlyle M, Salmeron infrastructure. Interfaces, 2006; 36(6):530–544. 19. Hong S. Strategic network interdiction. Working Paper 43, Fondazione Eni Enrico Mattei, 2011. 20. An B, Shieh E, Yang R, Tambe M, Baldwin C, DiRenzo J, Maule B, Meyer G. A deployed game-theoretic system for strategic security allocation for the United States Coast Guard. AI Magazine, 2012; 33(4):96–110. 21. Sorrentino F, Mecholsky N. Stability of strategies in payoff-driven evolutionary games on networks. Chaos, 2011; 21(3):033110–033110. 22. Dempe S, Kalashnikov V, Rios-Mercado RZ. Discrete bilevel programming: Application to a natural gas cash-out problem. European Journal of Operational Research, 2005; 166(2):469– 488. 23. Chen Y, Hobbs B, Leyffer S, Munson TS. Leader-follower equilibria for electric power and no x allowances markets. Computational Management Science, 2006; 3(4):307– 330.

1705 24. Steffensen S, Ulbrich M. A new relaxation scheme for mathematical programs with equilibrium constraints. SIAM Journal on Optimization, 2010; 20(5):2504–2539. 25. Siddiqui S, Gabriel SA. An SOS1-based approach for solving MPEC with a natural gas market application. Networks and Spatial Economics, 2013; 13(2):205–227. 26. Goldstein H. Modeling terrorists. Spectrum, IEEE, 2006; 43(9):26–34. 27. Gilbert N. Agent-Based Models. Los Angeles: Sage Publications, 2008. 28. Smith JM, Price GR. The logic of animal conflict. Nature, 1973; 246(5427):15–18. 29. Smith JM. Evolution and the Theory of Games. Cambridge: Cambridge University Press, 1982. 30. Vega-Redondo F. Evolution, Games and Economic Behaviour. Oxford: Oxford University Press, 1996. 31. Schaffer ME. Evolutionarily stable strategies for a finite population and a variable contest size. Journal of Theoretical Biology, 1988; 132(4):469–478. 32. Schaffer ME. Are profit-maximisers the best survivors? A Darwinian model of economic natural selection. Journal of Economic Behavior & Organization, 1989; 12(1):29–45. 33. Bazaraa MS, Jarvis JJ, Sherali HD. Linear Programming and Network Flows, 2nd ed. New York: John Wiley & Sons, 1990. 34. Wood K. Deterministic network interdiction. Mathematical and Computer Modeling, 1993; 17(2):1–18. 35. Macal CM, North MJ. Tutorial on agent-based modelling and simulation. J Sim, 2010; 4(3):151–162. 36. Kroshl WM. Allocation of Resources to Defend Spatially Distributed Networks Using Game Theoretic Allocations. Washington, DC: George Washington University, 2014.

Efficient Allocation of Resources for Defense of Spatially Distributed Networks Using Agent-Based Simulation.

This article presents ongoing research that focuses on efficient allocation of defense resources to minimize the damage inflicted on a spatially distr...
1MB Sizes 0 Downloads 5 Views