Nested Monte-Carlo Search of Multi-agent Coalitions Mechanism with Constraints

. This paper develops and evaluates a coalition mechanism that enables agents to participate in concurrent tasks achievement in competitive situations in which agents have several constraints. Here we focus on situations in which the agents are self-interested and have not a priori knowledge about the preferences of their opponents, and they have to cooperate in order to reach their goals. All the agents have their speciﬁc constraints and this information is private. The agents negotiate for coalition formation (CF) over these constraints, that may be relaxed during negotiations. They start by exchanging their constraints and making proposals, which represent their acceptable solutions, until either an agreement is reached, or the negotiation terminates. We explore two techniques that ease the search of suitable coalitions: we use a constraint-based model and a heuristic search method. We describe a procedure that transforms these constraints into a structured graph on which the agents rely during their negotiations to generate a graph of feasible coalitions. This graph is therefore explored by a Nested Monte-Carlo search algorithm to generate the best coalitions and to minimize the negotiation time.


Introduction
Forming coalitions of agents which are able to effectively perform tasks is a key issue for many practical application contexts. This paper mainly focuses on selfinterested agents which aim to form coalitions with other agents as they cannot reach their objectives individually. Several methods have been developed to control the behaviors of the agents involved in such process [12]. However few CF mechanisms cope with the dynamic of the constraints of the agents in such contexts. Indeed, these constraints can gradually be revealed, and relaxed by the agents at different moments of the negotiation in order to meet the requirements of their opponents and thus to ease the convergence. Some coalition methods have been developed to determine formerly to the negotiation the optimal coalitions and take into account the constraints of the agents involved in the coalition process. These methods have addressed important issues such as computational complexity and heuristics approaches for the optimal coalition structure generation, [7,10,13]. In this paper, we focus on contexts where agents neither have the same utility functions, nor they reveal these functions. Thus, it is infeasible to precisely estimate a priori the corresponding utility of each agent for each feasible proposal of solution with current optimal coalitions search algorithms. The issues with processing the constraints of the agents in the negotiation phase for the coalition formation deserve a particular attention and a deep study. Yet, only few works proposes a mechanism to deal with the dynamic of such constraints while agents negotiate them. Note that, since we consider that the agents are self-interested and do not share their information and computations, our aim is not to identify the optimal solution of the coalitions, but to ease the convergence to an agreed common solution for these agents. Our main contribution is a new mechanism that enables agents to negotiate and form coalitions. This mechanism is based on three main abstractions: a constraint graph, a coalition graph and Nested Monte-Carlo search method. First, we develop a constraints based graph which handles the revealed constraints of the agents. This graph of constraints can be used to specify different types of constraints relations, such as a constraints ordering over potential decision outcomes. Building upon this, we transform this representation into a flat representation of coalitions in the graph of coalitions. Each level of this graph allows generating a set of possible coalitions and in this set the agent selects the best coalitions that can be accepted. This graphical representation of constraints and coalitions specifies constraints relations in a relatively compact, intuitive, and structured manner. To explore this graph of coalitions, we first define the problem and link it to other existing problems, so that approximate solution techniques and anytime heuristics that provide increasingly better solutions if given more time can be re-used. We advise new solutions that allow agents use a nested Monte-Carlo search algorithm [1] which finds the best coalition that maximizes the utility of each agent. Nested Monte-Carlo search methods address the problem of guiding the search towards better states when there is no available heuristic. These methods use nested levels of random games in order to guide the search of coalitions. These algorithms have been studied theoretically on simple abstract problems and applied successfully to several games [4]. Specifically, this paper advances the state of the art in the following ways. We advise new anytime heuristics to find approximate solutions fast, we empirically evaluate our algorithm and show that it computes (in less than 600 milliseconds) 689 proposals of solutions for non-trivial problems involving up to 30 agents and 50 tasks. Thus, our work encompasses essential aspects of the coalition formation, from the coalition model, negotiation, and an anytime heuristic. The reminder of the paper is organized as follows. Section 2 briefly describes the related works. Section 3 introduces some preliminaries and the case study. Section 4 presents the coalition formation mechanism, and a final section will conclude the work with a summary of the contributions.

Related Work
In game-theoretic perspective, coalitional games with constraints have been addressed by a number of works. However, none of these structures is able to model agents' negotiations for reaching joint agreements. [3] proposes a gametheoretical study and focuses on strategic, core-related issues rather than computational analysis of the coalition formation. This work is more close to [7] where authors propose a constrained coalition formation model and an algorithm for optimal coalition structure generation. They develop a procedure that transforms the specified set of constraints, making it possible to identify all the feasible coalitions. Building upon this, they provide an algorithm for optimal coalition structure generation. [13] address the problem of coalition formation with sparse synergies where the set of feasible coalitions is constrained by the edges of a graph. Their aim is to check whether knowledge of the topology of an underlying social or organizational context graph could be used to speed up coalition enumeration and structure generation. [10] define the problem of allocating coalitions of agents to spatially distributed tasks with workloads and deadlines so as to maximize the total number of tasks completed over time. Nevertheless, these works have not deeply addressed the constraints of the agents in the proposed models or specify how agents negotiate over them to reach agreements. Constraints on coalition sizes have been considered for coalition value calculation [8,9,11]. However, the semantics of these constraints has not been used on the same level as it is done in this paper. [2,5] develop succinct and expressive representations for coalitional games. Such formalism could be used to encode the constraints, but this is not the main concern of the constrained CF mechanism considered in this paper.

Preliminaries and Case Study
To illustrate the coalition formation mechanism we propose, let us consider a carpooling example, where some travellers want to move from a city to another, and they want to share their means of transportation. Each traveller formulates to his agent the goals to be achieved. For example "I want to go from NY to Boston", his constraints as departure time, duration of the travel, and unit price of seat. To solve this problem, the agents have to deal with all the constraints and preferences over those of their associated travellers in order to enable them to share transportation. Agents negotiate for the coalitions to form to decrease the unit price of seat, increase the number of passengers, etc. They can step aside in favor of other agents, if an agreement can be found. More formally, consider a set of agents N = {a 1 , a 2 , ..., a n }, a set of actions A = {b 1 , b 2 , ..., b m } and a set of constraints C t = {c t1 , c t2 , ..., c tk }. The agents of N need to execute the actions of A by satisfying the constraints in C t . The constraints are defined as intervals, for instance: departure time: D ∈ [10a.m., 12a.m.], travel duration in hours: T ∈ [1H, 2H] and price: P ∈ [20, 25]. The agents' preferences are represented using a preference relation for those they want to share a car with, for instance a x i a y (for agent a i , a x is preferred to a y ). We consider a coalition c as a nonempty subset of N (c ⊆ N ). We define C as the set of all possible coalitions. For a coalition c to be formed, each agent a i in c should get a certain satisfaction. This satisfaction is defined by a utility function u i : C → R. Note that a coalition is acceptable for agent a i if it is preferred over, or equivalent to a reference coalition, u i (ref ), which corresponds to the minimal guaranteed gain of the agent during the negotiation. A solution of the negotiation for each agent a i introduces a coalition structure denoted CS i which is defined on N with its associated utility u i (CS i ). CS i contains a set of coalitions {c 1 , c 2 , .., c q } to be formed for the set of actions The set of all coalition structures is denoted S.

Coalition Formation Mechanism (CFM)
In order to satisfy the goals they have to achieve, the agents perform negotiations on the coalitions they want to form. So, the CFM requires an analysis step of constraints that agents exchange in order to guide the choice of the coalitions and a step of generating coalition structures from these constraints. Constraint analysis relies on constructing a graph of constraints and coalition generation is based on the mapping of the constraints to possible coalitions in a coalition graph. Exploring the search graph of coalitions toward better states is based on a Nested Monte-Carlo algorithm.

Constraint Graph
An effective technique for solving a coalition formation problem is a heuristic search through abstract problem spaces. The first problem space can be represented by a directed connected graph, where nodes correspond to constraint sets and edges correspond to actions (cf. Fig. 1). The constraint graph may include many paths from the start to any node. Since the agents are self-interested, to search among the constraints to deal with in the coalitions, every agent constructs its own graph of constraints based on its own constraints and those revealed by other agents during this negotiation. Given a set of constraints that must be satisfied by an agent to execute a set of actions and starting from the source node labeled with {b: ∅, c t : ∅ }, initially there are not constraints and actions associated with this source node, let us define a graph denoted G(c t , b) as follows.  (c t , b) is a directed connected graph, containing all possible nodes of constraints represented by intervals, labeled {X 1 , ..., X k } for each action b i , 1 ≤ i ≤ m, that has to be executed by the agent. Each node has a utility labeled u, and directed edges from this node are labeled {b j , ..., b k } where 1 ≤ j .. k ≤ m.
A constraint graph gathers, the most preferred constraints' intervals in its nodes. At the root node, no action and constraint are added. Each node generates a finite set of child nodes which correspond to the accepted sets of constraints, where the first node of the graph is an outgoing node and the last nodes are incoming nodes. This constraint graph is built following a preference rate on the intervals of constraints.
Let us consider two agents, a i which has its own interval X and receives from a j an interval Y . The agent a i wants to create a new interval Z that meets its constraints and those of a j , {Z |= a i }, by merging its interval and the one received from a j . We will adopt the convention of the left and right endpoints of an interval X by X and X, respectively [6]. First, a i tests if X ⊆ Y . Thus, if Y X and X Y , it will get X ⊆ Y and Z = X. Else the agent tests if X ∩ Y and calculates the new interval Z. If X ∩ Y = ∅ there are no points in common with a j . Otherwise, Z = {max{X, Y }, min{X, Y }} and tests if it complies with its actions. If a i does not choose this interval, it calculates Z = X ∪ Y which means the union X and Y and tests if it complies with its actions. For more details about the operations over the intervals see [6]. Based on this graph, constraint analysis consists for an agent of comparing and grouping its constraints and those received from others. A natural constraint graph analysis involves constructing and linking optimal nodes. Constraints are gathered based on their relations into sets represented in the nodes of this graph. Each level of the graph of constraints refers to an action to be performed by a coalition. The advantage of the suggested method consists in directing the search of the solutions of coalitions towards primary constraints, i.e., important constraints to satisfy, thus, reducing search complexity. To move from one node of this graph to another, an action is added to the graph. The utility of a move, which labels the corresponding edge in the search space, is the utility of the action when it is added and performed by the coalition. A solution path represents a particular succession order of the added actions, and the width of that order is the sum of the edge utilities on the solution path.
Let us consider constructing the constraint graph by the agent a 1 on our previous example using. First, assume that agent a 1 started a negotiation with agents a 2 to a 5 and in which each of these five agents revealed certain of its constraints. The actions that have to be executed are: b 1 , b 2 , b 3 which correspond respectively to: go from NY to Amherst, find a hotel room in Amherst, and go from Amherst to Boston. The constraints identified by a 1 for the action Each node is labeled with its associated utility. compares its own constraints and those received from these agents and creates new intervals of constraints X kct .
Let us consider again the agent a 1 who received these intervals of constraints from the agent a 2 concerning the action So, for each action, a 1 generates the different possible intervals of constraints that satisfy its action b 1 . We observe that the constraints in each child node are created taking into account the end of execution of the antecedent action. So, to generate intervals of constraints that satisfy the action b i+1 , the agent takes into account the end time of b i . This allows the agent to manage the relations between the actions that have to be executed. In this example, in the second level of the graph the agent a 1 identified for the action b 2 these intervals: D ∈ [01p.m., 02p.m.], T ∈ [2H, 5H] and P ∈ [20, 25]. The beginning of b 2 is in [01p.m., 02p.m.] because b 1 ends at the latest at 01p.m.. The dashed arcs show that nodes can share the same child nodes and the red and bold ones show the most preferred path from the root node to the last one, they result from the Monte-Carlo exploration (detailed below). Every agent a i which has to negotiate to execute an action b i while satisfying its constraints c t , chooses intervals of constraints: X kct (b i ) |= a i . It then creates child nodes for the feasible intervals that satisfy b i . Each node created can be split under appropriate restrictions to other child nodes. The agent a i starts with a node, labeled X kct , for the action b i . For each action b i+1 that must be executed after b i and need negotiation, a i creates the new intervals for b i+1 : (X 1ct , ..., X k ct ), and splits the node b i , X kct to the child nodes b i+1 : (X 1ct , ..., X k ct ). The agent a i uses this procedure until no action need negotiation.
A notable detail of the constraints search space construction is that a solution is measured by its maximum path utility. We use an additive utility function, where a path is evaluated by summing its edge utilities.

Constraint Graph Exploration with Nested Monte-Carlo Algorithm (NMC)
To optimize the search time for the new coalitions to propose or to accept, agents use the Nested Monte-Carlo algorithm. Nested Monte-Carlo Search is used for problems that do not have good heuristics. It was shown that memorizing the best sequence improves the mean result of the search. Experiments on different games gave very good results, finding a new world record of 80 moves at Morpion Solitaire, improving on previous algorithms at SameGame, and being more than 200,000 times faster than depth-first search at 16x16 Sudoku modeled as a Constraint Satisfaction Problem [1]. In the first step of the mechanism, the NMC explores each level of the graph of constraints and stores the best path of constraints that satisfies the agent a i . The idea of NMC is to use lower sequences of simulations in order to decide the utility that an agent gets from a path at the current sequence. This step is necessary because agents do not have a priori a knowledge about the utility functions of the others. When all simulations of the underlying sequences have been performed, the agent utility is memorized in the best sequence; it means that it is possible to get the best path. The solution improves monotonically since our algorithm keeps track of the best proposal of solution found so far. Nested Monte-Carlo search combines nested calls with randomness in the playouts and memorization of the best sequence of moves. In nested rollouts, the rollouts are based on a heuristic. It implies that nested rollouts always improve on rollouts and on simply following the heuristic. When the base level does not use a heuristic but random moves, it is possible that a nested search gives worse results than a lower level search. It is useful to memorize the best sequence found so far in order to follow it when the randomized searches give worse results than the best sequence. The basic sample function (cf. Algorithm 1) just explores a graph randomly from a given position in the graph, agents use the function play(position, move) which plays the move in the position and returns the resulting position. If none of the moves improves on the best sequence, the move of the best sequence is played, otherwise the best sequence is updated with the newly found sequence and the best move is played.

Conclusion
This paper has introduced a new coalition formation mechanism enriched with several principles to deal with the constraints of the agents and a Nested Monte-Carlo based search algorithm. We have detailed how the constraints are modeled as a graph and how this graph is explored using the Nested Monte-Carlo search. From the graph of constraints, each agent gets its most preferred path of constraints and constructs a coalition graph that is used to generate the coalitions to negotiate.