A Metric Learning Approach to Graph Edit Costs for Regression

. Graph edit distance (GED) is a widely used dissimilarity measure between graphs. It is a natural metric for comparing graphs and respects the nature of the underlying space, and provides interpretability for operations on graphs. As a key ingredient of the GED, the choice of edit cost functions has a dramatic eﬀect on the GED and therefore the classiﬁcation or regression performances. In this paper, in the spirit of metric learning, we propose a strategy to optimize edit costs according to a particular prediction task, which avoids the use of predeﬁned costs. An alternate iterative procedure is proposed to preserve the distances in both the underlying spaces, where the update on edit costs obtained by solving a constrained linear problem and a re-computation of the optimal edit paths according to the newly computed costs are performed alternately. Experiments show that regression using the optimized costs yields better performances compared to random or expert costs.


Introduction
Graphs provide a flexible representation framework to encode relationships between elements.In addition, graphs come with an underlying powerful theory.However, the graph space cannot be endowed with the mathematical tools and properties associated with Euclidean spaces.This issue prevents the use of classical machine learning methods mainly designed to operate on vector representations.To learn models on graphs, several approaches have been designed to leverage this flaw and among these, we can cite graph embeddings strategy [17], graph kernels [3,27] and more recently graph neural networks [8].Despite their state-of-the-art performances, they seldom operate directly in the graph space, hence reducing the interpretability of the underlying operations.
To overcome these issues, one needs to preserve the property of the graph space.For this purpose, one needs to define a dissimilarity measure in the graph space, in order to constitute the minimal requirement to implement simple machine learning algorithms like the k-nearest neighbors.The most used dissimilarity measure between graphs is the graph edit distance (GED) [10,26].The GED of two graphs G 1 and G 2 can be seen as the minimal amount of distortion required to transform G 1 into G 2 .This distortion is encoded by a set of edit operations whose sequence constitutes an edit path.These edit operations include nodes and edges substitutions, removals, and insertions.Depending on the context, each edit operation e included in an edit path γ is associated with a non-negative cost c(e).The sum of all edit operation costs included within the edit path defines the cost A(γ) associated with this edit path.The minimal cost 5among all edit paths Γ(G 1 , G 2 ) defines the GED between G 1 and G 2 , namely Evaluating the GED is computationally costly and cannot be done in practice for graphs having more than 20 nodes in general.To avoid this computational burden, strategies to approximate the GED in a limited computational time have been proposed [1] with acceptable classification or regression performances.An essential ingredient of the GED is the underlying edit cost function c(e), which quantifies the distortion carried by the edit operation e.The values of the edit costs for each edit operation have a major impact on the computation of GED and its performance.Thus, the cost edit function may be different depending on the data encoded by the graph and the one to predict.Generally, they are fixed a priori by an expert of the domain, and are provided with the datasets.
However, these predefined costs are not optimal for any prediction tasks, in the same spirit as the no free lunch theorems for machine learning and statistical inference.In addition, these costs may have a great influence on both the prediction performance and the computational time required to compute the graph edit distance.In [9], the authors show that a particular set of edit costs may reduce the problem of computing graph edit distance to well-known problems in graphs like (sub)graph isomorphism or finding the maximum common subgraph of a pair of graphs.This point shows again the importance of the underlying cost function when computing a graph edit distance.
In this paper, we propose a simple strategy to optimize edit costs according to a particular prediction task, and thus avoid the use of predefined costs.The idea is to align the metric in the graph space (namely, the GED) to the prediction space.While this idea has been largely used in machine learning (e.g. with the socalled kernel-target alignment [15]), this is the first time that such a line of attack is investigated to estimate the optimal edit cost.With this distance-preserving principle, we provide a simple linear optimization procedure to optimize a set of constant edit costs.The edit costs resulting from the optimization procedure can then be analyzed to understand how the graph space is structured.The relevance of the proposed method is demonstrated on two regression tasks, showing that the optimized costs lead to a lower prediction error.
The remainder of the paper is organized as follows.Section 2 presents related works that aim to compute costs associated with a particular task.Section 3 presents the problem formulation and describes the proposed optimization method.Then, Section 4 presents results from conducted experiments.Finally, we conclude and open perspectives on this work.

Related Works
As stated in the introduction, the choice of edit costs has a major impact on the computation of graph edit distance, and thus on the performance associated with the prediction task.
The first approach to design these costs is to set them manually, based on the knowledge on a given dataset/task (when such knowledge is available).This strategy leads, for instance, to the classical edit cost functions associated with the IAM dataset [24].However, it is interesting to challenge these predefined settings and experiment how they can improve the prediction performance.
In order to fit a particular targeted property to predict, tuning the edit costs and thus the GED can be seen as a subproblem of metric learning.Metric learning consists in learning a dissimilarity (or similarity) measure given a training set composed of data instances and associated targeted properties.For the classical metric learning where each data instance is encoded by a real-valued vector, the problem consists in learning a dissimilarity measure, which decreases (resp.increases) where the vectors have similar (resp.different) targeted properties.Many metric learning works focus on Euclidean data, while only a few addresses this problem on structured data [5].A complete review for general structured data representation is given in [22].In the following, we will focus on existing studies to learn edit costs for graph edit distance.
A trivial approach to tune the edit costs is to use a grid search strategy among a predefined range.However, the complexity required to compute graph edit distance and the number of different edit costs forbid such an approach.
String edit distance constitutes a particular case of graph edit distance, associated to a lower complexity, where graphs are restricted to be only linear and sequential.In [25], the authors propose to learn edit costs using a stochastic approach.This method shows a performance improvement, hence demonstrating the interest to tune edit costs; it is however restricted to strings.
Another strategy is based on a probabilistic approach [19][20][21].By providing a probabilistic formulation for the common edition of two graphs, an Expectation-Maximization algorithm is used to derive weights applied to each edit operation.The tuning is then evaluated in an unsupervised manner.In [20], the strategy consists in modifying the label space associated with nodes and edges such that edit operations occurring more often will be associated to lower edit costs.Conversely, higher values will be associated with edit operations occurring less often.The learning process was validated on two datasets.However, this approach is computationally too expensive when dealing with general graphs [4].

L. Jia et al.
In [4], the authors propose an interesting way to evaluate whether a distance is a "good" one.This criterion is based on the following concept: This principle is then derived to define an objective function to optimize.The matrix encoding the edit costs minimizing this objective function is then used to compute edit distances.However, this approach has only been adapted to strings and trees, but not to general graphs.
Another set of methods that address the problem of learning edit costs for GED is proposed in [13,14].These methods propose to optimize edit costs to maximize a ground truth mapping between nodes of graphs.This framework requires thus a ground truth mapping, which is not available on many datasets like chemoinformatics.
3 Proposed Method

Problem formulation
In this section, we propose an optimization procedure to learn edit costs in the context of regression tasks.Consider a dataset G of N graphs such that each graph G k = (V k , E k ), for k = 1, 2, . . .N , where V k represents the set of nodes of G k labeled by a function f v : V → L v , and E k encodes the set of edges of G k , namely e ij = (v i , v j ) ∈ E k iff an edge connects nodes v i and v j in G k .
The graph edit distance between two graphs is defined as the minimal cost associated to an optimal edit path.Given two graphs G 1 and G 2 , an edit path between them is defined as a sequence of edit operations transforming G 1 into G 2 .An edit operation e can correspond to a node substitution e = (v i → v j ), deletion e = (v i → ε) or insertion e = (ε → v j ).Similarly, for edges, we have (e ij → e ab ), (e ij → ε), and (ε → e ab ).Each edit operation is associated with a cost characterizing the distortion induced by this edit operation on the graph.These costs can be encoded by a cost function c that associates a positive real value to each edit operation, depending on the elements being transformed.
In this paper, we will restrict ourselves to only constant cost functions.Therefore, we can associate each edit operation to a constant value.Let c ns , c ni , c nd , c es , c ei , c ed ∈ R + be the cost values associated with respectively node substitution, insertion, deletion and edge substitution, insertion, deletion.
As shown in [7], any edit path between two graphs G 1 and G 2 can be encoded as two mapping functions.First, ϕ : Given a mapping and considering constant cost functions, the cost associated to node operations of an edit path represented by ϕ and ϕ −1 is given by: ( The cost associated with edge operations is defined as: The final cost is given by: Let #ns be the number of node substitutions, i.e., the cardinality of the subset of V 1 being mapped onto V 2 .This number is given by the number of terms of the first sum in Eq. 2, i.e., #ns = Then, let x ∈ N 6 encode the number of each edit operation as x = [#ns, #nd, #ni, #es, #ed, #ei] .Note that these values depend on both graphs being compared and a given mapping between nodes.Similarly, we define a vector representation of the costs associated with each edit operation by c = [c ns , c nd , c ni , c es , c ed , c ei ] ∈ R 6 + .Given these representations, the cost associated with an edit path, as defined by Eq. 4, can be rewritten as: Therefore, the graph edit distance between two graphs is defined as:

Learning the edit costs
Consider that each graph G k ∈ G is associated with a particular targeted property y k ∈ Y, namely the target in regression tasks (e.g.Y ⊆ R for real-valued output regression).Furthermore, a distance d Y : Y × Y → R + is defined on this targeted property, such as the Euclidean distance when dealing with a vector space Y, namely d Y (y i , y j ) = y i − y j 2 .
The main idea behind the proposed method is that the best metric in the graph space is the best aligned one to the target distances (i.e., d Y ).With this distance-preserving principle, we seek to learn the edit cost vector c by fitting the distances between graphs to the distances between their targeted properties.Ideally, we seek to preserve the GED between any two graphs G i and G j and the distance between their targeted properties.Considering the set of N available graphs G 1 , . . ., G N and their corresponding targets y 1 , . . ., y N , we seek to have for all i, j = 1, 2, . . .N.
Let ω : G × G × R 6 + → N 6 be the function that computes an optimal edit path between G i and G j according to the cost vector c and returns the vector x ∈ R 6 + of numbers of edit operations associated to this optimal edit path, namely x = ω(G i , G j , c).This function can be any method computing an exact or sub-optimal graph edit distance [1,6].
For any pair of graphs (G i , G j ), let x i,j be a vector encoding the number of each edit operation.Let X ∈ N N 2 ×6 be the matrix of the numbers of edit operations for each pair of graphs, namely its (iN + j)-th row is x T i,j .Then, Xc is the N 2 × 1 vector composed of edit distances computed according to c and X between all pairs of graphs.Let d ∈ R N 2 + be a vector of the differences on targeted properties according to d Y , with d(iN + j) = d Y (G i , G j ).Therefore, the optimization problem can be rewritten as: where L denotes a loss function.Besides the constraint on c to avoid negative costs, one can also add a constraint to satisfy the triangular inequality, or one to ensure that a deletion cost is equal to an insertion cost [23].
In the case of regression problem, L can be defined as the sum squares of differences between computed graph edit distances and dissimilarities of the targeted property.Therefore, the final optimization problem is: Estimating c by solving this constrained optimization problem allows to linearly fit graph edit distances to a particular targeted property according to the edit paths initially given by ω.However, changing the edit costs may influence the optimal edit path, and thus its description in terms of the numbers of edit operations.There is thus an interdependence between the function ω computing an optimal edit path according to c, and the objective function optimizing c according to edit paths encoded within X.To solve this interdependence, we propose an alternated optimization strategy, summarized in Algorithm 1 where Ω(G, c) denotes the computation of ω(G i , G j , c), ∀i, j ∈ 1 . . .N .The two main steps of the algorithm are described next: -Estimate c for fixed X (line 4): This optimization problem is a constrained linear problem that can be resolved using off-the-shelf solvers, such as Algorithm 1 Main algorithm to optimize costs X ← Ω(G, c) 6: end while cvxpy [16] and scipy [28].This optimization problem can also be viewed as a non-negative least squares problem [18].For a given set of edit operations between each pair of graphs, this step linearly optimizes the constant costs to be applied such that the difference between graph edit distances and distances between targets is minimized.
-Estimate X for fixed c (line 5): The modification performed on costs in the previous step may have an influence on the associated edit path.To address this point, the optimization of costs is followed by a re-computation of the optimal edit paths according to the newly computed c vector encoding the edit costs.This step can be achieved by any method computing graph edit distance.For the sake of computational time, one can choose an approximated version of GED [6,7].
This alternated optimization is repeated to compute both edit costs and edit operations.Since we do not have theoretical proof of the convergence of this optimization scheme, we limit the number of iterations to 5 in our implementation.

Experiments
We conducted experiments6 on two well-known datasets in chemoinformatics, both composed of molecules and their boiling points.The first dataset is composed of 150 alkanes [11].An alkane is an acyclic molecule solely composed of carbons and hydrogens.A common representation of such data consists in implicitly encoding hydrogen atoms using the valency of carbon atoms.Such an encoding scheme allows to represent alkanes as acyclic unlabeled graphs.The second dataset is composed of 185 acyclic molecules [12].In contrast with the previous dataset, these molecules contain several hetero atoms and are thus represented as acyclic labeled graphs.
To evaluate the predictive power of different settings of edit costs, we used a k-nearest-neighbors regression [2] model, where k is the number of the neighbors considered to predict a property.The performances are estimated on ten different random splits.For each split, a test set representing 10% of the graphs in the dataset is randomly selected and used to measure the performance of the prediction.The remaining 90% are used to optimize the edit costs and the value of k, where k is optimized through a 5-fold cross-validation (CV) procedure over the The proposed optimization procedure is compared to two other edit costs settings: a random set of edit costs and a predefined cost setting as given in [1]; the latter is the so-called expert costs.Tables in Fig. 1 show the average root mean squared errors (RMSE) obtained for each cost settings over the 10 splits, estimated on the training set and on the test set.The ± sign gives the 95% confidence interval computed over the 10 repetitions.Figures show a different representation of the same results with error bars modeling the 95% confidence interval.As expected, a clear and significant gain in accuracy is obtained when using fitted costs on the two datasets.These promising results confirm the hypothesis that ad-hoc edit costs may help the graph edit distance catch better targeted properties that are associated to a graph, and thus improve the prediction accuracy while still operating in the graph space.
The fitted values of edit costs are summarized in Table 1.From these results, we can observe that insertion and deletion costs are almost similar, hence showing the symmetry of these operations.Also, one can observe that deletion and insertion costs are more important than substitution costs, which shows that the number of atoms is more important than the atom itself.This is coherent with the chemistry theory [12].Finally, we can note that costs associated with nodes are higher to the ones associated with edges.

Conclusion and future work
In this paper, we introduced a new principle to define optimal graph edit costs of a GED for a given regression task.Based on this principle, we defined the optimization problem of fitting the edit costs to a particular metric, measured for instance on a targeted property to predict.An alternated optimization strategy was proposed to solve this optimization problem.The conducted experiments on two well-known datasets showed that the optimization process leads to a GED with a better predictive power compared to other methods.All these observations confirm that the proposed method helps to fit edit costs and outperforms other methods.There are still several challenges to address in future work.First, a clear and complete comparison to other methods cited in the introduction and related works will be established.Second, we seek to examine other criteria than the distance-preserving criterion, such as the conformal map for instance [?].Third, from a theoretical point of view, we are interested in establishing convergence proof on our alternated optimization strategy, and to extend these proofs to approximate computations of graph edit distances.Fourth, this scheme will be extended to classification problem and non-constant costs to be applicable in most application domains.Considering non-constant costs will need to optimize parametric functions rather than scalar values, hence complexifying the procedure.
proportion of examples are on average 2γ more similar to reasonable examples of the same class than to reasonable examples of the opposite class, where a τ proportion of examples must be reasonable.

Table 1 :
L. Jia et al.Average and standard deviation of fitted edit costs values