This page collects results we have obtained in recent years concerning the analysis of networks or of processes on networks
We present analytical results for the distribution of the number of cycles in directed and undirected random 2-regular graphs (2-RRGs) consisting of $N$ nodes. In directed 2-RRGs each node has one inbound link and one outbound link, while in undirected 2-RRGs each node has two undirected links. Since all the nodes are of degree $k = 2$, the resulting networks consist of cycles. These cycles exhibit a broad spectrum of lengths, where the average length of the shortest cycle in a random network instance scales with $\ln N$, while the length of the longest cycle scales with $N$. The number of cycles varies between different network instances in the ensemble, where the mean number of cycles $\langle S \rangle$ scales with $\ln N$. Here we present exact analytical results for the distribution $P_N(S = s)$ of the number of cycles $s$ in ensembles of directed and undirected 2-RRGs, expressed in terms of the Stirling numbers of the first kind. In both cases the distributions converge to a Poisson distribution in the large $N$ limit. The moments and cumulants of $P_N(S = s)$ are also calculated. The statistical properties of directed 2-RRGs are equivalent to the combinatorics of cycles in random permutations of N objects. In this context our results recover and extend known results. In contrast, the statistical properties of cycles in undirected 2-RRGs have not been studied before.
Conjugated polymers are employed in a variety of application areas due to their bright fluorescence and strong biocompatability. However, understanding the struc- ture of amorphous conjugated polymers on the nanoscale is extremely challenging compared to their related crystalline phases. Using a bespoke classical force field, we study amorphous poly(9,9-di-n-octylfluorene-alt-benzothiadiazole) (F8BT) with molec- ular dynamics simulations to investigate the role that its nanoscale structure plays in controlling its emergent all-important optical properties. Notably, we show that a giant percolating cluster exists within amorphous F8BT, which has ramifications in under- standing the nature of interchain species that drive the quantum yield reduction and bathochromic shift observed in conjugated polymer-based devices and nanostructures. We also show that distinct conformations can be unravelled from within the disordered structure of amorphous F8BT using a two-stage applied machine learning protocol, highlighting a link between molecular conformation and ring stacking propensity. This work provides new predictive understanding by which to enhance the optical proper- ties of next-generation conjugated polymer-based devices and materials by rational, simulation-led design principles. Supplementary Information
The distribution of shortest path lengths (DSPL) of random networks provides useful information on their large scale structure. In the special case of random regular graphs (RRGs), which consist of $N$ nodes of degree $c \ge 3$, the DSPL, denoted by $P(L=\ell)$, follows a discrete Gompertz distribution. Using the discrete Laplace transform and the Euler-Maclaurin formula we derive a closed-form expression for the moment generating function of the DSPL of RRGs. From the moment generating function we obtain closed-form expressions for the mean and variance of the DSPL. More specifically, we find that the mean distance between pairs of distinct nodes is given by $\langle L \rangle = \frac{\ln N}{\ln (c-1)} + \frac{1}{2} - \frac{ \ln c - \ln (c-2) +\gamma}{\ln (c-1)} + \mathcal{O} \left( \frac{\ln N}{N} \right)$, where $\gamma$ is the Euler-Mascheroni constant. While the leading term is known, this result includes a novel correction term, which yields very good agreement with the results obtained from direct numerical evaluation of $\langle L \rangle$ via the tail-sum formula and with the results obtained from computer simulations. However, it does not account for an oscillatory behavior of $\langle L \rangle$ as a function of $c$, which is negligible in the sparse-network limit but detectable in the dense-network limit. We also derive a closed-form expression for the variance ${\rm Var}(L)$ of the DSPL. This expression captures the overall dependence of the variance on the degree $c$ but does not account for the oscillations. The oscillations of the mean and variance are analyzed and discussed. The results for the mean and variance of the DSPL are compared to the corresponding results obtained in other types of random networks. The relation to the diameter is also discussed.
Dynamic processes of interacting units on a network are out of equilibrium in general. In the case of a directed tree, the dynamic cavity method provides an efficient tool that characterises the dynamic trajectory of the process for the linear threshold model. However, because of the computational complexity of the method, the analysis has been limited to systems where the largest number of neighbours is small. We devise an efficient implementation of the dynamic cavity method which substantially reduces the computational complexity of the method for systems with discrete couplings. Our approach opens up the possibility to investigate the dynamic properties of networks with fat-tailed degree distribution. We exploit this new implementation to study properties of the non-equilibrium steady-state. We extend the dynamical cavity approach to calculate the pairwise correlations induced by different motifs in the network. Our results suggest that just two basic motifs of the network are able to accurately describe the entire statistics of observed correlations. Finally, we investigate models defined on networks containing bi-directional interactions. We observe that the stationary state associated with networks with symmetric or anti-symmetric interactions is biased towards the active or inactive state respectively, even if independent interaction entries are drawn from a symmetric distribution. This phenomenon, which can be regarded as a form of spontaneous symmetry-breaking, is peculiar to systems formulated in terms of Boolean variables, as opposed to Ising spins.
The dynamic cavity method provides the most efficient way to evaluate probabilities of dynamic trajectories in systems of stochastic units with unidirectional sparse interactions. It is closely related to sum-product algorithms widely used to compute marginal functions from complicated global functions of many variables, with applications in disordered systems, combinatorial optimization, and computer science. However, the complexity of the cavity approach grows exponentially with the in-degrees of the interacting units, which creates a defacto barrier for the successful analysis of systems with fat-tailed in-degree distributions. In this paper, we present a dynamic programming algorithm that overcomes this barrier by reducing the computational complexity in the in-degrees from exponential to quadratic, whenever couplings are chosen randomly from (or can be approximated in terms of) discrete, possibly unit-dependent, sets of equidistant values. As a case study, we analyze the dynamics of a random Boolean network with a fat-tailed degree distribution and fully asymmetric binary ±J couplings, and we use the power of the algorithm to unlock the noise-dependent heterogeneity of stationary node activation patterns in such a system.
We investigate the statistics of articulation points and bredges (bridge edges) in complex networks in which bonds are randomly removed in a percolation process. Because of the heterogeneous structure of a complex network, the probability of a node to be an articulation point or the probability of an edge to be a bredge will not be homogeneous across the network. We therefore analyze full distributions of articulation point probabilities as well as bredge probabilities, using a message-passing or cavity approach to the problem. Our methods allow us to obtain these distributions both for large single instances of networks and for ensembles of networks in the configuration model class in the thermodynamic limit, through a single unified approach. We also evaluate deconvolutions of these distributions according to degrees of the node or the degrees of both adjacent nodes in the case of bredges. We obtain closed form expressions for the large mean degree limit of Erdos-Rényi networks. Moreover, we reveal and are able to rationalize a significant amount of structure in the evolution of articulation point and bredge probabilities in response to random bond removal. We find that full distributions of articulation point and bredge probabilities in real networks and in their randomized counterparts may exhibit significant differences even where average articulation point and bredge probabilities do not. We argue that our results could be exploited in a variety of applications, including approaches to network dismantling or to vaccination and islanding strategies to prevent the spread of epidemics or of blackouts in process networks.
In modern society people are being exposed to numerous information, with some of them being frequently repeated or more disruptive than others. In this paper we use a model of opinion dynamics to study how this news impact the society. In particular, our study aims to explain how the exposure of the society to certain events deeply change people’s perception of the present and future. The evolution of opinions which we consider is influenced both by external information and the pressure of the society. The latter includes imitation, differentiation, homophily and its opposite, xenophobia. The combination of these ingredients gives rise to a collective memory effect, which is triggered by external information. In this paper we focus our attention on how this memory arises when the order of appearance of external news is random. We will show which characteristics a piece of news needs to have in order to be embedded in the society’s memory. We will also provide an analytical way to measure how much information a society can remember when an extensive number of news items is presented. Finally we will show that, when a certain piece of news is present in the society’s history, even a distorted version of it is sufficient to trigger the memory of the originally stored information.
We consider a simplified model for gene regulation, where gene expression is regulated by transcription factors (TFs), which are single proteins or protein complexes. Proteins are in turn synthesised from expressed genes, creating a feedback loop of regulation. This leads to a directed bipartite network in which a link from a gene to a TF exists if the gene codes for a protein contributing to the TF, and a link from a TF to a gene exists if the TF regulates the expression of the gene. Both genes and TFs are modelled as binary variables, which indicate, respectively, whether a gene is expressed or not, and a TF is synthesised or not. We consider the scenario where for a TF to be synthesised, all of its contributing genes must be expressed. This results in an ``AND'' gate logic for the dynamics of TFs. By adapting percolation theory to directed bipartite graphs, evolving according to the AND logic dynamics, we are able to determine the necessary conditions, in the network parameter space, under which bipartite networks can support a multiplicity of stable gene expression patterns, under noisy conditions, as required in stable cell types. In particular, the analysis reveals the possibility of a bi-stability region, where the extensive percolating cluster is or is not resilient to perturbations. This is remarkably different from the transition observed in standard percolation theory. Finally, we consider perturbations involving single node removal that mimic gene knockout experiments. Results reveal the strong dependence of the gene knockout cascade on the logic implemented in the underlying network dynamics, highlighting in particular that avalanche sizes cannot be easily related to gene-gene interaction networks.
We investigate the heterogeneity of outcomes of repeated instances of percolation experiments in complex networks using a message passing approach to evaluate heterogeneous, node dependent probabilities of belonging to the giant or percolating cluster, i.e.\, the set of mutually connected nodes whose size scales linearly with the size of the system. We evaluate these both for large finite single instances, and for synthetic networks in the configuration model class in the thermodynamic limit. For the latter, we consider both Erd\H{o}s-R\'enyi and scale free networks as examples of networks with narrow and broad degree distributions respectively. For real-world networks we use an undirected version of a Gnutella peer-to-peer file-sharing network with $N=62,568$ nodes as an example. We derive the theory for multiple instances of both uncorrelated and correlated percolation processes. For the uncorrelated case, we also obtain a closed form approximation for the large mean degree limit of Erd\H{o}s-R\'enyi networks.
In order to understand the development of common orientation of opinions in the modern world, we propose a model of a society described as a large collection of agents that exchange their expressed opinions under the influence of their mutual interactions and external events. In particular, we introduce an interaction bias which creates a collective memory effect such that the society is able to store and recall information coming from several external signals. Our model shows how the inner structure of the society and its future reactions can be shaped by its own history. We will provide an analytical explanation of how this might occur and we will show the emergent similarity between the reaction of a society modelled in this way and the Hopfield mechanism for information retrieval.
A bredge (bridge-edge) in a network is an edge whose deletion would split the network component on which it resides into two separate components. Bredges are vulnerable links that play an important role in network collapse processes, which may result from node or link failures, attacks or epidemics. Therefore, the abundance and properties of bredges affect the resilience of the network to these collapse scenarios. We present analytical results for the statistical properties of bredges in configuration model networks. Using a generating function approach based on the cavity method, we calculate the probability $\widehat P(e \in {\rm B})$ that a random edge $e$ in a configuration model network with degree distribution $P(k)$ is a bredge (B). We also calculate the joint degree distribution $\widehat P(k,k' | {\rm B})$ of the end-nodes $i$ and $i'$ of a random bredge. We examine the distinct properties of bredges on the giant component (GC) and on the finite tree components (FC) of the network. On the finite components all the edges are bredges and there are no degree-degree correlations. We calculate the probability $\widehat P(e \in {\rm B}|{\rm GC})$ that a random edge on the giant component is a bredge. We also calculate the joint degree distribution $\widehat P(k,k'|{\rm B},{\rm GC})$ of the end-nodes of bredges and the joint degree distribution $\widehat P(k,k'|{\rm NB},{\rm GC})$ of the end-nodes of non-bredge (NB) edges on the giant component. Surprisingly, it is found that the degrees $k$ and $k'$ of the end-nodes of bredges are correlated, while the degrees of the end-nodes of non-bredge edges are uncorrelated. We thus conclude that all the degree-degree correlations on the giant component are concentrated on the bredges. We calculate the covariance $\Gamma({\rm B},{\rm GC})$ of the joint degree distribution of end-nodes of bredges and show it is negative, namely bredges tend to connect high degree nodes to low degree nodes. We apply this analysis to ensembles of configuration model networks with degree distributions that follow a Poisson distribution (Erd{\H o}s-R\'enyi networks), an exponential distribution and a power-law distribution (scale-free networks). The implications of these results are discussed in the context of common attack scenarios and network dismantling processes.
Boolean networks are popular models for gene regulation, where genes are regarded as binary units, that can be either expressed or not, each updated at regular time intervals according to a random Boolean function of its neighbouring genes. Stable gene expression profiles, corresponding to cell types, are regarded as attractors of the network dynamics. However, the random character of gene updates gives no insight into the biological mechanism behind the existence of attractors. We propose a bipartite Boolean network approach which integrates genes and regulatory proteins (i.e. transcription factors) into a single network, where interactions incorporate two fundamental aspects of cellular biology, i.e. gene expression and gene regulation, and the resulting dynamics is highly non-linear. Since any finite stochastic system is ergodic, the emergence of an attractor structure, stable under noisy conditions, requires a giant component in the bipartite graph. By adapting graph percolation techniques to directed bipartite graphs, we are able to calculate exactly the region, in the network parameters space, where a cell can sustain steady-state gene expression profiles, in the absence of inhibitors, and we quantify numerically the effect of inhibitors. Results show that for cells to sustain a steady-state gene expression profile, transcription factors should typically be small protein complexes that regulate many genes. This condition is crucial for cell reprogramming and remarkably well in line with biological facts.
We present a method for the construction of ensembles of random networks that consist of a single connected component with a given degree distribution. This approach extends the construction toolbox of random networks beyond the configuration model framework, in which one controls the degree distribution but not the number of components and their sizes. Unlike configuration model networks, which are completely uncorrelated, the resulting single-component networks exhibit degree-degree correlations. Moreover, they are found to be disassortative, namely high-degree nodes tend to connect to low-degree nodes and vice versa. We demonstrate the method for single-component networks with ternary, exponential and power-law degree distributions.
We investigate hide-and-seek games on complex networks using a random walk framework. Specifically, we investigate the efficiency of various degree-biased random walk search strategies to locate items that are randomly hidden on a subset of vertices of a random graph. Vertices at which items are hidden in the network are chosen at random as well, though with probabilities that may depend on degree. We pitch various hide and seek strategies against each other, and determine the efficiency of search strategies by computing the average number of hidden items that a searcher will uncover in a random walk of n steps. Our analysis is based on the cavity method for finite single instances of the problem, and generalises previous work of De Bacco et al. [1] so as to cover degree-biased random walks. We also extend the analysis to deal with the thermodynamic limit of infinite system size. We study a broad spectrum of functional forms for the degree bias of both the hiding and the search strategy and investigate the efficiency of families of search strategies for cases where their functional form is either matched or unmatched to that of the hiding strategy. Our results are in excellent agreement with those of numerical simulations. We propose two simple approximations for predicting efficient search strategies. One is based on an equilibrium analysis of the random walk search strategy. While not exact, it produces correct orders of magnitude for parameters characterising optimal search strategies. The second exploits the existence of an effective drift in random walks on networks, and is expected to be efficient in systems with low concentration of small degree nodes.
An articulation point (AP) in a network is a node whose deletion would split the network component on which it resides into two or more components. APs are vulnerable spots which play an important role in network collapse processes, which may result from node failures, attacks or epidemics. Therefore, the abundance and properties of APs affect the resilience of the network to these collapse scenarios. Here we present analytical results for the statistical properties of APs in configuration model networks. In order to quantify the abundance of APs, we calculate the probability P (i ∈ AP), that a random node, i, in a configuration model network with a given degree distribution, P (K = k), is an AP. We also obtain the conditional probability P (i ∈ AP|k) that a random node of degree k is an AP, and find that high degree nodes are more likely to be APs than low degree nodes. Using Bayes’ theorem, we obtain the conditional degree distribution, P (K = k|AP), over the set of APs and compare it to the overall degree distribution P (K = k). Apart from its degree, each node can be characterized by its articulation rank, r, which is the number of components that would be added to the network upon deletion of that node. For nodes which are not APs the articulation rank is r = 0, while for APs it satisfies r ≥ 1. We obtain a closed form analytical expression for the distribution of articulation ranks, P (R = r). Configuration model networks often exhibit a coexistence between a giant component and finite components. While the giant component is extensive in the network size and exhibits cycles, the finite components are non-extensive tree structures. To examine the distinct properties of APs on the giant and finite components, we evaluate the probabilities presented above separately for the giant and the finite components. We apply these results to ensembles of configuration model networks with degree distributions which follow a Poisson distribution (Erd ̋os-R ́enyi networks), an exponential distribution of the form P (K = k) ∼ e−αk and a power-law distribution of the form P (K = k) ∼ k−γ (scale-free networks), where k ≥ kmin = 1. It is found that for the Poisson and exponential degree distributions, as the mean degree c is increased, the fraction of APs in the network increases in the sparse network regime, reaches a maximum value and then declines as the dense network regime is approached. In contrast, for the power-law distribution the fraction of APs in the network increases monotonically, reaching a non-zero asymptotic value in the dense network limit.
The micro-structure of the giant cluster of the Erd ̋os-R ́enyi network and other configuration model networks is analyzed using generating function methods. While configuration model networks are uncorrelated, the giant cluster exhibits a degree distribution which is different from the overall degree distribution of the network and includes degree-degree correlations of all orders. We present exact analytical results for the degree distributions as well as higher order degree-degree correlations on the giant clusters of configuration model networks. We show that the degree-degree correlations are essential for the integrity of the giant cluster, in the sense that the degree distribution alone cannot guarantee that it will consist of a single connected component. To demonstrate the importance and broad applicability of these results, we apply them to the study of the distribution of shortest path lengths on the giant cluster, percolation on the giant cluster and the spectra of sparse matrices defined on the giant cluster. We show that by using the degree distribution on the giant cluster, and furthermore when taking degree-degree correlations into account, we obtain high quality results for these properties. This suggests that many existing methods could be simply adapted to yield results conditioned on the giant cluster.
We present analytical results for the distribution of shortest cycle lengths (DSCL) in random networks. The approach is based on the relation between the DSCL and the distribution of shortest path lengths (DSPL). We apply this approach to configuration model networks, for which analytical results for the DSPL were obtained before. We first calculate the fraction of nodes in the network which reside on at least one cycle. Conditioning on being on a cycle, we provide the DSCL over ensembles of configuration model networks with degree distributions which follow a Poisson distribution (Erdos-R\'enyi network), degenerate distribution (random regular graph) and a power-law distribution (scale-free network). The mean and variance of the DSCL are calculated. The analytical results are found to be in very good agreement with the results of computer simulations.
We examine the heterogeneous responses of individual nodes in sparse networks to the random removal of a fraction of edges. Using the message-passing formulation of percolation, we discover considerable variation across the network in the probability of a particular node to remain part of the giant component, and in the expected size of small clusters containing that node. In the vicinity of the percolation threshold, weakly non-linear analysis reveals that node-to-node heterogeneity is captured by the recently introduced notion of non-backtracking centrality. We supplement these results for fixed finite networks by a population dynamics approach to analyse random graph models in the infinite system size limit, also providing closed-form approximations for the large mean degree limit of Erdö&-Renyi random graphs. Interpreted in terms of the application of percolation to real-world processes, our results shed light on the heterogeneous exposure of different nodes to cascading failures, epidemic spread, and information flow.
We construct a model of cell reprogramming (the conversion of fully differentiated cells to a state of pluripotency, known as induced pluripotent stem cells, or iPSCs) which builds on key elements of cell biology viz. cell cycles and cell lineages. Although reprogramming has been demonstrated experimentally, much of the underlying processes governing cell fate decisions remain unknown. This work aims to bridge this gap by modelling cell types as a set of hierarchically related dynamical attractors representing cell cycles. Stages of the cell cycle are characterised by the configuration of gene expression levels, and reprogramming corresponds to triggering transitions between such configurations. Two mechanisms were found for reprogramming in a two level hierarchy: cycle specific perturbations and a noise induced switching. The former corresponds to a directed perturbation that induces a transition into a cycle-state of a different cell type in the potency hierarchy (mainly a stem cell) whilst the latter is a priori undirected and could be induced, e.g., by a (stochastic) change in the cellular environment. These reprogramming protocols were found to be effective in large regimes of the parameter space and make specific predictions concerning reprogramming dynamics which are broadly in line with experimental findings.
Rare event statistics for random walks on complex networks are investigated using the large deviation formalism. Within this formalism, rare events are realised as typical events in a suitably deformed path-ensemble, and their statistics can be studied in terms of spectral properties of a deformed Markov transition matrix. We observe two different types of phase transition in such systems: (i) rare events which are singled out for sufficiently large values of the deformation parameter may correspond to {\em localised\/} modes of the deformed transition matrix; (ii) ``mode-switching transitions" may occur as the deformation parameter is varied. Details depend on the nature of the observable for which the rare event statistics is studied, as well as on the underlying graph ensemble. In the present paper we report results on rare events statistics for path averages of random walks in Erd\H{o}s-R\'enyi and scale free networks. Large deviation rate functions and localisation properties are studied numerically. For observables of the type considered here, we also derive an analytical approximation for the Legendre transform of the large deviation rate function, which is valid in the large connectivity limit. It is found to agree well with simulations.
We present analytical results for the distribution of shortest path lengths between random pairs of nodes in configuration model networks. The results, which are based on recursion equations, are shown to be in good agreement with numerical simulations for networks with degenerate, binomial, and power-law degree distributions. The mean, mode, and variance of the distribution of shortest path lengths are also evaluated. These results provide expressions for central measures and dispersion measures of the distribution of shortest path lengths in terms of moments of the degree distribution, illuminating the connection between the two distributions.
We present two complementary analytical approaches for calculating the distribution of shortest path lengths in Erdös-Renyi networks, based on recursion equations for the shells around a reference node and for the paths originating from it. The results are in agreement with numerical simulations for a broad range of network sizes and connectivities. The average and standard deviation of the distribution are also obtained. In the case that the mean degree scales as $N^\alpha$ with the network size, the distribution becomes extremely narrow in the asymptotic limit, namely almost all pairs of nodes are equidistant, at distance $d=⌊1/\alpha⌋$ from each other. The distribution of shortest path lengths between nodes of degree m and the rest of the network is calculated. Its average is shown to be a monotonically decreasing function of m, providing an interesting relation between a local property and a global property of the network. The methodology presented here can be applied to more general classes of networks.
One of the key characteristics of cancer cells is an increased phenotypic plasticity, driven by underlying genetic and epigenetic perturbations. However, at a systems-level it is unclear how these perturbations give rise to the observed increased plasticity. Elucidating such systems-level principles is key for an improved understanding of cancer. Recently, it has been shown that signaling entropy, an overall measure of signaling pathway promiscuity, and computable from integrating a sample's gene expression profile with a protein interaction network, correlates with phenotypic plasticity and is increased in cancer compared to normal tissue. Here we develop a computational framework for studying the effects of network perturbations on signaling entropy. We demonstrate that the increased signaling entropy of cancer is driven by two factors: (i) the scale-free (or near scale-free) topology of the interaction network, and (ii) a subtle positive correlation between differential gene expression and node connectivity. Indeed, we show that if protein interaction networks were random graphs, described by Poisson degree distributions, that cancer would generally not exhibit an increased signaling entropy. In summary, this work exposes a deep connection between cancer, signaling entropy and interaction network topology.
A key challenge in systems biology is the elucidation of the underlying principles, or fundamental laws, which determine the cellular phenotype. Understanding how these fundamental principles are altered in diseases like cancer is important for translating basic scientific knowledge into clinical advances. While significant progress is being made, with the identification of novel drug targets and treatments by means of systems biological methods, our fundamental systems level understanding of why certain treatments succeed and others fail is still lacking. We here advocate a novel methodological framework for systems analysis and interpretation of molecular omic data, which is based on statistical mechanical principles. Specifically, we propose the notion of cellular signalling entropy (or uncertainty), as a novel means of analysing and interpreting omic data, and more fundamentally, as a means of elucidating systems-level principles underlying basic biology and disease. We describe the power of signalling entropy to discriminate cells according to differentiation potential and cancer status. We further argue the case for an empirical cellular entropy-robustness correlation theorem and demonstrate its existence in cancer cell line drug sensitivity data. Specifically, we find that high signalling entropy correlates with drug resistance and further describe how entropy could be used to identify the Achilles heels of cancer cells. In summary, signalling entropy is a deep and powerful concept, based on rigorous statistical mechanical principles, which, with improved data quality and coverage, will allow a much deeper understanding of the systems biological principles underlying normal and disease physiology.