c Complex Networks & Systems - Luis M. Rocha

Distance Backbones

Network science has provided many insights into the organization of complex systems. The success of this approach is its ability to capture the organization of multivariate interactions as networks or graphs without explicit dynamical rules for node variables. As the field matures, however, there is a need to move from understanding to controlling complex systems. This is particularly true in systems biology and medicine, where increasingly accurate models of biochemical regulation have been produced, or in social networks to study epidemic spread for pandemic management. We have contributed to this goal with two mathematical concepts which allow us to remove different forms of redundancy in networks: 1) distance closures, and 2) canalization via schema re-description. The first concept allows us to infer the invariant subgraph that is sufficient to compute all shortest paths in a weighted graph: the distance backbone. This has demonstrated that there is massive redundancy in many networks in different domains, whereby most edges in a network are not necessary to compute shortest paths (e.g. 90% of edges in some brain networks). Removing redundant edges can facilitate computation and discovery of important pathways in many applications. The removal of this redundancy simplifies and indeed enables the characterization of information transmission and dynamics on large biochemical, brain, social, technological, and knowledge (including for automatic fact-checking) networks, which are otherwise too large to study analytically [Correia et al, 2022; Correia, Barrat, and Rocha, 2022; Simas, Correia, and Rocha, 2022; Teixeira et al, 2020; Simas and Rocha, 2015; Ciampaglia et al, 2015; Simas and Rocha, 2012; Rocha, 2002] .

The U.S. domestic nonstop airport transportation network and backbones for the year 2006. A1. Distance network with weights representing the average number of passengers between two airports. This is a reconstruction of the network used in Serrano et al. [55] that keeps only the largest connected component and removes some U.S territory airports (e.g. Guam). A2. Metric backbone with only 16% of the original edges. A3. Ultrametric backbone with only 9% of the original edges. B. Log-binned distribution of semi-metric distortion values for the 84% of semi-metric edges in the network. From Simas, Correia, and Rocha [2022]

Proximity network of drug-drug interactions (DDI network) extracted from a electronic health records of a large population. Nodes denote drugs involved in at least one co-administration known to be a DDI. Node color represents the highest level of primary action class, as retrieved from Drugs.com (see legend). Node size represents the probability of interaction. Edge weights denote risk in population. Edge colors denote edges that are higher risk for females (blue) or males (red). From Correia et al [2019]

Weighted and Proximity Graphs

The prime example of a Document Network (DN) is the World Wide Web (WWW). But many other types of such networks exist: Wikipedia, electronic health records, bibliographic databases containing scientific publications, social media platforms, as well as databases of datasets used in scientific endeavors. Each of these databases possesses several distinct relationships among documents and between documents and semantic tags or indices that classify documents appropriately. For instance, documents in the WWW are related via a hyperlink network, while documents in bibliographic databases are related by citation and collaboration networks. Furthermore, documents can be related to semantic tags such as keywords used to describe their content. Given these relations, we can compute distance functions (typically via co-occurrence measures) amongst documents and/or semantic tags, thus creating associative, weighted networks between these items—which denote stronger or weaker co-associations. We have used such distance and proximity networks for inference and discovery of protein-protein and drug-drug interactions, health and comorbidity risks, gender and age biases, automatic fact-checking, pharmacokinetic parameters in drug interaction and adverse reaction studies, protein sequence and structure prediction, functional annotation of transcription data, enzyme annotation publications, etc (see publications listed below). We have also used distance and proximity graphs to uncover modules or clusters in the network that may be associated with a particular topic or community of interest. We have applied clustering methods to biomedical, social, knowledge, scientific co-authorship and citation networks (see our bibliome informatics and adaptive web projects for more details),etc. We have also used information-theoretical approaches to classify documents of interest in probabilistic graphs of citation co-occurrence in scientific citation networks [Kolchinsky et al, 2010].

Semi-metric networks

We study the non-metric network topologies that arise in weighted graphs obtained from real-world data (e.g. co-occurrence statistics). In particular, we have developed measures to extract the graph edges which most violate the triangle inequality: semi-metric associations (which are removed to reveal distance backbones above). Our working hypothesis is that strong semi-metric associations can be used to identify trends, items with a higher probability of co-occurring in the future, as well the dynamics of such networks in general. This methodology has been successfully applied to networks of biochemical and biomedical entities, published documents, recommender systems for digital libraries at the Los Alamos National Laboratory, web search and recommendation by the givealink.org project, networks of felons obtained from intelligence records, and gene networks (see publications below). This work has been pursued in the Identification of Interests, Trends and Dynamics in Document Networks Project as well as in a Los Alamos Homeland Security LDRD DR project, “Advanced Knowledge Integration (LDRD Reserve)” to discover latent associations in social networks (internal report available). Some of this work was also funded by NSF grant from the Human and Social Dynamics program, With Eliot Smith and Rob Goldstone—which received some attention in the media for being one of the first to look at facebook data.

An associative network of people names extracted from co-occurrence in documents in a database as described in an internal report. You can also see a 3D Video (Real Video) of this network.

Cumulative degree distribution for network sizes as predicted by stochastic model of (Simas and Rocha, 2008)

Stochastic model of cut-off behavior

We have developed a stochastic model of vertex aging in networks, to better predict network growth [Simas and Rocha, 2008]. Real world networks display a cut-off in the power-law node degree distributions of complex networks, not expected by the canonical Barabasi-Albert Model. Amaral et al had shown that this cut-off behavior can be computationally modeled with vertex aging. We produced a mathematical model of vertex aging, which allows accurate predictions of the equilibrium point of active vertices and relate network growth with probability of aging.

Funding Project partially funded by

Project Members (Current and Former)

Luis M. Rocha (PI)

Alaa Abi-Haidar

Alain Barrat

Katy Börner

Paulo Navarro Costa

Rion Brattig Correia

Felipe Xavier Costa

Ziqi Guo

Artemy Kolchinsky

Ana Maguitman

Jordan C. Rozum

Tiago Simas

David Soriano-Paños

Olaf Sporns

Andreia Sofia Teixeira

Selected Project Publications