I am a complexity scientist, currently working as a postdoctoral researcher at the Computational Science Lab (CSL), University of Amsterdam, The Netherlands. I am supervised by Prof. dr. Rick Quax and Prof. dr. Jos A. Bosch. My research interests lie in multilayer disease networks, higher-order interactions, information theory, deep learning and LLMs on graphs. My current research focus is mainly on building information-theoretic higher-order networks and applying the models to unravel the intricaices of human disease connectome within the scope of an EU-funded project TO_AITION. The major goal of this project is to understand causative mechanisms underlying the comorbidity of cadiovascular diseases and depression and identify significant biomarkers responsible for the development of these conditions. The core of my work is to infer higher-order interactions based on information theory, construct multilayer disease networks, and develop novel methods for further network analysis. Prior to being a postdoctoral researcher, I obtained my Ph.D. at Hokkaido University, Japan, where I was involved in a research project on information dynamics for complex ecosystem prediction and design led by Prof. dr. Matteo Convertino. My Ph.D. thesis: Information Dynamics for Complex Ecosystem Prediction and Design.
Literatures: Finding key players in complex networks through deep reinforcement learning.
Hypergraphs can better represent complex systems where interactions extend beyond pairs of nodes. Driver nodes in hypergraphs, crucial for system control, can be defined as node impact on system dynamics in response to an intervention, but often identified using structural properties such as centrality metrics. Current research in graph theory reveals that structurally central nodes may not necessarily be driver nodes, and the essential consideration of system dynamics is crucial. Despite this, due to the fact that structural properties significantly influence systems dynamics, we investigate in this study whether driver nodes can be predicted by structural centrality metrics. To achieve this objective, we construct both purely simulated and data-driven Bayesian networks with known dynamics, enabling precise computation of node impact, which serves as the ground truth. Within these networks, nodes exhibit either conditional dependency on a single node or synergistic dependency on two nodes, forming structures of both pairs and higher-order triplets. Data samples are generated based on the known dynamics of these networks, followed by the reconstruction of association networks and synergistic hypergraphs from the synthetic data. We then apply various centrality metrics to these graphs and hypergraphs, yielding node centrality data, which is used to predict node impact. We evaluate the individual and collective predictive capacities of these centrality metrics by calculating the coefficients of determination. This involves comparing the node impact against each individual centrality metric, as well as against predicted node impact from the random forest model. Results show that no single centrality metric independently possesses the capacity to effectively predict driver nodes in hypergraphs. The random forest model, employing all centrality metrics in purely simulated Bayesian networks, demonstrates enhanced yet limited performance. However, when applied to data-driven networks, its effectiveness significantly diminishes. These findings highlight the imperative to incorporate system dynamics into the methodology for predicting driver nodes in hypergraphs. Given the difficulties in fully understanding the dynamics of complex systems, there arises a need for an advanced algorithm capable of inferring driver nodes, specifically designed for distinct control tasks within hypergraphs.
One of the most impactful comorbidities is the association between cardiovascular disease (CVD) and major depression (MD). Despite its significance, the underlying biological pathways remain elusive, possibly due to complex, non-linear associations spread across multiple mechanisms. In this study, we propose a multipartite projection method based on mutual information correlation to construct multilayer disease networks. We apply our methods to a dataset from the Young Finns Study. In this cross-sectional cohort, two phenotype modules are studied: CVD and MD phenotypes, along with related risk factors and two biological measures including metabolites and lipids. Instead of directly correlating CVD and MD phenotype variables, we extend the notion of a bipartite network to create a multipartite network that connects phenotype variables to intermediate biological variables. Projecting from these intermediate groups results in a weighted multilayer network, where each link between CVD and MD phenotypes is marked by its 'layer' (in our use-case: metabolome or lipidome). We test four methods based on the expectation that the weighted multilayer network should ideally function as a decomposition of the pairwise correlation between phenotype variables. We find that using the sum of the average correlations between a phenotype pair and their shared neighboring variables as the projected correlation performs best in our current dataset. The projected correlation network finds gender and BMI as important risk factors related to cardiovascular diseases and depression. The projection method identifies significant biological mediators, including creatinine and leucine in metabolites, and acylcarnitine and phosphatidylcholines in lipids, explaining the comorbidity between CVD and MD. These findings suggest the potential role of these mediators in comorbidity development due to exposure to the risk factors. Our method generalizes to any number of layers and phenotype modules, offering a truly system-level overview of biological pathways contributing to comorbidity.
The detection of causal interactions is of great importance when inferring complex ecosystem functional and structural networks for basic and applied research. Convergent cross mapping (CCM) based on nonlinear state-space reconstruction made substantial progress about network inference by measuring how well historical values of one variable can reliably estimate states of other variables. Here we investigate the ability of a developed optimal information flow (OIF) ecosystem model to infer bidirectional causality and compare that to CCM. Results from synthetic datasets generated by a simple predator-prey model, data of a real-world sardine-anchovy-temperature system and of a multispecies fish ecosystem highlight that the proposed OIF performs better than CCM to predict population and community patterns. Specifically, OIF provides a larger gradient of inferred interactions, higher point-value accuracy and smaller fluctuations of interactions and -diversity including their characteristic time delays. Overall OIF outperforms all other models in assessing predictive causality (also in terms of computational complexity) due to the explicit consideration of synchronization, divergence and diversity of events that define model sensitivity, uncertainty and complexity. Thus, OIF offers a broad ecological information by extracting predictive causal networks of complex ecosystems from time-series data in the space-time continuum. The accurate inference of species interactions at any biological scale of organization is highly valuable because it allows to predict biodiversity changes, for instance as a function of climate and other anthropogenic stressors. This has practical implications for defining optimal ecosystem management and design, such as fish stock prioritization and delineation of marine protected areas based on derived collective multispecies assembly. OIF can be applied to any complex system and used for model evaluation and design where causality should be considered as non-linear predictability of diverse events of populations or communities.
Journals
1. Jie Li, Cillian Hourican, Stavroula Tassi, Jos A. Bosch, Rick Quax. Predicting driver nodes in synergistic hypergraphs using machine learning techniques. (To be submitted)Patents
1. Zhengdi Qin, Jie Li, Tianjiong Zhang, Siping Chen. “Digital TCD Ultrasound Blood Flow Detection System” [P]. Chinese Patent:2015200954772, 2015.7.29. (Chinese Patent)Conferences
1. Shaoxing Li, Tianjiong Zhang, Jie Li, et al. “Experimental Study on Digital Design of Doppler Ultrasound with Coded Excitation”, 4th AMITP, 24-26 Sep. 2016, Guilin, China.Presentations
1. Jie Li. “Multilayer Networks of Cardiovascular Diseases and Depression via Multipartite Projections”. Dutch NetSci Summer Symposium 2023 (Dutch NetSci2023), Delft, The Netherlands, August 30-31, 2023.Fall 2023, Scientific Data Analysis, lecturing and management.
Spring 2023, Seminars Computational Science, lecturing and management.
Fall 2022, Scientific Data Analysis, lecturing and grading.
Spring 2023, Johanna Gehlen, Master Student.
Thesis: An exploration of the bias in the O-information estimation with application to the comorbitity between cardiovascular disease and depression. Download
Congratulations to Johanna on passing her master thesis with an excellent score of 9!
You can reach out to me through the following channels:
Address: Lab 42, Science Park 900, 1098 XH Amsterdam, The Netherlands
Email: jieli198973@gmail.com