Jie Li

Research

Exploring the use of machine learning in complex systems (graphs and hypergraphs).

Literatures: Finding key players in complex networks through deep reinforcement learning.

Large Language Models on Graphs: A Comprehensive Survey.

Discovering biomarkers associated and predicting cardiovascular disease with high accuracy using a novel nexus of machine learning techniques for precision medicine.
Do structural features predict dynamic node impact in synergistic hypergraphs?

Hypergraphs can better represent complex systems where interactions extend beyond pairs of nodes. Driver nodes in hypergraphs, crucial for system control, can be defined as node impact on system dynamics in response to an intervention, but often identified using structural properties such as centrality metrics. Current research in graph theory reveals that structurally central nodes may not necessarily be driver nodes, and the essential consideration of system dynamics is crucial. Despite this, due to the fact that structural properties significantly influence systems dynamics, we investigate in this study whether driver nodes can be predicted by structural centrality metrics. To achieve this objective, we construct both purely simulated and data-driven Bayesian networks with known dynamics, enabling precise computation of node impact, which serves as the ground truth. Within these networks, nodes exhibit either conditional dependency on a single node or synergistic dependency on two nodes, forming structures of both pairs and higher-order triplets. Data samples are generated based on the known dynamics of these networks, followed by the reconstruction of association networks and synergistic hypergraphs from the synthetic data. We then apply various centrality metrics to these graphs and hypergraphs, yielding node centrality data, which is used to predict node impact. We evaluate the individual and collective predictive capacities of these centrality metrics by calculating the coefficients of determination. This involves comparing the node impact against each individual centrality metric, as well as against predicted node impact from the random forest model. Results show that no single centrality metric independently possesses the capacity to effectively predict driver nodes in hypergraphs. The random forest model, employing all centrality metrics in purely simulated Bayesian networks, demonstrates enhanced yet limited performance. However, when applied to data-driven networks, its effectiveness significantly diminishes. These findings highlight the imperative to incorporate system dynamics into the methodology for predicting driver nodes in hypergraphs. Given the difficulties in fully understanding the dynamics of complex systems, there arises a need for an advanced algorithm capable of inferring driver nodes, specifically designed for distinct control tasks within hypergraphs.

Figure 1: Multiplex disease graphs.

Figure 2: Predicted impact vs. actual impact in log-log scale. (A) Model trained by centrality data from simulated BNs. (B) Model trained by centrality data from simulated BNs and the ground truth.
Multilayer Networks of Cardiovascular Diseases and Depression via Multipartite Projections

One of the most impactful comorbidities is the association between cardiovascular disease (CVD) and major depression (MD). Despite its significance, the underlying biological pathways remain elusive, possibly due to complex, non-linear associations spread across multiple mechanisms. In this study, we propose a multipartite projection method based on mutual information correlation to construct multilayer disease networks. We apply our methods to a dataset from the Young Finns Study. In this cross-sectional cohort, two phenotype modules are studied: CVD and MD phenotypes, along with related risk factors and two biological measures including metabolites and lipids. Instead of directly correlating CVD and MD phenotype variables, we extend the notion of a bipartite network to create a multipartite network that connects phenotype variables to intermediate biological variables. Projecting from these intermediate groups results in a weighted multilayer network, where each link between CVD and MD phenotypes is marked by its 'layer' (in our use-case: metabolome or lipidome). We test four methods based on the expectation that the weighted multilayer network should ideally function as a decomposition of the pairwise correlation between phenotype variables. We find that using the sum of the average correlations between a phenotype pair and their shared neighboring variables as the projected correlation performs best in our current dataset. The projected correlation network finds gender and BMI as important risk factors related to cardiovascular diseases and depression. The projection method identifies significant biological mediators, including creatinine and leucine in metabolites, and acylcarnitine and phosphatidylcholines in lipids, explaining the comorbidity between CVD and MD. These findings suggest the potential role of these mediators in comorbidity development due to exposure to the risk factors. Our method generalizes to any number of layers and phenotype modules, offering a truly system-level overview of biological pathways contributing to comorbidity.

Figure 1: Stylized figure of the multipartite projection method.

Figure 2: Multilayer projected disease networks.
Inferring ecosystem networks as information flows

The detection of causal interactions is of great importance when inferring complex ecosystem functional and structural networks for basic and applied research. Convergent cross mapping (CCM) based on nonlinear state-space reconstruction made substantial progress about network inference by measuring how well historical values of one variable can reliably estimate states of other variables. Here we investigate the ability of a developed optimal information flow (OIF) ecosystem model to infer bidirectional causality and compare that to CCM. Results from synthetic datasets generated by a simple predator-prey model, data of a real-world sardine-anchovy-temperature system and of a multispecies fish ecosystem highlight that the proposed OIF performs better than CCM to predict population and community patterns. Specifically, OIF provides a larger gradient of inferred interactions, higher point-value accuracy and smaller fluctuations of interactions and -diversity including their characteristic time delays. Overall OIF outperforms all other models in assessing predictive causality (also in terms of computational complexity) due to the explicit consideration of synchronization, divergence and diversity of events that define model sensitivity, uncertainty and complexity. Thus, OIF offers a broad ecological information by extracting predictive causal networks of complex ecosystems from time-series data in the space-time continuum. The accurate inference of species interactions at any biological scale of organization is highly valuable because it allows to predict biodiversity changes, for instance as a function of climate and other anthropogenic stressors. This has practical implications for defining optimal ecosystem management and design, such as fish stock prioritization and delineation of marine protected areas based on derived collective multispecies assembly. OIF can be applied to any complex system and used for model evaluation and design where causality should be considered as non-linear predictability of diverse events of populations or communities.

Figure 1: Validating the OIF model and applying the model to real ecosystems.

Publications

Journals

1. Jie Li, Cillian Hourican, Stavroula Tassi, Jos A. Bosch, Rick Quax. Predicting driver nodes in synergistic hypergraphs using machine learning techniques. (To be submitted)
2. Jie Li, Jos A. Bosch, Rick Quax. Multilayer networks of cardiovascular diseases and depression via multipartite projections. (To be submitted)
1. Rydin, A., Milaneschi, Y., Quax, R., Li, J., et al. (2023). A network analysis of depressive symptoms and metabolomics. Psychological Medicine, 1-10. DOI: https://doi.org/10.1017/S0033291723001009
2. Galbraith, E., Li, J., Rio-Vilas, V.J.D. et al. In.To. COVID-19 socio-epidemiological co-causality. Sci Rep 12, 5831 (2022). DOI: https://doi.org/10.1038/s41598-022-09656-1
3. Li, J., Convertino M (2021) Temperature increase drives critical slowing down of fish ecosystems. PLOS ONE 16(10): e0246222. DOI: https://doi.org/10.1371/journal.pone.0246222
4. Li, J., Convertino, M. Inferring ecosystem networks as information flows. Sci Rep 11, 7094 (2021). DOI: https://doi.org/10.1038/s41598-021-86476-9
5. Li, J., Convertino M. Optimal Microbiome Networks: Macroecology and Criticality. Entropy. 2019; 21(5):506. DOI: https://doi.org/10.3390/e21050506

Patents

1. Zhengdi Qin, Jie Li, Tianjiong Zhang, Siping Chen. “Digital TCD Ultrasound Blood Flow Detection System” [P]. Chinese Patent:2015200954772, 2015.7.29. (Chinese Patent)

Conferences

1. Shaoxing Li, Tianjiong Zhang, Jie Li, et al. “Experimental Study on Digital Design of Doppler Ultrasound with Coded Excitation”, 4th AMITP, 24-26 Sep. 2016, Guilin, China.
2. Li, J., Diao X., Zhan K., Qin Z. (2015) "A Full Digital Design of TCD Ultrasound System Using Normal Pulse and Coded Excitation". In: Su FC., Wang SH., Yeh ML. (eds) 1st GCBME & 9th APCMBE. IFMBE Proceedings, vol 47. Springer, Cham. DOI: https://doi.org/10.1007/978-3-319-12262-5_38
3. Jie Li, Kai Zhan, Panpan Liu, Xiaonian He, Zhengdi Qin. “Coded Excitation System for Stationary Target Detection Using Multi Segment Coding”, IEIT2014, pp.395-399, 16-18 May., 2014, Tianjin, China.
4. Xiaonian He, Kai Zhan, Jie Li., Panpan Liu, Zhengdi Qin. "An application on adding window technology in truncated long code ultrasound system", ICSPCC2013, pp.1-3, 5-8 Aug. 2013.
5. Panpan Liu, Xianfen Diao, Jie Li, Kai Zhan, Xiaonian He, Zhengdi Qin. “Orthogonal frequency ultrasound vibration pulses for SDUV”, ICSPCC2013, pp.1-4, 5-8 Aug. 2013.

Presentations

1. Jie Li. “Multilayer Networks of Cardiovascular Diseases and Depression via Multipartite Projections”. Dutch NetSci Summer Symposium 2023 (Dutch NetSci2023), Delft, The Netherlands, August 30-31, 2023.
2. Jie Li, Arja Rydin, Rick Quax. “Multilayer disease networks via multipartite projections: linking risk factors to CVD-depression multi-morbidities via molecular mediators”. NetSci2023, Vienna, Austria, July 10-14, 2023.
3. Jie Li, Stavroula Tassi, Rick Quax. “A network-based approach to identifying synergistic triplets in high-dimensional data”. IPCS2022/CCS2022, Palma de Mallorca, Spain, October 17-21, 2022.
4. Jie Li, Matteo Convertino. “Taming Network Inference: Optimal Information Flow Model”. CCS2020, Online, December 4-11, 2020.
5. Jie Li, Matteo Convertino. “Computational and Applied Biocomplexity: Patterns, Connections and Design”. BASF, IBM and Syngenta, Research Triangle Park, Durham, NC, USA, December 2-7, 2019.
6. Jie Li, Matteo Convertino. “Optimal Microbiome Networks: Macroecological Characterization and Criticality”, CCS 2019, Singapore, September 30-October 04, 2019. (Accepted)
7. Jie Li, Matteo Convertino. “Model vs Data Centrality: Probing Transfer Entropy”. 2019 Summer International Symposium on Big-Data, Cybersecurity and IoT at Sapporo, Japan, August 8-9, 2019.
8. Jie Li, Matteo Convertino. “Taming Network Inference: Optimal Transfer Entropy Model”. 2018 Winter International Symposium on Big-Data, Cybersecurity and IoT at Sapporo, Japan, December 20-21, 2018.
9. Jie Li, Matteo Convertino. “Inference of Complex Microbiome Networks: Macroecology and Entropy Balance”. 2018 Summer International Symposium on Big-Data, Cybersecurity and IoT at Sapporo, Japan, August 7-8, 2018.

Welcome to Jie's Homepage

Research

Exploring the use of machine learning in complex systems (graphs and hypergraphs).

Do structural features predict dynamic node impact in synergistic hypergraphs?

Multilayer Networks of Cardiovascular Diseases and Depression via Multipartite Projections

Inferring ecosystem networks as information flows

Publications

Teaching (TA)

Supervision

CV

Google Scholar

Contact Me