Compositional analyses of the Human Cell Atlas with sccomp

Compositional analyses of the Human Cell Atlas with sccomp


Author(s): Stefano Mangiola

Affiliation(s): Adelaide University



Large-scale single-cell resources such as the Human Cell Atlas present an unprecedented opportunity to explore fundamental questions in biology and immunology across organs and demographics. For example, analysing the changes in immune composition across organs, in ageing, and between sexes and ethnicities can reveal important aspects of the human population's diversity that should be considered in precision medicine research. These large-scale resources present significant complexity, which statistical analysis tools should tackle. The first layer of complexity is the tree structure of our Human Cell Atlas-derived resource, organised on organs and including multiple datasets. A second layer of complexity is the data coverage bias. For example, there is a stark disparity in tissue representation, as evidenced by the two orders of magnitude difference between the most (blood) and least (e.g. rectum) sampled tissues. A third layer of complexity arises from the variable age distribution among tissue samples, making analyses sensitive to the Simpson paradox. To navigate these challenges, `sccomp` (Mangiola et al., PNAS 2023) offers a robust solution for fitting arbitrary multilevel (random effect) linear models of categorical and continuous factors to single-cell composition data. Our demonstration will illustrate how `sccomp` efficiently manages these complexities to discern the body and organ-level immune patterns. We will present examples demonstrating the tool's capacity to identify outliers and remove unwanted variation in complex multilevel models.