[PREPRINT] Large-scale computational analyses of gut microbial CAZyme repertoires enabled by Cayman

Ducarmon QR, Karcher N, Tytgat HLP, Delannoy-Bruno O, Pekel S, Springer F, Schudoma C, Zeller G, biorxiv (2024).


Carbohydrate-active enzymes (CAZymes) are crucial for digesting glycans, but bioinformatics tools for CAZyme profiling and interpretation of substrate preferences in microbial community data are lacking. To address this, we developed a CAZyme profiler (Cayman) and a hierarchical substrate annotation scheme. Leveraging these, we genomically survey CAZymes in human gut microbes (n=107,683 genomes), which suggests novel mucin-foraging species. In a subsequent meta-analysis of CAZyme repertoires in Western versus non-Western gut metagenomes (n=4,281) we find that non-Western metagenomes are richer in fibre-degrading CAZymes despite lower overall CAZyme richness. We additionally pinpoint the taxonomic drivers underlying these CAZyme community shifts. A second meta-analysis comparing colorectal cancer patients (CRC) to controls (n=1,998) shows that CRC metagenomes are deprived of dietary fibre-targeting, but enriched in glycosaminoglycan-targeting CAZymes. A genomic analysis of co-localizing CAZyme domains predicts novel substrates for CRC-enriched CAZymes. Cayman is broadly applicable across microbial communities and freely available from https://github.com/zellerlab/cayman.