游懿君,梁丹丹,陈天璐.4种相关分析方法在菌群和代谢物相关研究中的初步比较[J].转化医学杂志,2018,7(2):93-96
4种相关分析方法在菌群和代谢物相关研究中的初步比较
The preliminary comparison of four correlation analysis methods for association between microbiota and metabolites
  
DOI:
中文关键词:  相关分析  代谢组学  元基因组学  转化医学
英文关键词:Correlation analysis  Metabolomics  Microbiome  Translational medicine
基金项目:国家自然科学基金项目(31501079,31500954,81772530);上海交通大学附属第六人民医院院内预研(2017)
作者单位
游懿君 上海交通大学附属第六人民医院转化医学中心 
梁丹丹 上海交通大学附属第六人民医院转化医学中心 
陈天璐 上海交通大学附属第六人民医院转化医学中心 
摘要点击次数: 74
全文下载次数: 196
中文摘要:
      目的组学数据信息多样且体量庞大,变量间关系错综复杂。相关分析有助于在海量数据间找到有效关联对,是转化医学和系统生物学研究中常用手段之一。元基因组学和代谢组学2大组学平台由于具备整体系统性分析的功能,广泛应用到了菌群和代谢物的相关研究中。元基因组学和代谢组学数据的来源、结构和特点各不相同,需科学选取相关分析方法进行高质量跨组学研究。方法选取4种典型的相关分析方法(2种经典方法和2种元基因组数据专用方法),设计仿真数据集和实验数据集,对各方法的性能进行测试和比较。结果仿真和真实数据结果显示,CCLasso的相关系数最小,误差百分比最大,所找到的相关对数目最少;SparCC的结果与CCLasso相反;Pearson与Spearman结果介于两者之间,较为中立。结论对于元基因组学与代谢组学数据的相关分析,CCLasso方法较为严格,易得到假阴性结果;SparCC方法较为宽松,易得到假阳性结果;Pearson和Spearman结果介于两者之间。建议研究者结合研究目标和侧重点确定具体方法。
英文摘要:
      ObjectiveHighthroughout omics data with massive data size contains diverse information, and the relationships among variables are complex. Correlation analysis is one of the effective tools for translational medicine and systems biology study and is helpful for digging out valid correlation pairs from big data. Microbiome and metabolomics platform which equipped with integral systematic function are widely used in the association analysis between microbiota and metabolites. Considering the data sources, structures and characteristics are all different between microbiome data and metabolomics data, scientific correlation method selection is needed for high quality crossomics researches. MethodsIn this paper, four typical correlation analysis methods were selected (two classic methods and two specific analysis methods designed for compositional data) and the performance of all methods were tested and compared using simulated and real datasets. ResultsResults of simulated and real datasets suggested that correlation coefficient computed by CCLasso was minimum, its percentage error was maximum, and the number of correlated pairs found by CCLasso was least. On the contrary, results of SparCC were opposite to those of CCLasso. Pearson and Spearman performed between CCLasso and SparCC. ConclusionFor the correlation analysis between metabolomic and microbiome data, CCLasso is more stringent than the others and prone to provide falsenegative results easily. SparCC is looser and prone to achieve falsepositive results. The error risks of Pearson and Spearman are between CCLasso and SparCC. Both aim and emphasis should be considered for researchers with a suitable method selection.
查看全文  查看/发表评论  下载PDF阅读器
关闭