Aggregation tests identify new gene associations with breast cancer in populations with diverse ancestry
Institution: University College London
Corresponding Researcher: Karoline Kuchenbaecker
Data Link(s): Gene aggregation results for all genes and all analyses, as well as code used in the analysis for this manuscript, are made available in the following github repository: https://github.com/stef-mueller/BCAC_genotype_aggregation_analysis. Code for running mummy on genotypes available in public github repository here: https://github.com/stef-mueller/mummy_for_genotypes. An implementation of MONSTER, adapted for analyzing large-scale genotype data, is accessible on github: https://github.com/stef-mueller/MONSTER. Annotation sources used in this project are (1) ClinVar, https://www.ncbi.nlm.nih.gov/clinvar/; (2) MalaCards, https://www.malacards.org/; (3) Genetics Home Reference, https://medlineplus.gov/genetics/; (4) COSMIC Cancer Gene Census data, https://cancer.sanger.ac.uk/census. Summary statistics of GWAS data for breast cancer are available through the BCAC website: http://bcac.ccge.medschl.cam.ac.uk. The individual level datasets analyzed during the current study are not publicly available due to protection of participant privacy and confidentiality, and ownership of the contributing institutions, but may be made available in an anonymized form via the corresponding author on reasonable request and after approval of the involved institutions. To receive access to the data, a concept form must be submitted, which will then be reviewed by the BCAC Data Access Coordination Committee (DACC); see http://bcac.ccge.medschl.cam.ac.uk/bcacdata/.
Keyword(s): low-frequency variants, susceptibility, European ancestry, Asian ancestry, African ancestry, Latin American ancestry, Hispanic ancestry
Summary
BACKGROUND. Low-frequency variants play an important role in breast cancer (BC) susceptibility. Gene-based methods can increase power by combining multiple variants in the same gene and help identify target genes. METHODS. We evaluated the potential of gene-based aggregation in the Breast Cancer Association Consortium cohorts including 83,471 cases and 59,199 controls. Low-frequency variants were aggregated for individual genes' coding and regulatory regions. Association results in European ancestry samples were compared to single-marker association results in the same cohort. Gene-based associations were also combined in meta-analysis across individuals with European, Asian, African, and Latin American and Hispanic ancestry. RESULTS. In European ancestry samples, 14 genes were significantly associated (q < 0.05) with BC. Of those, two genes, FMNL3 (P = 6.11 × 10−6) and AC058822.1 (P = 1.47 × 10−4), represent new associations. High FMNL3 expression has previously been linked to poor prognosis in several other cancers. Meta-analysis of samples with diverse ancestry discovered further associations including established candidate genes ESR1 and CBLB. Furthermore, literature review and database query found further support for a biologically plausible link with cancer for genes CBLB, FMNL3, FGFR2, LSP1, MAP3K1, and SRGAP2C. CONCLUSIONS. Using extended gene-based aggregation tests including coding and regulatory variation, we report identification of plausible target genes for previously identified single-marker associations with BC as well as the discovery of novel genes implicated in BC development. Including multi ancestral cohorts in this study enabled the identification of otherwise missed disease associations as ESR1 (P = 1.31 × 10−5), demonstrating the importance of diversifying study cohorts.