Readme eMIND evaluation dataset 3/31/2022 Format: BioC This set is composed of 60 abstracts that were selected from three different sources to provide a variety of examples about the impact of protein mutations on AD/ADRDs. 1-UniProt: We downloaded the information from protein entries from the AD disease portal, UniProtKB Release 2021_04), which contains proteins that are candidates for AD (Breuza et al., 2020). From these, we selected 15 publications that constitute the evidence from existing UniProt annotations of natural variants involved in AD/ADRDs for 12 different proteins. Specifically, those containing a note describing impact after the mention of disease name. For example, from P10636, VAR_019660 corresponding to p.Arg5His, has annotation: (in FTD; reduces the ability of tau to promote microtubule assembly and promotes fibril formation in vitro; dbSNP:rs63750959) evidence="ECO:0000269|PubMed:11921059." We reviewed examples and selected those where the impact information was present in the abstract. 2-AD computational bibliography set: UniProt computationally mapped bibliography contains different publications collected from a variety of sources. One source is from our literature mining preliminary method based on EDG and a small number of handcrafted rules about variants with impact on AD. We randomly selected 35 abstracts about variants with impact on the protein-related aspects of interest. 3-LitSuggest (Allot et al., 2021): We collected an additional set of abstracts directly from the literature using LitSuggest. The set of articles described in 2 was used as the positive set to train the LitSuggest classifier with impact in 'Alzheimer', with options include_mutation: true, include_gene: true. The negative set was automatically selected by LitSuggest. After training the classifier, the new set of articles suggested were reviewed to indicate relevancy to impact, and based on this new annotation, the system was retrained on new relevant articles. From that set, the first 12 relevant publications were included in the set. PubTator was used to pre-annotate entities of interest (genes, variants, disease, species). Validation of entities of interest and annotation of impact relations were made using TeamTat text annotation tool The following impact types were annotated impact_protein_abundance impact_protein_ptm impact_protein_function_activity impact_protein_process impact_protein_localization impact_Protein_processing impact_protein_interaction impact_protein_aggregation impact_protein_structure impact_other Relation annotated: variant has_impact impact type References: Allot, A., Lee, K., Chen, Q., Luo, L., and Lu, Z. (2021). LitSuggest: a web-based system for literature recommendation and curation using machine learning. Nucleic Acids Res 49, W352–W358. doi:10.1093/nar/gkab326. Breuza, L., Arighi, C. N., Argoud-Puy, G., Casals-Casas, C., Estreicher, A., Famiglietti, M. L., et al. (2020). A Coordinated Approach by Public Domain Bioinformatics Resources to Aid the Fight Against Alzheimer’s Disease Through Expert Curation of Key Protein Targets. J Alzheimers Dis 77, 257–273. doi:10.3233/JAD-200206. Islamaj, R., Kwon, D., Kim, S., and Lu, Z. (2020). TeamTat: a collaborative text annotation tool. Nucleic Acids Res 48, W5–W11. doi:10.1093/nar/gkaa333.