OpenMinTeD has set up a mechanism which provides access to scholarly and scientific content from a wide range of sources (publishers, repositories, journals, etc.) and enables users to search and select among them the ones that interest them for mining; the selection is based on a faceted search or a google-like natural text query based on the harmonised metadata descriptions of the documents (e.g. publication year, keywords, domain, etc.) while the selected documents form together a collection or corpus. The OpenMinTeD registry provides content made available by two major content aggregators, OpenAIRE and CORE, and other open access content providers. StandardThe OpenMinTeD Corpus Builder is unique in that it exploits the largest available Open Access scholarly content brought together in one source and described in a harmonised way; thus users can easily select subsets with a single query, and get direct access to the full text of the selected publications, instead of having to go through the APIs of various content providers one by one, pose differently formulated queries to match the provider's system each time in order to collect the set of publications that fits their research topic. They can then go on to process this dataset with one of the TDM applications offered by the OpenMinTeD platform.
Resource availability and languages