OpenMinTeD Catalogue of Corpora

Find easily accessible corpora of scholarly content and mine them!

Provided by:
Scientific domain:
Dedicated for:
(0.0 /5) 0 reviews
Access the resource Open Access

A catalogue of corpora (datasets) made up of mainly Open Access scholarly publications. Users can view publicly available corpora that have been created with the OpenMinTeD Corpus Builder for Scholarly Works, or manually uploaded to the OpenMinTeD platform. The catalogue can be browsed and searched via the faceted navigation facility or a google-like free text search query. All users can view the descriptions of the corpora (with administrative and technical information, such as language, domain, keywords, licence, resource creator, etc.), as well as the contents and, when available, the metadata descriptions of the individual files that compose them. In addition, registered users can process them with the TDM applications offered by OpenMinTeD and download them in accordance with their licensing conditions. StandardFor users interested in finding corpora of various languages and domains easily accessible and ready to be processed with TDM applications; the use of a uniform metadata schema for their description facilitates comparison and contrast and thereby selection of the appropriate corpus.

Scientific categorisation
  • Generic
    • Generic
Target users
  • Researchers
Resource availability and languages
  • English
More about OpenMinTeD Catalogue of Corpora

The EOSC Portal is operated by the EOSC Enhance (Grant Agreement no. 871160), EOSC-hub (Grant Agreement no. 777536), and OpenAIRE-Advance (Grant Agreement no. 777541) projects funded by the European Union’s Horizon 2020 research and innovation programme.For a complete list of contributors, visit the About EOSC Portal