Index

Creator Datahub/agrovoc-skos#N7f07166ce767467ca7ea3d57015a1641
Description AGROVOC is a controlled vocabulary covering all areas of interest to the Food and Agriculture Organization of the UN (FAO). AGROVOC is available as and RDF - SKOS linked dataset. It consists of over 32,000+ concepts, available in up to 21 languages, and linked to 16 other vocabularies and resources. For detailed information see: http://aims.fao.org/agrovoc
Rights http://creativecommons.org/licenses/by-nc/2.0/
Source DataHub
Title AGROVOC
Description Export of the ALPINO Dutch Treebank to RDF.
Rights http://www.opendefinition.org/licenses/cc-by-sa
Source DataHub
Title ALPINO RDF Treebank
Creator Datahub/analisi-del-blog-http-www-beppegrillo-it#N49543fe4f75b469b9ca5d43a5dd90852
Description Analisi del blog http://www.beppegrillo.it/. I dati vanno da gennaio 2005 a febbraio 2013. Il pacchetto è diviso in quattro dataset: - Dati sui singoli post - Dati sulle singole tag - Trigrammi estratti dal testo dei post - Trigrammi estratti dai commenti L'analisi è stata effettuata utilizzando processing.
Rights http://www.opendefinition.org/licenses/cc-by
Source DataHub
Title Analisi del blog http://www.beppegrillo.it/
Creator Datahub/asit#Naed6ee713bbe4e94ae1cdb0c2937fb79
Description The Atlante Sintattico d'Italia, Syntactic Atlas of Italy (ASIt) enterprise builds on a long standing tradition of collecting and analysing linguistic corpora, which has originated different efforts and projects over the years. ASIt accounts for minimally different variants within a sample of closely related languages, thus it does not need a thorough part of speech (POS) disambiguation, since the \"trivial\" identification of basic POS (e.g. Nouns vs Verbs) is not enough to capture cross-linguistic differences between closely related languages. Secondly, the linguistic variants cannot be reduced to lexical distinctions only, i.e. syntactic differences are in general unpredictable on the basis of the properties of single lexical items. A specific tag set designed to capture sentence-level phenomena without taking into consideration POS tags is needed. As a consequence, while other tag sets are designed to carry out a gross linguistic analysis of a vast corpus, the ASIt tag set aims to capture fine-grained grammatical differences by comparing various dialectal translations of the same sentence. Moreover, in order to pin down these subtle asymmetries, the linguistic analysis must be carried out manually. To explain why the needs for ASIt are so special we have to take into consideration two different aspects: the nature of Italian dialects, and the kind of linguistic theory ASIt aims to interact with. The Italian dialectal area presents a kind of variation that involves parametric choices affecting many general aspects of syntax, morphology, and phonology. The kind of information we want to gather involves not only the presence of a certain element, but also the absence of an element; an element can be omitted only in some constructions and in conjunction with specific characteristics of the language. For this reason, ASIt proposed the creation of a specific set of tags starting from a universal core shared by all languages (on the basis of the work done by DynaSAND), and subsequently developing a language-specific periphery which is compatible with other projects. Dialectal data stored in the ASIt were gathered during a twenty-year-long survey investigating the distribution of several grammatical phenomena across the dialects of Italy. These data and information were collected by means of questionnaires formed by sets of Italian sentences: dialectal speakers were asked to translate them into their dialects and write their translations in the questionnaire; therefore, each questionnaire is associated with many parallel dialectal translations. At present, there are eight different questionnaires written in Italian and almost 500 questionnaires, corresponding to the eight Italian questionnaires, written in more than 240 different dialects, for a total of more than 54,000 sentences and more than 40,000 tags stored in the data resource managed by the ASIt digital library system.
Rights http://www.opendefinition.org/licenses/cc-by-sa
Source DataHub
Title Atlante Sintattico d'Italia (ASIt)
Description ASJP collects 40 words from 5500 languages in a simplified phonetic representation. More background can be found at http://email.eva.mpg.de/~wichmann/ASJPHomePage.htm
Rights http://www.opendefinition.org/licenses/cc-by
Source DataHub
Title Automated Similarity Judgment Program lexical data
Description This is the Basque EuroWordNet-Lemon lexicon. The lexicon was created from the Spanish Word-Net-LMF lexicon which is part of the Multilingual Central Repository (MCR http://adimen.si.ehu.es/web/MCR). The lexicon conforms to the 'lemon' specification.
Rights http://www.opendefinition.org/licenses/cc-by
Source DataHub
Title Basque EuroWordNet-lemon lexicon (3.0)
Creator Datahub/brown-corpus-in-rdf-nif#N9975bc86baa043718518170e40b547b3
Description RDF version of the Brown Corpus (W. N. Francis, H. Kucera; Brown University; 1979). 1,014,312 words in 500 documents, taken from newspapers texts on diverse topics, non-fiction and fiction books as well as government documents. Original corpus contains manually annotated sentence and token boundaries as well as word class annotations(such as POS, inflectional morphemes, such as noun plural, verb tense and adjective comparison and special tags for foreign words and proper nouns). Converted corpus contains complete texts reconstructed from TEI/XML version of the Brown corpus. Word classes where linked via OLiA to ontological categories for aggregated querying.
Rights http://www.opendefinition.org/licenses/cc-by-sa
Source DataHub
Title Brown Corpus in RDF/NIF
Description This is the Catalan EuroWordNet-Lemon lexicon. The lexicon was created from the Catalan Word-Net-LMF lexicon which is part of the Multilingual Central Repository (MCR http://adimen.si.ehu.es/web/MCR). The lexicon conforms to the 'lemon' specifications and it is linked to the Catalan Parole/Simple lemon lexicon (http://lod.iula.upf.edu/resources/metadata_SimpleParole-Catalanv2).
Rights http://www.opendefinition.org/licenses/cc-by
Source DataHub
Title Catalan EuroWordNet-lemon lexicon (3.0)
Creator Datahub/chat-game-corpus#N6617551eedae4a93b77ce1cd78d9778c
Description A corpus resulting from an object arrangement game using a computer-mediated setting.
Rights http://www.opendefinition.org/licenses/odc-by
Source DataHub
Title Chat Game corpus
Creator Datahub/clean-energy-data-reegle#N50381869686f4f0b9f289ce5ef4e9508
Description Comprehensive set of linked clean energy data including: * policy and regulatory country profiles, * key stakeholders (organisation profiles), * project outcome documents and a * thesaurus (SKOS format) on renewables, energy efficiency and climate change for public re-use.
Rights http://reference.data.gov.uk/id/open-government-licence
Source DataHub
Title Linked Clean Energy Data (reegle.info)
Description AfBo: A world-wide survey of affix borrowing published by the CLLD project
Rights http://www.opendefinition.org/licenses/cc-by-sa
Source DataHub
Title CLLD-afbo
Description APiCS Online published by the CLLD project
Rights http://www.opendefinition.org/licenses/cc-by-sa
Source DataHub
Title CLLD-APICS
Description eWAVE published by the CLLD project
Rights http://www.opendefinition.org/licenses/cc-by-sa
Source DataHub
Title CLLD-EWAVE
Description PHOIBLE Online published by the CLLD project
Rights http://www.opendefinition.org/licenses/cc-by
Source DataHub
Title CLLD-PHOIBLE
Description SAILS Online published by the CLLD project
Rights http://creativecommons.org/licenses/by-nc/2.0/
Source DataHub
Title CLLD-SAILS
Description WALS Online published by the CLLD project
Rights http://www.opendefinition.org/licenses/cc-by
Source DataHub
Title CLLD-WALS
Description WOLD published by the CLLD project
Rights http://www.opendefinition.org/licenses/cc-by-sa
Source DataHub
Title CLLD-WOLD
Description WordNet-like concept network developed at MIT ConceptNet aims to give computers access to common-sense knowledge, the kind of information that ordinary people know but usually leave unstated. The data in ConceptNet is being collected from ordinary people who contributed it on sites like Open Mind Common Sense. ConceptNet represents this data in the form of a semantic network, and makes it available to be used in natural language processing and intelligent user interfaces. ConceptNet is an open source project, with a Python implementation and a REST API that anyone can use to add computational common sense to their own project. A great tool to help you use ConceptNet in your software is Divisi. Licences: Creative Commons 3.0 Attribution (data set) GPL 3.0 (data + tools)
Rights http://www.opendefinition.org/licenses/cc-by
Source DataHub
Title ConceptNet
Creator Datahub/core#Nbc9465d70d9c4472a54babc5395b2c41
Description The CORE dataset contains information about similarities between scientific papers stored across Open Access repositories. The similarities are calculated using Natural Language Processing techniques based on the full-text. The similarities are provided only for research articles with an accessible and machine readable full-text. More information about the data structure can be found at:http://core-project-local.kmi.open.ac.uk/data-description. #### RDF Statistics At the moment we expose more than 92 million RDF triples describing similarities calculated on a set of more than 400k full-text articles harvested from over 230 Open Access repositories. #### Links The data about the similarities are represented using the MuSIM ontology (http://kakapo.dcs.qmul.ac.uk/ontology/musim/0.2/musim.html) BIBO ontologies (http://bibliontology.com/) with links to the OAI (RKBExplorer) repository available in the Linked Data cloud.
Rights http://www.opendefinition.org/licenses/cc-by
Source DataHub
Title CORE - Semantic Similarity of Open Access publications
Creator Datahub/dbnary#Nf82de97a0daa4f518eac989b52108162
Description Extracts of wiktionary data for several languages, structured as an RDF graph, based mainly on the LEMON model. English, Finnish, French, German, Greek, Italian, Japanese, Portuguese, Russian and Turkish.
Rights http://www.opendefinition.org/licenses/cc-by-sa
Source DataHub
Title dbnary