Index

Contact Point Metashare/33e0098e177c11e2bc05842b2b6a04d7365c11457d854e17a8331a6ec735305b#contact Person
Language English
Source META-SHARE
Subject everydayScenes
human human interaction
Title POETICON Multisensory and Multimedia Recordings of Everyday Interaction
Σώμα Πολυαισθητηριακών και Πολυμεσικών Καταγραφών Καθημερινής Διάδρασης POETICON
Type Corpus
Contact Point Metashare/3ccd9728c75911e1ae69000e0c4ad2262c2dadbb610b4c7a85dbaf4bb172e8b4#contact Person
Description This resource is intended to upgrade the CHIL2007+ corpus with two newly recorded and annotated audiovisual technical meetings. The new recordings are in Spanish and Catalan. A relevant objective is to include situations in which the semantic content can not be extracted from only a single modality.
Language Catalan
Spanish
Rights MS-NC-NoReD-ND
Source META-SHARE
Subject Video and audio technologies in a smart room
Title TM2: Technical Meetings
Type Corpus
Contact Point Metashare/354d05da63ce11e2861d842b2b6a04d78e6cbafa36c242728cefc76344f14c0a#contact Person
Description This is a bilingual (Greek - English) comparable corpus of News texts that pertain to the following domains: Technology, Politics, Entertainment (Culture, Sports), News (Terrorism, Economy), and Science (Physics, Health). The corpus amounts to approximately 3,5M words collected over various web sites. The texts have been classified according to selected elements of the IPTC subject reference system and, consequently, vertical clustering and horizontal mapping have been performed.
Language English
Greek (modern)
Rights underNegotiation
Source META-SHARE
Subject various
Title Bilingual Greek-English Comparable Corpus of News Texts
Type Corpus
Language French
Source Information Retrieval Facility (via CLARIN VLO)
Subject domain specific
Title MAREC
Language Spanish
Source Universitat de Barcelona (via CLARIN VLO)
Subject general
Title Spanish Wordnet 3.0
Description Das Mannheimer Korpus historischer Zeitungen und Zeitschriften besteht aus 21 deutschsprachigen Zeitungen und Zeitschriften des 18. und 19. Jahrhunderts und umfasst etwa 750 einzelne Ausgaben mit insgesamt 3532 Druckseiten. Dieses Korpus wurde zwischen 2009 und 2011 zusammengestellt und digitalisiert.
Language German
Source IDS Repository (via CLARIN VLO)
Subject Historischer Korpus
Title Mannheimer Korpus historischer Zeitungen und Zeitschriften
Contributor Radboud University Nijmegen; Museum het Rembrandthuis
Creator Radboud University Nijmegen
Description RemDoc is digital collection of primary documents that relate to the life and works of Rembrandt van Rijn (1606-1669), produced in the 15th to 18th centuries. RemDoc aims to collect and make available all known documents that relate to Rembrandt, as a person and as an artist, as well as to his ancestors and family. Until now, more than 1100 documents have been included in the database.
Language English
Rights Copyright of the individual documents resides with their original author
Source Huygens Metadata Repository (via CLARIN VLO)
Subject Primary sources; historical documents; Rembrandt van Rijn
Title RemDoc
Type http://purl.org/net/def/metashare#corpus
Contributor Netherlands Institute for Art History (RKD)
Creator Netherlands Institute for Art History (RKD)
Description RKD explore allows the user to browse all of the RKD's collections in one search. RKDartists is a database containing biographical data of Dutch and foreign artists from the Middle Ages to modern time. RKDimages is a database containing descriptions and images of mainly Dutch paintings, drawings, prints and original photos of before WWII.
Language English
Rights The copyright resides with RKD
Source Huygens Metadata Repository (via CLARIN VLO)
Subject Art history collections; artists; works of art
Title RKDexplore
Type http://purl.org/net/def/metashare#corpus
Contributor Radboud University Nijmegen
Creator Radboud University Nijmegen
Description RUQuest offers access to full text articles from e-journals to which the University Library is subscribed, as well as the collection of e-books. Simultaneously you can find the Radboud Repository and the printed books and journals from the University Library Catalogue.
Language English
Rights The copyright resides with Radboud University
Source Huygens Metadata Repository (via CLARIN VLO)
Subject Library catalog
Title RUQuest
Type http://purl.org/net/def/metashare#corpus
Contributor Esther van Gelder
Creator Esther van Gelder
Description Website Edition of letters from/to Carolus Clusius (1526-1609)
Language English
Italian
German
Dutch
Latin
French
Spanish
Source Huygens Metadata Repository (via CLARIN VLO)
Subject Letters from/to Carolus Clusius (1526-1609)
Title Clusius correspondence
Contributor Academia Sinica Computing Centre
Creator Academia Sinica Computing Centre
Description Institute of Information Science,Institute fo Linguistics,ASCC. All Rights Reserved.
Rights This notice regulates your usage of this web site and its associated services including interface, corpus data, segmenting and tagging standard, etc. All rights are reserved by Academia Sinica. In your research you may apply the data resulting from the searching processes of our interface systems. However, you are prohibited to abstract, alter or publish any searching results voluntarily. The copyright of corpus data is still reserved by original author or source and cannot be reproduced, copied or violate anything involving intellectual property.
Source Academia Sinica Balanced Corpus of Modern Chinese (via CLARIN VLO)
San-sui-ping-yao-zhuan
Subject en-us
Title Academia Sinica Tagged Corpus of Early Mandarin Chinese
Type http://purl.org/net/def/metashare#corpus
Contributor leeve kazaalae
Description The Formosan languages belong to a widespread language family called \"Austronesian\", which include all the languages spoken throughout the islands of the Pacific and Indian Ocean (Madagascar, Indonesian, the Philippines, Taiwan, New Guinea, New Zealand, Hawaii and the islands of Micronesia, Melanesia and Polynesia). A few languages are found in the Malay peninsula and in the Indo-Chinese peninsula (Vietnam and Cambodia). The Formosan languages exhibit very rich linguistic diversity and the variations that oppose different dialects/languages are enormous. These languages are extremely useful in comparative work but though they have been known to be on the verge of extinction for years, Formosan languages, Formosan linguistics as a specific field has bloomed only very recently, with the participation of more scholars adopting different contemporary linguistic approaches to investigate individual languages or establishing cross-linguistic comparisons. Unlike Chinese, the Formosan languages do not have any writing system and the lack of written records dampen our knowledge of extinct languages. Today, while elders are still able to speak their mother tongues fluently, the young cannot, as a result of migration in the cities and the prevalence of Mandarin Chinese in every day life. We are currently making attempts to record and maintain these languages but we believe that collecting and/or editing existing texts (sentences, textbooks, folktales, narratives) in a digital format constitute the most precious legacy for future generation. In order to achieve our goal, we hope that more people will devote to the study of the Formosan languages and integrate our project. The Formosan Language Archive contains: (i) texts, (ii) a geographical information system and (iii) four databases on related publication. 1. Texts: During the first year project, we have drawn the emphasis on Rukai, a Formosan language which stretches across the south of Taiwan and includes six different dialects (Mantauran, Maga, Tona, Budai, Labuan and Tanan). We provide a search system that enables users to choose one of these dialects and download recorded texts. Each text is divided into sentences and every sentence is translated in both Chinese and English. Glosses allow users to understand the meaning of each word. Users can also listen to the pronunciation of each sentence through the recorded sound file. Every word is analyzed and each morpheme separated by a hyphen. Search permits to understand the use of the various affixes that occur in the language and the lexical category of each word. Currently information on the Mantauran dialect can be searched online. 2. Geographical Information System:The geographical information system permits a search of basic lexical items in the Formosan languages and an identification of cognates and non-cognates and their mapping onto the map of Taiwan. 3. Related Publications:In the past few months, we have constructed four databases that permit publication queries pertaining to: linguistics, language teaching, literature and music.
Rights Copyright 2001 Institute of Linguistics (Preparatory Office), Academia Sinica. All rights reserved.
Source Academia Sinica Balanced Corpus of Modern Chinese (via CLARIN VLO)
Language: Rukai, Dialect:Mantauran,Informant:Yu-zhi Lu, Fieldworker: ElizabethZeitoun and Hui-chuan Lin, Data collected: 1992, 1997-1999 ,Chinese and EnglishTranslations: 1992-2001, Proof-reading and editing: 1999-2001 .The presentvolume aims at narrating the memories of our late Mantauran (Rukai) informant,Lu Yu-zhi, who passed away on May 6, 2000, as they were recorded between August1992 and November 1998, then later edited and revised between January 1999 andMay 2001. The volume is divided into two major parts: the first part consistsof 178 paragraphs translated into Chinese and English with ethnographicillustrations (maps, photos and additional data). The second part providesmorphemic analyses, glosses and linguistic annotations. An index provide a listof major lexical items (derivations are not included, as they will appear inZeitoun c). This work represents the result of years ofcollaboration. Elizabeth Zeitoun began fieldwork on Mantauran (Rukai) in August1992 and later trained Hui-chuan Lin in ethno-linguistics (Sept 1997~), whoeventually published a series of textbooks on Mantauran (Lin 1999). Theinvestigation out of which the present volume grew began as an exploration inthe life of our late informant and the discovery - for both authors - of afascinating world but was not, in the early stages, directed toward the writingof her memories. Two stories - the first on marriage, the second on childbirth- were collected along with other folktales in August 1992 during the veryfirst period of fieldwork on Mantauran. The others were recorded betweenNovember 1997 and November 1998 as short paragraphs to illustrate lexical itemsof the Thematic dictionary (see Lin and Zeitoun 1997) that we were, at thetime, compiling. When it became apparent that these narratives were too longand did not fit into a dictionary, we decided to put them together in aseparate volume where we conserved, however, the major themes that formed thebasis of the Thematic dictionary. We re-organized and edited the data in such away that it could read as a novel. The manuscript was revised and correctedover the years (January 1999 ~ May 2001) but the original (i.e., Mantauran)version was finished, entirely read to Lu Yu-zhi and approved by her during ourlast fieldwork sessions in January 1999.
Subject Dialect Mantauran
Title Academia Sinica Formosan Language Digital Library
Type Sound
Contributor Cornell Language Acquisition Laboratory (CLAL)
Creator Parkinson, David
Description The CLAL-Parkinson corpus of Inuktitut child language data includes language from at least 30 children between 3.10.04 and 6.10.22 (years, months, days) (mean age: 5.06.27) acquiring Inuktitut as their first language. Many were recruited at the Angmak elementary school in Arviat, Nunavut, Canada or directly through parents in Arviat.
Rights Any technical documentation that is made available by Cornell University, Cornell Language Acquisition Lab is the copyrighted work of Cornell University, Cornell Language Acquisition Lab, or the Virtual Linguistics Laboratory
Source Cornell Language Acquisition Laboratory CLAL (via CLARIN VLO)
Subject Inuktitut
Title First Language Acquisition of Inuktitut
Creator Universitat Pompeu Fabra (UPF)
Description Given a lemma and a category, this WS returns the sentences of the IULA corpus where this lemma occurs. The user can perform a domain search. The languages supported are Spanish and English.
Source IULA UPF Centre de Competencia CLARIN (via CLARIN VLO)
Subject NLP service, Querying, Corpus Processing,
Title Service - IULA concordancer Web Service
Type http://purl.org/net/def/metashare#toolService
Services
Creator M. Dolores Molina González
Description iSOL is a list of domain-dependent opinion signal words in Spanish. The domain is the set of words of movie reviews. The elaboration of the list was performed using a corpus-based approach. In this case it selected the Spanish Movie Reviews corpus.
Rights This resource is licensed under a Creative Commons Attribution 3.0 Unported License (http://creativecommons.org/licenses/by/3.0/). The availability status of the resource is: 'available-unrestrictedUse'.
Source IULA UPF Centre de Competencia CLARIN (via CLARIN VLO)
Subject 'language resources', 'lexical conceptual resource', 'monolingual lexicon'
Title eSOL
Creator Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)
Description <p>This workflow annotates with FreeLing PoS tagger the input crawled data and creates a weka file using given regular expressions and gold standard. This weka file is then classified using Naive Bayes classifier and the given model.</p>. An image preview of the workflow can be found at: http://myexperiment.elda.org/workflows/68/versions/1/previews/full
Rights by-sa
Source IULA UPF Centre de Competencia CLARIN (via CLARIN VLO)
Subject NLP workflow, Taverna 2, noun classification, naive bayes classification, basicxces, crawled, panacea, tagging,
Title Freeling tagging, weka creation and classification from crawled data
Creator Universitat Pompeu Fabra. Institut Universitari de Lingüística Aplicada (IULA)
Description <p>This workflow can be used to check if web services are compliant with the Panacea Common Interface.</p> <p>The input data is a list of &quot;function separator WSDL&quot;.&nbsp; Function can be: SentenceSplitter, Tokenizer, Lemmatizer, POSTagger, ShallowParser, Parser, TextAligner, CQPIndexer, CQPQuerier, Concordancer, NgramsCooccurrences, Crawler, NERecognition,TermExtraction, DictionaryLookup</p> <p>Example:</p> <p>POSTagger separator http://ws04.iula.upf.edu/soaplab2-axis/typed/services/morphosintactic_tagging.freeling_tagging?wsdl</p>. An image preview of the workflow can be found at: http://myexperiment.elda.org/workflows/25/versions/1/previews/full
Rights by-sa
Source IULA UPF Centre de Competencia CLARIN (via CLARIN VLO)
Subject NLP workflow, Taverna 2, panacea, ci, common interface, validation,
Title Panacea Common Interface validation for Soaplab web services
Creator Universitat Pompeu Fabra (UPF)
Description This WS creates an alignment file combining the Hunalign output and two sentences id lists extracted from GrAF documents.
Source IULA UPF Centre de Competencia CLARIN (via CLARIN VLO)
Subject NLP service, Alignment, Format Conversion,
Title Service - Hungalign to GrAF converter Web Service
Type http://purl.org/net/def/metashare#toolService
Services
Creator Universitat Pompeu Fabra (UPF)
Description PANACEA converter used to create GrAF documents from the output of PoS taggers (Freeling and IULA tagger). Input: Freeling: http://ws02.iula.upf.edu/panacea/examples/ws/freeling_tagging/freeling_tagging.output.example.txt IULA tagger: http://ws02.iula.upf.edu/panacea/examples/ws/iula_tagger/outputTagger.txt Output: GrAF set of files.This WS is a Panacea project converter that creates GrAF documents from the output of PoS taggers (Freeling and IULA tagger).
Source IULA UPF Centre de Competencia CLARIN (via CLARIN VLO)
Subject NLP service, Format Conversion,
Title Service - Post tagging to GrAF converter Web Service
Type http://purl.org/net/def/metashare#toolService
Services
Creator Universitat Pompeu Fabra (UPF)
Description Freeling-based chunker parser. Languages: English, Catalan, Spanish, Asturian and Galician. Input: Plain text Output: Freeling output format, XML, XML CQP ready. Input example: http://ws02.iula.upf.edu/panacea/examples/ws/freeling_parsed/freeling_parsed.input.example.txt Output example: http://ws02.iula.upf.edu/panacea/examples/ws/freeling_parsed/freeling_parsed.output.example.txt Output XML example: http://ws02.iula.upf.edu/panacea/examples/ws/freeling_parsed/freeling_parsed.output.xml Output XML CQP example: http://ws02.iula.upf.edu/panacea/examples/ws/freeling_parsed/freeling_parsed.output.cqp.xmlThis WS performs a FreeLing-based chunker parser (v 3.0). The WS requires a plain text input. The possible outputs formats are FreeLing , XML, and XML CQP ready. The languages supported are English, Catalan, Spanish, Asturian and Galician.
Source IULA UPF Centre de Competencia CLARIN (via CLARIN VLO)
Subject NLP service, Corpus Processing, Syntactic Tagging,
Title Service - FreeLing Chunker parser Web Service
Type http://purl.org/net/def/metashare#toolService
Services