DATA - Speech and Language Resources


    The TALP has a proven track record in the generation of speech and text resources. Since it was set up, it has participated in projects at home and abroad for the development of language tools and resources.

    In addition, the center has contributed to the setting of standards and is a member of the European Language Resources Association (ELRA).

    A number of linguistic resources, tools and open source software has been maintained since its inception. We have distributed Open Access licenses to expressive, speech synthesis, automatic speech recognition and multimodal databases, participating in the META-SHARE program. We also participate in the improvement of the FESTIVAL system for speech synthesis in several languages.

    Resources has been created in several topics:

    • Multilingual Lexicons
    • Machine Translation Resources 
    • Speech databases
    • Multimodal Resources
    • Emotion databases
    • Generation of corpora
    • Generation of basic processing tools


    LEXICAL-SEMANTIC KNOWLEDGE ACQUISITION: Extracting structured knowledge about word meanings from text documents

    Knowledge representation involves the modeling of systems that use artificial intelligence to process information. The form it takes basically depends on the task for which the knowledge in question is required. Therefore, the type and quantity of knowledge to perform a task are taken into consideration, as is the approach adopted to code and store it.

    We work on the acquisition of linguistic knowledge, particularly of lexical and semantic knowledge, for the enrichment of ontologies useful in NLP tasks and applications such as the disambiguation of meaning, machine translation and answers to questions.

    Large-scale lexical-semantic ontologies, rule systems and computational lexicons with different content and targets (verb diathesis models, total and partial grammars, selectional restrictions, etc.) are the most commonly used structures.

    A line of research is the building and expansion of these ontologies using automatic and semiautomatic media: the syntactic and semantic analysis of large quantities of texts makes it possible to learn and acquire new concepts and to create new relationships between them, which then go on to form part of the knowledge stored in the ontology.

Scroll to Top