Text Simplification in Spanish
Research and collaboration on automatic text simplification
Problem Context
Many academic, scientific, and institutional texts are written using complex vocabulary, long sentences, and dense syntactic structures. While this level of complexity is appropriate for expert audiences, it often creates significant barriers to comprehension for non-expert readers, students, and people with reading or visual disabilities. In Spanish, these challenges are amplified by the limited availability of high-quality simplified corpora, standardized guidelines, and evaluation resources compared to other high-resource languages.
Manual text simplification is a time-consuming process that requires specialized linguistic knowledge, making it difficult to scale in real-world settings. As a result, there is a growing demand for automatic text simplification (ATS) systems that can reduce textual complexity while preserving meaning, coherence, and grammatical correctness. Achieving this balance remains an open research problem, particularly for morphologically rich languages such as Spanish.
This project addresses these challenges through an interdisciplinary collaboration with the Escuela de Ciencias del Lenguaje at the Instituto Tecnológico de Costa Rica, combining linguistic expertise with computational methods in natural language processing and machine learning. With a strong focus on accessibility and social impact.
Research Directions
The research is structured around the following core directions, developed jointly with researchers from the Escuela de Ciencias del Lenguaje:
-
Linguistically Informed Text Complexity Analysis
Identification of lexical, syntactic, and semantic indicators of complexity in Spanish texts, grounded in linguistic theory and empirical analysis. -
Automatic Simplification Methods
Exploration of rule-based, statistical, machine learning and large language models approaches for lexical substitution, sentence splitting, and syntactic restructuring, informed by linguistic constraints. -
Accessibility-Focused Applications
Application of the proposed methods to educational materials, scientific communication, and assistive technologies, particularly supporting users with visual impairments or reading difficulties.
Through this collaboration between language sciences and artificial intelligence, the project aims to advance automatic text simplification in Spanish and contribute to the development of more inclusive and accessible language technologies.