MULTIMODAL MATCHING OF JOBS AND VET COURSES IN PORTUGAL: INTEGRATING SEMANTIC AND GEOSPATIAL ANALYSIS TO SUPPORT FUTURE SKILLS
1 NOVA Information Management School (NOVA IMS), Universidade Nova de Lisboa (PORTUGAL)
2 Nova School of Business and Economics (NOVA SBE), Universidade Nova de Lisboa (PORTUGAL)
About this paper:
Conference name: 20th International Technology, Education and Development Conference
Dates: 2-4 March, 2026
Location: Valencia, Spain
Abstract:
In recent years, researchers and policymakers have increasingly highlighted the persistent misalignment between higher-education supply and labor-market demand in Portugal. The OECD country note for Portugal (2022) reinforces this concern by showing that, despite relatively strong employment outcomes for graduates, employers still struggle to understand the skills that higher-education programmes develop, and national systems do not forecast emerging skill needs effectively. These findings strongly motivate our study. We conduct a multimodal analysis of programme–job alignment for tertiary-level vocational educational training (VET) in Portugal to reduce the gap between educational supply and labor-market demand. We integrate diverse data sources and analytical approaches—including semantic skill extraction, geospatial proximity and curricular clustering—to produce actionable insights for policymakers, curriculum designers and student-support systems.
We collect and integrate job listings from multiple online platforms and extract all VET course descriptions from Diário da República PDFs. After downloading and processing the documents, we use Gemini 2.5 Pro to summarize the main technical and non-technical skills taught in each course. We apply the same process to job postings, using Gemini 2.5 Pro to identify the most important skills required in each offer. This approach ensures a consistent skill representation across educational and employment datasets.
We compute semantic similarity scores for each course–job pair using Sentence-BERT embeddings and cosine similarity, quantifying the alignment between course skill clusters and job requirements. We also map all courses to their official CNAEF (National Classification of Education and Training Areas) codes, which allows us to identify the most demanded fields — i.e., those with the highest number of recommended jobs per area.
In the geospatial phase, we perform an exhaustive cross-matching between every VET course and every job posting. Using institutional geocodes (via geopy) and district centroids to approximate company locations, we rank each pair by proximity (≤ 30 km: High, 31–60 km: Medium, > 60 km: Low), generating over one million course–job matches. We further enrich this analysis by matching pairs located in similar districts, allowing for broader regional insights.
To capture emerging skills, we integrate data from ESCO, Skill2Vec, and the World Economic Forum’s Future of Jobs 2025 report. This enables us to flag VET courses that already include or align with skills projected to be in higher demand in the coming years.
We visualize these results through an interactive analytics dashboard that combines geospatial correspondence, district-level skill distributions, and CNAEF-based alignment graphs. These visual tools enable policymakers and educators to identify skill gaps, anticipate future needs, and align curricula with regional economic development priorities.
In future work, we plan to integrate a broader set of job offers, benchmark multiple embedding models and recommendation systems, and expand the platform’s features to include explainable recommendations and predictive labor-market insights.Keywords:
Multimodal Analysis, Semantic Skill Extraction, Skills Demand, Tertiary Education, Vocational Education and Training.