📊 Full opportunity report: AMÁLIA · The Three Hard Questions. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
Portugal’s AMÁLIA, a €5.5 million European Portuguese LLM, is now operational but faces three critical questions about its openness, native data, and objectives. These issues highlight broader challenges in Europe’s sovereign AI efforts.
Portugal’s €5.5 million investment in the AMÁLIA large language model has resulted in a functional, publicly accessible model that outperforms previous open models on European Portuguese benchmarks. However, critical questions about the model’s openness, native-language data, and strategic objectives remain unanswered, posing broader challenges for Europe’s sovereign AI initiatives.
AMÁLIA was developed through a consortium involving approximately 60 researchers across Portugal’s top institutions, including NOVA, IST, and IT. The project was announced in December 2024, with the base version completed by September 2025 and publicly launched in October 2025. The model is currently available to 450,000 academic users via the FCT’s IAedu platform, with knowledge limited to the end of 2023.
Technically, AMÁLIA is a continuation of the EuroLLM multilingual foundation, not trained from scratch, with a focus on Portuguese. It outperforms previous open models on Portuguese benchmarks and beats Qwen 3-8B on most tests, although it still lags behind on certain benchmarks like ALBA. The project’s final version is scheduled for release in June 2026, with ongoing development and evaluation.
Despite these advancements, Duarte O.Carmo’s analysis highlights three pressing questions: How open is ‘fully open’ in practice? How much native-language data is sufficient? What should the model optimize for? These questions are central to evaluating the strategic and technical success of the project and broader European efforts.
AMÁLIA
The three hard
questions.
Portugal spent €5.5M to build a European Portuguese LLM. The base version is operational, the benchmarks beat Qwen 3-8B on most pt-PT tasks. So why are the most important questions still unanswered?
Last month, Duarte O.Carmo published the sharpest public analysis of AMÁLIA — Portugal’s state-funded European Portuguese large language model. He prefaces his critique with the necessary diplomatic apparatus before doing what almost nobody else in the European-sovereign-LLM discourse has been willing to do publicly: asking hard questions about whether the work, as released, actually does what it set out to do. This piece is a structural extension of his analysis. The AMÁLIA case study exposes three hard questions every national LLM effort needs to answer publicly — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
Three questions every national LLM effort needs to answer publicly.
Duarte O.Carmo’s framing maps cleanly onto the structural argument. Each question lands specifically in AMÁLIA — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
The three questions form a structural feedback loop. Q3 (optimization target) determines Q2 (data volume needed) which conditions Q1 (openness sufficient for community contribution). The European sovereign-LLM movement collectively benefits from these questions becoming standard methodology disclosure, not exceptional critique.
European Portuguese language large language model
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
107 billion tokens. 5.8 billion clearly pt-PT.
The structurally tractable question with a structurally surprising answer. For a model whose entire stated purpose is European Portuguese prioritization, the native-language share of extended pre-training is 5.5%. The implications cascade into every other question.
open source AI language model
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
The Olmo standard. AMÁLIA’s current state.
Allen Institute for AI’s Olmo project defines what “fully open” operationally requires. Olmo doesn’t lead frontier benchmarks. That’s not the point. The point is to be the structural reference for openness. AMÁLIA’s “fully open source” claim should track to the operational standard.
AI model training data annotation tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Four strategic positions. AMÁLIA between two and three.
Approximately €100M+ in publicly disclosed European sovereign-LLM funding across the major initiatives. The structural question every project faces: what is the actual competitive position you’re staking? Four options — none mutually exclusive — but each requiring different commitments.
AI model evaluation benchmarks
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three standards. For AMÁLIA and the movement.
The structural critique generalizes beyond AMÁLIA. Italy, France, Germany, Switzerland, the OpenEuroLLM consortium, and every subsequent national project benefit from public discourse holding national LLM efforts to operational standards on openness, data accounting, and strategic positioning.
The European sovereign-AI agenda is a serious strategic project that deserves serious public discourse. O.Carmo’s analysis is what serious public discourse looks like. Appropriately diplomatic. Structurally rigorous. Willing to ask the hard questions in public when the public investment justifies it. More of this is needed — across every European sovereign-LLM project, not just AMÁLIA.
Implications for European Sovereign AI Strategies
The AMÁLIA project exemplifies the challenges faced by European countries in developing independent, native-language AI models. The unresolved questions about openness, native data sufficiency, and strategic goals influence national policies, resource allocation, and Europe’s position in global AI development. Addressing these issues is crucial for building trustworthy, transparent, and effective AI systems that serve local languages and communities.
European Sovereign-Language Model Initiatives and Challenges
Across Europe, nations like Italy, Germany, France, and Nordic countries are investing in sovereign-language models, often with similar structural and strategic questions. Many projects, including Italy’s Minerva and France’s Mistral, are at early stages, with ongoing debates about whether to train from scratch or adapt multilingual foundations. The European Union’s push for independent AI capabilities underscores the importance of addressing core questions of openness, native data, and purpose, which remain largely unresolved in the current landscape.
“The three questions about openness, native data, and objectives are fundamental to understanding the true state and potential of European sovereign models.”
— Duarte O.Carmo
Unanswered Questions About AMÁLIA’s Openness and Strategy
It remains unclear how open AMÁLIA truly is in practice, especially regarding access to training data and model weights. The sufficiency of native Portuguese data for future improvements is also uncertain, as the current dataset comprises a relatively small proportion of native content. Additionally, the strategic goals—whether the model is meant to serve as a national AI asset or a research prototype—are still under discussion, with no definitive public stance from the developers.
Next Steps for AMÁLIA and European Sovereign Models
The final version of AMÁLIA is scheduled for June 2026, with ongoing evaluations and potential updates based on benchmarking and community feedback. European projects will likely face increased scrutiny regarding transparency and data policies, prompting public discussions about openness and purpose. The coming months will also reveal how these models perform in real-world applications and whether they can meet strategic national and regional AI goals.
Key Questions
What are the main technical features of AMÁLIA?
AMÁLIA is a continuation of the EuroLLM multilingual foundation, trained on approximately 107 billion tokens, with a focus on European Portuguese. It outperforms previous open models on Portuguese benchmarks and is designed for multimodal capabilities in future versions.
Why are the questions about openness and native data important?
These questions determine how transparent, trustworthy, and effective the model can be. Openness affects access and collaboration, while native data sufficiency impacts the model’s accuracy and cultural relevance.
What is the broader significance of these issues for Europe?
Addressing these questions is vital for Europe’s goal of developing independent, culturally aligned AI systems that can compete globally and serve regional needs without over-reliance on non-European models.
When will we see the final version of AMÁLIA?
The final version is expected in June 2026, after further testing, benchmarking, and potential adjustments based on ongoing research and feedback.
Are there similar projects in other European countries?
Yes, countries like Italy, France, Germany, and Nordic nations are pursuing their own sovereign-language models, facing similar structural and strategic questions as Portugal with AMÁLIA.
Source: ThorstenMeyerAI.com