Namsor

Name origin – Find origin from a first name and a last name

Namsor's world-leading AI specializes in cutting-edge morphological and onomastic analysis, making it the most accurate tool for determining a name origin. Leveraging billions of names from international scientific research, our advanced AI identifies with unmatched precision the country of origin of a last name, first name, or full name.

600+Research contributions

99.99%Names availability

13B billionsNames processed

Discover name origins with our advanced AI analysis

Analyze a first name, surname, or full name to determine a person's country of origin. This refers to the geographical, linguistic, and cultural roots of a person. In multicultural countries (e.g. the United States, Canada, France, South Africa, Australia, New Zealand), Name Diaspora may provide a more relevant classification.

Slightly more accurate with separate names.

Origin: first & last name

Ideal feature for estimating the country of origin from a split name:
Returns religious statistics for countries of origin.

First name, given name, nickname.

Last name, family name, surname.

information

How to interpret the returned values

When you use our name origin finder API or interface, you get key indicators. These help you find out where a name comes from. Here's what they mean:

  • Geographic region indicator

    Region & Sub-region of origin Estimates the broader geographic and cultural origins of the name.

  • Writing system indicator

    Script (Latin, Cyrillic, etc.) Identifies the writing system used, helping determine linguistic and cultural roots.

  • Country flag indicator

    Country of origin (ISO 3166-1 alpha-2) The most likely country associated with the name.

  • Confidence level indicator

    Calibrated probability (Between 0% and 100%) Indicates the confidence level of the country of origin estimate. For instance, a 98% score means high certainty.

  • Alternative country indicator

    Alternative country of origin (ISO 3166-1 alpha-2) The second most likely country of origin.

  • Alternative probability indicator

    Alt. Calibrated probability (Between 0% and 100%) Represents how likely it is that the name belongs to either the primary or alternative country of origin. The value is always greater than the standard probability. This is because it includes many different possibilities.

What is the origin of a name and how to find it?

A name origin refers to the geographical, cultural, and linguistic roots of a given name. It reflects the historical migration, ethnic background, and linguistic traditions associated with surnames and first names.

Example of a basic morphological analysis of the Sharma surname.

By analyzing names through onomastics, we can determine their likely origin. This process involves morphological and phonetic analysis, helping to trace names back to specific countries, regions, or linguistic groups.

By combining morphological, linguistic, and geographical insights, Namsor's name origin finder provides a very reliable estimation of where a name comes from.

Some first names and last names are often found in many countries in similar amounts. This makes their origin less distinct. In these cases, the calibrated probability might drop to 30%. This shows that the name could come from different origins. To provide a more comprehensive analysis, we return a list of the 10 most likely countries of origin for the given name.

How do we identify the country of origin from a name?

At Namsor, we develop specialized AI-powered name origin analysis tools that leverage large-scale data and advanced natural language processing (NLP) techniques. Every step in our system is designed to improve accuracy and adaptability.

  1. Data collection icon
    1

    Large-scale data collection and preparation

  2. AI model training icon
    2

    Onomastic model training for name origin estimation

  3. Model validation icon
    3

    Model comparison and validation

  4. Continuous learning icon
    4

    Continuous learning and cultural adaptation

Additional origin taxonomies

  • A group of people of different ethnicities in front of a map of the earth.

    Ethnicity

    The diaspora categorizes people by shared cultural, national, or linguistic backgrounds rather than geography.

    Guess name ethnicity
  • A group of residential buildings with a location symbol in front.

    Residence country

    A person's residence country is where they have lived most in the past year, often a better indicator than nationality.

    Identify location
  • A group of people of different race/ethnicities in front of a map of the United States.

    U.S. race/ethnicity

    The U.S. Census classifies race and ethnicity into six categories based on social and cultural traits.

    Estimate U.S. race/ethnicity

Is Namsor the best tool to determine names origin?

Discover how Namsor's specialized onomastics outperforms LLMs, static databases, and other name analysis tools in accurately determining the origin of last names, first names, and full names.

NamsorData base comparaisonLarge Language Model (LLM)Onomastic solutions
Accuracy
Language coverage
Names covered99.99%75% to 92%
(depending on the solution)
80% to 95%
(depending on the models)
99.99%
Distinguishes typos from cultural nuances
Specialised onomastic analysisInfo
(morphology, context)
(gross indexation)
(not dedicated to names)
(Partial)
Data updatesInfoContinuous
(data & algorithms)
SporadicNon priorityIrregular
Speed of analysis per name
(lower is better)
0.03 sec.0.03 sec.From 1 sec. to 5sec.0.2 sec.
Privacy and anonymityInfoVery high
(Anonymisable data, deactivatable machine learning)
Medium
(No anonimisation of data)
Very low
(Data retention and compulsory machine learning)
Low
(Data retention)

How to use our name origin finder

Discover the geographical origins of names using our API documentation, CSV/Excel tools, or developer resources. Choose the method that best suits your project requirements.

A group of people from different backgrounds processing an Excel file using software.

CSV and Excel Tool

Process name lists by uploading your file and selecting the origin analysis type. Get instant insights on the country of origin for first names, last names, or full names.

This tool is ideal for small to medium datasets requiring quick surname origin exploration.

Process a CSV or Excel file
Two people interacting with computer servers.

API Documentation

For advanced requirements, our API integrates with your system to automate name origin analysis with high precision.

Built for dynamic applications, it includes comprehensive documentation with step-by-step guides and code examples in Python, JavaScript, Java, and Shell.

Explore the API Documentation
Groups of invdividuals building software using different modules.

Developer Tools

Access advanced name origin analysis using our SDKs and CLI for Python, Java, GoLang, and JavaScript.

With advanced morphological and linguistic processing, these tools provide precise origin insights for both individual queries and large datasets.

Download Developer Tools

In what cases can find origins analysis be used?

Analyzing first names, surnames, and full names to determine country of origin is valuable in numerous industries.

Scientific microscope next to clipboard symbolizing academic name data analysis

Research

Understanding migration patterns and ethnic distributions is crucial for demographic studies.

Researchers use Namsor to analyze name origins and track historical and modern population shifts.

Person reviewing identity card with security symbol for fraud prevention

Fraud prevention and KYC

Verifying identity in global transactions is key for compliance.

Banks and financial institutions use name origin analysis to detect inconsistencies in identity records and prevent fraud.

Person in suit next to globe with shield representing international security

Interior and exterior security

Governments and security agencies use Namsor to study name origins.

They do this for risk assessment, border control, and international cooperation. This helps improve security strategies and detect threats.

Silhouettes standing side by side representing equality in hiring processes

Population analysis

Some governments use Namsor to analyze name-origin patterns across hiring, education, and policy-related datasets.

The insights support population-level monitoring and data-informed evaluation.

World map with connected silhouettes representing diaspora community tracking

Diaspora mapping

Cities and international organisations are using Namsor to map diasporas and better understand migration patterns.

This supports population-level analysis for urban planning, service allocation, and long-term demographic monitoring.

Person analyzing data charts representing cultural marketing segmentation

Marketing

Successful marketing relies on personalization.

With Namsor, companies can segment customer databases based on surname origins to tailor campaigns by cultural and regional preferences.

Person at computer with pattern grid representing AI system compliance analysis

IA Act compliance

Namsor offers clear and ethical name origin analysis.

This helps organizations follow the EU AI Act. They provide explainable and auditable AI-driven insights.

Person examining ancient decorated vase symbolizing genealogical research

Historical analysis

Genealogists often face challenges in tracing ancestral roots.

Namsor helps sort large databases by last name origin, making family tree reconstruction faster and more accurate.

Frequently asked questions about name origin

What is the most accurate name origin and ethnicity API?

Namsor is the most accurate API for inferring geographic origin and ethnicity from a name, validated by independent benchmarks on hundreds of thousands of real-world names.

Origin classification: 92% accuracy vs 62% for LLMs

In a benchmark on approximately 400,000 names, Namsor correctly classified 92% of names by country of origin. The best-performing large language model reached only 62%, with 18% of names left unclassified, 8% assigned to an incompatible taxonomy and 12% classified to the wrong country.

Validated on 250,000 real individuals

Researchers from Harvard and the University of Chicago validated Namsor's origin and ethnicity inference on 250,000 individuals from the North Carolina voter registry, where self-reported race and ethnicity data was available for ground-truth comparison (Bursztyn, Chaney, Hassan & Rao, ).

Tested on 88,699 researcher names

A peer-reviewed study published in PLOS ONE tested Namsor's origin classification on 88,699 names of researchers worldwide, confirming high precision across cultural backgrounds.

Coverage: 131 countries, 22 writing systems

Namsor classifies origin across 131 countries and supports names in 22 alphabets, from Latin and Cyrillic to Arabic, Han, Hangul, Devanagari and beyond. Most competing tools cover fewer countries and only support Latin script.

Trusted by leading institutions and global companies

Elsevier, Springer Nature, the European Commission, Harvard, Columbia University, Yale and the World Bank rely on Namsor's origin and ethnicity inference for bibliometric analyses, policy research and academic studies.

In the private sector, global leaders in transportation and aviation, travel and tourism, financial services and money transfer, intelligence and risk analysis, and recruitment use Namsor's origin and ethnicity features in production.

What is the difference between name origin, ethnicity / diaspora, US race and country of residence?

These four Namsor features answer four different questions about a person. They often return different results for the same name, and choosing the right one depends on what you're trying to learn.

The four questions, in plain language

A concrete example: "Carlos García" living in Bogotá

FeatureReturnsWhat it tells you
OriginES (Spain)His ancestors come from Spain — not where he lives
EthnicityHispanoLatinoHis cultural identity is Hispanic/Latino
Country of ResidenceCO (Colombia)He currently lives in Colombia
US RaceHL (Hispanic/Latino)His US Census racial category

Same name, four different answers, four different insights.

Why Origin doesn't cover every country

Origin classifies the 131 countries that are historically sources of population, not destinations. Countries built through immigration (USA, Canada, Australia, Brazil, Argentina, New Zealand and most of Latin America) are not in the Origin taxonomy because there is no single "American origin" or "Australian origin." The people living there come from Europe, Africa, Asia, the Middle East and elsewhere. Origin tells you where from, not where to.

Common pitfall: Origin returns Spain or Portugal for people living in Latin America

Because Origin reflects ancestral roots and not current location, it will not return Colombia, Mexico, Argentina, Brazil or any other Latin American country for people who live there. It will return the country their family historically came from.

  • For Carlos García living in Bogotá, Origin returns ES (Spain) — his Spanish ancestral roots, not Colombia.
  • For João Silva living in São Paulo, Origin returns PT (Portugal) — his Portuguese ancestral roots, not Brazil.
  • For María Rodríguez living in Mexico City, Origin returns ES (Spain) — not Mexico.

The same logic applies to the US, Canada, Australia and other immigration countries. If you need to know the country where the person actually lives, use Country of Residence instead of Origin. If you need cultural segmentation across the Hispanic or Latino diaspora as a group, use Ethnicity / Diaspora.

Why Ethnicity goes beyond countries

Ethnicity captures cultural identities that don't align with national borders:

  • Sub-national groups: Scottish, Welsh and English instead of just "British." Flemish and Walloon instead of just "Belgian." Catalan within Spain.
  • Transnational groups: Hispanic and HispanoLatino cover the entire Spanish-speaking diaspora across dozens of countries, as a shared cultural identity rather than a specific nationality.
  • Religious and cultural groups: Jewish, which is a cultural and religious identity present across many countries.
  • Ethnic minorities: Tatar, AfricanAmerican, AsianAmerican, NativeHawaiian.

This is why Ethnicity is more granular than Origin for multicultural countries and diasporas.

What each feature takes as input

The features differ not only in what they return, but also in what context they accept:

  • Origin: name only. No country code input. The classification relies entirely on the name itself.
  • Ethnicity / Diaspora: name + optional country code. Providing a local context (country of residence, country of work) significantly improves precision, especially in multicultural countries.
  • Country of Residence: name only. The goal is to infer the country, so no country input is needed.
  • US Race: name + optional country code + optional US ZIP code. Adding a ZIP code provides neighborhood-level context for greater precision.

When to use which feature

  • You know where the person lives or works: use Ethnicity / Diaspora with the country code. This is the most precise option for immigration countries like the US, Canada, Australia, France or the UK, where a name alone may not distinguish between multiple possible origins.
  • You have a list of names without any context (social media aliases, pseudonyms, historical records with no location data): use Origin. It works from the name alone and doesn't require any additional information. Keep in mind that for Latin American or other immigration contexts, Origin will return the ancestral country, not the current one.
  • You need to know where someone currently lives (compliance, localization, routing, or simply the actual country for people in Latin America, the US, Canada, Australia, etc.): use Country of Residence.
  • You need US Census-aligned categories (federal reporting, disparate impact analysis): use US Race, ideally with a ZIP code for maximum precision.

As a general rule: when a local context is available, Ethnicity / Diaspora is more precise and more coherent than Origin for countries with diverse populations. Origin is the right choice when no context is available at all.