Namsor

Name diaspora – AI ethnicity guesser from first and last name

Through advanced onomastic research and morphological intelligence, Namsor's AI technology stands as the world's most reliable accurate ethnicity guesser. Our research-validated AI system expertly analyzes and determines the ethnic background associated with last names, first names, or complete names.

600+Research contributions

99.99%Names availability

13B billionsNames processed

Guess name ethnicity with our advanced AI tool

Analyze a first name, surname, or full name to guess a person's ethnic background. Add a country of residence to get a local context. The ethnicity/diaspora feature categorizes people by shared cultural, national, or linguistic backgrounds rather than geography.

Slightly more accurate with separate names.

Ethnicity: first & last name

Ideal feature for estimating the ethnic background from a split name:

First name, given name, nickname.

Last name, family name, surname.

Country of residence, in ISO 3166-1 alpha-2 format. "US" by default (if no value indicated).

information

Understanding the meaning of the returned values

When you use our ethnicity guesser API or interface, you receive key indicators that help determine the ethnic background. Here’s what each indicator represents:

  • Ethnicity indicator

    Ethnicity The most likely ethnicity associated with the name.

  • Confidence level indicator

    Calibrated probability (Between 0% and 100%) Represents the confidence level of the ethnicity estimate. For example, a 98% score indicates a very high level of accuracy.

  • Ethnicity indicator

    Alternative ethnicity The second most likely ethnicity associated with the name

  • Confidence level indicator

    Alt. Calibrated probability (Between 0% and 100%) Measures the overall likelihood that the name corresponds to either the primary or secondary ethnicity. It runs higher than the standard probability because it covers multiple possibilities.

  • Writing system indicator

    Script (Latin, Cyrillic, etc.) Identifies the writing system used, helping determine linguistic and cultural roots.

How to find ethnicity and diaspora from names?

By analyzing first names, surnames or full names through onomastics, we can find their likely ethnicity or diaspora. This process involves morphological and phonetic analysis, helping to trace names back to specific countries, regions, or linguistic groups. By adding a country of residence to the name, it is possible to improve the accuracy of the estimated ethnic group or diaspora.

Example of a basic morphological analysis of the Sharma surname.

An ethnicity is a group of people who identify with each other on the basis of common attributes distinguishing them from other groups: a common set of traditions, origin, culture, ancestry, language, history, religion or social treatment.

A diaspora refers to a population living outside the area in which they have lived for a long time or in which their ancestors have lived. The origins of this population differ from the country in which they live.

Some first and last names are common among different ethnic groups. In these cases, the probability score can fall to 30%. This suggests there are several possible ethnic backgrounds. That's why, to provide a more comprehensive analysis, we estimate a list of the 10 most likely ethnicities associated with the given name.

How does our ethnicity guesser AI work?

At Namsor, we develop specialized AI-powered name ethnicity finder tools that utilize large-scale data and advanced natural language processing (NLP) techniques. Our system is meticulously designed to enhance accuracy and adaptability at every stage.

  1. Data collection icon
    1

    Large-scale data collection and preparation

  2. AI model training icon
    2

    Onomastic model training for ethnic estimation

  3. Model validation icon
    3

    Model comparison and validation

  4. Continuous learning icon
    4

    Continuous learning and cultural adaptation

Additional origin taxonomies

  • The earth with a location sign over South America.

    Origin

    Origin is a taxonomy that categorizes a person's origin based on their own, their parents', or their ancestors' country of origin.

    Find name origin
  • A group of residential buildings with a location symbol in front.

    Residence country

    A person's residence country is where they have lived most in the past year, often a better indicator than nationality.

    Identify location
  • A group of people of different race/ethnicities in front of a map of the United States.

    U.S. race/ethnicity

    The U.S. Census classifies race and ethnicity into six categories based on social and cultural traits.

    Estimate U.S. race/ethnicity

How to use our name ethnicity finder

Discover the ethnic and cultural origins of names using our API documentation, CSV/Excel tools, or developer resources. Choose the method that best fits your ethnicity analysis needs.

A group of people from different backgrounds processing an Excel file using software.

CSV and Excel Tool

Process name ethnicity analysis by uploading your file and selecting the analysis type. Identify the ethnic and cultural origins of first names, last names, or full names quickly.

This tool is ideal for small to medium datasets requiring efficient name classification.

Process a CSV or Excel file
Two people interacting with computer servers.

API Documentation

For advanced requirements, our API integrates with your system to automate accurate name ethnicity detection.

Designed for dynamic applications, it provides comprehensive documentation with step-by-step guides and code examples in Python, JavaScript, Java, and Shell for seamless integration.

Explore the API Documentation
Groups of invdividuals building software using different modules.

Developer Tools

Access advanced ethnicity analysis using our SDKs and CLI for Python, Java, GoLang, and JavaScript.

These tools leverage advanced morphological and linguistic processing to provide precise ethnicity insights for both individual queries and large datasets.

Download Developer Tools

Why to used an ethnicity detector?

Estimating ethnicity from first names, surnames, and full names provides valuable insights across various industries.

World map with connected silhouettes representing diaspora community tracking

Diaspora mapping

Governments count on Namsor to pinpoint diaspora communities worldwide.

This helps countries to target their policies or attract funding. They can encourage expatriates to participate in national development.

Person interacting with formatted dialog bubbles illustrating data quality improvement

Data mining

Companies use name ethnicity estimation in data mining to detect patterns in large datasets.

It helps reveal demographic trends and improve predictive analytics for market segmentation and risk assessment.

Globe with movement indicators illustrating migration flow analysis

Population flows analysis

Government bodies and organizations use name ethnicity data to analyze migration trends.

They can adapt services to different groups by understanding ethnic distribution patterns. This helps improve planning for global mobility.

Person analyzing data charts representing cultural marketing segmentation

Marketing

Marketers use name ethnicity data to segment campaigns by cultural and ethnic groups.

Knowing the distribution of diasporas enables companies to tailor their products and services to specific communities.

Two people analyzing world map with location markers for global expansion

International expansion

Diasporas often play a key role in international trade by acting as bridges between their home and host countries.

That's why, global businesses leverage name ethnicity analysis to identify key diaspora communities.

Person reviewing identity card with security symbol for fraud prevention

Fraud prevention and KYC

Banks and financial institutions integrate Namsor into their KYC processes to enhance risk assessment.

By analyzing name ethnicity and origin, they strengthen fraud prevention and build user trust.

Silhouettes standing side by side representing equality in hiring processes

Population analysis

Organizations use name ethnicity analysis to detect biases in hiring, lending, and services.

By identifying discrimination patterns, they promote fair policies and ensure equal opportunities.

Person at computer with pattern grid representing AI system compliance analysis

IA Act compliance

AI developers leverage name ethnicity data to audit and reduce biases in machine learning models.

This ensures fairness, transparency, and compliance with AI Act regulations.

Frequently asked questions about name ethnicity and diaspora

How does Namsor estimate ethnicity and diaspora from a name?

Namsor infers ethnicity and diaspora using predictive AI models that analyze the morphological, phonetic and cultural signals embedded in a name, then optionally refine the result with a local geographic context.

Step 1: AI-powered name analysis

Namsor's predictive models identify statistical patterns in names: letter sequences, syllable structures, phonetic features and character combinations that correlate with specific cultural or linguistic backgrounds. The AI has learned these correlations from billions of names, building a probabilistic model rather than applying fixed rules.

For example, the suffix "-ović" is a strong statistical signal for South Slavic linguistic association, and the ending "-nen" correlates with Finnish-speaking populations. No single pattern is deterministic on its own. The model combines many such signals across the full name to produce a probability distribution over possible ethnicities.

Step 2: Optional geographic context

When a country code is provided (country of residence, country of work), Namsor uses it to narrow the classification. The same name can point to different diasporas depending on where the person lives:

  • "Chen" in the US → Chinese diaspora
  • "Chen" in Malaysia → Chinese-Malaysian community
  • "Smith" in the UK → English
  • "Smith" in South Africa → British diaspora

Without a country code, Namsor returns the globally most likely ethnicity. With a country code, it returns the locally most likely diaspora.

What it returns

Namsor classifies names into 139 ethnic and cultural groups. Unlike Origin (which returns a country code), Ethnicity returns named groups that reflect cultural identity rather than geography:

  • Sub-national identities: Scottish, Welsh, English, Flemish, Walloon, Catalan
  • Transnational identities: Hispanic, HispanoLatino, Jewish
  • Diaspora identities: AfricanAmerican, AsianAmerican, NativeHawaiian

When should I use Ethnicity/Diaspora instead of Origin or US Race?

Use Ethnicity/Diaspora when you need to understand a person's cultural identity with more granularity than a country code, especially in multicultural countries where Origin alone is too broad.

When Ethnicity is better than Origin

Origin returns a country code. It works well for identifying where a person's family comes from, regardless of where they live now. But Origin has two limitations that Ethnicity solves:

1. Immigration countries are not in the Origin taxonomy, and Origin returns the ancestral country instead. Countries like the US, Canada, Australia, Brazil, Argentina or any Latin American country are destinations, not origins. For someone living in these countries, Origin will return the country their family historically came from: ES (Spain) for Carlos García in Bogotá, PT (Portugal) for João Silva in São Paulo, ES (Spain) for María Rodríguez in Mexico City. Ethnicity solves this by returning groups like AfricanAmerican, AsianAmerican, HispanoLatino or NativeHawaiian, which describe the person's cultural identity regardless of where they live.

2. Ethnicity is more granular than a country code. Origin returns "GB" for a British name, but Ethnicity distinguishes English, Scottish and Welsh. Origin returns "BE" for a Belgian name, but Ethnicity distinguishes Flemish and Walloon. Ethnicity also captures identities that don't map to any single country: Jewish, Hispanic, Catalan, Tatar.

With a country code as context, Ethnicity becomes even more precise. In any country, Namsor can return any of its 139 cultural groups depending on the name analyzed. This is particularly valuable in multicultural countries where many different communities coexist.

As a rule of thumb: if you know the person's local context (country of residence or work), or if the names come from a recent multicultural country built through immigration (US, Canada, Australia), Ethnicity/Diaspora is the more adapted choice.

When to use Country of Residence

Use Country of Residence when you need to know where someone currently lives, not where their family comes from or what their cultural identity is. Country of Residence covers 247 countries and territories, including recent multicultural countries (United States, Canada, Australia), Latin American countries, newly formed states, overseas territories and micro-states that neither Origin nor Ethnicity covers. It is the most geographically complete of the four features.

This is also the right choice when Origin would return an ancestral country instead of the actual current location. If you want Colombia for Carlos García, Brazil for João Silva, or Mexico for María Rodríguez, use Country of Residence rather than Origin.

When US Race is more appropriate than Ethnicity

US Race returns the six categories defined by the US Census (White, Black/African American, Hispanic/Latino, Asian, Native Hawaiian/Pacific Islander, American Indian/Alaska Native). Use US Race when:

  • You need to align with US federal reporting requirements
  • You are running disparate impact analysis under US law
  • Your dataset is US-only and you need Census-compatible categories

Ethnicity is more granular: it distinguishes Chinese from Indian from Korean within the "Asian" Census category, or Moroccan from Senegalese from Nigerian within the "Black/African American" category. If you need this level of detail, use Ethnicity even for US datasets.

The practical decision

  • You know a local context such as country of residence, country of work or country of study → use Ethnicity/Diaspora with the country code for maximum precision
  • You have no context at all (social media aliases, anonymous lists) → use Origin (works from name alone, but returns ancestral country for people in immigration countries)
  • You need to determine where someone currently lives → use Country of Residence (247 countries and territories, including Latin America and other immigration countries)
  • You need US Census categories specifically → use US Race, with ZIP code for best accuracy
  • You want both cultural detail AND Census compliance → run Ethnicity + US Race on the same dataset