Use Ethnicity/Diaspora when you need to understand a person's cultural identity with more granularity than a country code, especially in multicultural countries where Origin alone is too broad.
When Ethnicity is better than Origin
Origin returns a country code. It works well for identifying where a person's family comes from, regardless of where they live now. But Origin has two limitations that Ethnicity solves:
1. Immigration countries are not in the Origin taxonomy, and Origin returns the ancestral country instead. Countries like the US, Canada, Australia, Brazil, Argentina or any Latin American country are destinations, not origins. For someone living in these countries, Origin will return the country their family historically came from: ES (Spain) for Carlos García in Bogotá, PT (Portugal) for João Silva in São Paulo, ES (Spain) for María Rodríguez in Mexico City. Ethnicity solves this by returning groups like AfricanAmerican, AsianAmerican, HispanoLatino or NativeHawaiian, which describe the person's cultural identity regardless of where they live.
2. Ethnicity is more granular than a country code. Origin returns "GB" for a British name, but Ethnicity distinguishes English, Scottish and Welsh. Origin returns "BE" for a Belgian name, but Ethnicity distinguishes Flemish and Walloon. Ethnicity also captures identities that don't map to any single country: Jewish, Hispanic, Catalan, Tatar.
With a country code as context, Ethnicity becomes even more precise. In any country, Namsor can return any of its 139 cultural groups depending on the name analyzed. This is particularly valuable in multicultural countries where many different communities coexist.
As a rule of thumb: if you know the person's local context (country of residence or work), or if the names come from a recent multicultural country built through immigration (US, Canada, Australia), Ethnicity/Diaspora is the more adapted choice.
When to use Country of Residence
Use Country of Residence when you need to know where someone currently lives, not where their family comes from or what their cultural identity is. Country of Residence covers 247 countries and territories, including recent multicultural countries (United States, Canada, Australia), Latin American countries, newly formed states, overseas territories and micro-states that neither Origin nor Ethnicity covers. It is the most geographically complete of the four features.
This is also the right choice when Origin would return an ancestral country instead of the actual current location. If you want Colombia for Carlos García, Brazil for João Silva, or Mexico for María Rodríguez, use Country of Residence rather than Origin.
When US Race is more appropriate than Ethnicity
US Race returns the six categories defined by the US Census (White, Black/African American, Hispanic/Latino, Asian, Native Hawaiian/Pacific Islander, American Indian/Alaska Native). Use US Race when:
- You need to align with US federal reporting requirements
- You are running disparate impact analysis under US law
- Your dataset is US-only and you need Census-compatible categories
Ethnicity is more granular: it distinguishes Chinese from Indian from Korean within the "Asian" Census category, or Moroccan from Senegalese from Nigerian within the "Black/African American" category. If you need this level of detail, use Ethnicity even for US datasets.
The practical decision
- You know a local context such as country of residence, country of work or country of study → use Ethnicity/Diaspora with the country code for maximum precision
- You have no context at all (social media aliases, anonymous lists) → use Origin (works from name alone, but returns ancestral country for people in immigration countries)
- You need to determine where someone currently lives → use Country of Residence (247 countries and territories, including Latin America and other immigration countries)
- You need US Census categories specifically → use US Race, with ZIP code for best accuracy
- You want both cultural detail AND Census compliance → run Ethnicity + US Race on the same dataset