Frequently Asked Questions - Everything you need to know about Namsor
About Namsor
Is Namsor the best tool available?
Yes. Namsor is the most widely validated and comprehensive name analysis tool on the market. Here is the evidence, dimension by dimension.
Most accurate, validated by peer-reviewed studies
Gender detection. A study published in Internal and Emergency Medicine (Springer) on 11,999 marathon runners from seven major international marathons showed that Namsor achieved an error rate of 4.8%, nearly half that of the next best tool at 8.0% (p < 0.001).
Origin classification. On a benchmark of 400,000 names, Namsor reached 92% accuracy, compared to 62% for the best-performing large language model. Researchers from Harvard and the University of Chicago validated Namsor on 250,000 individuals from the North Carolina voter registry (Bursztyn, Chaney, Hassan & Rao, ). A study published in PLOS ONE confirmed accuracy on 88,699 researcher names.
Highest coverage: 99.99% of names classified
A study published in the Journal of the Medical Library Association on 6,131 Swiss physicians showed that Namsor left 0% of names unclassified, compared to 0.3% to 16.4% for competing tools. On unique names, Namsor's error rate only moves from 2.0% to 3.1%, while a major competing tool's error rate jumps from 17.7% to 28.2%.
Fastest: 30 ms per name, 80 to 500 ms per batch
Namsor processes a single name in under 30 ms and a batch of several hundred names in 80 ms to less than 500 ms depending on name complexity. For comparison, large language models (LLMs) typically take 1 to 5 seconds per name for similar classification tasks. At this speed, processing 1 million names takes minutes, not days.
Most complete: nine features, deepest taxonomies in the industry
Namsor offers nine classification features: gender detection, origin (131 countries), ethnicity and diaspora (139 cultural groups), country of residence (247 countries and territories), US race/ethnicity (US Census categories), Indian name analysis (caste, religion, state), name parsing, name type recognition and phone number formatting. This range covers 22 writing systems (Latin, Cyrillic, Arabic, Han, Hangul, Devanagari, Hiragana, Katakana, Hebrew, Thai and more) with the deepest taxonomy segmentation in the industry.
Most private
Namsor is the only name analysis tool offering both data anonymization through SHA encryption and deactivatable machine learning on your data, fully compliant with GDPR, CCPA and the EU AI Act. Unlike LLMs, which transmit your data to third-party providers and may reuse it for their training, Namsor operates on dedicated infrastructure and provides a downloadable Data Processing Agreement.
Recognized by the global scientific community
Namsor is cited in over 1,200 Google Scholar publications and has contributed to more than 600 academic studies published in venues such as Nature, The Lancet Global Health, PLOS ONE, the British Journal of Surgery, the Journal of Medical Internet Research, Scientometrics, the Journal of the Medical Library Association and Internal and Emergency Medicine.
Elsevier and Springer Nature rely on Namsor for their own bibliometric analyses of author demographics. Namsor was selected by the European Commission to power the gender statistics in its SheFigures reports.
What features does Namsor offer?
Namsor provides a comprehensive suite of name analysis features, all accessible via REST API, SDKs, CSV/Excel upload, Google Sheets or no-code integrations.
Standard features
- Gender detection: determine if a name is male or female
- Name origin: identify the country of origin across 131 countries
- Ethnicity and diaspora: estimate cultural and ethnic background across 139 groups
- Country of residence: infer where a person currently lives between 247 countries
- US race/ethnicity: classify according to US Census categories
- Indian name analysis: detect 12 caste groups, religions and states
- Name parsing: split a full name into first and last name
- Name type recognition: classify as personal name, brand, pseudonym or place name
- Phone number formatting: detect country code and validate structure from a name
Name embeddings
Namsor generates name embeddings: numerical vector representations of proper names that capture morphological, cultural and linguistic signals. These vectors can be integrated into your own machine learning pipelines for clustering, similarity search or custom classification tasks. Available on namsor.ai.
Custom models
Beyond standard features, Namsor builds custom AI models for specific industry needs, including fake name detection for KYC and compliance, romance scam detection, and name transliteration (e.g. Mandarin or Kanji to Latin).
What is onomastics and how does Namsor use it?
Onomastics is the scientific study of proper names: their origin, structure, meaning and cultural usage. It is a branch of linguistics that analyzes how names carry information about a person's gender, geographic heritage, language, religion or ethnic background.
How Namsor applies onomastics
Namsor uses computational onomastics, a discipline that combines morphological analysis of names with artificial intelligence. Rather than simply matching a name against a list, Namsor decodes the internal structure of a name to extract meaningful signals.
Morphological analysis in practice
Names contain morphemes (roots, prefixes, suffixes) that carry cultural and linguistic information. For example:
- The suffix "-ović" (Petrović, Jovanović) is a patronymic marker signaling South Slavic origin
- The prefix "Al-" (Al-Fayed) is the Arabic definite article, indicating Arab heritage
- The suffix "-ko" signals a Ukrainian family name (Shevchenko, Bondarenko) but a feminine Japanese given name (Hanako, Yoshiko)
This last example illustrates why lookup tables fail: the same suffix carries opposite gender signals depending on linguistic context. Onomastic analysis decodes these patterns. Lookup tables cannot.
Beyond human onomastics
The examples above are simplified illustrations of well-known morphological patterns. In practice, Namsor's AI models detect far more subtle signals in name structures, identifying micro-patterns across billions of names that go beyond what traditional onomastic analysis can capture. The result is a level of precision that no human expert or static rule set can replicate at scale.
Why it matters
This morphological approach is what allows Namsor to classify names it has never encountered before, including rare names, newly invented names, or names from underrepresented populations that do not appear in publicly available name lists.
Trust & validation
Which institutions have validated Namsor's accuracy?
Namsor's accuracy has been independently validated through peer-reviewed studies, institutional audits and large-scale scientific benchmarks. Namsor is cited over 1,200 times on Google Scholar and has contributed to more than 600 academic publications.
Elsevier and Science-Metrix ()
Namsor was judged the most accurate tool for name-based gender inference and selected to power the gender statistics in the European Commission's SheFigures reports. (Read the report)
Harvard University and the University of Chicago ()
validated Namsor on a dataset of 250,000 individuals from the North Carolina voter registry for origin and ethnicity classification. (Read the study)
Uber, ACM FAccT ()
conducted an internal benchmark comparing name-based race and ethnicity inference tools and found that Namsor outperformed all alternatives tested. (Read the benchmark)
Journal of the Medical Library Association ()
conducted a peer-reviewed study on 6,131 physicians in Switzerland and confirmed Namsor as one of the top most accurate gender detection tools, and the only one with zero unclassified names. (Read the study)
Internal and Emergency Medicine, Springer ()
compared three leading gender detection APIs on 11,999 runners from seven international marathons. Namsor achieved the lowest error rate and classified 100% of names. (Read the study)
PLOS ONE ()
evaluated Namsor on 88,699 researcher names and confirmed its precision for origin and ethnicity classification. (Read the study)
Columbia University
Benchmark currently in progress.
Is Namsor used in academic research?
Yes, extensively. Namsor is cited in over 1,200 Google Scholar publications and has contributed to more than 600 academic studies across disciplines.
Types of research
Researchers use Namsor in a wide range of studies, including:
- Gender gap analysis: measuring female representation in scientific authorship, editorial boards, grant allocations and career progression
- Bibliometrics: analyzing author demographics across large publication databases (Scopus, PubMed, Web of Science)
- Migration and diaspora studies: tracking population flows, immigrant integration and diaspora mapping
- Epidemiology and public health: studying demographic patterns in health outcomes and clinical trial participation
- Discrimination and bias research: detecting ethnic or racial disparities in hiring, citations, funding and peer review
Disciplines
Namsor is used across medicine, sociology, economics, political science, computer science and information science, among others.
Why researchers choose Namsor
Namsor is the reference solution used by leading scientific publishers. Elsevier and Springer Nature rely on Namsor for their own bibliometric analyses of author demographics. Research teams from Harvard, Columbia University, Yale, Oxford, HEC and other major universities use Namsor in their studies.
Namsor allows retroactive analysis of large datasets where self-reported demographics are unavailable. It is fast, cost-effective, and its accuracy has been independently validated in peer-reviewed studies, making it defensible in academic methodology sections.
Researcher support program
Namsor offers a dedicated support program for researchers and scientists preparing a publication. Contact Namsor to learn more.
Is Namsor used by governments and international organizations?
Yes. Namsor is trusted by governments, international organizations and public institutions for large-scale demographic analysis and policy research.
International organizations
Among many others, here are a few examples of international organizations using Namsor:
- European Commission: Namsor powers the gender statistics in the SheFigures reports, produced by Elsevier and Science-Metrix, to measure women's contribution to scientific research across Europe (read the report)
- United Nations: uses Namsor for demographic and digital inclusion research, including the EQUALS Research Report and the ECLAC study on the digital footprint in Latin America and the Caribbean
- World Bank: commissioned a custom Namsor model to estimate caste groupings from Indian names, enabling research on internal migration and social inequalities
- IOM (International Organization for Migration): partnered with the World Bank on the Indian caste model, and uses Namsor for diaspora mapping projects including the Armenian diaspora, the Georgian diaspora and the Azerbaijan diaspora
Government and public sector
Among many others, here are a few examples of government and public sector institutions using Namsor:
- Federal Reserve Bank of Chicago: used Namsor to classify the ethnic origin of authors in a working paper on cultural change in the economics profession (García-Jimeno & Parsa, )
- DARES (French Ministry of Labour): uses Namsor for labor market and demographic analysis in France (CNIS report, )
- Boston Planning & Development Agency: used Namsor to map the Brazilian scientific diaspora in Boston
Why the public sector trusts Namsor
Namsor's combination of accuracy, privacy controls and regulatory compliance (GDPR, CCPA, EU AI Act) makes it suitable for public sector use cases where data sensitivity is critical.
Is Namsor used by companies?
Yes. Namsor powers name analysis at scale for companies across a wide range of industries, from global enterprises to fast-growing startups. While most clients operate under confidentiality, the types of organizations using Namsor include:
Transportation and travel
- International airports
- Global airlines
- Business travel and tourism platforms
Financial services
- Neobanks
- Global money transfer and remittance leaders
Science and publishing
- Pharmaceutical companies
- Scientific publishers
Retail, e-commerce and marketing
- Global cosmetics brands
- E-commerce platforms
- Retail companies
- Marketing and advertising agencies
Technology and data
- AI and big data companies
- Recruitment and HR tech platforms
Security and intelligence
- Intelligence and risk analysis firms
Why companies choose Namsor
Namsor scales from thousands to billions of names with consistent accuracy, integrates through API, SDK, CSV/Excel tools and no-code platforms, and meets enterprise requirements for GDPR, CCPA and EU AI Act compliance.
Why is a specialized onomastic API better than a name lookup database?
Name lookup databases work by matching an input name against a precompiled list. When the name is in the list, the result can be correct. When it is not, the tool either returns no result or falls back on an approximate match with no guarantee of accuracy.
Coverage drops on real-world data
Lookup databases typically cover between 75% and 92% of names, depending on the solution. That gap is not random. The missing 8% to 25% of unrecognized names are disproportionately rare names, non-Western names, transliterated names and newly coined names. These are precisely the names that a morphological approach can still classify correctly, because analysis does not depend on having seen that exact name before.
No ability to distinguish typos from cultural nuances
Name databases treat "Muhammed", "Mohammed" and "Muhammad" as separate entries. A specialized onomastic API recognizes them as transliteration variants of the same Arabic root and classifies them consistently. Conversely, when a name contains a genuine typo, an onomastic model can still extract the morphological signal, while a database either mismatches or returns nothing.
Shallow taxonomy and no contextual understanding
Most lookup databases only offer basic classifications: origin and sometimes location. They analyze first name and last name in isolation, missing the cultural signals that emerge from their combination. For example, the same first name paired with different last names can indicate completely different origins, genders or ethnicities. Only a model that understands name morphology and cultural context can capture these nuances.
Lookup databases also cannot distinguish a fake name from a rare one: both are simply absent from the list. A specialized onomastic API can detect structural anomalies in a fabricated name while still classifying a genuinely rare name correctly. This distinction is critical for KYC, fraud prevention and compliance workflows.
Sporadic updates
Lookup databases depend on periodic imports from public registries, census data or crowdsourced lists. Namsor's models are continuously updated with both new data and improved algorithms, adapting to evolving naming patterns across cultures.
Why is a specialized onomastic API better than a general-purpose LLM for name classification?
LLMs can appear accurate on name classification when tested on common names. In practice, on real-world data, they fall short in every critical dimension.
Accuracy collapses on real names
LLMs are trained on publicly available data, including lists of the top well-known names by country that appear on thousands of websites. When tested on these top well-known names, their results are correct, precisely because they have been overtrained on this data. This creates a dangerous bias: it gives a false sense of accuracy that collapses on real-world datasets.
When Namsor tested three major LLMs on a real-world dataset of 400,000 names submitted by actual API users, the results were very different. Namsor correctly classified over 92% of names. The best-performing LLM achieved approximately 62%, with 18% of names left unclassified, 8% assigned to the wrong taxonomy (confusing origin with diaspora, language with country), and 12% attributed to the wrong country.
Taxonomy confusion
Beyond missing names, LLMs frequently confuse classification categories. They mix linguistic origins (Latin, Greek, Cyrillic) with countries, and countries with diasporas. Some responses reference entities that no longer exist, such as the Persian Empire. A specialized onomastic API maintains strict, consistent taxonomies across every classification.
Syllable-level vs. letter-level analysis
LLMs process names at the syllable or token level, which limits their ability to detect fine morphological signals. Namsor's models perform letter-by-letter morphological analysis, capturing micro-patterns that syllable-level processing misses entirely.
Non-deterministic results
The same name submitted twice to an LLM can produce different answers. For research, compliance or any use case requiring reproducibility, this is disqualifying. A specialized API returns the same result every time.
Latency and cost
An LLM takes 1 to 5 seconds per name. Namsor processes a name in 0.03 second. At scale, the difference is the gap between minutes and days.
Privacy risk
LLMs retain input data and use it for training by default. Name data submitted to an LLM cannot be anonymized or excluded from model training. Namsor offers anonymized mode with SHA encryption and opt-out from machine learning.
But LLMs bring one thing
Despite their limitations on precision and taxonomy, LLMs can provide useful semantic context about names. This is why Namsor V3 integrates a semantic model alongside its morphological and statistical models, capturing the best of LLM capabilities without their weaknesses.
Coverage & capabilities
What happens if a name isn't in your dataset?
Namsor still classifies it. Unlike lookup-based tools that return no result when a name is absent from their list, Namsor does not depend on having seen a name before.
Morphological analysis, not lookup
Namsor analyzes the structure of a name letter by letter, extracting cultural, linguistic and geographic signals from its roots, prefixes, suffixes and phonetic patterns. This means Namsor can classify a rare name, a transliterated name, a misspelled name or even a completely invented name.
Proven in benchmarks
In two independent peer-reviewed studies, Namsor achieved zero unclassified names, while competing tools left up to 25% of names without a result (Sebo, ; Sebo, Shamsi & Wang, ).
99.99% classification rate
Namsor classifies virtually every name submitted, regardless of origin, writing system or frequency.
How many names can Namsor analyze and in which alphabets?
Namsor's models are trained on a proprietary database of 13 billion unique names, the largest in the industry. Over 12 billion names have been processed through the platform to date, covering individuals, companies and aliases from every region of the world.
22 writing systems supported
Namsor analyzes names written in Latin, Cyrillic, Arabic, Han (Chinese traditional and simplified, Kanji), Hangul (Korean), Hiragana, Katakana, Devanagari, Bengali, Georgian, Greek, Armenian, Thai, Hebrew, Kannada, Gujarati, Tamil, Telugu, Gurmukhi, Oriya, Myanmar and Malayalam.
99.99% classification rate
Unlike lookup-based tools that leave 8% to 25% of names unclassified, Namsor's morphological analysis ensures that virtually every name receives a classification, including rare names, transliterated names and newly invented names. In independent benchmarks, Namsor is the only tool that consistently achieves zero unclassified names (Sebo, ; Sebo, Shamsi & Wang, ).
Understanding results
Why does Namsor offer 4 different features for analyzing name origin?
Namsor offers four features for analyzing name origin because there are four different questions you can ask about a person, and each one requires a different answer. They are not redundant: a single name often returns four different but equally valid results.
The four questions and the four features
- Origin answers "Where does this person's family historically come from?" It returns a country code (ISO) and covers 131 countries.
- Ethnicity / Diaspora answers "What cultural identity does this person belong to?" It returns a named cultural group from 139 groups (e.g. Scottish, Catalan, Hispanic, Jewish, Tatar, AfricanAmerican).
- Country of Residence answers "Where does this person currently live?" It returns a country code and covers 247 countries and territories — the broadest geographic coverage of the four features.
- US Race / Ethnicity answers "Which US Census racial category does this person belong to?" It returns one of six Census categories: White, Black/African American, Hispanic/Latino, Asian, Native Hawaiian/Pacific Islander, American Indian/Alaska Native.
Why four features instead of one?
Because the four concepts genuinely do not overlap. A person can be ethnically Chinese, of Chinese ancestral origin, living in the United States, and classified as Asian under US Census categories — all at the same time. None of these four facts can be derived from any single other one.
A name like "García" tells you something about ancestral roots (Spanish), but not about where the person lives (could be Spain, Mexico, Colombia, the US, or anywhere else) and not about cultural identity (could be Spanish, Mexican, Hispanic-American, etc.). A name like "Smith" could belong to someone born in the US for ten generations, or someone who recently moved to London from Australia. One feature cannot answer all four questions correctly, so Namsor offers four specialized features instead of one approximate one.
One name, four answers: an example
For Wei Zhang living in San Francisco, the four features return:
| Feature | Returns | What it tells you |
|---|---|---|
| Origin | CN (China) | His family historically comes from China |
| Ethnicity | Chinese | His cultural identity is Chinese |
| Country of Residence | US (United States) | He currently lives in the United States |
| US Race | Asian | His US Census racial category |
All four answers are correct. They simply answer different questions. Choosing the right feature means knowing which question you are actually asking.
Why coverage differs across features
The four features cover different numbers of countries or groups because each is built around a different concept:
- Origin (131 countries): limited to countries that are historically sources of population. Immigration countries like the US, Canada, Australia, Brazil, Argentina and most of Latin America are not in the taxonomy because there is no single "American origin" or "Brazilian origin."
- Ethnicity (139 groups): captures cultural identities that don't always align with country borders — including sub-national groups (Scottish, Catalan), transnational groups (Hispanic, Jewish) and communities defined by shared culture rather than geography.
- Country of Residence (247 countries and territories): the most geographically complete feature. Covers every country, including immigration destinations, newly formed states, overseas territories and micro-states.
- US Race (6 categories): strictly aligned with the US Census taxonomy, used for federal reporting and disparate impact analysis.
A common pitfall to know about
Origin will not return the United States, Canada, Brazil, Mexico, Colombia, Argentina, Australia or any other immigration country for people who live there. Because Origin reflects ancestral roots, it returns the country the family historically came from instead — typically Spain or Portugal for Latin America, or various European/African/Asian countries for the US, Canada and Australia.
If you need the country where the person actually lives, use Country of Residence instead of Origin. This is the single most common confusion among new Namsor users.
Quick decision guide
- You know where the person lives or works → use Ethnicity / Diaspora with the country code. Most precise option for multicultural countries.
- You only have a name with no context (social media aliases, anonymous lists) → use Origin. Works from the name alone.
- You need to know where someone currently lives → use Country of Residence. The only feature that covers immigration countries like the US, Canada, Australia and Latin America.
- You need US Census-aligned categories → use US Race, ideally with a ZIP code for neighborhood-level precision.
- You want both cultural detail and geographic distribution → combine Ethnicity + Country of Residence on the same dataset.
Why does Namsor return Spain or Portugal instead of the actual country someone lives in?
Short answer: Namsor's Origin feature returns the country a person's family historically came from, not the country where they currently live. For someone living in Latin America, the United States, Canada, Australia or any other immigration country, Origin will return the ancestral country instead of the country of residence. This is by design, not a bug.
Why Origin works this way
Origin is built around 131 countries that are historically sources of population, not destinations. Countries built largely through immigration (United States, Canada, Australia, New Zealand, Brazil, Argentina, and most of Latin America) are not in the Origin taxonomy because there is no single ancestral origin shared by their populations.
For someone living in São Paulo, ancestral roots could be Portuguese, Italian, Japanese, Lebanese, German or African. There is no "Brazilian origin" in the historical sense Origin is designed to capture. The same is true for the US, where ancestral roots span every continent. Origin therefore returns the country the family historically came from, which is the only meaningful answer the feature can give within its taxonomy.
Examples across regions
Here are typical results that often surprise new users:
| Person | Origin returns | Why |
|---|---|---|
| Diego Hernández in Buenos Aires | ES (Spain) | Hernández is a Spanish surname, not Argentine |
| Ana Costa in Rio de Janeiro | PT (Portugal) | Costa is Portuguese, not Brazilian |
| John Smith in Boston | GB (Great Britain) | Smith is a British surname, not American |
| Liam O'Connor in Sydney | IE (Ireland) | O'Connor is an Irish surname, not Australian |
| Mohammed Hassan in Toronto | EG (Egypt) or similar | Arabic name, not Canadian in origin |
| Hiroshi Tanaka in São Paulo | JP (Japan) | Japanese name (large Nikkei community in Brazil) |
| Wei Zhang in Vancouver | CN (China) | Chinese name, not Canadian in origin |
In all these cases, Origin is doing exactly what it is designed to do: identify ancestral roots. The "wrong" country is only wrong relative to a question Origin was never built to answer.
How to get the actual country of residence
Use Country of Residence instead of Origin. Country of Residence is built around a different question (where someone currently lives) and covers 247 countries and territories, including all immigration countries that Origin cannot return.
For the same examples above, Country of Residence returns:
- Diego Hernández in Buenos Aires → AR (Argentina)
- Ana Costa in Rio de Janeiro → BR (Brazil)
- John Smith in Boston → US (United States)
- Liam O'Connor in Sydney → AU (Australia)
- Mohammed Hassan in Toronto → CA (Canada)
If you need cultural identity rather than geography (for example, identifying the Hispanic, African American or Asian American community a person belongs to), use Ethnicity / Diaspora instead. Ethnicity can return groups like HispanoLatino, AfricanAmerican or AsianAmerican that neither Origin nor Country of Residence can represent.
When Origin is still the right feature
Origin remains the right choice in several cases:
- You have no context about where the person lives (anonymous lists, social media aliases, historical records). Origin is the only feature that works from a name alone.
- You specifically want ancestral roots for genealogy, family history research or migration studies.
- You are studying historical population movements, diasporas or migration patterns. In this context, ancestral country is exactly the signal you want.
- The names come from a country that is in the Origin taxonomy (most of Europe, Asia, Africa and the Middle East). For these populations, Origin and Country of Residence often return the same answer.
For most analytics, customer segmentation, compliance and localization use cases involving immigration countries, Country of Residence is the more appropriate feature.
Getting started & integration
Can I use Namsor without coding?
Yes. Namsor offers four no-code ways to analyze names at scale, with no technical skills required.
CSV and Excel tool
Upload a spreadsheet, choose the analysis type, map your columns and download the enriched file. Supports .xls, .xlsx, .csv, .txt and .ods files. Learn more about the CSV/Excel tool.
Google Sheets add-on
Analyze up to 500,000 names directly inside a Google Sheet. Install from the Google Workspace Marketplace and run analyses from the sidebar.
No-code automations
Connect Namsor to 8,000+ apps through Zapier, Make or n8n to automate name analysis in your existing workflows (CRM enrichment, form submissions, database sync). Learn more about no-code integrations.
Interactive forms on feature pages
Every feature pages includes an interactive form at the top, so you can run small analyses directly from the Namsor website with no setup. Useful for testing a feature before integrating it, validating a one-off result or showing the product to a colleague.
Which option to choose
Use the Google Sheets add-on for collaborative work, the CSV/Excel tool for one-off large batches, no-code automations for recurring workflows, and the feature pages forms for quick tests.
What programming languages does Namsor support, and does it provide SDKs and a CLI?
Namsor provides official SDKs and a CLI for developers, all open-source on GitHub.
Supported languages
Native SDKs are available in four languages:
- Java
- Python
- JavaScript
- Go (Golang)
For languages without an official SDK, the Namsor REST API can be called directly from any language that supports HTTP requests.
SDKs
Each SDK wraps the Namsor REST API with typed methods, authentication handling and batch support, making integration straightforward in your application's data flow.
CLI (command-line tool)
Run name analyses from your terminal without writing code. Useful for quick tests, scripted pipelines and server-side automation.
How they're built
Namsor SDKs are generated through OpenAPI Generator from the official API specification. This guarantees consistency across languages and automatic updates when the API evolves.
Installation
Install via standard package managers:
- Java: Maven or Gradle (
com.namsor:namsor-sdk2) - Python:
pip install namsor - JavaScript:
npm install namsor - Go:
go get github.com/namsor/namsor-go-sdk
Source code and documentation
All SDKs and the CLI are publicly available on the Namsor GitHub organization. Learn more about Namsor developer tools.
Is there API documentation?
Yes. Namsor publishes complete, interactive API documentation with code examples, endpoint references and authentication guides. Read the Namsor API documentation.
What's documented
Every endpoint across all features is documented (gender, origin, ethnicity, country of residence, US race, Indian name, split name, name type, phone number format), with request/response schemas, error codes and rate limits.
Code examples
Ready-to-copy snippets in JavaScript, Python, Java and Shell (curl) for each endpoint.
API details
- Base URL:
https://v2.namsor.com/NamSorAPIv2 - Current version: 2.0.21
- Authentication: API key (header-based)
- Format: JSON
- Batch support: up to 100 names per POST request
Advanced topics covered
- Learnable mode: opt-out of machine learning on your data
- Anonymized mode: irreversibly anonymizes names with SHA before logging, so no raw name data is stored
- API Explainability: detailed reasoning output in Python logic
- API Enumerators: full list of return values for alphabets, countries, diasporas, castes, religions, US races and name types
How fast is the Namsor API per name?
A single name can be processed in under 30ms, and a batch of several hundred names typically completes in between 80ms and less than 500ms, depending on name complexity. Namsor is built for high-throughput name analysis at scale, with batch endpoints, persistent connections and a tuned inference layer.
Batch processing: 80ms to less than 500ms for hundreds of names
When you send a batch of names through a POST endpoint, Namsor processes them in parallel server-side and returns the full response in between 80ms and less than 500ms for several hundred names, depending on name complexity. Names in non-Latin scripts or with ambiguous structure may sit at the higher end of the range. This is the recommended mode for production workloads.
GET vs POST endpoints
- GET endpoints: process one name per request, typically in under 30ms. Useful for quick tests, integration debugging and very low-volume workflows.
- POST endpoints: process up to 100 names per request. Use these for production, bulk enrichment and batch pipelines.
How to maximize throughput
- Use POST batch endpoints instead of looping GET calls
- Run parallel batch requests if you need to process millions of names
- For very large workloads, the CSV/Excel tool handles millions of names per file, compared to 500,000 for the Google Sheets add-on
Why this matters
For comparison, large language models (LLMs) typically take 1 to 5 seconds per name for similar classification tasks. At Namsor's batch speed, processing 1 million names takes minutes, not days.
Pricing & plans
Is Namsor free to use?
Yes. Every Namsor account starts with 2,500 free credits per month, renewable every month, with no credit card required.
What you can do with 2,500 credits
The number of names you can analyze depends on the feature you use:
- 2,500 names for gender detection, name splitting or name type recognition (1 credit each)
- 250 names for origin, country of residence, US race or Indian name analysis (10 credits each)
No time limit on the free tier
Credits renew every month automatically. You can use Namsor for free indefinitely within the free quota.
Upgrade when you need more
Paid plans start at $19/month and unlock larger quotas, lower per-credit costs and premium features. See Namsor pricing plans.
How does Namsor pricing work?
Namsor uses a credit-based system: every name analysis consumes a defined number of credits, from 1 to 50 depending on the feature.
Credit cost by feature
- 1 credit: simple analyses (gender, name splitting, name type)
- 10 credits: mid-tier analyses (origin, country of residence, US race, Indian name)
- 20 credits: advanced analyses (ethnicity/diaspora)
Two ways to pay
- Monthly subscription (recommended): includes a monthly credit quota at a 30% discount versus one-time purchases. Plans range from Free (2,500 credits) to Enterprise (10 million credits).
- One-time credit packs: purchase credits as needed. Credits remain valid for 120 days.
Smart deduplication
On Ultra, Mega and Enterprise plans, repeated names in the same batch are only charged once (up to 10 or 20 times per duplicate), reducing costs significantly on large customer databases.
No lock-in
All subscriptions are monthly with no commitment. You can upgrade, downgrade or cancel anytime. See full pricing details and compare plans.
What happens if I exceed my monthly credits?
On a paid Namsor subscription, you continue to use the API without interruption. Additional credits are automatically billed at the end of the current billing period, at a per-credit rate that depends on your plan.
Additional credit pricing by plan
- Free: up to 200,000 additional credits/month at $0.005 per credit
- PRO: up to 500,000 additional credits/month at $0.003 per credit
- ULTRA: up to 2 million additional credits/month at $0.002 per credit
- MEGA: up to 10 million additional credits/month at $0.001 per credit
- ENTERPRISE: up to 100 million additional credits/month at $0.0005 per credit
Larger plans offer lower per-credit rates, so heavy users benefit from economies of scale.
Stay in control: soft and hard limits
Two configurable limits let you control exactly how much you want to spend on additional credits:
- Hard limit: caps your total monthly consumption. Once reached, your API key is automatically disabled until the next billing cycle. This prevents unexpected billing.
- Soft limit: a warning threshold. Once reached, you receive a notification email but the API continues to work. Useful to get early alerts without blocking production.
Both limits can be adjusted anytime in the Plan management section of your account.
Need more than your plan allows?
To increase your hard limit beyond your plan's maximum additional credits, contact the Namsor team. We can also help you choose the most cost-effective plan for your expected volume.
Do credits roll over, and how long are they valid?
Credit validity depends on whether you are on a monthly subscription or bought credits one-time.
Subscription credits
Subscription credits do not roll over from month to month. Each billing cycle gives you a fresh allocation that must be used within that month. Unused credits expire at the end of the cycle and the next month starts with the full plan allocation. This keeps pricing simple and predictable.
One-time credit purchases
One-time credit purchases are valid for 120 days from the date of purchase. You can use them at your own pace within that window. If you exhaust them before 120 days, you can buy more credits at any time. The 120-day validity starts fresh with each new purchase.
What happens when you downgrade your plan
When you downgrade your subscription plan, any credits you have not yet used from the previous plan are preserved and added to your account. They remain valid until the end of the original subscription period. After that date, the new (lower) plan quota applies normally.
This means a downgrade never causes you to lose credits you have already paid for.
In summary
- Subscription credits: reset each month, no carryover
- One-time credits: valid for 120 days
- Plan downgrade: unused credits preserved until the end of the old billing cycle
Why does Diaspora analysis cost more credits than Gender detection?
Credit cost reflects the computational complexity of each prediction, not an arbitrary price. The more possible outcomes a model has to choose from, the more resources each prediction requires.
Gender detection: a binary outcome
Gender detection classifies a name into two possible outcomes (male or female on a continuous scale). The underlying model is compact, trained on a simpler decision surface and returns a result quickly. Cost: 1 credit per name.
Diaspora: 139 cultural groups
Diaspora analysis classifies a name into 139 cultural groups. Each group carries distinct linguistic, morphological and cultural signals that the model must disentangle from potentially overlapping patterns. The model is larger, the training data is more diverse, and each prediction requires evaluating many possible outcomes simultaneously. Cost: 20 credits per name.
How pricing scales across features
The same logic applies across all Namsor features:
- 1 credit: simple classifications with binary or short taxonomies (Gender, Split Name, Name Type)
- 10 credits: mid-complexity features with national-level taxonomies (Origin with 131 countries, Country of Residence with 247 territories, US Race with 6 Census categories, Indian Name classifications)
- 11 credits: combined analysis (Phone Number Format, which parses a name and a phone number together)
- 20 credits: granular cultural classification (Diaspora with 139 groups)
- 50 credits: cross-entity analysis (Names Corridor, which analyzes the interaction between two names for cross-border dynamics)
Pricing is proportional to what you get
Choosing Diaspora instead of Gender means choosing a much deeper analysis, not the same analysis at a higher price. The extra credits reflect the extra signal, granularity and infrastructure required to deliver a classification across 139 groups instead of 2.
Is there a discount for researchers or academic use?
Yes. Namsor runs a dedicated research support program with discounts ranging from 40% to 99% on name analysis credits, designed to make rigorous onomastic methods accessible to academic teams, PhD students and research projects.
What determines your discount
The exact discount depends on several factors:
- Team: size and composition of the research group
- Project: nature, scope and scientific ambition of the research
- Publication target: the journals or conferences where results will appear
Larger projects destined for high-impact peer-reviewed venues typically qualify for the deepest discounts.
Hands-on methodology support
For complex research projects, Namsor's team can provide methodology support at no additional cost. This includes:
A proven track record in academia
Namsor is already used in over 600 academic publications and cited in more than 1,200 Google Scholar results. This includes studies published by Harvard, Columbia, Yale, HEC and in top-tier venues such as Nature, The Lancet Global Health, PLOS ONE, British Journal of Surgery, Journal of Medical Internet Research, Scientometrics (Springer), Journal of the Medical Library Association and Internal and Emergency Medicine (Springer).
Elsevier and Springer Nature use Namsor internally for bibliometric gender analyses, including for the European Commission's SheFigures reports.
How to apply
Contact the Namsor team with a short description of your project, team, methodology and target publication. You will usually receive a tailored offer within a few working days.
Technology & models
What is Namsor V2?
Namsor V2 is the current production version of Namsor, available at namsor.app. It is the product used by researchers, enterprises and institutions worldwide.
A morphological engine with statistical refinement
Namsor V2 is built on a specialized morphological model that analyzes the internal structure of names letter by letter, detecting cultural, linguistic and geographic signals embedded in roots, prefixes and suffixes. A statistical layer refines the output probabilities based on a proprietary dataset of 5 billion unique names.
Purpose-built for onomastics
Unlike general-purpose tools, Namsor V2 is dedicated entirely to the analysis of proper names. It covers gender detection, geographic origin, ethnicity and diaspora, country of residence, US race/ethnicity, Indian name classification, name parsing, name type recognition and phone number formatting.
Transparent and privacy-first
Namsor V2 includes an Explainability API that details the reasoning behind each classification in Python, as well as anonymized mode (SHA encryption) and opt-out from machine learning.
Independently validated
Namsor V2 is the version benchmarked by Elsevier, Harvard, the University of Chicago, Uber and in multiple peer-reviewed studies. It is cited in over 1,200 Google Scholar publications.
What is Namsor V3 and how does it differ from V2?
Namsor V3 is the next generation of Namsor's name analysis platform, available on request at namsor.ai. It represents a fundamental architectural evolution from V2.
From one model to three, on a massively expanded dataset
Namsor V2 relies on a single morphological model trained on 5 billion unique names. Namsor V3 moves to three models combined in a single pipeline, trained on a dataset massively expanded to 13 billion unique names:
- Improved morphological model: letter-by-letter analysis of name structure (roots, prefixes, suffixes), in a deeply reworked version compared to V2
- Statistical model (new): a brand new layer in V3, refining probabilities based on the largest proprietary name dataset in the industry, now expanded to 13 billion unique names
- Semantic model (new): a large language model that captures contextual and cultural meaning beyond what morphology and statistics alone can detect
Why add a semantic model?
Morphological and statistical models excel at precision and consistency but can miss contextual nuances that a semantic model captures. For example, understanding that a name is associated with a specific historical period, social class or regional dialect. The semantic layer adds this depth without sacrificing the speed, privacy and determinism that define Namsor.
What stays the same
Namsor V3 retains the core principles that make Namsor trusted by researchers and institutions: deterministic results, sub-second latency, anonymizable data and deactivatable machine learning.
What V3 unlocks beyond V2
Namsor V3 is a separate platform with its own API, opening capabilities that V2 does not offer:
- Name embeddings: numerical vector representations of names for integration into your own machine learning models
- Custom models: purpose-built solutions for fake name detection, fraud detection, romance scam detection, name transliteration and more
- Model enhancement: use Namsor's name intelligence to improve your own predictive models (churn prediction, customer lifetime value, forecasting)
Available on request
Namsor V3 is accessible at namsor.ai. Contact Namsor to discuss access and migration from V2.
Privacy, ethics & compliance
Is Namsor GDPR, CCPA and EU AI Act compliant?
Yes. Namsor is fully compliant with the three major regulatory frameworks governing data protection and artificial intelligence.
GDPR (EU General Data Protection Regulation)
Namsor applies data minimization principles, collecting only what is essential for model operation. Users retain full control over their data: the learnable option can be deactivated to prevent data from contributing to model training, and anonymized mode encrypts name data using SHA before processing. A Data Processing Agreement (DPA) is available for download.
CCPA (California Consumer Privacy Act)
Namsor's privacy architecture meets CCPA requirements for transparency, data access and deletion rights. The same anonymization and opt-out mechanisms that ensure GDPR compliance also satisfy CCPA obligations.
EU AI Act
Namsor is designed for compliance with the EU AI Act's requirements on algorithmic transparency and fairness. The Explainability API provides a detailed breakdown of how each classification is produced, enabling full traceability of origin, gender and ethnicity estimations. This level of transparency allows organizations to audit Namsor's reasoning and demonstrate compliance in regulated use cases.
What is anonymized mode and how does Namsor encrypt name data?
Namsor gives users full control over how their data is stored and used, through two independent privacy settings available in the account page or via API.
Anonymized mode
When set to true, all processed names are irreversibly hashed using SHA encryption before being stored. The original name cannot be recovered from the hash. Namsor only retains the hashed version to verify deduplication (smart processing), ensuring you are not billed multiple times for the same name. The smart processing for redundant queries works even with anonymized data.
Learnable mode
When set to false, the data processed through your API key does not feed Namsor's machine learning algorithm. Your data is used for classification only and does not contribute to model improvement.
Storage encryption
All data logs, whether anonymized or not, are secured using AES encryption before being stored.
Both settings are independent
You can disable machine learning while keeping full data logs, or enable anonymization while allowing machine learning. The two controls can be combined to match your organization's privacy requirements.
Is name analysis with Namsor privacy-safe compared to using LLMs?
Yes. Namsor is significantly more privacy-safe than sending names to a general-purpose LLM and offers controls that most LLM providers don't.
The problem with LLMs for name analysis
When you send names to a general-purpose LLM, the data typically:
- Leaves your infrastructure and travels to a third-party provider
- May be retained for model training, depending on the provider's terms
- Is processed by a model that wasn't designed for name analysis and has no dedicated privacy controls
- Is often logged in prompt history, accessible to employees of the LLM provider
How Namsor is different
- Purpose-built for name analysis. Namsor only processes names, not broader personal data or context. The scope of data exposure is minimal.
- Opt-out of machine learning. Set
learnable=falseand your data never feeds Namsor's algorithm. Your names are used for classification only. - Anonymized mode. Set
anonymized=trueand Namsor irreversibly hashes names with SHA before logging. No raw name data is stored. - AES encryption. All data logs are encrypted with AES at rest.
- Data Processing Agreement. A standard DPA is available for download and covers your GDPR and CCPA obligations.
Bottom line
Sending names to a general LLM exposes more data, with fewer controls. Namsor limits exposure to names only and gives you explicit controls over storage, training and anonymization.
What is Namsor's API Explainability feature and how does it ensure transparency?
API Explainability is a Namsor feature that returns a detailed explanation of how the AI arrived at each classification, in the form of a closed mathematical formula including both training data features and the complete model logic.
What it returns
When enabled, the API response includes an additional field containing the AI's reasoning as executable Python code. This code shows exactly which features, weights and decision paths produced the result for that specific name.
Why it matters for compliance
The EU AI Act requires bias detection and correction mechanisms in high-risk AI systems. Namsor's Explainability output can be stored as audit evidence, documenting how each inference was made. This is particularly valuable for regulated industries (finance, insurance, recruitment, healthcare) where decisions based on demographic inference must be defensible.
How it's delivered
The explanation is returned as Python logic. Namsor recommends removing tabs and line breaks for clean execution.
Cost and activation
- Additional cost: 50 credits per name processed
- Contact the Namsor team to enable Explainability on your account
- Add the header
X-OPTION-EXPLANABILITY: trueto your requests - Namsor requires signed documentation and an NDA before activation, to protect the intellectual property of the underlying model
Who it's for
Teams building high-risk AI systems, conducting algorithmic audits, preparing AI Act compliance documentation, or requiring detailed traceability for internal governance.
Use cases
How can I detect gender and ethnicity bias in a dataset using name analysis?
Name analysis lets you measure the gender and ethnic composition of any dataset where names are available, even when self-reported demographic data is missing or incomplete. Namsor powers bias detection in CRMs, hiring pipelines, scientific publications, customer bases and editorial boards.
The typical workflow
- Export your dataset (CSV, Excel, Google Sheet or API-accessible database)
- Run Namsor on the name column using the relevant feature: Gender, Ethnicity, Origin, or US Race
- Aggregate the results by the dimension you care about (team, department, year, region)
- Compare distributions against your reference benchmark (national population, industry average, target representation)
What you can measure
- Gender representation: share of women vs men in hiring shortlists, promotions, authorship, customer base, editorial boards
- Ethnic representation: share of each cultural background in the same contexts
- Regional origin: geographical diversity of a population
- US race ethnicity distribution: for US-specific reporting aligned with Census categories
Why name analysis is the right tool
Self-reported demographic data is often missing, outdated or inconsistent. Name analysis reconstructs the distribution retroactively on any historical dataset, without asking individuals to disclose sensitive information. Namsor's output is aggregated and statistical, never used to label individuals.
Privacy and scope
Namsor returns probabilities, not certainties. Use name-based inference at the group level (statistics, reporting, audits), not to make individual decisions about people. This is both an ethical best practice and an EU AI Act requirement for systems that rely on inferred attributes.
Can Namsor detect fake names and bots?
Yes. Namsor detects fake names with two levels of accuracy, depending on your requirements and volume.
Basic level: Name Type Recognition combined with Ethnicity
Namsor flags potentially fake names by combining its Name Type Recognition feature (anthroponym, brand, toponym, pseudonym classification) with Ethnicity analysis. This combined approach delivers solid accuracy for screening, risk scoring and exploratory detection work, and is accessible on request by contacting the Namsor team.
Expert level: Namsor V3 embeddings and custom models
For production-grade fake name detection, Namsor V3 provides name embeddings and custom models that capture the fine morphological, phonetic and cultural patterns that distinguish real names from generated or synthetic ones. Two options are available:
- Embeddings: plug Namsor V3 embeddings (several thousand dimensions per name) into your own fraud detection model to significantly improve its performance
- Custom models: have the Namsor team build a fake name detection model trained on your data, delivered as an API endpoint
Continuous improvement with feedback loops. Custom V3 models can be enhanced with a feedback loop: as your team labels detected names as true or false positives, the model retrains on this signal and improves over time. This adaptive approach keeps detection accuracy high even as fraud patterns evolve.
Proven accuracy
In a test on real data from one of the global leaders in money transfer, a Namsor V3 custom model reached over 94% accuracy in detecting fake names.
Why it works
Fake names, bots and synthetic profiles leave linguistic traces: improbable phoneme sequences, cross-cultural inconsistencies, low-frequency morphological patterns. Namsor V3 was trained on 13 billion names and captures these signals in its embeddings, outperforming generic anti-fraud machine learning that relies on behavioral or network-based features only.
Who it's for
Trust and safety teams, fraud prevention units, KYC and onboarding flows, marketplaces, social platforms, neobanks, money transfer and remittance companies.
Get started
To discuss fake name detection for your use case, contact the Namsor team. To learn more about Namsor V3 embeddings and custom models, visit namsor.ai.
How can name analysis improve KYC and fraud prevention?
Name analysis strengthens KYC and fraud prevention by enriching identity data, scoring risk and flagging suspicious patterns at multiple points in the customer journey.
Where name analysis fits in a KYC workflow
- Onboarding: verify that a submitted name matches expected patterns for the declared country, language and cultural background. Catch inconsistencies before an account is opened.
- Risk scoring: incorporate name-derived features (origin, ethnicity, cultural consistency) into your risk engine to improve the signal without requiring additional PII.
- Ongoing monitoring: re-analyze names periodically to detect identity manipulation or gradual drift in a customer profile.
- Sanctions and PEP screening support: normalize and transliterate names across alphabets before matching them against watchlists, reducing false negatives on non-Latin names.
Fraud pattern detection
Namsor helps fraud teams detect patterns associated with several types of financial crime, including account takeover attempts, impersonation, romance scams and authorized push payment (APP) fraud. In these cases, analyzing the names involved in a transaction, alongside other risk signals, reveals anomalies that purely behavioral or network-based fraud models miss.
To protect the integrity of these detection systems, Namsor does not publish the specific linguistic or statistical markers used in its fraud models. Customers receive these details under NDA during integration.
Real-world use
Several global leaders in money transfer and remittance use Namsor to strengthen their fraud prevention stack, benefiting from Namsor's coverage of 22 alphabets, 99.99% classification rate and V3 custom models trained on industry-specific data.
Why name analysis complements traditional fraud models
Behavioral models (login patterns, device fingerprints, transaction velocity) detect what someone is doing. Name analysis helps detect who someone is claiming to be. Combined, they reduce false positives and catch sophisticated identity-based fraud that behavioral signals alone miss.
Get started
To discuss KYC and fraud prevention for your specific stack, contact the Namsor team. For production-grade custom fraud detection models, visit namsor.ai.
How can name analysis power marketing segmentation and audience analytics?
Name analysis lets marketing teams segment audiences, personalize campaigns and analyze customer or influencer bases by cultural origin, language, gender and ethnicity, all from data that most organizations already have: names.
International marketing segmentation
Split your contact database, email list or CRM by cultural origin, language group or country of residence to run targeted campaigns. Personalize message tone, language, imagery and offers by segment. Allocate media budget based on where your real audience is, not where you assumed it would be.
Audience analytics on your existing base
Understand the real composition of your customer base, newsletter subscribers, app users or community members. Namsor reconstructs the demographic distribution retroactively, even when self-reported data is missing or incomplete. Typical questions you can answer:
- What share of my customers comes from each cultural background?
- How does gender distribution vary across my product lines?
- Which regions are over- or under-represented versus my target market?
- How has my audience composition evolved over the past 3 years?
Influencer and partnership mapping
For influencer marketing, brand partnerships or community programs, Namsor helps identify and group creators by cultural origin, language and gender. This enables:
- Building diverse influencer rosters that reflect your target markets
- Matching creators to campaigns by linguistic or cultural fit
- Measuring the demographic reach of an influencer's follower base (when follower names are accessible)
Integration with your stack
Namsor connects to CRMs and marketing platforms through the Google Sheets add-on, CSV/Excel tool, Zapier, Make, n8n, or the REST API. Analyses run in real time on form submissions or in batch on existing databases.
Privacy and compliance
Use name-based segmentation at the aggregate level for campaign strategy, not to make individual decisions about consumers. Namsor is GDPR and CCPA compliant and offers anonymized mode for privacy-sensitive workflows.
How is Namsor used for AI bias detection and EU AI Act compliance?
The EU AI Act requires providers and deployers of high-risk AI systems to detect, document and mitigate discriminatory bias. Namsor helps on both sides of this requirement: auditing existing AI systems for bias, and documenting AI decisions with a verifiable audit trail.
Auditing an existing AI system for bias
When an AI system (hiring tool, credit scoring model, insurance pricing engine, fraud detector) makes decisions about individuals, the AI Act requires evidence that outcomes are not systematically biased against protected groups. Namsor lets you test this by:
- Running names from your training data or production logs through Namsor to infer gender, origin or ethnicity at the aggregate level
- Segmenting your AI system's decisions (accept/reject, approve/deny, high/low score) by these inferred demographic groups
- Measuring disparities in outcomes across groups and comparing them against fairness thresholds (disparate impact ratio, statistical parity, equalized odds)
This approach fits directly into the bias detection and correction duties defined in AI Act Article 10 (data governance) and Article 15 (accuracy, robustness and cybersecurity).
Documenting AI decisions with Explainability
When your own AI system uses Namsor for inference, the API Explainability feature returns the complete reasoning behind each classification as executable Python code, including training features and model weights. This output can be stored as verifiable audit evidence for every decision, satisfying the transparency requirements of AI Act Article 13 (transparency and information to users).
The sensitive data exception
The EU AI Act introduces a specific exception in Article 10(5): providers may process special categories of personal data (ethnic origin, gender identity) specifically to detect and correct bias in high-risk AI systems, as a matter of substantial public interest. Namsor's name-based inference is designed to support this lawful use case while respecting data minimization principles.
Industries using Namsor for AI Act preparation
- Finance and insurance: bias audits on credit scoring, pricing and underwriting models
- Recruitment and HR: fairness testing on CV screening and candidate ranking algorithms
- Healthcare: equity analysis of clinical decision support tools
- Public sector: audits of algorithmic decision systems used by administrations
Critical usage principle
Use Namsor's output only at the group level for bias detection and correction. Do not use name-based inference to make individual decisions about people: this would defeat the purpose of the AI Act and create new discrimination risks.
Get started
For Explainability activation and AI Act documentation support, contact the Namsor team. A signed NDA is required before Explainability can be enabled.
How can Namsor improve data quality and enrich CRM databases?
Namsor helps data teams clean, validate and enrich customer databases at scale by turning raw names into structured, actionable attributes: split first/last name, detect invalid entries, infer gender, origin, country of residence, ethnicity and more.
Data cleaning and validation
- Detect invalid entries: the Name Type Recognition feature flags non-person entries in your name fields (brand names, placeholders like "TEST" or "Customer", toponyms, nonsense strings). Filter these out before they pollute downstream processes.
- Split full names: when first name and last name are merged in a single field, the Split Name feature separates them correctly, including for names that don't follow Western conventions.
- Normalize across alphabets: Namsor handles names in 22 writing systems, reducing data inconsistencies in international databases.
Data enrichment
Add high-value attributes to every contact in your CRM:
- Gender: populate a gender field when it's missing, for analytics or personalization
- Origin: country of cultural origin (131 countries supported)
- Country of residence: infer where a contact currently lives (247 countries supported)
- Ethnicity / Diaspora: cultural background for segmentation (139 ethnicities supported)
- US Race: for US-specific reporting aligned with 6 Census categories
Advanced deduplication with Namsor V3
Traditional CRM deduplication fails on name variants. Namsor V3 custom models use name embeddings to compute semantic similarity between name variants, detecting duplicates that exact-match algorithms miss:
- Abbreviations and reordering: "Jean Dupont", "J. Dupont", "Jean Du Pont" and "Dupont, Jean" recognized as the same person
- Accents and diacritics: "François", "Francois" matched together
- Typos and misspellings: "Catherine", "Catherien" and "Cathrine" matched despite data entry errors
- Transliteration variants: "Mohammed", "Mohamed" and "Muhammad" recognized as the same Arabic name; "Владимир", "Vladimir" and "Wladimir" identified as the same name written in different scripts
This is particularly valuable for:
- Legacy databases: reconcile historical contacts entered under inconsistent formatting conventions
- International CRMs: unify customer records across languages, scripts and regional naming conventions
- Post-merger data consolidation: merge customer bases from multiple sources without losing or duplicating records
Advanced deduplication is available through custom V3 models. To discuss your specific use case, contact the Namsor team or visit namsor.ai.
Privacy and usage principle
Enriched attributes are statistical inferences, not certified facts. Use them at the aggregate level for segmentation, analytics and reporting. Avoid using inferred attributes to make individual decisions about consumers. Namsor is GDPR and CCPA compliant and offers anonymized mode for privacy-sensitive workflows.
How do international organizations use name analysis for migration and diaspora mapping?
International organizations use name analysis to map diasporas, track migration flows and estimate the size and composition of populations when traditional census or registration data is missing, incomplete or outdated. Namsor has powered several published studies for UN agencies, the World Bank and city governments.
The core workflow
- Collect name data from professional sources: researcher databases (ORCID), labor market intelligence platforms (LinkedIn, job boards), public registries or administrative data
- Run Namsor to infer origin, ethnicity or diaspora membership at the aggregate level
- Apply refinement filters: exclude false positives from related cultural groups (e.g. distinguish Brazilian from Portuguese or Angolan names), add keyword filters on cities or institutions of origin
- Enrich with professional and educational attributes: job titles, degree level, industry, employer, field of study
- Aggregate by geography or professional segment: measure the share of each diaspora group by country, region, city, industry or institution
What you can measure
- Size of a diaspora abroad: how many people of origin X currently live in country Y (e.g. the IOM study identified 26,945 researchers of Armenian origin living outside Armenia)
- Professional and educational composition: degree levels, fields of study, industries, seniority and employer types within a diaspora
- Geographic concentration: where diaspora communities settle within a host country, down to metropolitan area or neighborhood level
- Skills and industry specialization: which sectors a diaspora is concentrated in (healthcare, engineering, research, tech), enabling targeted knowledge transfer programs
Published examples
- IOM (International Organization for Migration) mapped the Armenian diaspora in the United States and France by running Namsor onomastic analysis on the ORCID researcher database and ZoomInfo professional profiles, identifying 26,945 scientists of Armenian origin living outside Armenia (read the study). Namsor has also powered IOM diaspora mapping projects for Georgia and Azerbaijan.
- United Nations (ECLAC) used Namsor for its study Tracking the digital footprint in Latin America and the Caribbean, applying name-based inference to understand population flows in the region (read the study).
- Boston Planning & Development Agency mapped the Brazilian scientific diaspora in Greater Boston by combining Namsor's diaspora and origin models with labor market data, applying filters to distinguish Brazilian professionals from other Lusophone groups (Portuguese, Angolan, Cabo Verdean) (read the report).
Why name analysis is effective for this use case
Migration and diaspora research often faces the problem of missing self-reported data: people don't always register their ethnicity or origin in administrative systems, census coverage varies, and second-generation migrants are often invisible in traditional statistics. Name analysis reconstructs the demographic picture retroactively from data that is already available (names in professional registries, authorships, public records), without requiring new data collection.
Privacy and aggregation principle
Diaspora mapping with Namsor is always done at the aggregate level (populations, neighborhoods, professional groups), never to identify or track specific individuals. This is both an ethical requirement and a GDPR/CCPA compliance best practice.
Can Namsor auto-detect language or salutation from a contact name?
Yes. By combining gender, origin and phone number inference, Namsor lets you auto-detect language, salutation, phone prefix and country from basic contact data, without asking the user to fill in additional fields.
Salutation and language from a name
- Gender inference determines whether the contact is male or female, enabling the correct title (Mr / Mrs / Ms)
- Origin or country of residence inference identifies the contact's cultural and linguistic background
- Combine both to generate a localized salutation: "Herr" for a German male, "Madame" for a French female, "Señor" for a Spanish male, "Dear Ms" for an English-speaking female
Language detection from a name
Namsor's origin feature returns the most likely country of cultural origin from a name. Map this country to its primary language and you have a reliable language preference signal, without asking the contact to fill in an additional field.
Phone prefix and country from a name and phone number
When a contact provides a phone number alongside their name, Namsor's Phone Number Format feature identifies the international phone prefix, validates the number structure and infers the country code. This is particularly useful when:
- Users enter a phone number without the international prefix
- You need to validate that a phone number is consistent with the contact's name origin (fraud signal if mismatched)
- You want to auto-route calls or SMS to the correct regional team
Where to use this
- Contact forms: auto-populate salutation, language and phone prefix fields as soon as the name and number are entered
- Email personalization: generate properly gendered and localized greetings at scale
- CRM onboarding: enrich new contacts with language and country fields for routing to the right support or sales team
- Call center routing: use the inferred language and phone country to connect callers to agents who speak their language
- Direct mail and print: produce correctly addressed and titled correspondence across markets
Cost
Gender costs 1 credit. Origin costs 10 credits. Phone Number Format costs 11 credits. For a full contact enrichment (salutation + language + phone validation), running all three features on a single contact costs 22 credits.
Custom models
Does Namsor build custom name analysis models?
Yes. Namsor builds custom AI models trained on your data and your specific classification needs, delivered as a dedicated API endpoint on the Namsor V3 platform.
What a custom model is
A custom model extends Namsor's standard features beyond the built-in classifications (gender, origin, ethnicity). Instead of a generic taxonomy, the model is trained to answer a question specific to your business:
- "Is this name likely fraudulent?"
- "What caste group does this Indian name belong to?"
- "Is this name transliterated from Mandarin?"
- "Are these two name records the same person?" (cross-script and cross-format deduplication)
- "Does this name match an entry on a sanctions or PEP list?" (sanctions screening optimization with fuzzy matching across alphabets and transliteration variants)
- "Which prospects in my database look most like my top customers?" (lookalike and audience expansion using vector search with cosine similarity on name embeddings to identify prospects with similar cultural and demographic profiles)
Types of custom models
- Classification models: assign names to categories specific to your domain (caste, religion, tribe, linguistic group, customer segment)
- Detection models: identify patterns in names (fake names, bots, synthetic profiles)
- Matching models: compare name records to detect duplicates across formats, scripts and transliterations, or to optimize sanctions and PEP list screening with fuzzy matching
- Scoring models: assign a probability or risk score to each name based on your criteria
- Lookalike models: use vector search (cosine similarity) on name embeddings to find prospects with cultural and demographic profiles similar to your best customers
- Transliteration models: convert names between writing systems (e.g. Mandarin to Latin, Arabic to Latin)
How the process works
- Scoping: define the classification objective and the target taxonomy with the Namsor team
- Data exchange: share labeled training data under NDA (Namsor provides guidance on data format and volume)
- Model training: Namsor trains a custom model using V3 embeddings (several thousand dimensions per name) and your labeled data
- Validation: review precision, recall and edge cases on a holdout test set
- Delivery: the model is deployed as a dedicated API endpoint, ready for production integration
- Continuous improvement: optionally, set up a feedback loop where your team labels predictions as correct or incorrect, and the model retrains periodically on this signal
What you get
- A dedicated API endpoint on the Namsor V3 platform (namsor.ai)
- Trained on your data, tuned to your taxonomy
- Batch and real-time inference
- Optional feedback loop for continuous improvement
- Documentation and integration support
Contact the Namsor team to discuss your custom model requirements.
What industries has Namsor built custom AI models for?
Namsor has built custom AI models for organizations across several industries. While most engagements are covered by NDAs, the following examples illustrate the range of domains where Namsor V3 embeddings and custom models deliver results.
International organizations and development
The World Bank and the IOM (International Organization for Migration) commissioned custom models for Indian names: caste group classification, religion estimation and sub-region identification, enabling research on internal migration and social inequality.
Namsor has also built models with different levels of geographic and ethnic granularity depending on the client's needs, including ethnicity classification in Australia and regional segmentation models adapted to specific national contexts.
Financial services and money transfer
Custom V3 models detect fake names, fraudulent identities and authorized push payment (APP) fraud patterns in transaction flows. In tests on real data from one of the global leaders in money transfer, a custom model reached over 94% accuracy in detecting fake names.
Transportation and aviation
Custom models built on name embeddings power passenger flow forecasting at international airports, using the cultural and geographic profile of passenger names to improve demand predictions by route and season.
Global identity verification
Namsor has built bidirectional transliteration models for a global identity verification provider: Latin to Mandarin and Mandarin to Latin, Latin to Kanji and Kanji to Latin. These models power an intelligent name translation engine used in KYC, electronic identity verification and PEP/sanctions screening across writing systems.
Security and intelligence
Custom models support risk analysis and intelligence workflows where name-based demographic inference is a critical signal.
Marketing and social listening
Namsor is currently developing custom models for synthetic account detection on social platforms, helping brands and agencies identify fake profiles and assess the authenticity of online audiences.
Discuss your industry
These examples represent a fraction of what Namsor V3 custom models can do. To explore a model for your specific domain, contact the Namsor team or visit namsor.ai.
