Disappointingly short article

http://mobile.nytimes.com/2016/10/16/jobs/a-linguist-who-cracks-the-code-in-names-to-predict-ethnicity.html

I wanted to know SO MUCH MORE about this.

Replies

1
By EVie
October 17, 2016 9:32 PM

I wonder where they got the data on name demographics in the first place?

2
October 18, 2016 9:10 AM

I wonder if they fed in lists to train the model based on name books' assessments, perhaps? There's also paying people to take a survey with their name and their self-identified racial identity? New York City did use to release first name data with race associated with it, but only a very limited basis.

3
October 18, 2016 3:19 PM

My first guess for the source of the data would be the Census Department.

4
By EVie
October 20, 2016 1:57 PM

Is that data publicly available? I mean, sometimes researchers can get access to government data sets that aren't otherwise available to the public, but usually there are hoops to jump through, and this wasn't an academic researcher, she's in private industry.

5
October 20, 2016 2:40 PM

They probably had to buy the data, and it was very likely in some aggregate, anonymized form, but any other source of the information would have to waste a lot of energy to completely re-invent the wheel: the census data contains exactly the sorts of correlations between name, ethnicity, and residence that the article mentions.