Research Notes: Name Entropy

Additional research notes for the blog post "The Names You Choose Mean More Today Than Ever Before"

Data source: I used the Social Security Administration collection of names used five or more times in a given year. This is the most complete historical collection of American baby names available, but the five-name limit does skew the data by lopping off the long tail of rare names. I don't think that should invalidate the trend I found, though -- if anything, it should undercount the real trend.

Sample size: The larger the number of babies in the SSA sample, the more different names you'd expect to see. That longer long tail should produce greater entropy. But the entropy scores do NOT simply follow the sample sizes. Here's another look at the graph, with sample sizes indicated at 50-year increments:

As you see, the number of babies counted (i.e. SSN applications for people born in America in that year) rose by a factor of 7 from 1909 to the 1959 "baby boom" year, but entropy remained stable. Comparing 1959 to 2009, the number of babies is stable but entropy soared. That points to qualitative changes in naming culture. Another dramatic illustration of the change in name conformity: In the 1959 count, 11,766 different names were used 5 or more times. By 2009, the number of different names was 34,440.