Is your name related to how much you get paid?

Question

Is your name related to how much you get paid? How much more is person AAA paid compared to ZZZ?

Hypothesis

  • Salary is NOT correlated with the alphabetical order of one’s last name. That is professor ABCs get paid similar to professor XYZs.

  • Male and femamle professors are paid the same for the same job.

Key findings

  • The correlation between the alphabetical order of one’s last name and his/her salary is positive and significant. Compared to last names started with B, the professors with other last names earn less, around 1,200 to 3,000 dollars less per year.

  • Female professors get paid 7,700 dollars less than males controlling job title, employer and the alphabetical order of last name.

  • Compared to assistant professor, the salary premium of the Associate Professor and Professor title is 12,000 and 36,000 dollars per year, respectively.

More details and my R code (click CODE to display) follows.

Data source

d<-read_rds("pssd_1996_2017.rds")
d$Calendar.Year<-as.factor(d$Calendar.Year)

The Public Sector Salary Disclosure Act, 1996 (the act) makes Ontario’s public sector more open and accountable to taxpayers. The act requires organizations that receive public funding from the Province of Ontario to make public, by March 31 each year, the names, positions, salaries and total taxable benefits of employees paid 100,000 or more in the previous calendar year. The act applies to organizations such as the Government of Ontario, Crown Agencies, Municipalities, Hospitals, Boards of Public Health, School Boards, Universities, Colleges, Ontario Power Generation, and other public sector employers who receive a significant level of funding from the provincial government. All employers subject to the act are required to submit their disclosure records to their funding ministries.

The data in this study is downloaded from Ontario Public sector salary disclosure website. The data includes the names, positions, salaries and total taxable benefits of public sector employees paid 100,000 or more from 1996 to 2017. The data for each year are downloaded and combined together. The total dataset contains 1152185 observations. The top ten employees witht the highest salary are

arrange(d,desc(Salary.Paid))[1:10,]
Sector Last.Name First.Name Salary.Paid Taxable.Benefits Employer Job.Title Calendar.Year
Hydro One and Ontario Power Generation HANKINSON JAMES F 2475800 9145.85 Ontario Power Generation President/Chief Executive Officer 2008
Hydro One and Ontario Power Generation OSBORNE RONALD 2264311 7956.00 Ontario Power Generation President & CEO 2002
Crown Agencies HAGGIS PAUL 2240008 9551.80 OMERS Administration Corporation President/Chief Executive Officer, OMERS 2007
Hydro One and Ontario Power Generation HANKINSON JAMES F. 2150116 2632.50 Ontario Power Generation President & Chief Executive Officer 2009
Crown Agencies NOBREGA MICHAEL 1926358 77501.71 OMERS Administration Corporation President/Chief Executive Officer, OMERS 2008
Hydro One & Ontario Power Generation PRESTON EUGENE 1892276 2936.80 Ontario Power Generation EVP, Chief Nuclear Officer 2000
Hydro One & Ontario Power Generation OSBORNE RONALD W. 1874103 6534.00 Ontario Power Generation President & CEO 2001
Hydro One and Ontario Power Generation MITCHELL THOMAS 1824000 6768.00 Ontario Power Generation President & Chief Executive Officer 2011
Hydro One and Ontario Power Generation HANKINSON JAMES F 1788719 7572.46 Ontario Power Generation President/Chief Executive Officer 2007
Hydro One & Ontario Power Generation OSBORNE RONALD W. 1767870 6138.00 Ontario Power Generation President & CEO 2000

This study is focused on professors. So only employees with a job title “Assistant Professor”, “Associate Professor”, or “Professor” are included.

In some case, the job titles are abbreviated. For example, Associate Professor could be entered as “Assoc. Prof”, or “Assoc. Professor”. I ensure they mean the same thing.

# save the original job.title as job.title.original
d<-rename(d,Job.Title.Original=Job.Title)
d$Job.Title<-NA

prof.set<-c("Prof.","Prof","PROF", "Professor","Full Professor","Full Prof",
                      "Full Prof.")
for (i in 1: length(prof.set)){
condition <-grepl(prof.set[i],d$Job.Title.Original, fixed=TRUE)
condition2<-grepl("Professional",d$Job.Title.Original, fixed=TRUE)
d$Job.Title[condition & !condition2]<-"Professor"
}

associate.prof.set<-c("Assoc. Prof","Associate Prof.","Assoc. Prof.", "Assoc Prof","AssocProf","AssocProfessor",
                      "Associate Professor","ASSOC. Prof.")
for (i in 1: length(associate.prof.set)){
condition<-grepl(associate.prof.set[i],d$Job.Title.Original, fixed=TRUE)
d$Job.Title[condition]<-"Associate Professor"
}

assistant.prof.set<-c("Assis. Prof","Assistant. Prof.", "Assistant. Prof", "Assistant. Professor",
                      "Assistant Professor","Assistant.Prof","ASST. Prof.")
                      
for (i in 1: length(assistant.prof.set)){
condition<-grepl(assistant.prof.set[i],d$Job.Title.Original, fixed=TRUE)
d$Job.Title[condition]<-"Assistant Professor"
}


# management position
d$ManageP<-"No Mngt"
dean.set<-c("DEAN", "Dean","Dean.","Associate Dean", "AssocDean","AssociateDean", "Associ Dean","Associ.Dean")
for (i in 1: length(dean.set)){
condition<-grepl(dean.set[i],d$Job.Title.Original, fixed=TRUE)
d$ManageP[condition]<-"Dean"
}

# associate.dean.set<-c("Associate Dean", "AssocDean","AssociateDean", "Associ Dean","Associ.Dean")
# for (i in 1: length(associate.dean.set)){
# condition<-grepl(associate.dean.set[i],d$Job.Title.Original, fixed=TRUE)
# d$ManageP[condition]<-"Associate Dean"
# }

#Ass't. Dean, Exec. Dean

depart.hd.set<-c("Department Head", "Dept.Head","Chair","chair")
for (i in 1: length(depart.hd.set)){
condition<-grepl(depart.hd.set[i],d$Job.Title.Original, fixed=TRUE)
d$ManageP[condition]<-"Department Head"
}
d$ManageP<-factor(d$ManageP,level=c("No Mngt","Department Head","Dean"))

d <- within(d, ManageP <- relevel(ManageP, ref = "No Mngt"))
#Chair


# director.set<-c("Dir", "Dir.","Acting Dir","Director")
# for (i in 1: length(director.set)){
# condition<-grepl(director.set[i],d$Job.Title.Original, fixed=TRUE)
# d$ManageP[condition]<-"Director"
# }

I also generated new variables Last.Name.Start. It is a factor indicates whether someone’s surname initial is in A-E, F-J, K-O, P-T, or T-Z. I also generated variables indicating the initial of the employee’s last name and first name.

d<-d %>% 
  mutate(First.Name.start=substr(First.Name,1,1),
         Last.Name.start=substr(Last.Name,1,1)) %>% 
  filter(First.Name.start %in% c(LETTERS) &
           Last.Name.start %in% c(LETTERS))     %>% 
  mutate(First_Name=case_when(First.Name.start %in% LETTERS[1:5] ~ "A-E",
                           First.Name.start %in% LETTERS[6:10] ~ "F-J",
                           First.Name.start %in% LETTERS[11:15] ~ "K-O",
                           First.Name.start %in% LETTERS[16:20] ~ "P-T",
                           First.Name.start %in% LETTERS[20:26] ~ "T-Z"
                           ))                   %>% 
  mutate(Last_Name=case_when( Last.Name.start %in% LETTERS[1:5] ~ "A-E",
                           Last.Name.start %in% LETTERS[6:10] ~ "F-J",
                           Last.Name.start %in% LETTERS[11:15] ~ "K-O",
                           Last.Name.start %in% LETTERS[16:20] ~ "P-T",
                           Last.Name.start %in% LETTERS[20:26] ~ "T-Z"
  ))

d.sub<-subset(d, Job.Title %in% c("Assistant Professor", "Associate Professor", "Professor"))

d.sub$Job.Title<-factor(d.sub$Job.Title,levels=c("Assistant Professor", "Associate Professor", "Professor"))

There are 167846 observatrions on professors. The top 10 professors with the highest salary are

arrange(d.sub,desc(Salary.Paid))[1:10,]
Sector Last.Name First.Name Salary.Paid Taxable.Benefits Employer Job.Title.Original Calendar.Year Job.Title ManageP First.Name.start Last.Name.start First_Name Last_Name
Universities Ganjavi Ozhand 686760.6 205.23 Laurentian University of Sudbury Full Professor 2017 Professor No Mngt O G K-O F-J
Universities WOMACK KENT 560928.0 611.76 University of Toronto Professor of Finance 2011 Professor No Mngt K W K-O T-Z
Universities Kelton John 514665.4 10515.79 McMaster University Professor / Executive Director 2016 Professor No Mngt J K F-J K-O
Universities ELBESTAWI MOHAMED ABDELAZIZ 506246.8 9921.46 McMaster University Vice–President Research/Professor 2012 Professor No Mngt M E K-O A-E
Universities Whyte Glen 504907.0 520.20 University of Toronto Professor of Organizational Behaviour and Human Resource Management 2016 Professor No Mngt G W F-J T-Z
Universities Dacin Tina 501570.8 280.80 Queen’s University Professor - Queen’s School of Business, Director (Centre for Responsible Leadership) - Queen’s School of Business 2014 Professor No Mngt T D P-T A-E
Universities CHAKMA AMIT 500000.0 0.00 University of Waterloo Professor, Adjunct 2010 Professor No Mngt A C A-E A-E
Universities MONAHAN PATRICK J. 488233.6 489.84 York University Professor 2012 Professor No Mngt P M P-T K-O
Universities Strong Michael 487126.0 91.80 University of Western Ontario Dean/Professor/Medical Doctor 2015 Professor Dean M S K-O P-T
Universities STRONG MICHAEL 482125.9 95.40 University of Western Ontario Dean / Professor / Medical Doctor 2012 Professor Dean M S K-O P-T

The observations for each year and for each job title are

summaryBy(Salary.Paid ~ Calendar.Year, data=d.sub, FUN=summary)
Calendar.Year Salary.Paid.Min. Salary.Paid.1st Qu. Salary.Paid.Median Salary.Paid.Mean Salary.Paid.3rd Qu. Salary.Paid.Max.
1996 100004.0 103400.2 108060.0 114853.0 119310.0 278683.0
1997 100002.9 103377.9 107589.4 115009.3 119122.3 285122.5
1998 100000.0 103703.5 108154.4 115147.9 118731.7 289808.5
1999 100005.5 103392.4 108569.1 115026.3 117604.9 302500.1
2000 100004.5 103759.7 109675.9 116011.5 119434.4 297500.0
2001 100016.2 104568.6 110507.2 117281.3 120982.5 300000.0
2002 100000.0 104865.6 111215.3 118105.0 122299.6 327521.3
2003 100000.0 106617.7 113868.7 120828.9 125510.7 350387.5
2004 100000.0 107304.0 115616.1 122486.1 127889.3 351553.8
2005 100000.0 108184.6 117982.6 124518.1 131278.9 368919.9
2006 100000.0 109592.3 120773.5 127215.2 135558.8 375392.5
2007 100000.1 111032.7 123210.3 130100.0 139156.5 410596.0
2008 100000.0 111592.5 125186.4 132324.4 142546.0 420608.2
2009 100000.0 110115.4 125906.9 132968.6 145457.2 476372.2
2010 100000.1 110794.6 128339.7 135069.4 148791.0 500000.0
2011 100000.0 107471.5 127085.4 134658.6 150054.3 560928.0
2012 100000.0 107671.1 128569.7 136644.9 152917.7 506246.8
2013 100000.1 108721.2 131833.5 139312.9 157010.9 475261.8
2014 100000.0 109628.0 133419.3 140975.9 159784.3 501570.8
2015 100000.0 109412.6 134490.4 142251.0 162064.9 487126.0
2016 100000.0 109957.0 137410.3 145387.2 167100.1 514665.4
2017 100000.0 125793.6 150316.2 155850.1 175827.7 686760.6
summaryBy(Salary.Paid ~ Job.Title, data=d.sub, FUN=summary)
Job.Title Salary.Paid.Min. Salary.Paid.1st Qu. Salary.Paid.Median Salary.Paid.Mean Salary.Paid.3rd Qu. Salary.Paid.Max.
Assistant Professor 1e+05 105964.5 114313.4 122723.8 129375.4 393014.0
Associate Professor 1e+05 113379.4 126643.4 131759.4 143247.2 419175.2
Professor 1e+05 107440.6 128402.1 139274.6 158607.6 686760.6
summaryBy(Salary.Paid ~ ManageP, data=d.sub, FUN=summary)
ManageP Salary.Paid.Min. Salary.Paid.1st Qu. Salary.Paid.Median Salary.Paid.Mean Salary.Paid.3rd Qu. Salary.Paid.Max.
No Mngt 1e+05 108264.5 124839.5 134244.6 149610.7 686760.6
Department Head 1e+05 125260.1 150424.8 156287.9 177381.4 406024.1
Dean 1e+05 125029.2 152469.4 167674.4 189839.7 487126.0

Preliminary Results

This preliminary analysis is on individuals with a job title Assistant Professor, Associate Professor, or Professor. Here are the regression results. Last Name A - E is the reference group and year . Compared to this group, people with other last names (lower by alphabetical order) get paid less. Column (2) adds the initial of the first name as an explanatory variable. It’s is not significant. But the initials of the last name is still significant.

md <- lm(data = d.sub,
         Salary.Paid ~ Last_Name+
         Job.Title+ManageP+Job.Title*ManageP+
         Employer+
         Calendar.Year
         
)

md.add.first.name <- lm(data = d.sub,
         Salary.Paid ~ Last_Name+Job.Title+ManageP+Job.Title*ManageP+
         Employer+
         Calendar.Year+First_Name
          )
         
stargazer(md,md.add.first.name,
          omit = "Employer",
          omit.labels = "Employer",
          intercept.bottom = F,
          df = FALSE,
          type="html")
Dependent variable:
Salary.Paid
(1) (2)
Constant 56,702.190*** 56,755.370***
(2,333.084) (2,335.388)
Last_NameF-J -1,661.109*** -1,663.511***
(215.599) (215.592)
Last_NameK-O -950.504*** -945.375***
(195.375) (195.409)
Last_NameP-T -702.380*** -699.413***
(198.817) (198.830)
Last_NameT-Z -557.689** -571.039**
(267.116) (267.264)
Job.TitleAssociate Professor 11,081.620*** 11,066.830***
(298.069) (298.048)
Job.TitleProfessor 34,075.960*** 34,034.370***
(294.807) (294.860)
ManagePDepartment Head -957.254 -986.372
(2,657.740) (2,657.537)
ManagePDean 26,735.720*** 26,882.760***
(5,116.582) (5,116.116)
Calendar.Year1997 -6,662.287 -6,661.233
(31,807.320) (31,803.950)
Calendar.Year1998 2,383.513* 2,386.027*
(1,244.969) (1,244.843)
Calendar.Year1999 2,127.845* 2,125.834*
(1,199.258) (1,199.139)
Calendar.Year2000 4,776.123*** 4,779.836***
(1,157.955) (1,157.850)
Calendar.Year2001 6,123.593*** 6,133.420***
(1,135.925) (1,135.818)
Calendar.Year2002 7,756.351*** 7,769.062***
(1,114.539) (1,114.435)
Calendar.Year2003 12,743.150*** 12,767.950***
(1,083.542) (1,083.454)
Calendar.Year2004 14,576.460*** 14,614.840***
(1,074.428) (1,074.356)
Calendar.Year2005 19,984.730*** 20,023.280***
(1,067.038) (1,066.973)
Calendar.Year2006 23,801.080*** 23,847.380***
(1,058.940) (1,058.888)
Calendar.Year2007 28,712.250*** 28,760.540***
(1,048.716) (1,048.674)
Calendar.Year2008 32,778.370*** 32,829.740***
(1,043.535) (1,043.496)
Calendar.Year2009 37,860.590*** 37,914.160***
(1,036.094) (1,036.056)
Calendar.Year2010 42,093.740*** 42,144.390***
(1,033.396) (1,033.358)
Calendar.Year2011 45,053.200*** 45,110.300***
(1,028.902) (1,028.875)
Calendar.Year2012 48,017.110*** 48,074.400***
(1,027.625) (1,027.599)
Calendar.Year2013 50,489.100*** 50,549.070***
(1,029.075) (1,029.059)
Calendar.Year2014 53,968.840*** 54,031.730***
(1,027.884) (1,027.873)
Calendar.Year2015 55,771.870*** 55,837.130***
(1,024.562) (1,024.557)
Calendar.Year2016 59,394.420*** 59,459.340***
(1,025.174) (1,025.171)
Calendar.Year2017 62,827.670*** 62,895.430***
(1,030.223) (1,030.223)
First_NameF-J 135.494
(199.171)
First_NameK-O -969.399***
(199.843)
First_NameP-T -21.156
(193.273)
First_NameT-Z 500.165
(325.044)
Job.TitleAssociate Professor:ManagePDepartment Head 8,111.406*** 8,145.501***
(2,774.424) (2,774.183)
Job.TitleProfessor:ManagePDepartment Head 15,196.400*** 15,180.860***
(2,685.595) (2,685.402)
Job.TitleAssociate Professor:ManagePDean -5,565.219 -5,655.219
(5,195.484) (5,194.975)
Job.TitleProfessor:ManagePDean 6,042.226 5,919.922
(5,144.550) (5,144.065)
Employer Yes Yes
Observations 167,846 167,846
R2 0.397 0.397
Adjusted R2 0.396 0.397
Residual Std. Error 28,440.350 28,437.340
F Statistic 502.116*** 493.431***
Note: p<0.1; p<0.05; p<0.01

Test to see violation of model assumptions

  • Error should follow normal distribution

Normal Q-Q plot does not look linear. Therefore the error is not following normal distribution.

par(mfrow = c(2,2))
plot(md)

  • No autocorrelation

D-W test value is close to 2. Hence, this assumption is satisfied.

dwtest(md)
## 
##  Durbin-Watson test
## 
## data:  md
## DW = 1.8856, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is greater than 0

Why so?

Here are some related research investigating the last name premium problem:

Van Praag, C. Mirjam, and Bernard van Praag. “The benefits of being economics professor A (rather than Z).” Economica 75, no. 300 (2008): 782-796.

“Alphabetical name ordering on multi-authored academic papers, which is the convention in economics and various other disciplines, is to the advantage of people whose last name initials are placed early in the alphabet. Professor A, who has been a first author more often than Professor Z, will have published more articles and experienced a faster productivity rate over the course of her career as a result of reputation and visibility.”

Efthyvoulou, G. (2008). Alphabet economics: The link between names and reputation. The Journal of Socio-Economics, 37(3), 1266-1285.

“Overall, we find that faculty members with earlier last name initials are more likely to get employment at high standard research departments. Furthermore, we show that the relationship between alphabetical placement and academic success remains significant if we use as an alternative measure of reputation the number of people showing an interest in the papers of a particular academic.”

Adjust salary by CPI

The previous results do not adjust salary by CPI. I will do it here to see if that affects the main findings. All salaries are adjusted to the 2017 level.

cpi<-read.csv("CanadaCPI.csv")
cpi$year<-as.numeric(substr(cpi$Quarter,1,4))
d.cpi<-data.frame(year=1993:2017,CPI=c(by(cpi$CPI.Inflation,cpi$year,mean)))
d.cpi$pct<-(d.cpi$CPI/100+1)

d.cpi$to2017<-1
for (i in 1993:2016) {
d.cpi$to2017[d.cpi$year==i] <- prod(d.cpi$pct[d.cpi$year %in% i:2017])
}

d.merge<-merge(x = d,y = d.cpi,by.x ="Calendar.Year", by.y = "year",all.x = T )

d<-mutate(d.merge,
          Salary.Paid=Salary.Paid*to2017,
          Taxable.Benefits=Taxable.Benefits*to2017
          )

Here are the results with inflation adjusted salary.

d.sub<-subset(d, Job.Title %in% c("Assistant Professor", "Associate Professor", "Professor"))
md <- lm(data = d.sub,
         Salary.Paid ~ Last_Name+Job.Title+ManageP+Job.Title*ManageP+
         Employer+
         Calendar.Year
)

md.add.first.name <- lm(data = d.sub,
         Salary.Paid ~ Last_Name+Job.Title+ManageP+Job.Title*ManageP+
         Employer+
         Calendar.Year+First_Name)
         
stargazer(md,md.add.first.name,
          omit = "Employer",
          omit.labels = "Employer",
          intercept.bottom = F,
          df = FALSE,
          type="html")
Dependent variable:
Salary.Paid
(1) (2)
Constant 109,123.600*** 109,202.900***
(2,576.347) (2,578.877)
Last_NameF-J -1,818.655*** -1,821.368***
(238.079) (238.069)
Last_NameK-O -1,078.631*** -1,071.973***
(215.747) (215.782)
Last_NameP-T -747.700*** -743.906***
(219.548) (219.560)
Last_NameT-Z -630.293** -643.516**
(294.968) (295.129)
Job.TitleAssociate Professor 11,511.160*** 11,494.300***
(329.148) (329.123)
Job.TitleProfessor 36,235.280*** 36,188.210***
(325.545) (325.602)
ManagePDepartment Head -599.610 -636.544
(2,934.854) (2,934.613)
ManagePDean 28,116.310*** 28,279.210***
(5,650.073) (5,649.527)
Calendar.Year1997 -12,775.710 -12,774.490
(35,123.780) (35,119.860)
Calendar.Year1998 -2,528.662* -2,525.750*
(1,374.778) (1,374.631)
Calendar.Year1999 -4,405.067*** -4,407.385***
(1,324.301) (1,324.162)
Calendar.Year2000 -4,012.371*** -4,008.112***
(1,278.691) (1,278.569)
Calendar.Year2001 -6,479.527*** -6,468.584***
(1,254.364) (1,254.239)
Calendar.Year2002 -8,440.893*** -8,426.563***
(1,230.748) (1,230.627)
Calendar.Year2003 -5,888.918*** -5,860.621***
(1,196.520) (1,196.415)
Calendar.Year2004 -7,776.531*** -7,732.925***
(1,186.456) (1,186.369)
Calendar.Year2005 -4,404.242*** -4,360.476***
(1,178.295) (1,178.216)
Calendar.Year2006 -3,253.942*** -3,201.407***
(1,169.352) (1,169.288)
Calendar.Year2007 -712.656 -657.665
(1,158.062) (1,158.009)
Calendar.Year2008 620.557 678.973
(1,152.341) (1,152.292)
Calendar.Year2009 2,509.882** 2,570.917**
(1,144.125) (1,144.077)
Calendar.Year2010 6,734.345*** 6,792.048***
(1,141.145) (1,141.097)
Calendar.Year2011 7,188.268*** 7,253.309***
(1,136.183) (1,136.146)
Calendar.Year2012 6,139.287*** 6,204.602***
(1,134.772) (1,134.737)
Calendar.Year2013 6,481.421*** 6,549.871***
(1,136.373) (1,136.349)
Calendar.Year2014 8,881.642*** 8,953.395***
(1,135.059) (1,135.040)
Calendar.Year2015 7,970.827*** 8,045.267***
(1,131.390) (1,131.378)
Calendar.Year2016 10,034.610*** 10,108.670***
(1,132.066) (1,132.056)
Calendar.Year2017 8,503.031*** 8,580.343***
(1,137.641) (1,137.634)
First_NameF-J 152.448
(219.937)
First_NameK-O -1,116.692***
(220.678)
First_NameP-T -57.276
(213.424)
First_NameT-Z 489.830
(358.933)
Job.TitleAssociate Professor:ManagePDepartment Head 8,222.506*** 8,264.578***
(3,063.705) (3,063.421)
Job.TitleProfessor:ManagePDepartment Head 16,646.270*** 16,633.480***
(2,965.615) (2,965.384)
Job.TitleAssociate Professor:ManagePDean -5,398.025 -5,496.541
(5,737.202) (5,736.607)
Job.TitleProfessor:ManagePDean 9,594.258* 9,457.571*
(5,680.957) (5,680.389)
Employer Yes Yes
Observations 167,846 167,846
R2 0.349 0.349
Adjusted R2 0.348 0.348
Residual Std. Error 31,405.750 31,402.240
F Statistic 408.659*** 401.636***
Note: p<0.1; p<0.05; p<0.01

Male Names vs. Female Names

It would be interesting to see how gender is related to salary. However, the Ontario public sector salary data has no gender. But it does have names. So I matched the first names with the top 100 male names and female names in Canada according to this website. For names not in the top 100 list, the gender variables are “Gender Unknown”. I hope there are not many males using female names or vice versa.

top100names<-read.csv("top100names.csv",colClasses = c("integer","character","character"))
top100names<-top100names %>% 
mutate(Boy = tolower(Boy)) %>% 
  mutate(Girl = tolower(Girl)) 
set.top.male.name<-c(top100names$Boy)  
set.top.female.name<-c(top100names$Girl)  




d <-d %>% 
  mutate(First.Name = tolower(First.Name)) %>% 
  mutate(Last.Name = tolower(Last.Name)) %>% 
  mutate(top_100_male_name = case_when(First.Name %in%  set.top.male.name~ "YES",
                                       TRUE ~ "NO")) %>% 
  mutate(top_100_female_name = case_when(First.Name %in% set.top.female.name ~ "YES",
                                       TRUE ~ "NO")) %>% 
  mutate(Gender = case_when(top_100_male_name == "YES" ~ "Male",
                          top_100_female_name == "YES" ~ "Female",
                                       TRUE ~ "Gender Unknown"))

Plots

Let’s plot the data.

d.sub<-subset(d, Job.Title %in% c("Assistant Professor", "Associate Professor", "Professor"))
ggplot(d.sub,aes(Job.Title,Salary.Paid))+
  geom_boxplot()

ggplot(d.sub,aes(ManageP,Salary.Paid))+
  geom_boxplot()

ggplot(d.sub,aes(Gender,Salary.Paid))+
  geom_boxplot()

d.sub$Calendar.Year<-as.numeric(d.sub$Calendar.Year)
ggplot(d.sub,aes(Calendar.Year,Salary.Paid,color=Gender))+
  geom_point(alpha=0.05)+
  stat_smooth(size=1)+
  facet_grid(~Job.Title)+
  theme(legend.position="bottom")

Results

Here are the regression results. Column (3) adds gender and female is the reference group. The results suggests that males are paid 7,315 more than females after controlling the alphabetical order of last names, job titles, employer, and year. Of course, these results are preliminary and the model only explains 41% of the salary variations. Are girls doing the same job as guys but got paid less? Then from an employer’s point of view, they should hire more girls. Then there will be more demand for female professors. As employers compete for female professors by offering them more wage, this would push up female salary. So what is causing this 7,315 salary gap per year between male and female?

Salary.Paid
(1) (2) (3)
Constant 97,527.900*** 97,604.440*** 90,340.790***
(2,357.120) (2,359.758) (2,425.523)
Last_NameF-J -1,792.617*** -1,795.633*** -1,799.626***
(238.598) (238.588) (238.461)
Last_NameK-O -1,075.633*** -1,069.225*** -1,059.909***
(216.222) (216.257) (216.141)
Last_NameP-T -747.906*** -744.385*** -759.773***
(220.031) (220.043) (219.923)
Last_NameT-Z -640.752** -654.715** -698.524**
(295.614) (295.775) (295.650)
Job.TitleAssociate Professor 11,461.390*** 11,444.760*** 11,453.430***
(329.765) (329.739) (329.555)
Job.TitleProfessor 35,814.980*** 35,767.290*** 35,698.320***
(325.699) (325.757) (325.627)
ManagePDepartment Head -659.121 -696.382 -1,151.923
(2,941.267) (2,941.020) (2,939.810)
ManagePDean 27,838.270*** 28,004.460*** 27,999.590***
(5,662.441) (5,661.883) (5,658.744)
First_NameF-J 165.732 -49.062
(220.412) (220.837)
First_NameK-O -1,119.370*** -1,186.550***
(221.160) (221.557)
First_NameP-T -57.103 -61.648
(213.894) (216.031)
First_NameT-Z 515.337 350.053
(359.720) (359.727)
GenderGender Unknown 7,002.208***
(573.632)
GenderMale 8,438.986***
(612.652)
Job.TitleAssociate Professor:ManagePDepartment Head 8,282.233*** 8,325.268*** 8,690.548***
(3,070.359) (3,070.069) (3,068.711)
Job.TitleProfessor:ManagePDepartment Head 16,924.680*** 16,910.910*** 17,343.130***
(2,972.057) (2,971.820) (2,970.569)
Job.TitleAssociate Professor:ManagePDean -5,331.880 -5,432.455 -5,417.916
(5,749.763) (5,749.156) (5,745.957)
Job.TitleProfessor:ManagePDean 10,154.000* 10,012.730* 10,069.650*
(5,693.195) (5,692.616) (5,689.444)
Year Dummies? Yes Yes Yes
Employer Dummies? Yes Yes Yes
Observations 167,846 167,846 167,846
R2 0.346 0.346 0.347
Adjusted R2 0.345 0.346 0.346
Residual Std. Error 31,475.220 31,471.650 31,453.960
F Statistic 443.746*** 435.351*** 432.534***
Notes: ***Significant at the 1 percent level.
**Significant at the 5 percent level.
*Significant at the 10 percent level.

A dummy for each last name initial

One of my favorite economists Professor Marc F. Bellemare suggests

one dummy for each letter and see if it’s generally monotonic.

Great suggestion! Let’s generate one dummy for each last name initial and repeat our main model. The results are reported in column (4).

Salary.Paid
(1) (2) (3) (4)
Constant 97,527.900*** 97,604.440*** 90,340.790*** 91,080.620***
(2,357.120) (2,359.758) (2,425.523) (2,455.053)
Last_NameF-J -1,792.617*** -1,795.633*** -1,799.626***
(238.598) (238.588) (238.461)
Last_NameK-O -1,075.633*** -1,069.225*** -1,059.909***
(216.222) (216.257) (216.141)
Last_NameP-T -747.906*** -744.385*** -759.773***
(220.031) (220.043) (219.923)
Last_NameT-Z -640.752** -654.715** -698.524**
(295.614) (295.775) (295.650)
Last.Name.startB -1,057.384**
(476.446)
Last.Name.startC -1,327.810***
(493.545)
Last.Name.startD -1,693.311***
(522.658)
Last.Name.startE 310.866
(697.114)
Last.Name.startF -2,724.239***
(574.184)
Last.Name.startG -3,773.459***
(530.269)
Last.Name.startH -1,510.284***
(509.304)
Last.Name.startI -3,559.857***
(972.096)
Last.Name.startJ -3,992.708***
(644.691)
Last.Name.startK -2,543.395***
(527.593)
Last.Name.startL -2,851.209***
(509.113)
Last.Name.startM -1,762.381***
(465.675)
Last.Name.startN -2,891.073***
(664.872)
Last.Name.startO 1,844.441**
(761.177)
Last.Name.startP -2,712.749***
(529.501)
Last.Name.startQ -5,262.311***
(1,673.228)
Last.Name.startR -809.547
(535.678)
Last.Name.startS -1,734.296***
(468.301)
Last.Name.startT -1,604.479***
(574.444)
Last.Name.startU -6,218.417***
(1,822.707)
Last.Name.startV -3,277.964***
(721.094)
Last.Name.startW -1,091.622**
(523.956)
Last.Name.startX -5,616.109***
(1,906.005)
Last.Name.startY -1,884.500**
(857.160)
Last.Name.startZ -612.824
(859.520)
Job.TitleAssociate Professor 11,461.390*** 11,444.760*** 11,453.430*** 11,463.900***
(329.765) (329.739) (329.555) (329.481)
Job.TitleProfessor 35,814.980*** 35,767.290*** 35,698.320*** 35,683.460***
(325.699) (325.757) (325.627) (325.586)
ManagePDepartment Head -659.121 -696.382 -1,151.923 -1,015.876
(2,941.267) (2,941.020) (2,939.810) (2,939.370)
ManagePDean 27,838.270*** 28,004.460*** 27,999.590*** 28,290.420***
(5,662.441) (5,661.883) (5,658.744) (5,658.021)
First_NameF-J 165.732 -49.062 -28.559
(220.412) (220.837) (220.884)
First_NameK-O -1,119.370*** -1,186.550*** -1,160.651***
(221.160) (221.557) (221.670)
First_NameP-T -57.103 -61.648 -67.210
(213.894) (216.031) (216.165)
First_NameT-Z 515.337 350.053 400.243
(359.720) (359.727) (360.494)
GenderGender Unknown 7,002.208*** 7,029.347***
(573.632) (573.666)
GenderMale 8,438.986*** 8,485.181***
(612.652) (612.779)
Job.TitleAssociate Professor:ManagePDepartment Head 8,282.233*** 8,325.268*** 8,690.548*** 8,603.174***
(3,070.359) (3,070.069) (3,068.711) (3,068.335)
Job.TitleProfessor:ManagePDepartment Head 16,924.680*** 16,910.910*** 17,343.130*** 17,213.910***
(2,972.057) (2,971.820) (2,970.569) (2,970.145)
Job.TitleAssociate Professor:ManagePDean -5,331.880 -5,432.455 -5,417.916 -5,798.780
(5,749.763) (5,749.156) (5,745.957) (5,745.116)
Job.TitleProfessor:ManagePDean 10,154.000* 10,012.730* 10,069.650* 9,738.049*
(5,693.195) (5,692.616) (5,689.444) (5,688.889)
Year Dummies? Yes Yes Yes No
Employer Dummies? Yes Yes Yes Yes
Observations 167,846 167,846 167,846 167,846
R2 0.346 0.346 0.347 0.348
Adjusted R2 0.345 0.346 0.346 0.347
Residual Std. Error 31,475.220 31,471.650 31,453.960 31,442.970
F Statistic 443.746*** 435.351*** 432.534*** 393.403***
Notes: ***Significant at the 1 percent level.
**Significant at the 5 percent level.
*Significant at the 10 percent level.

The estimated coefficients on last name initials compared to last name A are:

n.start=2
d.tmp<-data.frame(Last.Name.Initial=LETTERS[2:26],
                  Coefficient=coef(md.all.letters)[n.start:26],
                  p.value=summary(md.all.letters)$coefficients[n.start:26,4])
d.tmp$significant10<-"Not Significant"
d.tmp$significant10[d.tmp$p.value<=0.1]<-"Significant at 10%"

ggplot(data = d.tmp,aes(Last.Name.Initial,Coefficient))+
  geom_col(aes(fill =significant10))+
  ylab("Salary Drop Compared to the Reference Last Name Initial ($)")+
  xlab("Last Name Initial")

Rare last name ranked lower in alphabetical order coincide with race

Let’s look at the distribution of the last name initials.

plot(table(d$Last.Name.start),xlab="Last Name Initial", ylab="Number of Observations")

Sort the last name initials by the number of observations.

sort(table(d$Last.Name.start), decreasing = T)
M S B C H L D P G W R K F T A J N V E O Y Z I Q U X
128684 106039 101748 89644 67218 65279 63820 59601 57196 55843 53884 49970 41377 41136 38860 24291 21905 19934 17830 16360 8945 8832 7397 2888 2309 702

Some of the last names (i.e. names started with “U” and “X”) are rare and may coincide with the professor’s race and cultural background. This would screw up the model as we have no control on professors race (race data not available in our dataset).

We can only select those popular last name initials with more observations (less likely to be a minority), and see whether our findings remain valid.

Last name with more observations

Select the most populated last name initials. They are:

#set<-names(sort(table(d$Last.Name.start), decreasing = T))[1:14]# B-W

set<-names(sort(table(d$Last.Name.start), decreasing = T))[1:14]

set.n.1<-sort(set,decreasing = F)[2:length(set)]
set
##  [1] "M" "S" "B" "C" "H" "L" "D" "P" "G" "W" "R" "K" "F" "T"

Here are the results. It appears that our findings remain valid. Your salary is related to the alphabetical order of your last name.

d.sub2<-subset(d.sub,Last.Name.start %in% set)

d.sub<-d.sub2
md <- lm(data = d.sub,
         Salary.Paid ~ Last_Name+Job.Title+ManageP+Job.Title*ManageP+
         Employer+
         Calendar.Year)

md.first.name <- lm(data = d.sub,
         Salary.Paid ~Last_Name+Job.Title+ManageP+Job.Title*ManageP+
         Employer+
         Calendar.Year+First_Name)

md.gender<- lm(data = d.sub,
         Salary.Paid ~ Last_Name+Job.Title+ManageP+Job.Title*ManageP+
         Employer+
         Calendar.Year+First_Name+Gender)

md.all.letters<- lm(data = d.sub,
         Salary.Paid ~Last.Name.start+Job.Title+ManageP+Job.Title*ManageP+
         Employer+
         Calendar.Year+First_Name+Gender)

md.all.letters.100k<- lm(data = d.sub,
         (Salary.Paid-100000) ~Last.Name.start+Job.Title+ManageP+Job.Title*ManageP+
         Employer+
         Calendar.Year+First_Name+Gender)

md.all.letters.log<- lm(data = d.sub,
         log(Salary.Paid) ~Last.Name.start+Job.Title+ManageP+Job.Title*ManageP+
         Employer+
         Calendar.Year+First_Name+Gender)

md.all.letters.log.100k<- lm(data = d.sub,
         log(Salary.Paid-100000) ~Last.Name.start+Job.Title+ManageP+Job.Title*ManageP+
         Employer+
         Calendar.Year+First_Name+Gender
         )


stargazer(md,
          md.first.name,
          md.gender,
          md.all.letters,
          md.all.letters.100k,
          md.all.letters.log,
          md.all.letters.log.100k,
          omit = c("Calendar.Year","Employer"),
          omit.labels = c("Year Dummies?","Employer Dummies?"),
          df = FALSE,
          type="html",
          intercept.bottom = F,
          style = "aer",covariate.labels = c())
Salary.Paid (Salary.Paid - 1e+05) log(Salary.Paid) log(Salary.Paid - 1e+05)
(1) (2) (3) (4) (5) (6) (7)
Constant 98,854.710*** 99,027.990*** 91,713.300*** 91,815.450*** -8,184.546*** 11.543*** 9.476***
(2,817.535) (2,819.652) (2,885.176) (2,890.290) (2,890.290) (0.016) (0.050)
Last_NameF-J -1,279.286*** -1,295.123*** -1,284.922***
(265.734) (265.733) (265.590)
Last_NameK-O -964.347*** -974.910*** -947.646***
(238.756) (238.820) (238.704)
Last_NameP-T -420.562* -429.584* -428.878*
(234.266) (234.294) (234.157)
Last_NameT-Z 270.860 216.788 172.196
(381.225) (381.362) (381.145)
Last.Name.startC -314.045 -314.045 -0.002 -0.008
(392.574) (392.574) (0.002) (0.007)
Last.Name.startD -638.036 -638.036 -0.005** -0.024***
(428.134) (428.134) (0.002) (0.007)
Last.Name.startF -1,684.619*** -1,684.619*** -0.010*** -0.036***
(490.327) (490.327) (0.003) (0.009)
Last.Name.startG -2,744.926*** -2,744.926*** -0.016*** -0.042***
(438.028) (438.028) (0.002) (0.008)
Last.Name.startH -489.251 -489.251 -0.005** -0.017**
(411.906) (411.906) (0.002) (0.007)
Last.Name.startK -1,495.744*** -1,495.744*** -0.007*** -0.012
(434.657) (434.657) (0.002) (0.008)
Last.Name.startL -1,818.614*** -1,818.614*** -0.010*** -0.026***
(412.303) (412.303) (0.002) (0.007)
Last.Name.startM -724.222** -724.222** -0.005** -0.015**
(356.530) (356.530) (0.002) (0.006)
Last.Name.startP -1,687.778*** -1,687.778*** -0.009*** -0.027***
(437.167) (437.167) (0.002) (0.008)
Last.Name.startR 242.868 242.868 0.000 -0.006
(444.374) (444.374) (0.002) (0.008)
Last.Name.startS -687.697* -687.697* -0.004** -0.012*
(359.822) (359.822) (0.002) (0.006)
Last.Name.startT -562.754 -562.754 -0.002 -0.014
(490.849) (490.849) (0.003) (0.009)
Last.Name.startW -88.356 -88.356 -0.002 -0.006
(430.456) (430.456) (0.002) (0.008)
Job.TitleAssociate Professor 11,404.710*** 11,387.240*** 11,385.180*** 11,391.140*** 11,391.140*** 0.085*** 0.429***
(363.310) (363.297) (363.072) (363.041) (363.041) (0.002) (0.006)
Job.TitleProfessor 36,040.070*** 35,999.730*** 35,909.620*** 35,894.800*** 35,894.800*** 0.234*** 0.873***
(358.872) (358.970) (358.818) (358.809) (358.809) (0.002) (0.006)
ManagePDepartment Head 172.141 105.919 -533.733 -281.844 -281.844 -0.001 0.033
(3,361.074) (3,360.944) (3,359.609) (3,359.793) (3,359.793) (0.018) (0.059)
ManagePDean 28,971.990*** 29,145.770*** 29,156.780*** 29,403.960*** 29,403.960*** 0.200*** 0.806***
(5,765.845) (5,765.436) (5,761.916) (5,762.278) (5,762.278) (0.031) (0.101)
First_NameF-J 12.623 -207.455 -199.670 -199.670 -0.001 -0.002
(241.286) (241.707) (241.751) (241.751) (0.001) (0.004)
First_NameK-O -803.730*** -856.747*** -843.809*** -843.809*** -0.005*** -0.014***
(242.945) (243.340) (243.470) (243.470) (0.001) (0.004)
First_NameP-T -211.174 -162.788 -169.460 -169.460 -0.001 -0.000
(233.416) (235.754) (235.824) (235.824) (0.001) (0.004)
First_NameT-Z 1,259.309*** 1,105.883*** 1,172.243*** 1,172.243*** 0.007*** 0.022***
(395.792) (395.730) (396.171) (396.171) (0.002) (0.007)
GenderGender Unknown 7,078.564*** 7,093.705*** 7,093.705*** 0.043*** 0.130***
(631.075) (631.049) (631.049) (0.003) (0.011)
GenderMale 8,819.144*** 8,847.023*** 8,847.023*** 0.053*** 0.162***
(672.466) (672.545) (672.545) (0.004) (0.012)
Job.TitleAssociate Professor:ManagePDepartment Head 8,561.990** 8,642.629** 9,205.934*** 9,007.493*** 9,007.493*** 0.058*** 0.163***
(3,493.507) (3,493.351) (3,491.825) (3,491.969) (3,491.969) (0.019) (0.061)
Job.TitleProfessor:ManagePDepartment Head 15,515.690*** 15,537.320*** 16,136.680*** 15,872.750*** 15,872.750*** 0.091*** 0.207***
(3,393.679) (3,393.520) (3,392.113) (3,392.366) (3,392.366) (0.018) (0.059)
Job.TitleAssociate Professor:ManagePDean -5,518.253 -5,613.393 -5,607.395 -5,890.610 -5,890.610 -0.058* -0.370***
(5,862.153) (5,861.677) (5,858.079) (5,858.378) (5,858.378) (0.031) (0.102)
Job.TitleProfessor:ManagePDean 8,984.689 8,839.569 8,864.663 8,559.721 8,559.721 -0.016 -0.394***
(5,801.306) (5,800.875) (5,797.318) (5,797.908) (5,797.908) (0.031) (0.101)
Year Dummies? Yes Yes Yes No No No No
Employer Dummies? Yes Yes Yes Yes Yes Yes Yes
Observations 140,412 140,412 140,412 140,412 140,412 140,412 140,412
R2 0.345 0.346 0.346 0.347 0.347 0.425 0.529
Adjusted R2 0.345 0.345 0.346 0.346 0.346 0.424 0.528
Residual Std. Error 31,519.990 31,517.150 31,497.610 31,493.190 31,493.190 0.169 0.550
F Statistic 389.559*** 381.747*** 379.218*** 362.907*** 362.907*** 504.888*** 768.339***
Notes: ***Significant at the 1 percent level.
**Significant at the 5 percent level.
*Significant at the 10 percent level.

Here are the plot of the residuals.

Residuals for the model where salary is unchanged.

par(mfrow = c(2,2))
plot(md.all.letters)

Residuals for the model where 100,000 is subtracted from salary. This is because the our dataset only includes salaries higher than 100,000 per year.

# par(mfrow = c(2,2))
# plot(md.all.letters.100k)

Residuals for the model where salary is in the log form.

par(mfrow = c(2,2))
plot(md.all.letters.log)

Residuals for the model where 100,000 is subtracted from salary, and then in log form.

par(mfrow = c(2,2))
plot(md.all.letters.log.100k)

Let’s see whether the effect, as hypothesized by Professor Marc Bellemare, would be generally monotonic.

n.start=2
d.tmp<-data.frame(Last.Name.Initial=set.n.1,
                  Coefficient=coef(md.all.letters)[n.start:(length(set.n.1)+n.start-1)],
                  p.value=summary(md.all.letters)$coefficients[n.start:(length(set.n.1)+n.start-1),4])
d.tmp$significant10<-"Not Significant"
d.tmp$significant10[d.tmp$p.value<=0.1]<-"Significant at 10%"

ggplot(data = d.tmp,aes(Last.Name.Initial,Coefficient))+
  geom_col(aes(fill =significant10))+
  ylab("Salary Drop Compared to the Reference Last Name Initial ($)")+
  xlab("Last Name Initial")

Compared to professors with Last names started with B, professor Cs are not experiencing significant salary drop. However, for professor Ds and Fs, the salary drop is 1,560 and 1,096 per year. This adverse effect strengthens for professor Gs, reducing his salary by 3,093 per year compared to Bs. After that, the adverse effect shrinks to 2,500, 2,000, and then 1,500 as the last names started further away from B. Eventually, the adverse last name effect stops at about 1,200 per year. So we could conclude that the adverse last name effect increases first and then decreases. This makes sense. Think of a case where we need to read something quickly: we would look at the first few pages, skip the middle, and then roughly read the last few pages. Thus, the adverse effect would be the largest if the last name is in the middle.

How about the last model where we subtracted 10,000 from the salary variable and then take a log? The R square increased to 57%, suggesting that this model fits the data better. Let’s plot the estimated coefficients of the impact of last name initial on salary. It shows a similar pattern.

# #n.start=89
d.tmp<-data.frame(Last.Name.Initial=set.n.1,
                  Coefficient=c(coef(md.all.letters.log.100k)[n.start:(length(set.n.1)+n.start-1)]),
                  p.value=summary(md.all.letters.log.100k)$coefficients[n.start:(length(set.n.1)+n.start-1),4])
d.tmp$significant10<-"Not Significant"
d.tmp$significant10[d.tmp$p.value<=0.1]<-"Significant at 10%"

ggplot(data = d.tmp,aes(Last.Name.Initial,Coefficient*100))+
  geom_col(aes(fill=significant10))+
  ylab("Drop on salary Compared to Reference Last Name (%)")+
  xlab("Last Name Initial")

Well, my father may be mad. But next time you see me, Max Shang is no longer my name. I’m Max Aaaa.

Max Shang
Max Shang
Research Associate

Research Associate, University of Guelph (Ridgetown Campus)

comments powered by Disqus