— title: “genderreport” author: “Jennifer Young” date: “7/31/2021” output: pdf_document — ##

title: “genderreport”

author: “Jennifer Young”

date: “7/31/2021”

output: pdf_document

## Introduction ##

One important decision parents make is naming their children. In this study, we will look at

popular names and gender neutral names. A soon-to-be parent who is researching such an important

decision may want to consider data on a name to see how neutral the name is considered to be.

Choosing a name that is almost equally chosen for both sexes can be

the goal for parents. We will consider several names that have been labeled gender neutral

and consider how they have been used by both biological sexes historically and

we will use a model that predicts when the name is considered male or female based on

it’s use in the US. The babynames and ssa dataset were used for analysis in this study and three models (logistic regression, Random Forest, and Naive Bayes) were used to analyze the data.

“`{r}

local({r <- getOption("repos")

r[“CRAN”] <- "http://cran.r-project.org"

options(repos=r)

})

install.packages(“remotes”) # if necessary

remotes::install_github(“lmullen/gender”)

install.packages(“rTool”)#or install through RStudio

install.packages(‘plyr’, repos = “http://cran.us.r-project.org”)

install.packages(“babynames”)

install.packages(“dplyr”)

install.packages(“tidyr”)

install.packages(“ggplot2”)

install.packages(“gridExtra”)

install.packages(“magrittr”)

install.packages(“devtools”)

install.packages(“tidyverse”)

install.packages(“caret”)

install.packages(“e1071”)

install.packages(“randomForest”)

“`

I had to install psych, naivebayes, gender, randomForest, tinytex and genderdata through RStudio instead

“`{r}

library(tidyverse)

library(caret)

library(plyr)

library(naivebayes)

library(psych)

library(gender)

library(tibble)

library(devtools)

library(babynames)

library(dplyr)

library(tidyr)

library(ggplot2)

library(gridExtra)

library(magrittr)

library(e1071)

library(tinytex)

data(babynames)

head(babynames)

tail(babynames)

“`

## Methods ##

Data visualization was used to look at specific names that are often considered to be gender neutral through various baby name web sites. We can look at the names and graph their use for male and female babies and see their use for either gender in a historical context.

Drawn from Social Security Administration data, a sample of random names were taken from websites that identify gender neutral names the prospective parents could visit using a Google search.

From the earlier analysis on each name, 7 names were chosen that seemed the most neutral based on male and female trendlines in the charts.

Logistic regression, Random Forest and Naive Bayes were used to create models of accurate classification of names for being male, female, or somewhere in between, or gender neutral.

# Finding out how many people were named X name is year X (sample) #

“`{r}

entered_name <- "Charlie"

entered_year <- 2017

result % filter(name == entered_name) %>%

filter(year == entered_year) %>%

summarize(count = sum(n))

result

“`

# Number of male and female names in dataset #

“`{r}

babynames %$%

split(., sex) %>%

lapply(. %$% length(unique(name)))

“`

# Gender Neutral Names by Sex from 1880-2017 #

For each chart, you can view the popularity of the name for use in both biological

sexes between 1880-2017. I took a sample of random names from websites that identify gender neutral names

the prospective parents could visit using a Google search

The names that were tested were taken from a few popular websites, as that is likely

the place where expectant parents would look. Some examples are:

https://www.popsugar.com/family/Gender-Neutral-Baby-Names-34485564

https://www.mother.ly/child/top-50-gender-neutral-baby-names-youll-obsess-over-

The name Kaelin seems to be used by both sexes but has fallen in popularity.

“`{r}

babynames %>%

filter(name == “Kaelin”) %>%

ggplot(aes(x = year, y = n)) +

geom_line(aes(color = sex)) + labs(x = “Year”, y = “Number Born”,

title = “Kaelin, by Sex”)

“`

Charlie is another name for Charles and was traditionally used by males.

However, it has grown in popularity for both genders

“`{r}

babynames %>%

filter(name == “Charlie”) %>%

ggplot(aes(x = year, y = n)) +

geom_line(aes(color = sex)) + labs(x = “Year”, y = “Number Born”,

title = “Charlie, by Sex”)

“`

Shane is a name that was traditionally given to males but has decreased in popularity

“`{r}

babynames %>%

filter(name == “Shane”) %>%

ggplot(aes(x = year, y = n)) +

geom_line(aes(color = sex)) + labs(x = “Year”, y = “Number Born”,

title = “Shane, by Sex”)

“`

Quinn is a name that has been used by box sexes, but has grown in popularity in females

“`{r}

babynames %>%

filter(name == “Quinn”) %>%

ggplot(aes(x = year, y = n)) +

geom_line(aes(color = sex)) + labs(x = “Year”, y = “Number Born”,

title = “Quinn, by Sex”)

“`

Morgan is a name that has historically been used by both sexes, but sharply rose among

females 20 years ago. It has fallen in usage in females since then to meet male usage

“`{r}

babynames %>%

filter(name == “Morgan”) %>%

ggplot(aes(x = year, y = n)) +

geom_line(aes(color = sex)) + labs(x = “Year”, y = “Number Born”,

title = “Morgan, by Sex”)

“`

Finley has grown in usage for both sexes, but more for females

“`{r}

babynames %>%

filter(name == “Finley”) %>%

ggplot(aes(x = year, y = n)) +

geom_line(aes(color = sex)) + labs(x = “Year”, y = “Number Born”,

title = “FInley, by Sex”)

“`

Leslie is a name that was historically used in both genders, although it’s use in males

has decreased over the last 60 years. It was popular for females in the last half

of the last century. It has fallen in popularity overall.

“`{r}

babynames %>%

filter(name == “Leslie”) %>%

ggplot(aes(x = year, y = n)) +

geom_line(aes(color = sex)) + labs(x = “Year”, y = “Number Born”,

title = “Leslie, by Sex”)

“`

Jessie a name that was historically used in both genders and has fallen in popularity

“`{r}

babynames %>%

filter(name == “Jessie”) %>%

ggplot(aes(x = year, y = n)) +

geom_line(aes(color = sex)) + labs(x = “Year”, y = “Number Born”,

title = “Jessie, by Sex”)

“`

Sidney is a name that was historically used in both genders and has fallen in popularity

for both genders

“`{r}

babynames %>%

filter(name == “Sidney”) %>%

ggplot(aes(x = year, y = n)) +

geom_line(aes(color = sex)) + labs(x = “Year”, y = “Number Born”,

title = “Sidney, by Sex”)

“`

Skyler a name that was historically used in both genders and has risen in popularity

in the last two decades

“`{r}

babynames %>%

filter(name == “Skyler”) %>%

ggplot(aes(x = year, y = n)) +

geom_line(aes(color = sex)) + labs(x = “Year”, y = “Number Born”,

title = “Skyler, by Sex”)

“`

Clarke is a name that was historically used for males but has increased in

popularity for females

“`{r}

babynames %>%

filter(name == “Clarke”) %>%

ggplot(aes(x = year, y = n)) +

geom_line(aes(color = sex)) + labs(x = “Year”, y = “Number Born”,

title = “Clarke, by Sex”)

“`

Jackie is a name that was historically used in both genders and has fallen in popularity for both genders

“`{r}

babynames %>%

filter(name == “Jackie”) %>%

ggplot(aes(x = year, y = n)) +

geom_line(aes(color = sex)) + labs(x = “Year”, y = “Number Born”,

title = “Jackie, by Sex”)

“`

Nicky is a name that was historically used in both genders and has fallen in popularity for both genders

“`{r}

babynames %>%

filter(name == “Nicky”) %>%

ggplot(aes(x = year, y = n)) +

geom_line(aes(color = sex)) + labs(x = “Year”, y = “Number Born”,

title = “Nicky, by Sex”)

“`

Ashley is a name that has generally been given to females. Gone With the Wind was an anomaly.

“`{r}

babynames %>%

filter(name == “Ashley”) %>%

ggplot(aes(x = year, y = n)) +

geom_line(aes(color = sex)) + labs(x = “Year”, y = “Number Born”,

title = “Ashley, by Sex”)

“`

Oakley is the closest to gender neutral out of this data analysis and is extremely popular.

“`{r}

babynames %>%

filter(name == “Oakley”) %>%

ggplot(aes(x = year, y = n)) +

geom_line(aes(color = sex)) + labs(x = “Year”, y = “Number Born”,

title = “Oakley, by Sex”)

“`

Frankie is a name that was historically used in both genders and is rising in popularity in females.

“`{r}

babynames %>%

filter(name == “Frankie”) %>%

ggplot(aes(x = year, y = n)) +

geom_line(aes(color = sex)) + labs(x = “Year”, y = “Number Born”,

title = “Frankie, by Sex”)

“`

Justice is a name that was historically used in both genders and is a newer name compared

to many others.

“`{r}

babynames %>%

filter(name == “Justice”) %>%

ggplot(aes(x = year, y = n)) +

geom_line(aes(color = sex)) + labs(x = “Year”, y = “Number Born”,

title = “Justice, by Sex”)

“`

Royal is a name that was historically used for males but has risen in female in the past decade.

“`{r}

babynames %>%

filter(name == “Royal”) %>%

ggplot(aes(x = year, y = n)) +

geom_line(aes(color = sex)) + labs(x = “Year”, y = “Number Born”,

title = “Royal, by Sex”)

“`

What name has been the most popular over time for males? For females?

“`{r}

babynames %>% group_by(sex, name) %>%

dplyr::summarize(median_prop = median(prop)) %>%

top_n(1)

namesinyear <- function(myyear){

require(dplyr)

yearnames % filter(year == myyear) %>% distinct(name)

yearnames <- sapply(yearnames[,"name"], as.character)

return(length(yearnames))}

library(reshape2)

namescount <- c()

for (year in 1880:2017){namescount <- c(namescount,namesinyear(year))}

namescount <- as.data.frame(namescount)

namescount$year <- rownames(namescount)

namescount <- melt(namescount)

“`

This is the number of names given each year in US (1880-2017). The number is rising, which means more names will be given for our data point.

“`{r}

ggplot(namescount, aes(x = year,y = value, group=”variable”)) + geom_line(alpha = 0.4) + theme_minimal() + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + ggtitle(label=” Number of names in a given year”) + geom_smooth(method=”loess”)

“`

We can look at the popular names and see how gender neutral they appear.

“`{r}

babynames %>%

filter(name == “Ava”) %>%

ggplot(aes(x = year, y = n)) +

geom_line(aes(color = sex)) + labs(x = “Year”, y = “Number Born”,

title = “Ava, by Sex”)

babynames %>%

filter(name == “Liam”) %>%

ggplot(aes(x = year, y = n)) +

geom_line(aes(color = sex)) + labs(x = “Year”, y = “Number Born”,

title = “Liam, by Sex”)

babynames %>%

filter(name == “Noah”) %>%

ggplot(aes(x = year, y = n)) +

geom_line(aes(color = sex)) + labs(x = “Year”, y = “Number Born”,

title = “Noah, by Sex”)

babynames %>%

filter(name == “Olivia”) %>%

ggplot(aes(x = year, y = n)) +

geom_line(aes(color = sex)) + labs(x = “Year”, y = “Number Born”,

title = “Olivia, by Sex”)

“`

The most popular names in 2017 are not considered gender neutral. A parent would would be concerned about this would be unikely to choose these names.

# Prediction of gender by name #

I used (method = “ssa”): United States from 1930 to 2012. Drawn from Social Security Administration data.I took a sample of random names from websites that identify gender neutral names the prospective parents could visit using a Google search and graphed them earlier.

From the earlier analysis on each name, I chose 7 names that seemed the most neutral based on male and female trendlines in the charts.

“`{r}

head(gender)

ssa_names <- c("Charlie", "Royal", "Morgan", "Skyler",

“Frankie”, “Oakley”, “Justice”)

ssa_years <- c(rep(c(2009, 2012), 3), 2012)

ssa_df <- tibble(first_names = ssa_names,

last_names = LETTERS[1:7],

years = ssa_years,

min_years = ssa_years – 3,

max_years = ssa_years + 3)

ssa_df

“`

This dataset connects first names to years but there are columns

for minimum and maximum years for possible age range since birth dates are not always exact. We pass this to gender_df() function, which assigns the method that we wish to use and the names of the columns that contain the names and the birth years. The result is a tibble of predictions.

“`{r}

results <- gender_df(ssa_df, name_col = "first_names", year_col = "years",

method = “ssa”)

results

“`

“`{r}

ssa_df %>%

left_join(results, by = c(“first_names” = “name”, “years” = “year_min”))

gender_df(ssa_df, name_col = “first_names”,

year_col = c(“min_years”, “max_years”), method = “ssa”)

“`

Now, we use gender_df() to predict gender by passing it the columns

minimum and maximum years to be used for each name

“`{r}

ssa_df %>%

left_join(results, by = c(“first_names” = “name”, “years” = “year_min”))

gender_df(ssa_df, name_col = “first_names”,

year_col = c(“min_years”, “max_years”), method = “ssa”)

ssa_df %>%

distinct(first_names, years) %>%

rowwise() %>%

do(results = gender(.$first_names, years = .$years, method = “ssa”)) %>%

do(bind_rows(.$results))

ssa_df %>%

distinct(first_names, years) %>%

group_by(years) %>%

do(results = gender(.$first_names, years = .$years[1], method = “ssa”)) %>%

do(bind_rows(.$results))

“`

# Logistic Regression Model #

“`{r}

neutral_names %

select(-prop) %>%

#filter only names between years 1930 and 2012

filter(year >= 1930, year %

#get the number of female and male for each name per year

spread(key = sex, value = n, fill = 0) %>%

#Calculate the measure of gender-neutrality

mutate(prop_F = 100 * F / (F+M), se = (50 – prop_F)^2) %>%

group_by(name) %>%

#per name, find the total number of babies and measure of gender-neutrality

dplyr::summarise(n = n(), female = sum(F), male=sum(M), total = sum(F + M),

mse = mean(se)) %>%

#take only names that occurs every year and occurs greater than 9000 times

filter(n == 83, total > 9000) %>%

#sort by gender neutrality

arrange(mse) %>%

#get only the top 10

head(10)

neutral_names

“`

# Random Forest Classification #

“`{r}

library(randomForest)

neutral_names %

select(-prop) %>%

#Filter only names between years 1930 and 2012

filter(year >= 1930, year %

#Get the number of female and male for each name per year

spread(key = sex, value = n, fill = 0) %>%

#Calculate the measure of gender-neutrality

mutate(prop_F = 100 * F / (F+M), se = (50 – prop_F)^2) %>%

group_by(name) %>%

#Find the total number of babies and measure of gender-neutrality per name

dplyr::summarise(n = n(), female = sum(F), male=sum(M), total = sum(F + M),

mse = mean(se)) %>%

#Take only names that occurs every year and occurs greater than 9000 times

filter(n == 83, total > 9000) %>%

#Sort by gender neutrality

arrange(mse) %>%

#Add variable to represent gender neutral namse. Assumes an mse <= 2000

mutate(isNeutral = ifelse(mse <= 2000,1,0))

neutral_names$isNeutral <- as.factor(neutral_names$isNeutral)

set.seed(100)

train <- sample(nrow(neutral_names), 0.7*nrow(neutral_names), replace = FALSE)

TrainSet <- neutral_names[train,]

ValidSet <- neutral_names[-train,]

summary(TrainSet)

summary(ValidSet)

model1 <- randomForest(isNeutral ~ ., data = TrainSet, importance = TRUE)

model1

predTrain <- predict(model1, TrainSet, type = "class")

caret::confusionMatrix(predTrain, TrainSet$isNeutral)

“`

Train data accuracy is 100% that indicates all the values classified correctly.

Predicting on test data

“`{r}

predTest <- predict(model1, ValidSet, type = "class")

caret::confusionMatrix(predTest, ValidSet$isNeutral)

“`

Validation data accuracy is 100% that indicates all the values classified correctly.

# Naive Bayes Classification #

Comparing model 1 of Random Forest with Naive Bayes model and prediction using naive bayes on training data

“`{r}

model <- naive_bayes(isNeutral ~ ., data = TrainSet, usekernel = T)

model

plot(model)

p <- predict(model, TrainSet, type = 'prob')

head(cbind(p, TrainSet))

“`

Confusion matrix for train data, Calculate misscalculation/error,and model accuracy

“`{r}

p1 <- predict(model, TrainSet)

(tab1 <- table(p1, TrainSet$isNeutral))

miscalc <- (1 – sum(diag(tab1)) / sum(tab1)) * 100

accuracy <- (100- miscalc)

accuracy

“`

The model has an accuracy of 99.90357 on training data for the correct classification of gender neutral names.

# Results #

We can use logistic regression to make a prediction of gender from a name, we can use Random Forest Classification and Naive Bayes to make whether a name is gender neutral with close to 100% and over 99% accuracy, respectively. These methods are effective in determining whether a name is considered gender neutral based on its usage between genders historically. Using these methods indicate that the methods of classification between genders is highly accurate.

# Conclusion #

The results indicate the name and the proportion of each biological sex given that name and a prediction of whether the name is generally considered male or female. By using this data, a prospective parent can consider how names are viewed regarding gender neutrality based on statistical data from the SSA dataset. The limitations on the dataset is that it only has data up to 2017 and is not up to date to the current year.[supanova_question]

College of Administrative and Financial Sciences Assignment 2 Course Name: Communications Management

Writing Assignment Help College of Administrative and Financial Sciences

Assignment 2

Course Name: Communications Management

Student’s Name:

Course Code: MGT-421

Student’s ID Number:

Semester: 1st Semester

CRN:

Academic Year: 2021-2022

For Instructor’s Use only

Instructor’s Name:

Students’ Grade: Marks Obtained/Out of

Level of Marks: High/Middle/Low

Instructions – PLEASE READ THEM CAREFULLY

The Assignment must be submitted on Blackboard (WORD format only) via allocated folder.

Assignments submitted through email will not be accepted.

Students are advised to make their work clear and well presented, marks may be reduced for poor presentation. This includes filling your information on the cover page.

Students must mention question number clearly in their answer.

Late submission will NOT be accepted.

Avoid plagiarism, the work should be in your own words, copying from students or other resources without proper referencing will result in ZERO marks. No exceptions.

All answered must be typed using Times New Roman (size 12, double-spaced) font. No pictures containing text will be accepted and will be considered plagiarism).

Submissions without this cover page will NOT be accepted.

Assignment Regulation:

All students must use their own words.

Assignment -2 should be submitted on Saturday 20/11/2021 (by the end of week 11) using the Black Board only.

This assignment is an individual assignment.

Citing of references is also necessary in APA style.

Your answers MUST include at least 1 outside references (other than the slides and textbook)

Using references from SDL will be highly valued.

Assignment Structure:

Type

Marks

Assignment-2

Critical Thinking

2

Writing Exercise

3

Total

5

Learning Outcomes:

Ability to illustrate techniques and assess skills of correct business research report writing; learn report writing style using an approved style and apply the basics of oral communication in a presentation of a project, including proper speech, organization, use of graphical aids, and effective non-verbal communications.

Ability to write effective business letters, memorandums, and case studies.

Critical Thinking

Visual aids are used by business writers for many purposes. Here we have a graph used to represent the Gross Domestic Product (GDP) Per Capita for Saudi Arabia. Why do you think the author decided to use the bar graph among the other visual aids? Briefly analyze the bar graph below? (2 Marks)

Source: GASTAT

Writing Exercise

In the role of a senior decision-maker in business, write a (policy/ notice of change) memo for your employees on one of the following subjects: (3 Marks)

On-site smoking

Changes in working hours

Overtime

Or early retirement.

Answer:

&&&&

Reference

Gross Domestic Product Second Quarter 2021. (2021). General Authority for Statistics. https://www.stats.gov.sa/[supanova_question]