Data Analysis

Introduction

This report entails data analysis involving the R statistical package. The five variables include sales, calls, years, time, and type. The analysis will involve summarizing the data, analyzing each variable by itself using graphical and numerical techniques of summarization. In terms of interpretation, stem-leaf diagram, frequency table, histogram, boxplot, dot plot, pie chart, and bar graph among others. Also appropriate measures of central tendency, the measures of dispersion, and the shapes of distributions (for the quantitative variable for the data provided). Where appropriate five-number summary (the Min, Q1, Median, Q3, Max).

Analysis involving connections or relationships between the variables will also be done. The pairing of variables to identify whether there is the existence of any relationship is considered as well as numerical summary measures and graphs as well. This will be in line to reflect the variables that show relationships and those that do not. In the analysis the variables represent the following:

Sales: represents the number of sales this week

Calls: represents the number of sales calls made this week

Time: represents the average number per call this week

Years: represents years of experience in the call center

Type: represents the type of training the employee received.

Presentation of the Results and Inferences

Numerical measures

Sales
Mean	43.35
Min	21
Q1	41.75
Median	44
Q2	47
Max	54
Standard Deviation	6.2
Variance	38.45

Numerical summary

Calls
Mean	158.9
Min	116
Q1	146
Median	157.5
Q2	173.2
Max	198
Standard Deviation	18.5
Variance	342.14

Numerical summary

Time
Mean	15.53
Min	10
Q1	13.2
Median	15.35
Q2	17.55
Max	22
Standard Deviation	2.72
Variance	7.31

Numerical summary for Years

Years
Mean	2.22
Min	0
Q1	1
Median	2
Q2	3
Max	5
Standard Deviation	1.26
Variance	1.59

Type
Mode	Online

Frequency
online	46
group	41
none	13

Interpretation

The above boxplots tell us the following,

All the three variables above that are sales, calls and time are quite close to normal as the middle black line is close to the center, with equal whiskers on both ends.

There are no outliers in the data for all three variables. Also, the data points are covered in the boxplot with no data set seen to be too high or too low.

The numerical measure summary simply shows the summary statistics of the 3 variables. These include meaning, median, upper quartile, lower quartile, variance, standard deviation, the maximum, and minimum. This helps ready to get a clear understanding of the data interpretation.

Scatter Plot for Sales against Calls.

From the above scatter diagram of sales against calls, we note that there is a positive linear relationship where the variables are uniformly spread both on the right and left. This indicates that both of the variables are linear predictors of one another and thus have a relationship against each other. There is a positive correlation which is close to one. When an OLS line is introduced as it can be seen above it cuts across all the data points which are uniformly spread and this strongly supports our relationship. Therefore as sales increases, the number of calls also increase, also a decrease in the number of calls decreases the sales.

Boxplot by Type is as follows

Interpretation

The boxplot above shows there are no outliers in the variables. It also indicates the level of skewness of the variables. Also, all the data points are covered within the boxplot. The ggboxplot also shows the relationship among all the variables. The boxplot also shows that the majority of the calls were online as it can be seen from the above figure, it was followed by group calls then finally the none type of calls which recorded the lowest number as it can be seen in each case. When it comes to time the none type represented the bigger whiskey followed by group type and finally the online type. In the sales part, the online type had the largest whiskey. Thus all the variables are good predictors of each other.

Conclusion

The type of mode use is online as it represents the highest number in the frequency table. The 3 variables whereby the boxplot was drawn also shown no outliers meaning the data was okay and analysis has no error that may arise from multicollinearity or data set is too large. The scatter plot of sales against calls also shows a positive linear relationship. When the OLS line is introduced it fits well with an equal amount of points lying on both sides that are left and right. This indicates a positive correlation with the r-value being close to one. Therefore this suggests that as the number of sales increases, the number of calls also increases. Also, a decrease in the number of sales decreases the number of calls. The boxplot for type with all the rest of the variables shows no outliers suggesting no data set seemed to be too high or too low.

The appendix represents the codes used.

> library(readxl)

> Project_Data_SALESCALL_2_1_ <- read_excel(“C:/Users/brayo-onyas/Desktop/New folder (11)/Project_Data_SALESCALL_2 (1).xlsx”)

> View(Project_Data_SALESCALL_2_1_)

sattach(Project_Data)

boxplot(`Sales (Y)`,data=Project_Data_SALESCALL_2_1_,ylab=”Values”,main=”Boxplots of the Sales Data”,col=”yellow”,horizontal=TRUE)

summary(Sales(Y))

attach(Project_Data_SALESCALL_2_1_)

summary(`Sales (Y)`)

boxplot(`Calls (X1)`,data=Project_Data_SALESCALL_2_1_,ylab=”Values”,main=”Boxplots of the Calls Data”,col=”purple”,horizontal=TRUE)

summary(`Calls (X1)`)

sd(`Calls (X1)`)

var(`Calls (X1)`)

boxplot(`Time (X2)`,data=Project_Data_SALESCALL_2_1_,ylab=”Values”,main=”Boxplots of the Time Data”,col=”blue”,horizontal=TRUE)

summary(`Time (X2)`)

sd(`Time (X2)`)

var(`Time (X2)`)

library(“ggpubr”)

ggscatter(Project_Data_SALESCALL_2_1_, x = `Calls (X1)`, y = `Sales (Y)`,add = “reg.line”, # Add regression lineconf.int = TRUE, # Add confidence intervaladd.params = list(color = “blue”,fill = “lightgray”))+stat_cor(method = “pearson”)

stat_cor(method = “pearson”) # Add correlation coefficient

ggscatter(Project_Data_SALESCALL_2_1_, x = `Sales (Y)`, y = `Calls (X1)`,

add = “reg.line”, # Add regression line

conf.int = TRUE, # Add confidence interval

add.params = list(color = “blue”,

fill = “lightgray”)

summary

cor(`Sales (Y)`~`Calls (X1)`)

ggboxplot(Project_Data_SALESCALL_2_1_, x = Type,y = c(`Sales (Y)`,`Calls (X1)`,`Time (X2)`),combine = TRUE,color = “Type”, palette = “jco”,ylab = “Values”,add = “jitter”,add.params = list(size = 0.1, jitter = 0.2), label = “Years”,label.select = list(top.up = 1, top.down = 1),font.label = list(size = 9, face = “italic”),repel = TRUE)

summary(`Years (X3)`)

sd(`Years (X3)`)

var(`Years (X3)`)

frequency(`Sales (Y)`)

summary

sd(`Years (X3)`)

count(Type)

table(Type)

Data Analysis

Pssst… we can write an original essay just for you.

Remember! This is just a sample.

Save time and get your custom paper from our expert writers