This essay has been submitted by a student. This is not an example of the work written by professional essay writers.
Uncategorized

Data Analysis

Pssst… we can write an original essay just for you.

Any subject. Any type of essay. We’ll even meet a 3-hour deadline.

GET YOUR PRICE

writers online

Data Analysis

Introduction

This report entails data analysis involving the R statistical package. The five variables include sales, calls, years, time, and type. The analysis will involve summarizing the data, analyzing each variable by itself using graphical and numerical techniques of summarization. In terms of interpretation, stem-leaf diagram, frequency table, histogram, boxplot, dot plot, pie chart, and bar graph among others. Also appropriate measures of central tendency, the measures of dispersion, and the shapes of distributions (for the quantitative variable for the data provided). Where appropriate five-number summary (the Min, Q1, Median, Q3, Max).

Analysis involving connections or relationships between the variables will also be done. The pairing of variables to identify whether there is the existence of any relationship is considered as well as numerical summary measures and graphs as well. This will be in line to reflect the variables that show relationships and those that do not. In the analysis the variables represent the following:

Sales: represents the number of sales this week

Calls: represents the number of sales calls made this week

Time: represents the average number per call this week

Years: represents years of experience in the call center

Type: represents the type of training the employee received.

 

 

Presentation of the Results and Inferences

Numerical measures

Sales 
Mean43.35
Min21
Q141.75
Median44
Q247
Max54
Standard Deviation6.2
Variance38.45

 

Numerical summary

Calls 
Mean158.9
Min116
Q1146
Median157.5
Q2173.2
Max198
Standard Deviation18.5
Variance342.14

 

 

 

Numerical summary

Time 
Mean15.53
Min10
Q113.2
Median15.35
Q217.55
Max22
Standard Deviation2.72
Variance7.31

 

Numerical summary for Years

Years 
Mean2.22
Min0
Q11
Median2
Q23
Max5
Standard Deviation1.26
Variance1.59

 

Type 
Mode Online
  
Frequency 
online46
group41
none13

 

 

Interpretation

The above boxplots tell us the following,

All the three variables above that are sales, calls and time are quite close to normal as the middle black line is close to the center, with equal whiskers on both ends.

There are no outliers in the data for all three variables. Also, the data points are covered in the boxplot with no data set seen to be too high or too low.

The numerical measure summary simply shows the summary statistics of the 3 variables. These include meaning, median, upper quartile, lower quartile, variance, standard deviation, the maximum, and minimum. This helps ready to get a clear understanding of the data interpretation.

 

 

 

 

 

 

 

Scatter Plot for Sales against Calls.

From the above scatter diagram of sales against calls, we note that there is a positive linear relationship where the variables are uniformly spread both on the right and left. This indicates that both of the variables are linear predictors of one another and thus have a relationship against each other. There is a positive correlation which is close to one. When an OLS line is introduced as it can be seen above it cuts across all the data points which are uniformly spread and this strongly supports our relationship. Therefore as sales increases, the number of calls also increase, also a decrease in the number of calls decreases the sales.

 

 

Boxplot by Type is as follows

 

Interpretation

The boxplot above shows there are no outliers in the variables. It also indicates the level of skewness of the variables. Also, all the data points are covered within the boxplot. The ggboxplot also shows the relationship among all the variables. The boxplot also shows that the majority of the calls were online as it can be seen from the above figure, it was followed by group calls then finally the none type of calls which recorded the lowest number as it can be seen in each case. When it comes to time the none type represented the bigger whiskey followed by group type and finally the online type. In the sales part, the online type had the largest whiskey. Thus all the variables are good predictors of each other.

Conclusion

The type of mode use is online as it represents the highest number in the frequency table. The 3 variables whereby the boxplot was drawn also shown no outliers meaning the data was okay and analysis has no error that may arise from multicollinearity or data set is too large. The scatter plot of sales against calls also shows a positive linear relationship. When the OLS line is introduced it fits well with an equal amount of points lying on both sides that are left and right. This indicates a positive correlation with the r-value being close to one. Therefore this suggests that as the number of sales increases, the number of calls also increases. Also, a decrease in the number of sales decreases the number of calls. The boxplot for type with all the rest of the variables shows no outliers suggesting no data set seemed to be too high or too low.

The appendix represents the codes used.

> library(readxl)

> Project_Data_SALESCALL_2_1_ <- read_excel(“C:/Users/brayo-onyas/Desktop/New folder (11)/Project_Data_SALESCALL_2 (1).xlsx”)

> View(Project_Data_SALESCALL_2_1_)

sattach(Project_Data)

boxplot(`Sales (Y)`,data=Project_Data_SALESCALL_2_1_,ylab=”Values”,main=”Boxplots of the Sales Data”,col=”yellow”,horizontal=TRUE)

summary(Sales(Y))

attach(Project_Data_SALESCALL_2_1_)

summary(`Sales (Y)`)

boxplot(`Calls (X1)`,data=Project_Data_SALESCALL_2_1_,ylab=”Values”,main=”Boxplots of the Calls Data”,col=”purple”,horizontal=TRUE)

summary(`Calls (X1)`)

sd(`Calls (X1)`)

var(`Calls (X1)`)

boxplot(`Time (X2)`,data=Project_Data_SALESCALL_2_1_,ylab=”Values”,main=”Boxplots of the Time Data”,col=”blue”,horizontal=TRUE)

summary(`Time (X2)`)

sd(`Time (X2)`)

var(`Time (X2)`)

library(“ggpubr”)

ggscatter(Project_Data_SALESCALL_2_1_, x = `Calls (X1)`, y = `Sales (Y)`,add = “reg.line”, # Add regression lineconf.int = TRUE, # Add confidence intervaladd.params = list(color = “blue”,fill = “lightgray”))+stat_cor(method = “pearson”)

)+

stat_cor(method = “pearson”) # Add correlation coefficient

ggscatter(Project_Data_SALESCALL_2_1_, x = `Sales (Y)`, y = `Calls (X1)`,

add = “reg.line”, # Add regression line

conf.int = TRUE, # Add confidence interval

add.params = list(color = “blue”,

fill = “lightgray”)

)+

summary

cor(`Sales (Y)`~`Calls (X1)`)

ggboxplot(Project_Data_SALESCALL_2_1_, x = Type,y = c(`Sales (Y)`,`Calls (X1)`,`Time (X2)`),combine = TRUE,color = “Type”, palette = “jco”,ylab = “Values”,add = “jitter”,add.params = list(size = 0.1, jitter = 0.2), label = “Years”,label.select = list(top.up = 1, top.down = 1),font.label = list(size = 9, face = “italic”),repel = TRUE)

summary(`Years (X3)`)

sd(`Years (X3)`)

var(`Years (X3)`)

frequency(`Sales (Y)`)

summary

sd(`Years (X3)`)

count(Type)

table(Type)

 

 

  Remember! This is just a sample.

Save time and get your custom paper from our expert writers

 Get started in just 3 minutes
 Sit back relax and leave the writing to us
 Sources and citations are provided
 100% Plagiarism free
error: Content is protected !!
×
Hi, my name is Jenn 👋

In case you can’t find a sample example, our professional writers are ready to help you with writing your own paper. All you need to do is fill out a short form and submit an order

Check Out the Form
Need Help?
Dont be shy to ask