Introduction.
The data analysis write up report is an analysis of the variables on marital status, salary, family size, housing costs, and amount of electricity. From the analysis, it is possible to determine the most important variables hence minimizing costs. Variables without any significance or can be replaced with better variables are eliminated from a data set. Below is a description of the data for analysis.
Table 1: Data Description
Variable Name in Data set | Description | Type of variable(qualitative or quantitative |
Variable 1: ”SE-Marital Status” | The marital status of an individual i.e., married or not married. | Qualitative |
Variable 2: “SE-Income.” | The amount of income that an individual receives in a year. | Quantitative. |
Variable 3: “SE-Family size.” | The count of people that make up the family. | Quantitative. |
Variable 4: “USD-Housing.” | The cost of the premises in which the family or individual lives on US dollars per year. | Quantitative. |
Variable 5: “USD-Electricity.” | The cost of electricity that the family or individual consumers in US Dollars. | Quantitative. |
Data set description and method used for Analysis.
For the analysis, in this case, is driven by the need to get a feel of the data. Descriptive statistics allows an analyst to gain insights on data before carrying out any other tests on the data. The analysis in this section is carried out using the Data Analysis Tool pack in Excel.
Results
Variable 1: SE- Marital Status.
This variable describes the marital status of any individual under analysis and is presented in a binary qualitative manner taking the options of “married” and “not married” as the only acceptable outcomes. The descriptive statistics for this variable will, therefore, consist mainly of frequencies.
Numerical Summary
Table 2 Marital status variable description
Variable | n | Measure(s) of Central Tendency | Measure(s) of Dispersion |
Variable: SE-Marital Status. | N =30 | Frequencies: Married :15 Not-Married:15 | Binary qualitative variables cannot be used to calculate any measures of central tendency. |
The figure below shows the histogram for the data.
Figure 1 Histogram for frequency in marital status
The histogram in figure 1 above shows the frequencies of the two variables in the data. The data is evenly distributed as it consists of an equal number of married and not married people. This is proven by frequency statistics which indicates that the data has 15 marries persons and 15 not married persons. This is shown by the equal heights of the bars in the histogram.
Variable 2: SE-Income
The variables describe how much each individual under analysis earns. The variable is selected in numerical form implying that qualitative methods are suitable for analysis. The qualitative method used is descriptive statistics containing measures of variability and measures of central tendency. From the analysis, it will be possible to determine the distribution of salaries.
The table below shows descriptive statistics for the income levels.
Table 3 Descriptives for income
Variable | n | Measure(s) of Central Tendency | Measure(s) of Dispersion |
Variable: SE-Income. | N =30 | Mean: 100610.7667 Standard error: 1168.879276 Median: 96907 | Standard deviation: 6402.215466 Sample variance: 40988362.87
|
The figure below shows a histogram for the income variable.
Figure 2: Histogram for distribution of income
From the histogram in the figure above, as the salary increases, the number of people in that salary bracket decreases. The average salary received by an individual from the data in table 3 above is 100610.7667 while the median salary is 96907. The mean and median represent how much one is expected to be earning on average if picked out randomly from the sample. However, the two values have a big difference indicating that there are some people who earn extremely high amounts while a majority receives salary below the average. This is supported by the large value of the standard error of the mean. The standard deviation and sample variance also have extremely high values. The high values of the two measures of dispersion imply that a large percentage of the people earn a salary that is below the mean.
Variable 3: SE-Family Size.
This is the family size variable as is measured quantitatively. This allows for the calculation of descriptive statistics.
The table below shows the measures of central tendency and dispersion for the number of family members’ variable.
Table 4: Descriptives for family size.
Variable | n | Measure(s) of Central Tendency | Measure(s) of Dispersion |
Variable: SE-Income. | N =30 | Mean: 3.2 Standard error: 0.2368 Median: 3 Mode: 4 | Standard deviation: 1.29721186 Sample variance: 1.68275862 Range: 5
|
The figure below shows the histogram for the distribution of the number of members in the family.
Figure 3 Histogram for family size
From table 5 above, the average number of family members for each individual is about three people. The median is also equal to three people. Due to the close proximity of the mean and median, the data shows no signs of having an extremely high number of family members. The majority of the people had four family members. The standard deviation and sample variances from table 3 are relatively low having a value of between 1 and 1.5. This implies that the majority of the people have either 2, 3, or 4 family members. This is supported by the information from the histogram which shows that a majority of the people have either 2, 3, or 4 family members.
Variable 4: USD-Housing.
The next variable under analysis is the housing variable in dollars which is numerical. Descriptive statistics having measures of central tendency and dispersion are used on the data.
Below is a table for the descriptive statistics
Table 5: Descriptives for income
Variable | n | Measure(s) of Central Tendency | Measure(s) of Dispersion |
Variable: SE-Income. | N =30 | Mean: 21684.8667 Standard error: 632.326806 Median: 20607 | Standard deviation: 3463.3966 Sample variance: 11995115.7 Range: 9015
|
The figure below shows a histogram for the housing variable.
Figure 4 Histogram of income
The data from the histogram above indicates that the salaries are unevenly distributed. The mean amount of amount spent on housing is 21684.8667 while the median is 20607. The big difference supports the uneven distribution of housing expenditure differences among the people. The standard deviation and sample variance according to table 5 above, are very high. This implies that most salaries are located away from the mean.
Variable 5: USD-Electricity.
This is the amount spent on electricity in US dollars. Table 6 below shows the descriptive statistics for the variable.
Table 6. Descriptives on spending on electricity
Variable | n | Measure(s) of Central Tendency | Measure(s) of Dispersion |
Variable: USD-Electricity. | N =30 | Mean: 1463.4 Standard error: 18.4935901 Median: 1466.5 | Standard deviation: 101.293564 Sample variance: 10260.3862 Range: 423
|
The figure below shows the histogram for the data.
Figure 5: Histogram for spending on electricity.
The average amount of money spent on electricity is 1463.4 while the median is 1466.5. The values are very close implying that there is no extremely high spending on electricity. The standard deviation and sample variance from table 6 are also very low. This implies that most salaries are in close proximity to the mean. This is supported by the normal shaper of the histogram in figure 5 above.
Discussion and conclusion.
The variable on marital status is the cheapest. However, with the inclusion of the number of family members the cost can be cut by eliminating the variable. The income variable has the highest amount of variations among all the variables. This makes it the variable with the highest cost. To reduce cost, the inclusion of age brackets instead of specific ages is recommended. The family size describes the number of people in a household whether married or not married. It is quantitative and is the second cheapest. The amount spent on housing also shows high variations making it more expensive. The variations in electricity prices among the people does not show much variance. In conclusion, elimination of marital status is recommended.