This essay has been submitted by a student. This is not an example of the work written by professional essay writers.
Uncategorized

HAVE TO CHOOSE YOUR OWN DATASET

Pssst… we can write an original essay just for you.

Any subject. Any type of essay. We’ll even meet a 3-hour deadline.

GET YOUR PRICE

writers online

You need to choose a data set where you can explore the concepts learned in your theory class using R and the data management tools learned in the lab.

I expect you to choose a dataset with more than 500 observations. Pick a dataset easy enough that you can manipulate it and understand its content but complex enough that you can use it to do different data analysis and visualizations.HAVE TO CHOOSE YOUR OWN DATASET

Here are the requirements and guidelines for this first deliverable:

  1. You have to submit an HTML document that is the result of knitting a Rmarkdown document.
  2. This document should contain an introduction explaining why you chose your data set, and what are you planning to investigate using that data set (Basically a short research proposal).
  3. You should state the source of your data set (I will provide with a list below of good sources of data, however, you are free to choose a data set from any source or topic of your interest).
  4. You have to download your dataset into R and make a summary of it and/or show its structure. (summary(dataframe) or str(dataframe))
  5. Your document should contain your code and output.
  6. At least 70% of your code should be commented explaining what each line of code does.

 

 

 

 

 

 

 

 

 

 

 

 

Some data sources (you don’t need to choose your data set from this list, but these are great suggestions):

U.S. Government data https://www.data.gov/ (Links to an external site.)Links to an external site.

The Data and Story Library https://dasl.datadescription.com/ (Links to an external site.)Links to an external site.

United Nations Data http://data.un.org/ (Links to an external site.)Links to an external site.

This article provides you with a bunch of data resources https://www.springboard.com/blog/free-public-data-sets-data-science-project/ (Links to an external site.)Links to an external site.

UNICEF (Links to an external site.)Links to an external site. offers statistics on the situation of women and children worldwide.

World Health Organization (Links to an external site.)Links to an external site. offers world hunger, health, and disease statistics.

http://www.internationaleconomics.net/data.html (Links to an external site.)Links to an external site.

World Bank Data (Links to an external site.)Links to an external site. hundreds of datasets spanning many decades, sortable by topic or country. Data is downloadable in Excel or XML formats.

Here are the common mistakes that I observed in choosing your data set for your final project.

  • Data structure. Your data set has to have a cross-sectional data structure. That means observations in rows (individuals or subject of analysis) and attributes or variables in the columns.
  • Your variables are mostly characters. Either you entered the data incorrectly into R and it is reading your numerical variables as characters. Sometimes it happens because there is a row with a character in your numeral variables and R recognized the whole variable as a character. Or your variables are mostly categorical. You have to think if you want your regression analysis only have categorical (factor variables).
  • See the guidelines for the final project in the Rmd template that I am making so that you use this to create your project.

 

 

 

 

Project Template:

Introduction Research Idea

Describe your research idea, give a broad idea of the data, the statistical analysis that you want to estimate and describe the dependent and independent variables.

# The data set

Shortly describe your data source and data. Why you think this data is adequate to answer your research question. Do the required manipulations so that your data have self explanatory names and if not describe those variables shortly.

 

Choose only the variables that are part of analysis. Create the data set with those, do not print your all your dataset into the Rmarkdown document.

 

If you have issues with your data structure it might be because it is not in the right format. Use the package tidyr that allows you to improve and change the format of your dataset. Also that teaches you how to change the names of your variables.

# In this chunk of code load your data set into Rmarkdown

 

# str(data)

# use stargazer to do a table of the summary statistics of your dataset

# summary(data)

# Graphs

Explain some of the important descriptive statistics of your data set by using data visualization.

Remember to use GGplot

# Your analysis

 

Your research idea executed. Here you can show

 

Correlations

Means

Variances

Standard deviations

Samples

 

What ever you are planing to investigate with your data do it here remember to use the package dplyr when possible.

Remember to use stargazer and or kable if you do tables, if possible.

 

See that in the chunk of code below if you use stargazer you nned to use the option results=’asis’

 

“`{r, results=’asis’}

 

# Inference

 

Do some hypotesis testing and confidence intervals to support your claims about your data in the previous section.

Explain your results

 

 

 

  Remember! This is just a sample.

Save time and get your custom paper from our expert writers

 Get started in just 3 minutes
 Sit back relax and leave the writing to us
 Sources and citations are provided
 100% Plagiarism free
error: Content is protected !!
×
Hi, my name is Jenn 👋

In case you can’t find a sample example, our professional writers are ready to help you with writing your own paper. All you need to do is fill out a short form and submit an order

Check Out the Form
Need Help?
Dont be shy to ask