Skip to content

Analyzed the distribution of Indian census data using R, applied the central limit theorem and compared different sampling methods.

License

Notifications You must be signed in to change notification settings

AmoghKatwe/Exploratory-data-analysis-using-R

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exploratory-data-analysis-using-R

Analyzed the distribution of Indian census data using R, applied the central limit theorem and compared different sampling methods.

Abstract

The project is based on picking up a real-life Dataset, preparing and pre-processing the Dataset so that it can be analysed using various methods and draw graphs from them.

Introduction

Population Census is the total process of collecting, compiling, analyzing or otherwise disseminating demographic, economic and social data pertaining, at a specific time, of all persons in a country or a well-defined part of a country. As such, the census provides snapshot of the country’s population and housing at a given point of time.

Census of India is a rich database which can tell stories of over a billion Indians. This database has been extracted from Census of 2001 and includes data of 590 districts and 34 States having around 80 variables each.

Source

This Dataset is picked from Kaggle. https://www.kaggle.com/bazuka/census2001

Analysis for Categorical Data:

It is the qualitative data that is associated with a property or a quality. Generally, to represent the frequency of various categories, we use bar plot and pie chart. Here we check the Religions.

Analyzing Numeric data:

It is the quantitative data which is associated with numeric measurement. To graphically represent the numerical data, we generally use histogram, bar plot and dot chart. Here we check the number of Males with respect to each State.

Applying the Central Limit Theorem:

The central limit theorem states that the distribution of the sample means for a given sample size of the population has the shape of the normal distribution.

Sampling Methods:

1. Simple Random Sampling

In simple random sampling, every item from a frame has the same chance of selection from the sample as every other item.

2. Systematic Sampling

In systematic sampling, sample members from a larger population are selected.

3. Stratified Sampling

In stratified sampling, the items from the frame are subdivided into separate subgroups called strata. Simple random samples are selected from each stratum and combined for the desired sample of size n.

4. Clustering

In cluster sampling, the data is divided into groups called clusters. These clusters should mirror the entire data. A random sample of these clusters is then collected and analyzed.

5. Confidence Level

The confidence level is the confidence that the confidence interval contains the data mean.

About

Analyzed the distribution of Indian census data using R, applied the central limit theorem and compared different sampling methods.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages