Main Menu

My Account
Online Free Samples
   Free sample   Data analytics assignment data processing of life expectancy data from who

Data Analytics Assignment: Data Processing Of Life Expectancy Data From WHO

Question

Task:
Data Analytics Assignment Overview
A data analytics project starts with collecting the data and ends with communicating the results from the data. In between, there are multiple steps that are required to be followed- data preprocessing is one of the most important steps among them. The data preprocessing step itself has multiple steps depending on the nature, type, value etc. of the data.

On the other hand, data visualisation uses visual representations to explore, make sense of, and communicate data that often includes charts, graphs, illustrations etc. Today, there is a move towards visualisation that can be observed among many big companies.

For this assignment, students are required to write 1,500 words report on a specific case study and explain the use and applications of data preprocessing and data visualisation techniques on a selected data set. Students can choose any suitable data set that is publicly available on the internet.

Students are required to select a data set and answer the following questions:

  • What is the purpose of the data set, and what kind of insights can be extracted from the chosen data set?
  • Have you applied any data cleaning approaches (e.g., missing value handling, noisy data handling) for the chosen data set? Explain in your own words what data cleaning approaches you have perform or why it was not required.
  • Have you applied any data transformation techniques (normalisation, attribute creation, discretisation etc.) for the chosen data set? What data transformation techniques you have performed or why it was not required to perform any transformation? Explain in your own words.
  • Have you applied any data reduction techniques (reduce dimension, reduce volume, balance data) ?If yes, then describe the data transformation technique(s) you have followed; otherwise, explain why no transformation techniques were not required.
  • Design an interactive dashboard using 3-4 charts/graphs/illustrations to represent the data.

Answer

Introduction
The concept of data mining explored in the data analytics assignment is one of the leading domains of Information Technology that is leading as well as directing the world. With the advent of technological advancement, everything can be converted into information. These collected data can be stored in the form of tables with relevant types of attributes and values. These large tables are called datasets and they are so called datasets because they are used in data mining and taking out crucial information through analysis. This assignment is based on the concept of data mining which takes in multiple sets of datasets that are used to predict information that is useful. The assignment is primarily focused on data preprocessing which contains data cleaning, data transformation, and data reduction. These three are the most important element of data pre – processing. This assignment utilizes a practical dataset obtained from Kaggle to apply the techniques of data pre – processing using Microsoft Excel.

Overview of the Data
The dataset is the list of countries and their life expectancy which is generated and prepared by the WHO. The importance of this dataset for learning and practising data cleaning, transformation, and reduction techniques. The biggest challenge in this situation is to conduct this pre – processing in MS Excel. The dataset contains 22 columns and 2938 rows of information. The columns are –

Country: Name of the country for which the rest data is given.

Year: The particular year in which the rest of the data was recorded.

Status: Status of the development of the country.

Life Expectancy: Rate of life expectancy.

Adult Mortality: The number of adults dying per thousand people in a year.

Infant Deaths: The number of infants dying per thousand people in a year.

Alcohol: Amount of alcohol per capita for the age 15+ consumption (litres).

Percentage Expenditure: Percentage of the GDP expenditure on health.

BMI: Average Body Mass Index of the population.

Under 5, deaths: Number of deaths of the children below 5 years per 1000 years.

Total Expenditure: Total expense made

GDP: Gross Domestic Product of the company in that particular year.

Population: Population of the country in a particular year.

Thinness (1-19 years): Average thickness of individual in a particular year

Thinness (5-9 years): Average thickness of individual in a particular year

Income Composition of Resources: The composition of income from resources

Schooling: Average schooling years

Data Preprocessing in data analytics 1

Data Preprocessing

Data Cleaning
Data Cleaning is the process of removing unwanted or unnecessary content or element from the dataset. The proposition of data cleaning in a dataset is primarily done to remove blank rows, cells, any kind of duplicate values. One can also say that data cleaning allows to make the dataset more representable and non – ambiguous. Data cleaning lets the analyst to work on data and remove all the loose ends that can increase their effort in further and deep analysis (Kathuria, Gupta and Singla, 2021).

Data cleaning will allow the dataset to become more readable and easier to understand, that’s why it will be comfortable to interpret. In incorporates consistency in the data. It provides accurate results and allows the analyst to make better decisions. There is no a particular set of tasks that must be carried out for data cleaning. It is done while working on the dataset and understanding as well as observing the dataset to determine the scope of cleaning.

In this data set, the cleaning will begin with auto – fitting the column width. This will make the data more presentable. This can be done by selecting all the columns and then click on ‘Format icon’ on Cells section in Home tab. Then select ‘AutoFit Column Width’. Then the next step is to change the data type of ‘Percentage Expenditure’ column to Percentage. This can be done by clicking on the (%) icon on Number section in Home tab. At the same place, there is an icon to increase the decimal significant point by shifting it to right. The GDP column must be associated with dollar sign along with reducing decimal to only three significant place. This same procedure should be followed for GDP too. The population column must be associated with comma (,) sign to mark the counting system. There are numerous blank cells in the dataset. This has to be deleted along with their complete row by clicking on Find & Select from Editing section of Home tab. Then select ‘Go To’ Special’. Then select Blanks. This will select all the blank the cells. Then on delete arrow in Cells section on Home tab. Then click ‘Delete Sheet Rows (Woo, Kim and Lee, 2020).

Before

Data Preprocessing in data analytics 2

After Data Cleaning

Data Preprocessing in data analytics 3

Data Transformation
Data transformation is one of the prominent steps in data pre – processing. The primary notion is to modify the worksheet and change certain elements to reduce its value or within the outliers. There are many techniques for data transformation such as normalization, decertation, attribute creation, etc. The primary attribute associated with data transformation is to change the format or numerical values of the data. Basically, it’s a ETL (Extract Transform Load) process which can be done by adding an excel from Get Data. In this case similar transformation can be done through power Query (Keskar, et al. 2021).

Normalization
The objective of normalization is to reduce its actual magnitude. In other words, it can be said that it is a way to reduce the scale of the data so that it falls in a smaller range.

For this we will normalize population because it’s quite large, so we can normalize it by using excel function. (=Standardize(Population cell, Mean, Standard Deviation).

Data Preprocessing in data analytics 4

Manipulation
In the following figure pivot table has been used to concise the data for a statical and narrowed analysis. The pivot table has been created using the columns – Country, Life Expectancy, Adult Mortality, Population, and BMI. The countries are grouped together and din’t had year wise distribution. N the other hand, every other column was in the average form (Rayat, 2018).

Row Labels

Average of Life expectancy

Average of Adult Mortality

Average of Population

Average of  BMI

Afghanistan

58.194

269.063

9972259.813

15.519

Albania

75.156

45.063

696911.625

49.069

Algeria

74.209

102.818

24124739.273

48.873

Angola

50.675

362.750

10107848.375

18.450

Argentina

75.238

100.385

20847453.538

54.485

Armenia

73.307

117.333

1063395.933

44.027

Australia

81.907

62.429

3541690.500

54.929

Austria

81.480

65.800

6330993.933

47.667

Azerbaijan

71.146

119.846

1906076.077

43.408

Bangladesh

69.967

135.667

45988342.500

14.442

Belarus

69.747

220.267

6164016.867

54.240

Belgium

80.653

69.933

2324699.000

50.040

Belize

69.153

154.200

157799.867

39.793

Benin

57.708

269.308

4143771.308

19.300

Bhutan

65.920

231.533

472931.533

17.120

Bosnia and Herzegovina

76.182

63.545

1813739.455

48.509

Botswana

55.407

460.933

1119512.067

31.867

Brazil

73.273

151.267

93830194.800

46.460

Bulgaria

72.740

124.733

5165119.467

53.753

Burkina Faso

57.233

224.000

7680183.333

16.644

Burundi

56.027

257.455

3842748.455

15.791

Cabo Verde

72.623

110.615

281645.846

24.238

Cambodia

66.433

154.778

6493229.556

16.489

Cameroon

54.860

305.500

8580129.900

25.780

Canada

82.233

66.750

14844926.833

54.267

Central African Republic

51.417

444.833

3072260.333

14.933

Chad

52.286

322.143

7677454.571

17.500

Chile

79.944

78.333

15251788.889

54.578

China

74.140

73.000

334124.733

21.067


Data Reduction
The most common method of reducing data during pre – processing is done either by removing some portion of the data or grouping two different data. The important aspect of grouping is the dimension, attributes, and domain of the data. The primary objective of conducting data transformation is to establish a relationship between different variables whether they are dependent or independent. It also allows you to minimize the sizer of worksheet to prevent any type of clogging due to presence of huge data. The primary notion of reducing data came into existence because of the concerns associated with storage. There are many ways to reduce data such as compression, data deduplication (Zanna, et al.).

In the given dataset, there are certain attributed that can be eradicated. Since the primary objective of the dataset was to showcase life expectancy of countries. This thing can be shown either country wise or year wise. Thus, rest other column will be removed and at first year wise life expectancy will be given and then country wise. This can be achieved by selecting the appropriate columns for the reduction. In this case, the columns that will be selected are country, year, and life expectancy.

Dashboard Design
The dashboard design contains all the important visualization that would be the results of the changes made in Data cleaning, transformation, and reduction process. It would also involve important insights (Tiew, Lim and Sivagnanasithiyar, 2020).

Data Reduction (Year wise, life expectancy)
The country is taken on the column side with year on the rows.

Data Preprocessing in data analytics 5

Data Reduction (Country wise, life expectancy)
The country is taken on the row side with year on the column.

Data Preprocessing in data analytics 6

Conclusion
The paper was based on pre – processing of data which is focused upon real - world problems or challenges. The content and techniques of pre – processing is a primary problem that must be understood and well known before beginning analysis. Analysts and Data Scientists are quite largely involved in the processes of data mining and analysis to standardize important information. The primary steps to mining and analysis are data cleaning, transformation, and reduction. Each of these tasks has numerous techniques and they are carried out for a particular purpose. The paper has used a dataset to represent these methodologies applied in MS Excel. The objective of this assignment was to practically imply theoretical information on real - world data. Data pre – processing seems to be extremely important as it reduces the burden of analysts up to a great extent. The paper has successfully employed the techniques in an appropriately applicable manner.

References
Kathuria, A., Gupta, A. and Singla, R.K., 2021. A Review of Tools and Techniques for Preprocessing of Textual Data. Computational Methods and Data Engineering, pp.407-422.https://link.springer.com/chapter/10.1007/978-981-15-6876-3_31

Woo, H., Kim, J. and Lee, W., 2020. Validation of Text Data Preprocessing Using a Neural Network Model. Mathematical Problems in Engineering, 2020.https://www.hindawi.com/journals/mpe/2020/1958149/

Keskar, V., Abdufattokhov, S., Phasinam, K., Wenda, A., Jagtap, S.T. and Ventayen, R.J.M., 2021. Big Data Preprocessing Frameworks: Tools and Techniques. Data analytics assignment Design Engineering, pp.1738-1746.http://www.thedesignengineering.com/index.php/DE/article/view/1729

Rayat, C.S., 2018. Applications of Microsoft Excel in Statistical Methods. In Statistical Methods in Medical Research (pp. 139-146). Springer, Singapore.https://link.springer.com/chapter/10.1007/978-981-13-0827-7_15

Zanna, B., Ibrahim, U.A., Goni, A.A. and Sanda, I.G., Challenges of Microsoft excel in statistical analysis on Education: A case study of federal college of freshwater fisheries technology, Baga, Borno State, Nigeria.https://www.allmultidisciplinaryjournal.com/archivesarticle/2021.v2.i4.338.pdf

Tiew, S., Lim, C. and Sivagnanasithiyar, T., 2020. Using an excel spreadsheet to convert Snellen visual acuity to LogMAR visual acuity. Eye, 34(11), pp.2148-2149.https://www.nature.com/articles/s41433-020-0783-6

NEXT SAMPLE

Related Samples

Question Bank

Looking for Your Assignment?

Search Assignment
Plagiarism free Assignment

FREE PARAPHRASING TOOL

PARAPHRASING TOOL
FREE PLAGIARISM CHECKER

FREE PLAGIARISM CHECKER

PLAGIARISM CHECKER
FREE PLAGIARISM CHECKER

FREE ESSAY TYPER TOOL

ESSAY TYPER
FREE WORD COUNT AND PAGE CALCULATOR

FREE WORD COUNT AND PAGE CALCULATOR

WORD PAGE COUNTER



AU ADDRESS
9/1 Pacific Highway, North Sydney, NSW, 2060
US ADDRESS
1 Vista Montana, San Jose, CA, 95134
ESCALATION EMAIL
support@totalassignment
help.com