Analysis of data refers to seeing the data in the light of hypotheses or research problems and the prevailing theories and inferring conclusions amenable to theory as possible. In data analysis, we combine together number of questions thereby creating in new variable and analyze the interdependence between questions, variable and goals e.g. How new students select universities for new admissions?
We have to examine the following:
a) How institutions / colleges are perceived by the students.
b) They may be asked how do they rate a particular college and why they prefer one particular college over others. The various options may be due to:
c) Students may be asked to evaluate the importance of each of the attributes. It has too many variables or attributes.
Many of these may be redundant.
The
procedures involved in the analysis of data include:
i)
Classification or Editing of Data
ii)
Coding
iii)
Tabulation of responses
iv)
Statistical analysis of data
v)
Inferences about casual relations among variables
EDITING–FIELD EDIT, OFFICE EDIT & PREVENTING ERRORS
Field Edit
A preliminary or
field edit is a quick examination of completed data collection forms on the
same day they are filled out.
In the lifestyle research study, even a cursory examination of completed questionnaires immediately after the researcher received them from the interviewers would have indicated -the interviewer who was checking both male and female on some questionnaires.
The errant interviewer could then have been re-instructed about the quota sampling requirements and asked to repeat the defective interviews using correct procedures.
Even if, repeating the interviews was not feasible, the mistake could have been prevented in later interviews and the number of wasted interviews would have been avoided.
In the lifestyle research study, even a cursory examination of completed questionnaires immediately after the researcher received them from the interviewers would have indicated -the interviewer who was checking both male and female on some questionnaires.
The errant interviewer could then have been re-instructed about the quota sampling requirements and asked to repeat the defective interviews using correct procedures.
Even if, repeating the interviews was not feasible, the mistake could have been prevented in later interviews and the number of wasted interviews would have been avoided.
A field edit serves two objectives:
Indeed, in central-location, computer-based telephone or Internet interviewing, some field editing can and should be done by a supervisor as the interviews are taking place.
- Ensures that proper procedures are followed in selecting respondents,
- Interviewing them and recording their responses as well as to remedy fieldwork deficiencies before they turn into major problems.
Indeed, in central-location, computer-based telephone or Internet interviewing, some field editing can and should be done by a supervisor as the interviews are taking place.
Typical problems a field edit can
reveal include:
- inappropriate respondents,
- incomplete interviews,
- illegible or unclear responses.
Office Edit
An office edit:
- verifies response’ consistency and accuracy,
- makes necessary corrections and
- determines whether some or all parts of a data collection should be discarded.
Case
1: A
respondent said he was 18 years old but indicated that he had a doctoral degree
when asked for his highest level of education.
Case 2: On a questionnaire
containing a mixture of positive and negative Likert scale items, a respondent
“strongly agreed” with all of them.
Case 3: In response to the question
“What is the most expensive-purchase you have made in the last one month?”
three respondents gave the following answers: respondent 1, “a new motor
car”; respondent 2, “a vacation in New York”; respondent 3, “Water, gas and
electricity for my house.”
In case 1, the responses to the age
and education questions appear to be inconsistent. Case 2 involves a set of
responses that are too consistent. Since
the questionnaire contained a mixture of positive and negative items, a
respondent who agreed with all such items was obviously being frivolous or
inattentive in providing invalid answers.
In this case, the office editor may have no alternative but throw the
entire questionnaire out.
Case 3 depicts a different type of
editing problem, namely consistency or comparability of responses across
questionnaires to the same question.
Though the answers given by all three respondents are legitimate, they
appear to be based on different frames of reference. The major issue facing the
office editor is determining how these diverse responses should be coded. One
way to improve upon the question would be to re-frame it, since it is not
specific as to the type of purchase (necessity versus luxury). The specific
information objectives of the study play a critical role in coding responses.
In fact, before beginning the editing process, the researcher should establish
a detailed set of guidelines, preferably in writing, for interpreting and
categorizing open-ended responses.
This sample of editing problems is
certainly not exhaustive. A few final
points about editing should be discussed.
- A number of potential editing problems can be avoided through careful planning before fieldwork begins. Preventing ambiguous, inappropriate or incorrect responses is better than attempting to correct data errors after they occur. Editing is not a panacea for all data quality issues and it is a serious flaw to view it as such.
- When the collected data are already in computer in the case of Internet or computer-assisted interviews, editing can be done thoroughly and efficiently. Editing tasks that are difficult or impossible to complete manually especially in large-scale surveys, and can be done easily through computer editing. For instance, a computer can be programmed to check for such things as whether response values are within pre-specified ranges, whether responses to key questions are consistent with those to related questions, or whether a respondent's pattern of answers deviates substantially from the average pattern. Problem responses and respondents can thus be brought to the office editor's notice quickly.
- The role editing can play in improving data quality is much more confined in mail surveys than in personal interview or telephone surveys. A mail survey researcher has little control over data collection once the questionnaires are mailed out. Therefore, the only editing possible in mail surveys is a limited office edit.
The process of editing is
not limited to evaluating data collected through questionnaires. Editing can
also check the quality of data collected through observation.
In most survey research projects, the process of editing, especially the office edit, goes hand in hand simultaneously with the process of coding.
i) Classification of Data: Most of the studies involve a
large number of responses of different kinds to question asked to the sample
which may be verbal or non-verbal. These questions must be grouped into a
limited number of categories. The responses may be groups into ‘yes’, ‘No’, ‘Do
not know’ ‘Did not reply’ categories or may be categorized into ‘High’ ‘Middle’
‘Low categories. In order to determine the categories the researcher must
choose some appropriate basis of classification.
ii) CODING: Coding broadly refers to the set of all tasks associated with
transforming edited responses into a form that is ready for analysis. Emphasis
here will be on questionnaires used in conclusive research projects, which
invariably rely on large sample sizes and computer data analyses. Exploratory research projects are
characterized by fairly informal data collection and analysis procedures. Hence
a formal coding process is typically not necessary in such projects.
1)
Transforming responses to each
question into a set of meaningful categories.
2)
Assigning numerical codes to
the categories, and
3)
Creating a data set suitable
for computer analysis.
Transforming Responses into Useful
Categories
How difficult and time consuming
this step is depends on the degree to which the questionnaire is structured. A
structured question is pre-categorized; that is, it has a set of fixed-response
categories. Responses to a non-structured or open-ended question have to be
grouped into a meaningful and manageable set of categories, a task that can be
laborious if respondents' answers vary widely.
A special problem in coding
responses to open-ended as well as structured questions relates to the
treatment of “don’t know” responses. A
“don’t know” might be a legitimate response; that is, the respondent could not
honestly answer this type of question.
Or it might represent an interviewing failure; that is, the respondent
had an answer but for some reason did not divulge it. An editor/coder must
ascertain which of these two interpretations of “don’t know” is correct. However, this task is not simple as it
appears, except in certain cases. For example, a “don’t know” answer to the query
“Do you have any credit card?” is most probably an interviewing failure. But a
“don’t know” answer to the question “Do you favor or oppose spending public
funds to support certified abortion clinics?” This may or may not be an
interviewing failure.
There are no simplified methods for treating
“don’t know” responses. A single approach
is to infer a real response, that is, make an educated guess about what the
answer might have been on the basis of the answers to other questions. For example, a respondent’s likely income
bracket might be subjectively estimated from his or her age, education level
and occupation. However, this approach
is fraught with questionable assumptions and hence is of dubious validity. A
safer and more defensible approach is to simply classify the “don’t know’” as a
separate response category for each question. If legitimate “don’t know’” can
be distinguished from those that are interviewing failures, the researcher
should report the latter separately as missing values.
A missing-value category codes
questions for which answers should have been obtained but for some reason were
not. A missing value can stem from a
respondent’s refusal to answer a question, an interviewer’s failure to ask a
question or record an answer, or a “don’t know” that does not seem legitimate. Sound questionnaire design, tight control
over fieldwork and a thorough field edit can help reduce, but not necessarily
eliminate, the occurrence of missing values. Questions affected by a large
number of missing values, however, invariably indicate a poorly designed
questionnaire and/or shoddy fieldwork.
In such a case, researchers should be vigilant during subsequent
analysis and interpretation of the data.
ASSIGNING NUMERICAL CODES
Assign appropriate numerical codes
to responses that are not already in quantified form is next step. The purpose of numerical coding is to
facilitate computer manipulation and analysis of the responses. The researcher must keep these measurement
levels in mind while analyzing and interpreting the quantified responses. Two
things have to be kept in mind:
- Each survey question has just one corresponding variable as each question in the survey of had one and only one, possible response.
- The entries under the variable Name column were the symbols used in the survey to identify the respective variable during computer analysis.
PRELIMINARY DATA ANALYSIS USING BASIC
DESCRIPTIVE STATISTICS
Before analyzing a data set using
statistical techniques, a researcher identify what the data are like. The aim of preliminary data analysis is to identify
features of the basic composition of the data collected. It can also provide useful insights in to the
research objectives and suggest meaningful approaches for further analysis of
the data.
Preliminary data
analysis examines the central tendency and the dispersion of the data on each
variable in the data set. The measurement level of a variable has a bearing on
which measures of central tendency and dispersion will be appropriate it.
Measures of Central Tendency
The common measures of central
tendencies include mode, median and mean.
Mode: The mode is the most frequently occurring value for a variable in a
data set. It is an appropriate measure
for data that are grouped into categories.
Table: Measures of Central Tendency and Dispersion
for Different Types of Variables.
e transformations.
Mean:
The mean is the simple average of the various responses pertaining to a
variable. It is by far the most widely used and easiest measure to work with.
It is computed it by summing all the values and dividing by the number of valid
cases. For computing the mean, the
variables must at least have interval measurement properties. The mean uses all the data pertaining to a
variable and therefore information is not lost as it is in computing the median
or the mode. However, a few extreme responses or “outliers,” if present in a
data set, can dominate the mean and result; in a distorted picture of central
tendency.
Measures of Dispersion
Measures of
dispersion describe how the data are clustered around the central value. Along
with measures of central tendencies, measures of dispersion provide a richer
description of the data. The most
commonly used measures dispersions are range and standard deviation. These measures are appropriate only if the
level of measurement is interval or ratio.
Variance and Standard Deviation: The
variance of a set of data is a measure of deviation of the data around the
arithmetic mean. We calculate it as the average of squared deviations around
the mean. The standard deviation is the
square root of the variance and is the most popular measure of variability. The
standard deviation is easy to interpret because it is expressed in the same
units as the mean.
Like the arithmetic mean, the standard
deviation is also influenced by extreme values and should not be used if the
distribution of responses to a question\is highly skewed. Marketing researchers
rely on standard deviation most often in descriptive and inferential
statistics. A simple way to uncover the
central tendency and/or dispersion of data for virtually any variable is to
construct a one-way table.
FREQUENCY DISTRIBUTION:
A one-way table is a table showing
the distribution of data pertaining to categories of a single variable. Virtually all computer analysis packages are
capable of generating the frequency distribution for any variable in a data
set. One-way tables are particularly appropriate for examining data on nominal
and ordinal-scaled variables since they normally have only a limited set of
discrete-response categories.
In addition to
revealing the general nature of the data, one-way tables offer some very
specific benefits. First, they are
helpful in detecting certain types of coding errors. Human errors can occur at
various stages of the coding process. Thus, in general, preliminary one-way
tabulation can facilitate data cleaning by pointing out glaring coding errors.
However, we cannot detect less obvious errors
Second, one-way tables can provide
valuable insights through comparisons with other relevant distributions. They
can be especially helpful in understanding the composition of the respondent
group and in looking for evidence of non-response error. For instance, comparing the frequency distribution
of respondents on key demographic variables with appropriate distributions for
the population as a whole or for non-respondents will indicate how
representative the collected data are.
In summary one-way tabulations,
while not as sophisticated as other data analysis techniques can be just as
insightful. Indeed, using complex techniques to analyze data without first
understanding what the data look like may produce meaningless results and lead
to erroneous conclusions. Even such a
simple thing as computing the mean value for a variable, without having an idea
about the variable’s response distribution, can be purposeless.
No comments:
Post a Comment