Friday, May 09, 2014

DATA ANALYSIS


Analysis of data refers to seeing the data in the light of hypotheses or research problems and the prevailing theories and inferring conclusions amenable to theory as possible. In data analysis, we combine together number of questions thereby creating in new variable and analyze the interdependence between questions, variable and goals e.g. How new students select universities for new admissions?  


 We have to examine the following: 
a) How institutions / colleges are perceived by the students. 
b) They may be asked how do they rate a particular college and why they prefer one particular college over others.  The various options may be due to:
 c) Students may be asked to evaluate the importance of each of the attributes.  It has too many variables or attributes.  
 Many of these may be redundant.

The procedures involved in the analysis of data include:
i)                    Classification or Editing of Data
ii)                  Coding
iii)                Tabulation of responses
iv)                Statistical analysis of data
v)                  Inferences about casual relations among variables

EDITING–FIELD EDIT, OFFICE EDIT & PREVENTING ERRORS

Field Edit
A preliminary or field edit is a quick examination of completed data collection forms on the same day they are filled out.  
 In the lifestyle research study, even a cursory examination of completed questionnaires immediately after the researcher received them from the interviewers would have indicated -the interviewer who was checking both male and female on some questionnaires.   
The errant interviewer could then have been re-instructed about the quota sampling requirements and asked to repeat the defective interviews using correct procedures. 
Even if, repeating the interviews was not feasible, the mistake could have been prevented in later interviews and the number of wasted interviews would have been avoided.

A field edit serves two objectives: 
  • Ensures that proper procedures are followed in selecting re­spondents, 
  • Interviewing them and recording their responses as well as to remedy fieldwork deficiencies be­fore they turn into major problems.  
 Speed is crucial for an effective field edit: it must be done at the time the study is in progress, preferably at the end of each interviewing day and especially at the end of the first day of interviewing. 

Indeed, in cen­tral-location, computer-based telephone or Internet interviewing, some field editing can and should be done by a supervisor as the interviews are taking place.

Typical problems a field edit can reveal include:
  • inappropriate respondents, 
  • incomplete interviews, 
  • illegible or unclear re­sponses.
 On identifying prob­lems of this nature, the field editor, who is usually the person in charge of supervising the fieldwork, must ask an explanation from the interviewer imme­diately, when the interview is still fresh in the interviewer’s memory.

Office Edit

An office edit: 
  • verifies response’ consistency and accuracy, 
  • makes necessary corrections and 
  • determines whether some or all parts of a data collection should be discarded.   
This stage of editing is conducted after all the field-edited data collection forms are available in a central location.  In computer-based telephone or Internet interviewing, there are typically no physical questionnaires and interviewers store the collected data directly in computer memory.  Final editing of such data can be done with the help of a computer.  An office edit is more thorough than a field edit and the task of an office editor is somewhat more complex than that of a field editor. The following cases illustrate the sorts of problems an office editor may have face.

Case 1:   A respondent said he was 18 years old but indicated that he had a doctoral degree when asked for his highest level of education.

Case 2:   On a questionnaire containing a mixture of positive and negative Likert scale items, a respondent “strongly agreed” with all of them.

Case 3:  In response to the question “What is the most expensive-purchase you have made in the last one month?” three respondents gave the following answers: respondent 1, “a new motor car”; respondent 2, “a vacation in New York”; respondent 3, “Water, gas and electricity for my house.”

In case 1, the responses to the age and education questions appear to be in­consistent. Case 2 involves a set of responses that are too consistent.  Since the questionnaire contained a mixture of positive and negative items, a respondent who agreed with all such items was obviously being frivolous or inattentive in providing invalid answers.  In this case, the office editor may have no alternative but throw the entire questionnaire out.

Case 3 depicts a different type of editing problem, namely consistency or comparability of responses across questionnaires to the same question.  Though the answers given by all three respondents are legitimate, they appear to be based on different frames of reference. The major issue facing the office editor is determining how these diverse responses should be coded. One way to improve upon the question would be to re-frame it, since it is not specific as to the type of purchase (necessity versus luxury). The specific information objectives of the study play a critical role in coding responses. In fact, before beginning the editing process, the researcher should establish a detailed set of guidelines, preferably in writing, for interpreting and categorizing open-ended responses.

This sample of editing problems is certainly not exhaustive.  A few final points about editing should be discussed.  
  1.  A number of potential editing problems can be avoided through careful planning before fieldwork be­gins. Preventing ambiguous, inappropriate or incorrect responses is better than attempting to correct data errors after they occur.  Editing is not a panacea for all data quality issues and it is a serious flaw to view it as such.
  2. When the collected data are already in computer in the case of Internet or computer-assisted interviews, editing can be done thoroughly and efficiently.  Editing tasks that are difficult or impossible to complete manually especially in large-scale surveys, and can be done easily through computer editing.  For instance, a computer can be programmed to check for such things as whether response values are within pre-specified ranges, whether responses to key ques­tions are consistent with those to related questions, or whether a respondent's pattern of answers deviates substantially from the average pattern. Problem re­sponses and respondents can thus be brought to the office editor's notice quickly.
  3. The role editing can play in improving data quality is much more confined in mail surveys than in personal interview or telephone surveys.  A mail survey researcher has little control over data collection once the questionnaires are mailed out. Therefore, the only editing possible in mail surveys is a limited of­fice edit.
The process of editing is not limited to evaluating data collected through questionnaires. Editing can also check the quality of data collected through observation.

In most survey research projects, the process of editing, especially the office edit, goes hand in hand simultaneously with the process of coding. 
i)             Classification of Data: Most of the studies involve a large number of responses of different kinds to question asked to the sample which may be verbal or non-verbal. These questions must be grouped into a limited number of categories. The responses may be groups into ‘yes’, ‘No’, ‘Do not know’ ‘Did not reply’ categories or may be categorized into ‘High’ ‘Middle’ ‘Low categories. In order to determine the categories the researcher must choose some appropriate basis of classification.

ii)            CODING: Coding broadly refers to the set of all tasks associated with transforming edited responses into a form that is ready for analysis. Emphasis here will be on questionnaires used in conclusive research projects, which invariably rely on large sample sizes and computer data analyses.  Exploratory research projects are characterized by fairly informal data collection and analysis procedures. Hence a formal coding process is typically not necessary in such projects. 
Coding involves the sequence of steps.

1)      Transforming responses to each question into a set of meaningful categories.
2)      Assigning numerical codes to the categories, and
3)      Creating a data set suitable for computer analysis.


Transforming Responses into Useful Categories

How difficult and time consuming this step is depends on the degree to which the questionnaire is structured. A structured question is pre-categorized; that is, it has a set of fixed-response categories. Responses to a non-structured or open-ended question have to be grouped into a meaningful and manageable set of categories, a task that can be laborious if respondents' answers vary widely.

A special problem in coding responses to open-ended as well as structured questions relates to the treatment of “don’t know” responses.  A “don’t know” might be a legitimate response; that is, the respondent could not honestly answer this type of question.  Or it might represent an interviewing failure; that is, the respondent had an answer but for some reason did not divulge it. An editor/coder must ascer­tain which of these two interpretations of “don’t know” is correct.  However, this task is not simple as it appears, except in certain cases. For example, a “don’t know” answer to the query “Do you have any credit card?” is most probably an interviewing failure. But a “don’t know” answer to the question “Do you favor or oppose spending pub­lic funds to support certified abortion clinics?” This may or may not be an interviewing failure.

There are no simplified methods for treating “don’t know” responses.  A single ap­proach is to infer a real response, that is, make an educated guess about what the answer might have been on the basis of the answers to other questions.  For example, a respondent’s likely income bracket might be subjectively estimated from his or her age, education level and occupation.  However, this approach is fraught with questionable assumptions and hence is of dubious validity. A safer and more defensible approach is to simply classify the “don’t know’” as a separate response category for each question. If legitimate “don’t know’” can be distin­guished from those that are interviewing failures, the researcher should report the latter separately as missing values.

A missing-value category codes questions for which an­swers should have been obtained but for some reason were not.  A missing value can stem from a respondent’s refusal to answer a question, an interviewer’s failure to ask a question or record an answer, or a “don’t know” that does not seem legiti­mate.  Sound questionnaire design, tight control over fieldwork and a thorough field edit can help reduce, but not necessarily eliminate, the occurrence of miss­ing values. Questions affected by a large number of missing values, however, in­variably indicate a poorly designed questionnaire and/or shoddy fieldwork.  In such a case, researchers should be vigilant during subsequent analysis and inter­pretation of the data.

ASSIGNING NUMERICAL CODES

Assign appropriate numerical codes to responses that are not already in quantified form is next step.  The purpose of numerical cod­ing is to facilitate computer manipulation and analysis of the responses.  The researcher must keep these measurement levels in mind while analyzing and interpreting the quantified responses. Two things have to be kept in mind: 
  1. Each survey question has just one corresponding variable as each question in the survey of had one and only one, possible response. 
  2. The entries under the variable Name column were the symbols used in the survey to identify the respective variable during computer analysis.
PRELIMINARY DATA ANALYSIS USING BASIC DESCRIPTIVE STATISTICS

Before analyzing a data set using statistical techniques, a researcher identify what the data are like.  The aim of preliminary data analysis is to identify features of the basic composition of the data collected.  It can also provide useful insights in to the research objectives and suggest meaningful approaches for further analysis of the data.
Preliminary data analysis examines the central tendency and the dispersion of the data on each variable in the data set. The measurement level of a variable has a bearing on which measures of central tendency and dispersion will be appropriate it.

Measures of Central Tendency

The common measures of central tendencies include mode, median and mean.

Mode: The mode is the most frequently occurring value for a variable in a data set.  It is an appropriate measure for data that are grouped into categories.

Table:  Measures of Central Tendency and Dispersion for Different Types of Variables.

e transforma­tions.
Mean: The mean is the simple average of the various responses pertaining to a variable. It is by far the most widely used and easiest measure to work with. It is computed it by summing all the values and dividing by the number of valid cases.  For computing the mean, the variables must at least have interval measurement properties.  The mean uses all the data pertaining to a variable and therefore information is not lost as it is in computing the median or the mode. However, a few extreme responses or “outliers,” if present in a data set, can dominate the mean and result; in a distorted picture of central tendency.

Measures of Dispersion

Measures of dispersion describe how the data are clustered around the central value. Along with measures of central tendencies, measures of dispersion provide a richer description of the data.  The most commonly used measures dispersions are range and standard deviation.  These measures are appropriate only if the level of measurement is interval or ratio.

Variance and Standard Deviation: The variance of a set of data is a measure of deviation of the data around the arithmetic mean. We calculate it as the average of squared deviations around the mean.  The standard deviation is the square root of the variance and is the most popular measure of variability. The standard deviation is easy to interpret because it is expressed in the same units as the mean.

Like the arithmetic mean, the standard deviation is also influenced by ex­treme values and should not be used if the distribution of responses to a question\is highly skewed. Marketing researchers rely on standard deviation most often in descriptive and inferential statistics.  A simple way to uncover the central tendency and/or dispersion of data for virtually any variable is to construct a one-way table.



FREQUENCY DISTRIBUTION:

A one-way table is a table showing the distribution of data pertaining to cate­gories of a single variable.  Virtually all computer analysis packages are capable of generating the frequency distribution for any variable in a data set. One-way tables are particularly appropriate for examining data on nominal and ordinal-scaled variables since they normally have only a limited set of discrete-response categories.

In addition to revealing the general nature of the data, one-way tables offer some very specific benefits.  First, they are helpful in detecting certain types of coding errors. Human errors can occur at various stages of the coding process. Thus, in general, preliminary one-way tabulation can facilitate data cleaning by pointing out glaring coding errors. However, we cannot detect less obvious errors

Second, one-way tables can provide valuable insights through comparisons with other relevant distributions. They can be especially helpful in understanding the composition of the respondent group and in looking for evidence of non-response error.  For instance, comparing the frequency distribution of respondents on key demographic variables with appropriate distributions for the population as a whole or for non-respondents will indicate how representative the collected data are.
 
Another useful technique is to compare one-way tables across similar vari­ables within a given survey or across the same variable measured at different times. Third, one-way tables can suggest potentially useful variable

In summary one-way tabulations, while not as sophisticated as other data analysis techniques can be just as insightful. Indeed, using complex techniques to analyze data without first understanding what the data look like may produce meaningless results and lead to erroneous conclusions.  Even such a simple thing as computing the mean value for a variable, without having an idea about the variable’s response distribution, can be purposeless.

 
Related Articles

No comments:

Post a Comment

Random Posts