Generally, all research projects are started with in-depth interviews of the business heads generating hypotheses about the topic being researched. At the end of this meeting, it’s usually important to stress with the business heads that these are just hypotheses that need to be validated with research. The reason is that often these hypotheses are viewed to be facts without any evidence to support them – corporate assumptions. The fact the business leaders articulated them can further engrain their beliefs. It may seem obvious, but other benefits of involving the business leaders in the early hypothesis generation phase are to:1. Ensure you are asking all the necessary questions to collect the data for testing relevant assumptions.
2.
Increase business buy-in to the
process as a full project partner, thereby dramatically increasing the
likelihood of subsequent market action.
3.
Improve the image of the
research function as an integrated and valued contributor to the strategic
direction and tactical program implementation of the business.
One example of an early assumption
and a testable hypothesis was that bankers assumed high-income earners are more
profitable who carry higher balances and fees than low-income earners. Another
example was that clients who carefully balance their checkbooks every month and
minimize fees due to overdrafts are unprofitable checking account customers.
These are examples of oversimplified or incorrect assumptions that need to be
subjected to more formal hypotheses testing. In these examples, the first step
is to build cross-tabulation reports of profitability versus income or
profitability versus checking or saving account balancing. Often the conclusion
is that no relationship exists or, if one does, it is not statistically
significant enough to warrant action.
Though the process might seem tedious, you will conclude there must be
other factors in play. By continuing to
generate additional hypotheses, a meaningful and actionable business discovery
can be made. Discoveries such as this can help gain a competitive advantage.
Furthermore, this advantage can be sustainable and dramatic because the unique
knowledge can assist an organization to better isolate, communicate and serve
customers in new or more efficient ways.
In another case, we found that older
clients as compared to younger clients, were more likely to diminish cumulative
deposit balances by large amounts. This was non-intuitive because conventional
wisdom suggested that older clients have a larger portfolio of assets and seek
less risky investments. This triggered a small qualitative marketing research
validation project that determined high-balance CD prospects were the target
group for competitive financial planners selling mutual funds. The result was
a dramatic change in the marketing strategy for this segment. Such procedures
constitute descriptive analysis and can provide valuable insights. In fact,
certain research studies may require no more than descriptive analysis of the
data.
DESCRIPTIVE VERSUS INFERENTIAL
ANALYSIS
In some studies, however, we must go
beyond descriptive analysis to verify specific statements, or hypotheses,
about the population(s) of interest. Data analysis aimed at testing specific
hypotheses is usually called inferential analysis. Here we describe, the general procedure
involved in hypothesis testing, discuss the role of hypothesis testing in data
analysis and outline several hypothesis tests frequently encountered in marketing.
HYPOTHESIS TESTING
A useful starting point for
discussing hypothesis testing is to consider the following situations, which
illustrate critical questions typically faced by decision makers.
SCENARIO 1. Karen, product manager for a line of apparel, is wondering whether
to introduce the product line into a new market area. A recent survey of a random sample of 400
households in that market showed a mean income per household of $30,000. On the basis of past experience and of
comprehensive studies in current market areas, Karen strongly believes the
product line will be adequately profitable only in markets where the mean
household income (across all households) is greater than $29,000. Should Karen
introduce the product line into the new market?
SCENARIO2: Roni, advertising manager for a frozen-foods company, is in the
process of determining shortly between two TV commercial X runs for 20 seconds,
and commercial Y runs for 30 seconds. Therefore, for a given number of
exposures, commercial Y will be more expensive than commercial X. Roni
believes commercial Y will also be more effective in creating awareness for the
new product, but he is not sure. Each commercial was recently shown during the
same TV program, but in two comparable test cities. After the broadcast, a
random sample of 200 adults was interviewed by telephone in each city. In the
city in which commercial X was shown, 40 of the 200 respondents were aware of
the new frozen food; that is, the awareness rate for commercial X was 20
percent. In the other city, the awareness rate for commercial Y was 25 percent.
Can Roni conclude that commercial Y will be more effective in the total market
for the new frozen food?
What features do scenarios 1 and 2
have in common? Clearly, to reach a
final decision, both Karen and Roni have to make a general inference from
sample data. However, making generalizations from sample data is a feature
implicit in virtually all conclusive research projects and hence is not unique
to scenarios 1 and 2. The purpose of any
sampling study is to learn something about the population. A more distinctive
feature of scenarios 1 and 2, one that is more directly relevant to hypothesis
testing, is that each implies a criterion on which the final decision depends. In scenario 1, the criterion is the mean
income across all households in the new market area under consideration. Specifically, if the mean population
household income is greater than $29,000, Karen should introduce the product
line into the new market. In scenario 2, the criterion is the relative degrees
of awareness likely to be created by the two commercials in the population of
all adult consumers. Specifically, Roni should conclude that commercial X is
more effective than commercial Y only if the anticipated population awareness
rate for Y is greater than that for X.
Stated differently, Karen’s scenario
1 is equivalent to either accepting or rejecting following hypothesis: “The
population means household income in the new market area is greater; than
$29,000.” Similarly Roni’s decision making in scenario 2 is equivalent to
either accepting of rejecting the following hypothesis: The potential awareness
rate that commercial Y can generate-among the population of consumers is
greater than that which commercial X can generate. A situation calling for formal hypothesis
testing will usually stipulate a specific criterion for choosing between
alternative inferences or courses of action. However, certain types of
hypothesis tests may not have a criterion as clear-cut as those in
scenarios 1 and 2. Furthermore, in; many real-life situations, final decisions
may depend on several factors rather than on a single, clear-cut criterion; we
have simplified scenarios 1 and 2 to highlight the defining features of
hypothesis testing.
Null and Alternative Hypotheses
After recognizing that particular
decisions require formal hypothesis testing, the first step is to state a null
hypothesis and an alternative hypothesis.
Ho and Ha to denote the null and alternative hypotheses,
respectively. Hypotheses always pertain to population parameters rather than to
sample characteristics. It is the population not the sample that we want to
make an inference about the population not the sample that from limited
data. Although this may seem obvious, it
is easy to become perplexed about when formally staging and formulating
hypotheses.
Type I and Type II Errors
A Type I error
is committed if the null hypothesis is rejected when it is true.
A Type II error is committed if the null hypothesis is not rejected when it is false.
A Type II error is committed if the null hypothesis is not rejected when it is false.
Significance Level
- The significance level associated with a hypothesis-testing procedure is the maximum probability of rejecting H0 with that procedure when H0 is actually true.
- The term significance level means the upper-bound probability of a Type I error.
- The symbol a, the Greek letter alpha, to denote the significance level.
- The other part of the significance level, 1–µ, is the confidence level.
- The symbol β, the Greek letter beta, indicates the probability of committing a Type II error.
Decision Rule:
- A decision rule is a guideline that specifies the sample evidence necessary to reject the null hypothesis.
- The critical value to be incorporated in the decision rule depends on the significance level specified for the hypothesis test.
One-Tailed Versus Two-Tailed
Tests
The procedure we used to set up a
decision rule for Karen in scenario 1 involved what is known as a one-tailed
hypothesis test which signifies that all values that would cause Karen to
reject H0, are within just one tail of the sampling distribution. In
a one tailed hypothesis test, values of the test statistic showing the
rejection of the null hypothesis fall in only one tail of the sampling
distribution curve.
Whenever the null hypothesis
contains an inequality, we call it a directional hypothesis. The corresponding
hypothesis test will be one-tailed. If
the null hypothesis includes a strict equality (such as =), it is a
non-directional hypothesis. For instance, consider the following pair of
hypotheses.
Intuitively,
both very high and very low values of x should lead to rejection of H0.
Therefore, the decision rule for rejecting H0 will have two critical
x-values:
A two-tailed hypothesis test is one in which
values of the test statistic leading to rejection of the null hypothesis fall
in both tails of the sampling distribution curve. A two-tailed hypothesis rest has one special
implication: the significance level
specified for the test must be allocated equally to each tail of the sampling
distribution curve. A two-tailed hypothesis test has one special implication:
the significance level specified for the test must be allocated equally to each
tail of the sampling distribution curve.
In other words, when the significance level is µ, the two critical test statistic values must be established in such
a way that the tail portion of the sampling distribution curve beyond each
critical value corresponds to a probability.
In practice, whether a hypothesis
test should be one-tailed or two-tailed depends on the nature of the
problem. A one-tailed test is
appropriate when the decision maker’s interest centers primarily on one side of
the issue. For example, does the
proportion of customers prefer our brand over competitors’ brands greater
than? Is customer response to our coupon
campaign greater in city A than in city B?
Is our current advertisement less effective than the proposed new advertisement? A two-tailed test is appropriate when the
decision maker has no a priori reason to focus on one side of the issue. For example, do consumers perceive the
average useful life for our appliance as different from the objectively
determined average life of ten years? Is
test market C different from test market D in terms of average household
incomes? Is the satisfaction level of
salespeople over 30 years of age different from that of salespeople 30 years of
age or younger?
Steps in Conducting a Hypothesis
Test
The procedures
followed in the calculations are representative of hypothesis testing in
general. In summary, the sequence of tasks
involved in a typical hypothesis test are as follows:
- Set up H0 and Ha.
- Identify the nature of the sampling distribution curve and specify the appropriate test statistic. Note: in scenario 1, the, sampling distribution was the normal curve and the test statistic was, the z-variable. But as we will see later depending on the specific problem, the appropriate, the sampling distribution and test statistic will vary.
- Determine whether the hypothesis test is one-tailed or two-tailed.
- Taking into account the specified significance level determine the critical value (two critical values for a two-tailed test) for the test statistics the appropriate statistical table.
- State the decision rule for rejecting H0.
- Compute the value for the test statistic from the sample data.
- Using the decision rule either reject H0 or reject Ha.
ROLE OF HYPOTHESIS TESTING
Two factors are crucial in choosing an
appropriate analysis procedure: the number of variables to be analyzed and the
nature of the data collected on each variable. Analysis procedures are broadly
classified as being univariate or multivariate. As the terms imply, univariate
analysis is appropriate when just one variable is the focus of the analysis,
arid multivariate analysis is appropriate when two or more variables are to be
analyzed simultaneously. (The label bivariate analysis rather than multivariate
is often used when the analysis considers just two variables.)
The second factor affecting the
choice of analysis techniques is the nature of the data collected. Particularly relevant in this regard is the
measurement level of the data, that is, whether they are nominal, ordinal,
interval or ratio. Nominal and ordinal (non-metric) data are not as powerful or
versatile as interval and ratio (metric) data. Therefore, we can perform only
relatively crude statistical analyses with non-metric data.
The types of analyses and hypothesis
tests appropriate for non-metric data are typically labeled nonparametric
procedures. Statistical procedures that are nonparametric require only minimal
assumptions about the nature of the data, especially with respect to their measurement
level and the shape of their distribution. Analysis techniques suitable for
metric data are said to be parametric procedures. The use of most parametric
methods requires data with at least interval-scale properties and a
distribution that resembles the normal probability distribution.
In short, as a general rule,
nonparametric procedures are appropriate for nominal and ordinal data, and
parametric procedures are appropriate only for interval and ratio data. For
more details on nonparametric tests, we refer you to this textbook’s website.
SPECIFIC HYPOTHESIS TESTS
This section
deals with some hypothesis tests that are used quite frequently. Table presents
an overview of the specific hypothesis tests we will discuss. The first technique we will look at is a
cross-tabulation procedure, also known as the chi-square contingency test.
Table 1
Cross-Tabulations:
The objectives of most research
studies include an examination of relationships among key variables. Two-way
tabulation is a useful preliminary step in understanding the nature of the
association between a pair of variables.
A two-Way table is shows the number of responses in each category of one
variables falling into the categories of a second variable.
For two-way tabulation to be
meaningful, the data on each variable must be coded into a teed set of
categories, and the number of categories should not be large. Therefore,
two-way tables are particularly appropriate for categorical (nominal - or
ordinal-scaled) variables. Of course, two-way tables are also appropriate for
interval - or ratio-scaled variables that have been transformed into
ordinal-scaled variables with a limited number of categories.
Constructing a two-way table means
breaking down the number of response in each category of one variable into the
categories of the second variable. This
process is the simplest form of cross-tabulation, the simultaneous tabulation
of data on two or more variables. Standard software programs capable of
cross-tabulating data on any combination of variables in a data set are readily
available.
The chi-square contingency test is a
widely used technique for determining whether there is a statistically
significant relationship between two categorical (nominal or ordinal)
variables. (Though the chi-square test
requires only nominal data, it can also be used to analyze associations between
two ordinal-scaled variable or one nominal – and one ordinal – scale
variable). A mere visual inspection of a
two-way tabulation of data can suggest whether or not the variables are
associated with each other. The
chi-square contingency test is a means of formally checking the relationship
between such variables. To illustrate
the chi-square contingency test, lest us consider the following example.
Example. The marketing manager of a telecommunications company is reviewing
the results of a study of potential users of a new cell phone. The study used a random sample of 200
respondents and was conducted in a metropolitan area representative of the
company’s target market area. The marketing manager is intrigued by one table,
which is a cross-tabulation of data on whether target consumers would buy the
phone. (Yes or No) and if the cell phone
had access to the Internet (Yes or No), Table presents the cross-tabulation.
Can the marketing manager infer that an association exists between Internet
access and buying the cell phone?
The percentage breakdowns in Table
do suggest an association between the two variables. And the association does
appear to be somewhat intriguing, with more respondents willing to buy the cell
phone when it has Internet access than when it does not. However, is this
result trustworthy, or could the association have occurred in this sample
purely by chance? A chi-square
contingency test of the following hypotheses can answer this question:
H0:
There is no association between Internet access
and buying the cell phone (the two variables are independent of each other).
Ha:
There is some association between Internet access
and buying the cell phone (the two variables are not independent of each
other).
Conducting the Test
Computing the test statistic in the
chi-square contingency test requires comparison of the actual with the
observed, cell frequencies within the cross-tabulation (with a corresponding
set of expected cell frequencies. The
expected cell frequencies are generated under the assumption that the null
hypothesis is true. The expected cell frequency for any cell defined by the ith
row and jth column in the contingency table is given by where ni
and nj are the marginal frequencies that, the total number of
sample units in category I of the row variable and category j of
the column variably respectively.
No comments:
Post a Comment