Information

What variables allow one to empirically and scientifically quantify trends for learning curves?

What variables allow one to empirically and scientifically quantify trends for learning curves?


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Specifically, I am trying to quantify trends in learning for certain mediums of audio-visual communication, and what I've gathered so far suggests that there are 4 distinguishable types, being linear, logarithmic (or possibly asymptotic), exponential and ogive. How can I go from peoples qualitative observations to an actual mathematical function to show these patterns for a specific medium? Would one measure… success rate over time and that's it? Or what?


Short answer
In psychophysical tests, often %correct rates are determined. Hence, training effects are often measured by determining correct rates. The ultimate outcome measures can be wildly variable, as they are dependent on the physical characteristics of the stimulus (visual, auditory, tactile, gustatory etc).

Background
Learning curves can be measured by measuring the performance on a certain task.

From what I understand of your question you are:

… trying to quantify trends in learning for certain mediums of audio-visual communication…

and you are looking for

What one measure… success rate over time and that's it? Or what?

Taking a personal vantage point here, I have measured learning effects using various auditory, tactile and visual psychophysical tests (though not a combination of them like an audio-visual test as you are planning). I will provide a few tests I have done so far to look at training effects and I will provide some basic background information on psychophysics. Please following the links if you wish to learn more on specific subjects. I have measured the following, among others:

  • Speech understanding by measuring the speech-recognition threshold (SRT) in noise using the Dutch Matrix test (Houben & Dreschler, 2015). The SRT basically shows you the signal-to-noise ratio where speech understanding is 50% correct. In other words, it shows how much noise a listener can handle to still understand 50% of the words in the sentences heard. I've performed this test for 12 times over four several sessions and within as well as between-session learning effects were observed, and also within-run training effects (unpublished observations);
  • Vibro-tactile detection threshold. Basically we asked the subject to answer in a yes/no task if they felt a stimulus and the outcome measure was that stimulus level where the correct rate was 50%. There was no learning effect observed within and between sessions. A within-run training effect was observed, which may have been due to procedural training effects (unpublished observations);
  • Tactile spatial acuity using (2-point discrimination); here a person was asked to answer whether one or two stimuli were felt and then, again, a percent-correct rate (here: 62.5%) score was determined ultimately expressed as that distance where correct rate was 62.5%. No training effects other than procedural learning were observed (Stronks et al, 2017);
  • A vibrotactile intensity-difference (JND) task, where the subject was asked to indicate whether they could feel the difference in intensity of two stimuli. Again the correct rates were measured and expressed as that intensity where the %correct scores equaled a certain threshold (Stronks et al, 2017).
  • Visual acuity was measured with a grating task - again percent correct is measured, but there the outcome is visual acuity, namely an angle of resolution where the %correct rate exceeds a certain threshold. There was procedural learning observed (unpublished observations).

Note that most of the above tasks were alternative forced choice (AFC) tasks, where the threshold (%correct) is dependent on the number of choices.

References
- Houben & Dreschler, Trends Hear (2015); 11(19): 1-10
- Stronks et al, Artif Organs (2017); in press


The Difference Between Descriptive and Inferential Statistics

The field of statistics is divided into two major divisions: descriptive and inferential. Each of these segments is important, offering different techniques that accomplish different objectives. Descriptive statistics describe what is going on in a population or data set. Inferential statistics, by contrast, allow scientists to take findings from a sample group and generalize them to a larger population. The two types of statistics have some important differences.


Ask a Question

The first step of the scientific method is to ask a question, describe a problem, and identify the specific area of interest. The topic should be narrow enough to study within a geography and time frame. “Are societies capable of sustained happiness?” would be too vague. The question should also be broad enough to have universal merit. “What do personal hygiene habits reveal about the values of students at XYZ High School?” would be too narrow. That said, happiness and hygiene are worthy topics to study. Sociologists do not rule out any topic, but would strive to frame these questions in better research terms.

That is why sociologists are careful to define their terms. In a hygiene study, for instance, hygiene could be defined as “personal habits to maintain physical appearance (as opposed to health),” and a researcher might ask, “How do differing personal hygiene habits reflect the cultural value placed on appearance?” When forming these basic research questions, sociologists develop an operational definition , that is, they define the concept in terms of the physical or concrete steps it takes to objectively measure it. The operational definition identifies an observable condition of the concept. By operationalizing a variable of the concept, all researchers can collect data in a systematic or replicable manner.

The operational definition must be valid, appropriate, and meaningful. And it must be reliable, meaning that results will be close to uniform when tested on more than one person. For example, “good drivers” might be defined in many ways: those who use their turn signals, those who don’t speed, or those who courteously allow others to merge. But these driving behaviors could be interpreted differently by different researchers and could be difficult to measure. Alternatively, “a driver who has never received a traffic violation” is a specific description that will lead researchers to obtain the same information, so it is an effective operational definition.


Positive and Negative Correlation

Correlation between variables can be positive or negative. Positive correlation implies an increase of one quantity causes an increase in the other whereas in negative correlation, an increase in one variable will cause a decrease in the other.

It is important to understand the relationship between variables to draw the right conclusions. Even the best scientists can get this wrong and there are several instances of how studies get correlation and causation mixed up.


Types of Descriptive Research

Some of the most commonly used forms of descriptive research utilized by social psychologists include:

Surveys

Surveys are probably one of the most frequently used types of descriptive research. Such surveys usually rely on self-report inventories in which people fill out questionnaires about their own behaviors or opinions.

The advantage of the survey method is that it allows social psychology researchers to gather a large amount of data relatively quickly, easily, and cheaply.

The Observational Method

This involves watching people and describing their behavior. Sometimes referred to as field observation, this can involve creating a scenario in a lab and then watching how people respond or performing naturalistic observation in the subject's own environment.

Each type of observation has its own strengths and weaknesses. Researchers might prefer using observational methods in a lab in order to gain greater control over possible extraneous variables, while they might prefer using naturalistic observation in order to obtain greater ecological validity. However, lab observations tend to be more costly and difficult to implement than naturalistic observations.

Case Studies

A case study involves the in-depth observation of a single individual or group. Case studies can allow researchers to gain insight into things that are very rare or even impossible to reproduce in experimental settings.

The case study of Genie, a young girl who was horrifically abused and deprived of learning language during the critical period, is one example of how a case study can allow social scientists to study phenomena that they otherwise could not reproduce in a lab.


Browse Full Outline

Every true experimental design must have this statement at the core of its structure, as the ultimate aim of any experiment.

The hypothesis is generated via a number of means, but is usually the result of a process of inductive reasoning where observations lead to the formation of a theory. Scientists then use a large battery of deductive methods to arrive at a hypothesis that is testable, falsifiable and realistic.

The precursor to a hypothesis is a research problem, usually framed as a question. It might ask what, or why, something is happening.

For example, we might wonder why the stocks of cod in the North Atlantic are declining. The problem question might be ‘Why are the numbers of Cod in the North Atlantic declining?’

This is too broad as a statement and is not testable by any reasonable scientific means. It is merely a tentative question arising from literature reviews and intuition. Many people would think that instinct and intuition are unscientific, but many of the greatest scientific leaps were a result of ‘hunches’.

The research hypothesis is a paring down of the problem into something testable and falsifiable. In the above example, a researcher might speculate that the decline in the fish stocks is due to prolonged over fishing. Scientists must generate a realistic and testable hypothesis around which they can build the experiment.

This might be a question, a statement or an ‘If/Or’ statement. Some examples could be:

Over-fishing affects the stocks of cod.

If over-fishing is causing a decline in the numbers of Cod, reducing the amount of trawlers will increase cod stocks.

These are acceptable statements and they all give the researcher a focus for constructing a research experiment. The last example formalizes things and uses an ‘If’ statement, measuring the effect that manipulating one variable has upon another. Though the other one is perfectly acceptable, an ideal research hypothesis should contain a prediction, which is why the more formal ones are favored.

A scientist who becomes fixated on proving a research hypothesis loses their impartiality and credibility. Statistical tests often uncover trends, but rarely give a clear-cut answer, with other factors often affecting the outcome and influencing the results.

Whilst gut instinct and logic tells us that fish stocks are affected by over fishing, it is not necessarily true and the researcher must consider that outcome. Perhaps environmental factors or pollution are causal effects influencing fish stocks.

A hypothesis must be testable, taking into account current knowledge and techniques, and be realistic. If the researcher does not have a multi-million dollar budget then there is no point in generating complicated hypotheses. A hypothesis must be verifiable by statistical and analytical means, to allow a verification or falsification.

In fact, a hypothesis is never proved, and it is better practice to use the terms ‘supported’ or ‘verified’. This means that the research showed that the evidence supported the hypothesis and further research is built upon that.

  • Be written in clear, concise language

  • Have both an independent and dependent variable

  • Be falsifiable – is it possible to prove or disprove the statement?

  • Make a prediction or speculate on an outcome

  • Be practicable – can you measure the variables in question?

  • Hypothesize about a proposed relationship between two variables, or an intervention into this relationship

A research hypothesis, which stands the test of time, eventually becomes a theory, such as Einstein’s General Relativity. Even then, as with Newton’s Laws, they can still be falsified or adapted.

The research hypothesis is often also callen H1 and opposes the current view, called the null hypothesis (H0).

Consider the following hypotheses. Are they likely to lead to sound research and conclusions, and if not, how could they be improved?

Adding mica to a plastic compound will decrease its viscosity.

Those who drink a cup of green tea daily experience enhanced wellness.

Prolonged staring into solar eclipses confers extrasensory powers.

A decline in family values is lowering the marriage rate.

Children with insecure attachment style are more likely to engage in political dissent as adults.

Sub-Saharan Africa experiences more deaths due to Tuberculosis because the HIV rate is higher there.

This is an ideal hypothesis statement. It is well-phrased, clear, falsifiable and merely by reading it, one gets an idea of the kind of research design it would inspire.

This hypothesis is less clear, and the problem is with the dependent variable. Cups of green tea can be easily quantified, but how will the researchers measure “wellness”? A better hypothesis might be: those who drink a cup of green tea daily display lower levels of inflammatory markers in the blood.

Though this hypothesis looks a little ridiculous, it is actually quite simple, falsifiable and easy to operationalize. The obvious problem is that scientific research seldom occupies itself with supernatural phenomenon and worse, putting this research into action will likely cause damage to its participants. When it comes to hypotheses, not all questions need to be answered!

Provided the researchers have a solid method for quantifying “family values” this hypothesis is not too bad. However, scientists should always be alert for their own possible biases creeping into research, and this can occur right from the start. Normative topics with moral elements are seldom neutral. A better hypothesis will remove any contentious, subjective elements. A better hypothesis: decrease in total discretionary income corresponds to lower marriage rate in people 20 – 30 years of age.

This hypothesis may yield very interesting and useful results, but practically, how will the researchers gather the data? Even if research is logically sound, it may not be feasible in the real world. A researcher might instead choose to make a more manageable hypothesis: high scores on an insecure attachment style questionnaire will correlate with high scores on a political dissention questionnaire.

Though complex, this is a good hypothesis. It is falsifiable, has clearly identified variables and can be supported or rejected using the right statistical methods.


The Median

The median is the value at the middle of a distribution of data when those data are organized from the lowest to the highest value. This measure of central tendency can be calculated for variables that are measured with ordinal, interval or ratio scales.

Calculating the median is also rather simple. Let’s suppose we have the following list of numbers: 5, 7, 10, 43, 2, 69, 31, 6, 22. First, we must arrange the numbers in order from lowest to highest. The result is this: 2, 5, 6, 7, 10, 22, 31, 43, 69. The median is 10 because it is the exact middle number. There are four numbers below 10 and four numbers above 10.

If your data distribution has an even number of cases which means that there is no exact middle, you simply adjust the data range slightly in order to calculate the median. For example, if we add the number 87 to the end of our list of numbers above, we have 10 total numbers in our distribution, so there is no single middle number. In this case, one takes the average of the scores for the two middle numbers. In our new list, the two middle numbers are 10 and 22. So, we take the average of those two numbers: (10 + 22) /2 = 16. Our median is now 16.


Part 1.2 Variation in temperature over time

Learning objectives for this part

  • summarize data in a frequency table, and visualize distributions with column charts
  • describe a distribution using mean and variance.

Aside from changes in the mean temperature, the government is also worried that climate change will result in more frequent extreme weather events. The island has experienced a few major storms and severe heat waves in the past, both of which caused serious damage and disruption to economic activity.

Will weather become more extreme and vary more as a result of climate change? A New York Times article uses the same temperature dataset you have been using to investigate the distribution of temperatures and temperature variability over time. Read through the article, paying close attention to the descriptions of the temperature distributions.

We can use the mean and median to describe distributions, and we can use deciles to describe parts of distributions. To visualize distributions, we can use column charts in Google Sheets. (For some practice on using these concepts and creating column charts in Google Sheets, see Section 1.3 of Economy, Society, and Public Policy). We are now going to create similar charts of temperature distributions to the ones in the New York Times article, and look at different ways of summarizing distributions.

frequency table A record of how many observations in a dataset have a particular value, range of values, or belong to a particular category.

In order to create a column chart using the temperature data we have, we first need to summarize the data using a frequency table. Instead of using deciles to group the data, we use intervals of 0.05, so that temperature anomalies with a value from −0.3 to −0.25 will be in one group, a value greater than −0.25 up until 0.2 in another group, and so on. The frequency table shows us how many values belong to a particular group.

  1. Using the monthly data for June, July, and August (columns G to I in your spreadsheet), create two frequency tables similar to Figure 1.5 for the years 1951–1980 and 1981–2010, respectively. The values in the first column should range from −0.3 to 1.05, in intervals of 0.05.

Figure 1.5 A frequency table.

Google Sheets walk-through 1.3 Creating a frequency table

Figure 1.6 How to create a frequency table in Google Sheets.

Create a table

In this example, we will make a frequency table for the years 1951–1980 in Columns A and B. It’s a good idea to put all the tables in a separate place from the data.

Create a table

After step 2, your table will look like Figure 1.5.

Filter the data

It is easier to make a frequency table if you have filtered the data to show only the values you need for the table (the years 1951–1980 in this case).

Use the FREQUENCY function to fill in the rest of the table

Now that the data is filtered, we will use Google Sheets’ FREQUENCY function to fill in Column B. First, select the cells that need to be filled in.

Use the FREQUENCY function to fill in the rest of the table

The values in the cells you selected will be used to fill in the frequency table.

Use the FREQUENCY function to fill in the rest of the table

After step 11, you will have calculated the first entry in Column B.

Use the FREQUENCY function to fill in the rest of the table

The full formula will be: =FREQUENCY(‘1.3’!G74:103,A3:A30). Note: The values you get may be slightly different to those shown here if you are using the latest data.

  1. Using the frequency tables from Question 1:
  • Plot two separate column charts for 1951–1980 and 1981–2010 to show the distribution of temperatures, with frequency on the vertical axis and the range of temperature anomaly on the horizontal axis. Your charts should look similar to those in the New York Times article.
  • Using your charts, describe the similarities and differences (if any) between the distributions of temperature anomalies in 1951–1980 and 1981–2010.

Now we will use our data to look at different aspects of distributions. First, we will learn how to use deciles to determine which observations are ‘normal’ and ‘abnormal’, and then learn how to use variance to describe the shape of a distribution.

  1. The New York Times article considers the bottom third (the lowest or coldest one-third) of temperature anomalies in 1951–1980 as ‘cold’ and the top third (the highest or hottest one-third) of anomalies as ‘hot’. In decile terms, temperatures in the 1st to 3rd decile are ‘cold’ and temperatures in the 7th to 10th decile or above are ‘hot’ (rounded to the nearest decile). Use the PERCENTILE function to determine what values correspond to the 3rd and 7th decile, across all months in 1951–1980.

Google Sheets walk-through 1.4 Calculating percentiles

Figure 1.7 How to use Google Sheets’ PERCENTILE function.

The data

We will be using the same data as in walk-through 1.3.

Use PERCENTILE to get the value for the 3rd decile

The PERCENTILE function will find the value corresponding to the chosen percentile in the cells you selected. The value 0.3 refers to the 30th percentile, also known as the 3rd decile.

Use PERCENTILE to get the value for the 7th decile

Repeat step 3 to calculate the value corresponding to the 7th decile. Note: The values you get may be slightly different to those shown here if you are using the latest data.

  1. Based on the values you found in Question 3, count the number of anomalies that are considered ‘hot’ in 1981–2010, and express this as a percentage of all the temperature observations in that period. Does your answer suggest that we are experiencing hotter weather more frequently in 1981–2010? (Remember that each decile represents 10% of observations, so 30% of temperatures were considered ‘hot’ in 1951–1980.)

Google Sheets walk-through 1.5 Using Google Sheets’ COUNTIF function

Figure 1.8 How to use Google Sheets’ COUNTIF function.

Filter the data, keeping the years 1981–2010

We will be using the years 1981–2010 only. To make the data easier to view, we will filter the data so that only the years 1981–2010 are visible.

Use COUNTIF to get the number of cells with a value less than the 3rd decile of 1951–1980

The COUNTIF function counts the number of cells you selected that satisfy a given condition (in this case, having a value less than or equal to the value of the 3rd decile in 1951–1980).

Use COUNTIF to get the number of cells with a value greater than the 7th decile of 1951–1980

Now, the condition is that values should be greater than or equal to the value of the 7th decile in 1951–1980.

Use the numbers obtained to calculate percentages

COUNTIF gives us numbers, but to convert these into percentages we need to divide the numbers from COUNTIF by the total number of observations. Note: The values you get may be slightly different to those shown here if you are using the latest data.

  1. The New York Times article discusses whether temperatures have become more variable over time. One way to measure temperature variability is by calculating the variance of the temperature distribution. For each season (‘DJF’, ‘MAM’, ‘JJA’, and ‘SON’):
  • Calculate the mean (average) and variance separately for the following time periods: 1921–1950, 1951–1980, and 1981–2010.
  • For each season, compare the variances in different periods, and explain whether or not temperature appears to be more variable in later periods.

Google Sheets walk-through 1.6 Calculating and understanding the variance

Figure 1.9 How to calculate variance.

The temperature data

This data is temperature anomalies for 1981–2010, showing the months March to May only. The column chart shows what the data looks like. Each column shows how many values fall within the ranges shown on the horizontal axis. For example, the leftmost column tells us that 25 values are less than 0.3. Note: The values in your dataset may be slightly different to those shown here, if you are using the latest data.

Made up data that is less spread out

This data on the right is made up. From this chart, you can see that the values are all quite close together, with the smallest value being 0.6 and the largest being 0.8. Comparing the charts, the real temperature data looks more spread out than the made up data.

Calculating and interpreting the variance

The formula shown calculates the variance of the real temperature data. As expected, it is much larger than the variance of the made up data.

  1. Using the findings of the New York Times article and your answers to Questions 1 to 5, discuss whether temperature appears to be more variable over time. Would you advise the government to spend more money on mitigating the effects of extreme weather events?

Conclusion

Learning is a property of all living organisms. They can trace improvement patterns characteristic of themselves. Since organized groups can be looked upon as living entities, they can be expected to exhibit learning and to trace such patterns. In the aircraft industry, for example, they commonly do.

Such performance does not just happen. It is the result of continued seeking and resourceful striving. Study of a number of operations which are important components of major industries reveals that they have traced improvement patterns with learning curve characteristics.

In a way, such findings should not be surprising. Unremitting competition has provided a continuing incentive for companies to look for new and better ways of doing things, and the resulting progressive improvements are merely consistent with the common experience that a thing can always be done more efficiently each succeeding time by trying.

Nevertheless, discovering such performance for operations previously considered unresponsive does provide additional tangible evidence that learning can be an underlying, natural characteristic of organized activity. It does not merely extend the catalog of learning curves. Instead, it can help to breed the conviction that such performance should be found elsewhere, and thereby lead not only to scrutinizing all operations to see which additional ones are susceptible, but to assuming that all operations have learning curve potential and to devising ways of making this potential a reality. Thus, it is prudent to reflect learning potential in plans and forecasts.

The most important ingredients in learning curve performance are vision and leadership. Continued improvement is a chain of influences which starts with the conviction that progress is possible, continues with the creation of an environment and support of work which promote it, and results in a flexibility and willingness to change established practices for more efficient ones as they continually evolve. Furthering this chain is part of the practice of management. Consequently, the learning curve can be regarded as a primary tool of management.

1. Miguel A. Reguero, An Economic Study of the Military Airframe Industry, Wright-Patterson Air Force Base, Ohio, Department of the Air Force, October 1957, p. 213.

2. See Frank J. Andress, “The Learning Curve as a Production Tool,” HBR January–February 1954, p. 96.

4. T. P. Wright, “Factors Affecting the Cost of Airplanes,” Journal of Aeronautical Science, February 1936, pp. 122–128.

5. Oil and Gas Journal, published monthly.

6. For a discussion of these studies see “The Growth Force That Can’t Be Overlooked,” Business Week, August 6, 1960, p. 68.

7. Richard W. Conway and Andrew Schultz, Jr., “The Manufacturing Progress Function,” Journal of Industrial Engineering, January–February 1959, p. 48.

8. Glen E. Ghormley, “The Learning Curve,” Western Industry, September 1952, p. 34.

9. Philip S. Skaff, “The Maintenance Challenge in a Petrochemicals Plant,” ASME Petroleum Mechanical Engineering Conference, Dallas, Texas, September 1956, Paper 56-PET-2, p. 9.

10. My Life and Work (New York, Doubleday, Page and Company, 1932), pp. 146–147.


Positive and Negative Correlation

Correlation between variables can be positive or negative. Positive correlation implies an increase of one quantity causes an increase in the other whereas in negative correlation, an increase in one variable will cause a decrease in the other.

It is important to understand the relationship between variables to draw the right conclusions. Even the best scientists can get this wrong and there are several instances of how studies get correlation and causation mixed up.


Ask a Question

The first step of the scientific method is to ask a question, describe a problem, and identify the specific area of interest. The topic should be narrow enough to study within a geography and time frame. “Are societies capable of sustained happiness?” would be too vague. The question should also be broad enough to have universal merit. “What do personal hygiene habits reveal about the values of students at XYZ High School?” would be too narrow. That said, happiness and hygiene are worthy topics to study. Sociologists do not rule out any topic, but would strive to frame these questions in better research terms.

That is why sociologists are careful to define their terms. In a hygiene study, for instance, hygiene could be defined as “personal habits to maintain physical appearance (as opposed to health),” and a researcher might ask, “How do differing personal hygiene habits reflect the cultural value placed on appearance?” When forming these basic research questions, sociologists develop an operational definition , that is, they define the concept in terms of the physical or concrete steps it takes to objectively measure it. The operational definition identifies an observable condition of the concept. By operationalizing a variable of the concept, all researchers can collect data in a systematic or replicable manner.

The operational definition must be valid, appropriate, and meaningful. And it must be reliable, meaning that results will be close to uniform when tested on more than one person. For example, “good drivers” might be defined in many ways: those who use their turn signals, those who don’t speed, or those who courteously allow others to merge. But these driving behaviors could be interpreted differently by different researchers and could be difficult to measure. Alternatively, “a driver who has never received a traffic violation” is a specific description that will lead researchers to obtain the same information, so it is an effective operational definition.


Conclusions.

Among the many challenges facing studies that quantify behavior are the measurement errors in the behavioral and other measures, the need to formulate multiple equations to characterize the behavioral system, and the desire to understand the direct and indirect effects of variables as they work their way through the equation system. Latent variable SEMs provide the tools to address these challenges. They have the capability to allow quantification and testing of the hypothesized relationships among latent and observed variables. They provide tests of the consistency and plausibility of the assumed model compared with the observed data. Additionally, they enable a researcher to analyze direct as well as mediated relationships. Although SEMs cannot replace sound substantive knowledge in formulating a model, they can provide information on the match between the model and the data, and they do provide tools to further trace the implications of this structure.


The Handling Problem

Turning from the conceptual problems that beset current attempts to use behavioral neurogenetic screening to the practical problems, the methods in current use do not lend themselves to large-scale high throughput screening. Most of them require the handling of the subjects in the course of the training and testing. This is doubly undesirable. It consumes large amounts of experimenter and technician time in the obtaining of small amounts of data. And, it seriously stresses the subjects. Most strains of mice react badly to handling, although the extent, duration and manifestations of handling stress vary greatly between strains. Moreover, the skill with which the mice are handled varies greatly between laboratories and even between personnel within laboratories. Reactions to having been handled and the anticipation of soon being handled again may take a long time to subside once a mouse has been placed in a test environment. These reactions to handling interfere with and contaminate almost every kind of behavioral measurement and observation.


Browse Full Outline

Every true experimental design must have this statement at the core of its structure, as the ultimate aim of any experiment.

The hypothesis is generated via a number of means, but is usually the result of a process of inductive reasoning where observations lead to the formation of a theory. Scientists then use a large battery of deductive methods to arrive at a hypothesis that is testable, falsifiable and realistic.

The precursor to a hypothesis is a research problem, usually framed as a question. It might ask what, or why, something is happening.

For example, we might wonder why the stocks of cod in the North Atlantic are declining. The problem question might be ‘Why are the numbers of Cod in the North Atlantic declining?’

This is too broad as a statement and is not testable by any reasonable scientific means. It is merely a tentative question arising from literature reviews and intuition. Many people would think that instinct and intuition are unscientific, but many of the greatest scientific leaps were a result of ‘hunches’.

The research hypothesis is a paring down of the problem into something testable and falsifiable. In the above example, a researcher might speculate that the decline in the fish stocks is due to prolonged over fishing. Scientists must generate a realistic and testable hypothesis around which they can build the experiment.

This might be a question, a statement or an ‘If/Or’ statement. Some examples could be:

Over-fishing affects the stocks of cod.

If over-fishing is causing a decline in the numbers of Cod, reducing the amount of trawlers will increase cod stocks.

These are acceptable statements and they all give the researcher a focus for constructing a research experiment. The last example formalizes things and uses an ‘If’ statement, measuring the effect that manipulating one variable has upon another. Though the other one is perfectly acceptable, an ideal research hypothesis should contain a prediction, which is why the more formal ones are favored.

A scientist who becomes fixated on proving a research hypothesis loses their impartiality and credibility. Statistical tests often uncover trends, but rarely give a clear-cut answer, with other factors often affecting the outcome and influencing the results.

Whilst gut instinct and logic tells us that fish stocks are affected by over fishing, it is not necessarily true and the researcher must consider that outcome. Perhaps environmental factors or pollution are causal effects influencing fish stocks.

A hypothesis must be testable, taking into account current knowledge and techniques, and be realistic. If the researcher does not have a multi-million dollar budget then there is no point in generating complicated hypotheses. A hypothesis must be verifiable by statistical and analytical means, to allow a verification or falsification.

In fact, a hypothesis is never proved, and it is better practice to use the terms ‘supported’ or ‘verified’. This means that the research showed that the evidence supported the hypothesis and further research is built upon that.

  • Be written in clear, concise language

  • Have both an independent and dependent variable

  • Be falsifiable – is it possible to prove or disprove the statement?

  • Make a prediction or speculate on an outcome

  • Be practicable – can you measure the variables in question?

  • Hypothesize about a proposed relationship between two variables, or an intervention into this relationship

A research hypothesis, which stands the test of time, eventually becomes a theory, such as Einstein’s General Relativity. Even then, as with Newton’s Laws, they can still be falsified or adapted.

The research hypothesis is often also callen H1 and opposes the current view, called the null hypothesis (H0).

Consider the following hypotheses. Are they likely to lead to sound research and conclusions, and if not, how could they be improved?

Adding mica to a plastic compound will decrease its viscosity.

Those who drink a cup of green tea daily experience enhanced wellness.

Prolonged staring into solar eclipses confers extrasensory powers.

A decline in family values is lowering the marriage rate.

Children with insecure attachment style are more likely to engage in political dissent as adults.

Sub-Saharan Africa experiences more deaths due to Tuberculosis because the HIV rate is higher there.

This is an ideal hypothesis statement. It is well-phrased, clear, falsifiable and merely by reading it, one gets an idea of the kind of research design it would inspire.

This hypothesis is less clear, and the problem is with the dependent variable. Cups of green tea can be easily quantified, but how will the researchers measure “wellness”? A better hypothesis might be: those who drink a cup of green tea daily display lower levels of inflammatory markers in the blood.

Though this hypothesis looks a little ridiculous, it is actually quite simple, falsifiable and easy to operationalize. The obvious problem is that scientific research seldom occupies itself with supernatural phenomenon and worse, putting this research into action will likely cause damage to its participants. When it comes to hypotheses, not all questions need to be answered!

Provided the researchers have a solid method for quantifying “family values” this hypothesis is not too bad. However, scientists should always be alert for their own possible biases creeping into research, and this can occur right from the start. Normative topics with moral elements are seldom neutral. A better hypothesis will remove any contentious, subjective elements. A better hypothesis: decrease in total discretionary income corresponds to lower marriage rate in people 20 – 30 years of age.

This hypothesis may yield very interesting and useful results, but practically, how will the researchers gather the data? Even if research is logically sound, it may not be feasible in the real world. A researcher might instead choose to make a more manageable hypothesis: high scores on an insecure attachment style questionnaire will correlate with high scores on a political dissention questionnaire.

Though complex, this is a good hypothesis. It is falsifiable, has clearly identified variables and can be supported or rejected using the right statistical methods.


Conclusion

Learning is a property of all living organisms. They can trace improvement patterns characteristic of themselves. Since organized groups can be looked upon as living entities, they can be expected to exhibit learning and to trace such patterns. In the aircraft industry, for example, they commonly do.

Such performance does not just happen. It is the result of continued seeking and resourceful striving. Study of a number of operations which are important components of major industries reveals that they have traced improvement patterns with learning curve characteristics.

In a way, such findings should not be surprising. Unremitting competition has provided a continuing incentive for companies to look for new and better ways of doing things, and the resulting progressive improvements are merely consistent with the common experience that a thing can always be done more efficiently each succeeding time by trying.

Nevertheless, discovering such performance for operations previously considered unresponsive does provide additional tangible evidence that learning can be an underlying, natural characteristic of organized activity. It does not merely extend the catalog of learning curves. Instead, it can help to breed the conviction that such performance should be found elsewhere, and thereby lead not only to scrutinizing all operations to see which additional ones are susceptible, but to assuming that all operations have learning curve potential and to devising ways of making this potential a reality. Thus, it is prudent to reflect learning potential in plans and forecasts.

The most important ingredients in learning curve performance are vision and leadership. Continued improvement is a chain of influences which starts with the conviction that progress is possible, continues with the creation of an environment and support of work which promote it, and results in a flexibility and willingness to change established practices for more efficient ones as they continually evolve. Furthering this chain is part of the practice of management. Consequently, the learning curve can be regarded as a primary tool of management.

1. Miguel A. Reguero, An Economic Study of the Military Airframe Industry, Wright-Patterson Air Force Base, Ohio, Department of the Air Force, October 1957, p. 213.

2. See Frank J. Andress, “The Learning Curve as a Production Tool,” HBR January–February 1954, p. 96.

4. T. P. Wright, “Factors Affecting the Cost of Airplanes,” Journal of Aeronautical Science, February 1936, pp. 122–128.

5. Oil and Gas Journal, published monthly.

6. For a discussion of these studies see “The Growth Force That Can’t Be Overlooked,” Business Week, August 6, 1960, p. 68.

7. Richard W. Conway and Andrew Schultz, Jr., “The Manufacturing Progress Function,” Journal of Industrial Engineering, January–February 1959, p. 48.

8. Glen E. Ghormley, “The Learning Curve,” Western Industry, September 1952, p. 34.

9. Philip S. Skaff, “The Maintenance Challenge in a Petrochemicals Plant,” ASME Petroleum Mechanical Engineering Conference, Dallas, Texas, September 1956, Paper 56-PET-2, p. 9.

10. My Life and Work (New York, Doubleday, Page and Company, 1932), pp. 146–147.


The Median

The median is the value at the middle of a distribution of data when those data are organized from the lowest to the highest value. This measure of central tendency can be calculated for variables that are measured with ordinal, interval or ratio scales.

Calculating the median is also rather simple. Let’s suppose we have the following list of numbers: 5, 7, 10, 43, 2, 69, 31, 6, 22. First, we must arrange the numbers in order from lowest to highest. The result is this: 2, 5, 6, 7, 10, 22, 31, 43, 69. The median is 10 because it is the exact middle number. There are four numbers below 10 and four numbers above 10.

If your data distribution has an even number of cases which means that there is no exact middle, you simply adjust the data range slightly in order to calculate the median. For example, if we add the number 87 to the end of our list of numbers above, we have 10 total numbers in our distribution, so there is no single middle number. In this case, one takes the average of the scores for the two middle numbers. In our new list, the two middle numbers are 10 and 22. So, we take the average of those two numbers: (10 + 22) /2 = 16. Our median is now 16.


The Difference Between Descriptive and Inferential Statistics

The field of statistics is divided into two major divisions: descriptive and inferential. Each of these segments is important, offering different techniques that accomplish different objectives. Descriptive statistics describe what is going on in a population or data set. Inferential statistics, by contrast, allow scientists to take findings from a sample group and generalize them to a larger population. The two types of statistics have some important differences.


Types of Descriptive Research

Some of the most commonly used forms of descriptive research utilized by social psychologists include:

Surveys

Surveys are probably one of the most frequently used types of descriptive research. Such surveys usually rely on self-report inventories in which people fill out questionnaires about their own behaviors or opinions.

The advantage of the survey method is that it allows social psychology researchers to gather a large amount of data relatively quickly, easily, and cheaply.

The Observational Method

This involves watching people and describing their behavior. Sometimes referred to as field observation, this can involve creating a scenario in a lab and then watching how people respond or performing naturalistic observation in the subject's own environment.

Each type of observation has its own strengths and weaknesses. Researchers might prefer using observational methods in a lab in order to gain greater control over possible extraneous variables, while they might prefer using naturalistic observation in order to obtain greater ecological validity. However, lab observations tend to be more costly and difficult to implement than naturalistic observations.

Case Studies

A case study involves the in-depth observation of a single individual or group. Case studies can allow researchers to gain insight into things that are very rare or even impossible to reproduce in experimental settings.

The case study of Genie, a young girl who was horrifically abused and deprived of learning language during the critical period, is one example of how a case study can allow social scientists to study phenomena that they otherwise could not reproduce in a lab.


Part 1.2 Variation in temperature over time

Learning objectives for this part

  • summarize data in a frequency table, and visualize distributions with column charts
  • describe a distribution using mean and variance.

Aside from changes in the mean temperature, the government is also worried that climate change will result in more frequent extreme weather events. The island has experienced a few major storms and severe heat waves in the past, both of which caused serious damage and disruption to economic activity.

Will weather become more extreme and vary more as a result of climate change? A New York Times article uses the same temperature dataset you have been using to investigate the distribution of temperatures and temperature variability over time. Read through the article, paying close attention to the descriptions of the temperature distributions.

We can use the mean and median to describe distributions, and we can use deciles to describe parts of distributions. To visualize distributions, we can use column charts in Google Sheets. (For some practice on using these concepts and creating column charts in Google Sheets, see Section 1.3 of Economy, Society, and Public Policy). We are now going to create similar charts of temperature distributions to the ones in the New York Times article, and look at different ways of summarizing distributions.

frequency table A record of how many observations in a dataset have a particular value, range of values, or belong to a particular category.

In order to create a column chart using the temperature data we have, we first need to summarize the data using a frequency table. Instead of using deciles to group the data, we use intervals of 0.05, so that temperature anomalies with a value from −0.3 to −0.25 will be in one group, a value greater than −0.25 up until 0.2 in another group, and so on. The frequency table shows us how many values belong to a particular group.

  1. Using the monthly data for June, July, and August (columns G to I in your spreadsheet), create two frequency tables similar to Figure 1.5 for the years 1951–1980 and 1981–2010, respectively. The values in the first column should range from −0.3 to 1.05, in intervals of 0.05.

Figure 1.5 A frequency table.

Google Sheets walk-through 1.3 Creating a frequency table

Figure 1.6 How to create a frequency table in Google Sheets.

Create a table

In this example, we will make a frequency table for the years 1951–1980 in Columns A and B. It’s a good idea to put all the tables in a separate place from the data.

Create a table

After step 2, your table will look like Figure 1.5.

Filter the data

It is easier to make a frequency table if you have filtered the data to show only the values you need for the table (the years 1951–1980 in this case).

Use the FREQUENCY function to fill in the rest of the table

Now that the data is filtered, we will use Google Sheets’ FREQUENCY function to fill in Column B. First, select the cells that need to be filled in.

Use the FREQUENCY function to fill in the rest of the table

The values in the cells you selected will be used to fill in the frequency table.

Use the FREQUENCY function to fill in the rest of the table

After step 11, you will have calculated the first entry in Column B.

Use the FREQUENCY function to fill in the rest of the table

The full formula will be: =FREQUENCY(‘1.3’!G74:103,A3:A30). Note: The values you get may be slightly different to those shown here if you are using the latest data.

  1. Using the frequency tables from Question 1:
  • Plot two separate column charts for 1951–1980 and 1981–2010 to show the distribution of temperatures, with frequency on the vertical axis and the range of temperature anomaly on the horizontal axis. Your charts should look similar to those in the New York Times article.
  • Using your charts, describe the similarities and differences (if any) between the distributions of temperature anomalies in 1951–1980 and 1981–2010.

Now we will use our data to look at different aspects of distributions. First, we will learn how to use deciles to determine which observations are ‘normal’ and ‘abnormal’, and then learn how to use variance to describe the shape of a distribution.

  1. The New York Times article considers the bottom third (the lowest or coldest one-third) of temperature anomalies in 1951–1980 as ‘cold’ and the top third (the highest or hottest one-third) of anomalies as ‘hot’. In decile terms, temperatures in the 1st to 3rd decile are ‘cold’ and temperatures in the 7th to 10th decile or above are ‘hot’ (rounded to the nearest decile). Use the PERCENTILE function to determine what values correspond to the 3rd and 7th decile, across all months in 1951–1980.

Google Sheets walk-through 1.4 Calculating percentiles

Figure 1.7 How to use Google Sheets’ PERCENTILE function.

The data

We will be using the same data as in walk-through 1.3.

Use PERCENTILE to get the value for the 3rd decile

The PERCENTILE function will find the value corresponding to the chosen percentile in the cells you selected. The value 0.3 refers to the 30th percentile, also known as the 3rd decile.

Use PERCENTILE to get the value for the 7th decile

Repeat step 3 to calculate the value corresponding to the 7th decile. Note: The values you get may be slightly different to those shown here if you are using the latest data.

  1. Based on the values you found in Question 3, count the number of anomalies that are considered ‘hot’ in 1981–2010, and express this as a percentage of all the temperature observations in that period. Does your answer suggest that we are experiencing hotter weather more frequently in 1981–2010? (Remember that each decile represents 10% of observations, so 30% of temperatures were considered ‘hot’ in 1951–1980.)

Google Sheets walk-through 1.5 Using Google Sheets’ COUNTIF function

Figure 1.8 How to use Google Sheets’ COUNTIF function.

Filter the data, keeping the years 1981–2010

We will be using the years 1981–2010 only. To make the data easier to view, we will filter the data so that only the years 1981–2010 are visible.

Use COUNTIF to get the number of cells with a value less than the 3rd decile of 1951–1980

The COUNTIF function counts the number of cells you selected that satisfy a given condition (in this case, having a value less than or equal to the value of the 3rd decile in 1951–1980).

Use COUNTIF to get the number of cells with a value greater than the 7th decile of 1951–1980

Now, the condition is that values should be greater than or equal to the value of the 7th decile in 1951–1980.

Use the numbers obtained to calculate percentages

COUNTIF gives us numbers, but to convert these into percentages we need to divide the numbers from COUNTIF by the total number of observations. Note: The values you get may be slightly different to those shown here if you are using the latest data.

  1. The New York Times article discusses whether temperatures have become more variable over time. One way to measure temperature variability is by calculating the variance of the temperature distribution. For each season (‘DJF’, ‘MAM’, ‘JJA’, and ‘SON’):
  • Calculate the mean (average) and variance separately for the following time periods: 1921–1950, 1951–1980, and 1981–2010.
  • For each season, compare the variances in different periods, and explain whether or not temperature appears to be more variable in later periods.

Google Sheets walk-through 1.6 Calculating and understanding the variance

Figure 1.9 How to calculate variance.

The temperature data

This data is temperature anomalies for 1981–2010, showing the months March to May only. The column chart shows what the data looks like. Each column shows how many values fall within the ranges shown on the horizontal axis. For example, the leftmost column tells us that 25 values are less than 0.3. Note: The values in your dataset may be slightly different to those shown here, if you are using the latest data.

Made up data that is less spread out

This data on the right is made up. From this chart, you can see that the values are all quite close together, with the smallest value being 0.6 and the largest being 0.8. Comparing the charts, the real temperature data looks more spread out than the made up data.

Calculating and interpreting the variance

The formula shown calculates the variance of the real temperature data. As expected, it is much larger than the variance of the made up data.

  1. Using the findings of the New York Times article and your answers to Questions 1 to 5, discuss whether temperature appears to be more variable over time. Would you advise the government to spend more money on mitigating the effects of extreme weather events?


Watch the video: ACCA P5 Learning Curves (May 2022).


Comments:

  1. Courtenay

    Stylish thing

  2. Wendel

    I can suggest to visit to you a site on which there are many articles on a theme interesting you.

  3. Jago

    I think you are not right. I can prove it. Write to me in PM.

  4. Yorn

    Thanks for your help in this matter. All just brilliant.



Write a message