Cognitive Load Theory

I will look at how cognitive load theory (CLT) can help us to teach computer science and particularly coding. CLT is a theory of how we learn and gain knowledge. The principal character in this field is John Sweller who is an Australian educational psychologist. It took a while for his ideas to gain acceptance in the scientific community, but they are uncontroversial and are now filtering down to secondary schools in the UK, but more slowly than they perhaps should. CLT is important for teachers because it identifies the most effective teaching methods for learning. But before I write about CLT further I need to present a simple model for how human memory works. Memory is split into short term and long term memory. Short term memory has limited capacity and can hold between three and seven chunks of information. We are consciously aware of what we are holding in short term memory. Long term memory has unlimited capacity. For us to learn anything it first needs to be transferred from short term memory into long term memory. But just because something is stored in long term memory it does not mean that we remember it forever. How we remember something is the subject of another article that I have written, so I will not go into it here.

Knowledge is stored and organised in long term memory as complex structures called schema. When we learn, we either create new schema if the subject area is completely new to us or add to existing schema, if we already have some prior knowledge As we learn more and more the schema become increasing complex allowing us develop innate skills and to perform complex tasks with minimal effort. By way of example let us consider the task of driving a car. This is a complex task where we must coordinate the clutch and gear stick, all at the same time as controlling our speed and the steering wheel, while all the time being aware of other vehicles, pedestrians and signs and constantly checking mirrors. There is certainly lots of think about. Yet once we have built up enough experience, we can drive with very little conscious effort especially if it is a journey we make regularly. So in essence we have a “driving a car” schema which we can access when we need to that takes little space in short term memory allowing us to perform other tasks like having a conversation with another passenger, planning what you are going to have for dinner that evening, or singing along to your favourite pop song on the radio!

If you will indulge me, here is a memory exercise for the reader to demonstrate the effect of schemas. Spend thirty seconds remembering the following letters: N M S G A O. The sequence is not important. Now cover up the letters with your hand or a piece of paper and try recall those letters. How did you do? My guess is that even if you did correctly get all the letters it took you some cognitive effort (hard thinking!) to remember them.

Now let us do the same exercise again but spend thirty seconds remembering the following letters: M A N G O S. Now recall those letters. How did you do this time? You probably had no difficultly remembering the letters that time, because all you had to do was remember the word mangos. Even though I used the same letters in both exercises, the second was far easier than the first. So why was this? The reason is that in the first exercise, unless you spotted the anagram, you will have needed to remember each of the seven letters individually as seven separate chunks. Remember what I wrote previously that we can only store between 3 and 7 chunks of information at a time in short term memory. This would have taken up precious space in working memory and possibly even overwhelmed it. You may even have come up with an elaborate mnemonic like: Naughty Monkeys Sit Gloomily Around Oliver. This certainly would have helped reduce the burden on short term memory because you only would have had to remember the mnemonic as a single chunk, but you would have had to come up with it in the first place which would have taken some cognitive effort. In the second example a schema for the word "mangos" existed so you only had to remember that word which could be easily remembered as a single chunk.

Thus, we can circumvent the limits of short-term memory by developing schema of complex knowledge in long term memory. A schema constitutes only a single element in working memory. When we need to, we can access those potentially enormous sources of information as a single chunk without burdening our working memory and allowing us to perform a task or solve a problem without much conscious effort and even perform another task at the same time. Because our short-term memory can easily be overwhelmed with information, the principle of CLT is to ensure that we do not overload our working memory but at the same time build up our schema in long term memory. Therefore, when we teach, we need to bear in mind the limits of working memory. If working memory is overloaded then learning does not take place. Consider what this means for the children in you class when this happens. They will find the work difficult and issues around confidence and behaviour can then manifest.

There are three types of cognitive load that can be imposed on working memory: intrinsic load, extraneous load and germane load. Total load in working memory is the sum of these three loads. Let us look at each of these in turn. Firstly, intrinsic load is the inherent difficulty of learning something new. For instance, quantum physics is an innately difficult subject. There is nothing we can do to influence this load and is seen as necessary load. Secondly, extraneous load is load cause by the instructional method used to deliver the material. This is seen as bad load. Poor teaching approaches will increase extraneous load, good teaching approaches will reduce this. Therefore we need to use the best instructional methods that reduce extraneous load and later in this article we will look at appropriate approaches. Finally, germane load is the actual process of earning by transferring this into long term memory. Unlike extraneous load germane load is desirable. Thus when designing instructional methods we wish to find approaches that increase germane load and reduce extraneous load.

CLT favours instructional methods that are designed not to overload working memory as opposed than constructivist approaches where students learn things for themselves which goes against much of what Pappert wrote about when it came to learning to code. CLT develops domain specific problem solving as opposed to generic problem-solving skills. That is we develop surface structures for a range of topics and examples, thereby allowing us to develop deep structures that allow us to develop transferable skills.
A range of instructional techniques based on CLT have been developed that have been tested using random control experiments to so are robust. These techniques see either to increase germane load which is desirable, and / or reduce extraneous load which is not.
Worked examples - Worked examples are problems that have already been solved showing the steps. Worked examples have been demonstrated to In coding we might give a live demonstration to show how a particular concept works. As we are demonstrating we are writing and explaining each line of code. For instance, write a program that asks a user for their age to determine if they are a child or adult. So we might write something like this.
age = input()
if age lt 18:
print(“child”)
else:
print(“adult”)

Expertise reversal - Contrary to the worked example effect the expertise reversal effect shows that for experts worked examples are less effective, and independent problem solving is more beneficial. So as learners develop we need to move away from worked examples and give students opportunities to solve problems independently.
Completion problems – In the same way as presenting worked examples solving partially completed solutions also reduces cognitive load. Here we present a partially complete solution and ask students to modify the code to complete. For instance, write a program that determines the grade a student achieves, based on the mark attained, where A is 90 or above, B 70 is or above and C 50 or above and anything below is 50 is a fail. The partially completed code might look like this.
score = input()
if score ge 90:
print(“A”)
elif score lt 90 and score ge 70:
print(“B”)
The great thing about partially completed code is that it allows for differentiation by using code with different stages of completeness.

Split attention effect - The split attention effects requires us to process multiple sources of information in order to understand the material. This is considered extraneous load. Commonly this occurs where a diagram is be presented with a written explanation and the student needs to refer to the written explanation and the diagram to be able to understand the material. There is no way to understand the material based on a single source The student constantly needs to refer back to the written explanation and the diagram thereby placing a burden on cognition. By integrating information we can reduce the extraneous load. For instance, during instruction on a computer Cerpa et al. showed that all material should be presented on screen, rather than a manual and computer especially when then there is a high intrinsic load. Having a separate manual and computer screen leads to split attention. They found learning to be more effective if the instruction material was integrated on the computer screen.

Redundancy effect – Students do not learn effectively if there is superfluous information that is not needed for the learning objective. Often this takes the form of same information presented in different ways eg a diagram presented with the same written information, for instance. This is the opposite of the redundancy effect in that it is enough to present a single source of information and for students to present the material. Another mistake that teachers make, and I include myself in this, is to verbally present a slide that has lots of writing on. Redundancy can be detrimental to learning because students are overwhelmed with information even if they are being presented with the same information. This is demonstrated in a study by Lazonder who showed that minimal manuals were more effective in teaching students to use a word processor that standard instruction manuals. Redundancy implies that less is more. However, presenting the information in series may well be beneficial. For instance, a teacher may get pupils to read the slides on the projector and then give a verbal explanation to reinforce the learning instead of doing both at the same time.

Collective working memory effect – The collective working memory effect shows that when students work in groups the individual cognitive burden for each is reduced. Thus pair programming is a means by which we can make use of this effect, where pairs of pupils are teamed up to written code.

Modality effect – The modality effect is not to be confused with the redundancy effect. Where redundancy effect increases extraneous load, the modality effect increases the germane load by using both auditory and visual modes at the same time. These are two modes which when taken together enhance one another rather than competing for the same cognitive resources. So instead of written text and a diagram, diagrams should be presented by explaining verbally. The modality effects supports the use of multimedia and therefore flipped learning where students can access the learning before the lesson.

Variability effect – The variability effect supports the idea that students need to be introduced to a variety of question types. Mixing questions up with other topics will also help to enhance learning and increase germane load.

Goal free principle – The goal free principle involves problems without a goal in mind. More often these are mathematical problems, which can also apply in computer science. This reduces extraneous load because learners do not have to concern themselves with the goal which can take up some of the cognitive load. Sweller noticed that just because learners can solve a problem, it does not necessarily mean that then have understood what you would like then to learn because they have been focused on solving the problem and not identified the underlying pattern.

Isolated element effect – Difficult concepts need to be introduced in small isolated chunks as far as possible. We need to introduce concepts starting with simple ideas and progressing on to more difficult and complex ideas. This allows us to manage the intrinsic load. For programming we start with basic ideas like sequencing, moving onto variables and selection then onto more tricky concepts like iteration and procedures.

The effects presented here are not exhaustive and there are a range of other effects that have show to support learning including the imagination effect, human movement effect self-management effect, guidance fading effect and element interactivity effect.

References

Cerpa et al, 1996, Some Conditions Under Which Integrated Computer-Based Training Software Can Facilitate Learning, Journal of Educational Computing Research

Lazonder A.W. and van der Meij, H. 1993, The minimal manual: is less really more?, International journal of man-machine

Pappert, Mindstorms

Sweller, J et al 2019, Cognitive Architecture and Instructional Design: 20 Years Later, Educational Psychology Review, 31, 261-292

Teach Computer Science

Search This Blog

Cognitive Load Theory

References

Comments

Post a Comment

Popular posts from this blog

Mango Learning

Semantic Waves

How to support your students to write code