Music psychology in videogames


Video games has had a rapid technological evolution along with the advancement of data storage and processing power. The consequence for the role and function of music in games has likewise had a technological evolution, while the underlying principles of music as a conveyor of information and emotions has remained relatively unchanged.

As shown by Grimshaw et al. (2013), music can be a source of obscurity in a game and jeopardize the player’s ability to enter into a state of flow[1]. It should therefore be in the interest of music producers for games to understand how to use modern resources in a way that is effective in achieving the function of music, while being acutely aware of the context it is meant to be presented in. To understand this, we can look at what psychologists tell us about music and emotions (Juslin, 2019) and cross-reference it with ludomusicology[2] (Summers, 2018) and communication theory (Al-Fedaghi, 2012).

What can psychology and communication theory teach us about producing music for a videogame?

Music as information about states

As a basic principle, music is used in videogames to signify and reinforce aspects of the game that are both seen and/or unseen (Summers, 2018). The mechanics, dynamics, and aesthetics[3] of a videogame can all be informed through music, and Summers proposes a musicological theory for describing the relationship music can have with a game.

A result of creating rules for a game is that it naturally produces a set of states within the game. The states can in turn be categorized by their relationship with each other – win conditions, time constraints, dramatic narrative etc. – and thus we have sources of information that the player might benefit to be aware of.

In Musical Emotions Explained, Patrik Juslin (2019) explains that listeners of music tend to agree with what type of basic emotions[4] a piece of music is meant to convey. And that more complex expressions of emotions in a style of music are created with a degree of similarity, which a “qualified listener” will accurately be able to interpret the meaning of.

Further, Juslin (2019) explains how music is especially effective in expressing the transformation between emotional states over time. For instance, you can imagine how music can express a change from calm to stressful by adding rapid rhythmic elements over time. Juslin refers to this category of psychological experience as vitality affects, a state of emotional change. It is also argued that the reason why music is used so thoroughly in human culture, from religious rituals to commercial advertising, is because of its contagious tendency to communicate these transformations.

[1] Csikszentmihaly (1990).

[2] A term used to describe a field within musicology that focus on the context of videogames (Hocking, 2007).

[3] Mechanics, Dynamics, and Aesthetics are the three core principles of through which a game is experienced by the player, as formalized by Hunicke, LeBlanc & Zubeck (2004).

[4] The basic emotions that Juslin lists are: Happiness, sadness, anger, fear, and tenderness.

Order effects

It can be argued that vitality affects are one of the most important results of musical expression. Let us therefore look at what implications this might have on the way we perceive music when listening.

Juslin (2019, p. 124) points out that one of the challenges when conducting empirical studies of music psychology, is that it the order in which a listener hears two consecutive pieces of music will have a substantial effect on how each of those pieces of music are perceived by the listener. This is referred to as order effects, and while it is a problematic phenomenon when trying to achieve reliable results in qualitative studies, it also tells us something very fundamental about creating music.

Juslin explains that a genre of music have a set of moderating variables (2019, p. 96) which is unique for that specific style of music. And argues that these moderating variables are what distinguishes genres from each other. For example, a speed-metal song which is played in a slow tempo for effect might still be significantly faster than a piano ballad played in a faster tempo than what is common for piano ballads.

When analyzing music for videogames it becomes apparent that, despite the fact that videogames has a non-linear aesthetic, all music is played back in a linear fashion (Summers, 2018, p. 51). This means that the music which has been played just before a current piece will have substantial implications on how the music is perceived.

Let us take an example of how order effects can have unintended consequences in a game.

A game developer is convinced that a piece perfectly communicates an emotion for a game event, let’s say “hectic”, but when the game’s mechanics happens to cue a piece of music which contains similar moderating variables as the “hectic” music, then the “hectic” music will not arouse the same level of emotions as may have been intended. This assumption results in a phenomenon which is called ludonarrative dissonance,[5] where the narrative intention of the game is obscured by the rules and requirements that communicates the narrative.

Thus, it would be helpful for game developers and music producers to use an analytical method that can map out and unveil instances of music that could lead to some problematic sequences during gameplay.

[5] Clint Hocking popularized this term when he criticized the game Bioshock for symbolizing its protagonist as a pacifist through the game’s narrative, while having a game design centered around killing hordes of enemies and not addressing this contradiction. (2007)

The 2.5 principle

When music is communicated to the player of a game, it comes packaged in a way that transmit the information. This packaging is defined by Al-Fedaghi (2012) as a carrier, meaning that whatever information is sent to the receiver is done so together with the semantic codes that are necessary to decipher that information. However, the information will be obscured when the level of noise becomes too high in the transmission.

Sound designer and film editor Walter Murch, who worked with movies such as Apocalypse Now and the Godfather-trilogy, provides an anecdotal explanation on how much audio information an audience is capable of deciphering, which he has coined “the two-and-a-half principle” (Murch, 2012). Murch’s generalization says that with each level of abstraction, the listener can at most handle 2.5 different elements at the same time before the whole signal becomes “noise” and undiscernible.

The code in a signal relies on the receiver’s ability to understand the connotation and denotations that is being transmitted, and that code is continuously updated when new technology allows for new ways (carriers) of transmitting information such as music. Walter Murch explains that when he worked with the sound design for Apocalypse Now, there was a strong push from the production team to use new surround technology which allowed the audience to be immersed in a spatialized soundscape. Grimshaw et al. (2013, p. 306), shows that studies confirm how playing a game with a 5.1 surround system enhances the players’ feeling of immersion and results in the game being more fun to play, versus playing the game in a stereo system.

It should therefore be in the interest of game developers and music producers alike to understand what connotations in music and sounds will translate to new immersive audio technologies, such as Ambisonics and Dolby Atmos, and what new types of codes and associations it will spawn. Likewise, it should also be in their interest to understand how “noise” is being generated during the creation and transmission of this higher level of abstraction.


We can see that context is a ruler for how music is perceived, regardless of its “inner workings”. It can be argued that music for videogames should be produced and played back in a way that takes advantage of order effects, instead of having its effect impact the musical intention negatively. Therefore, it should be in the interest of game developers to explore methods that can unveil circumstances where ludonarrative dissonance occurs with the game music so that it instead becomes successful with its purpose.

It also becomes clear that the medium which transmits the music will add some level of noise to the signal, because of the limitations of the medium’s carriers. I think this is an interesting problem when looking at modern immersive audio solutions, such as Ambisonics and Dolby Atmos, that uses some level of backwards compatibility. Meaning that even if a product has been created for the highest most level of abstraction in these technologies, the same audio will be compatible with older technologies, such as stereo and mono. By understanding Walter Murch’s principle (2012), this should also be applicable to all levels of abstraction in immersive audio.


Al-Fedaghi, S. (2012). A conceptual foundation for the Shannon-Weaver model of communication. Medwell Journals.

Csikszentmihaly, M. (1990). Flow: The Psychology of Optimal Experience. Harper & Row.

Grimshaw, M., Tan, S. L., & Lipscomb, S. D. (2013). Playing with sound: The role of music and sound effects in gaming. | Tan, S. L. (Red.), Cohen, A. J. (Red.), Lipscomb, S. D. (Red.), Kendall, R. A. (Red.). The psychology of music in multimedia (1st edition, p. 289-314). Oxford University Press.

Hocking, C. (October 7, 2007). Ludonarrative Dissonance in Bioshock. Click Nothing: Design from a long time ago.

Juslin, P. (2019). Musical Emotions Explained (1st edition). Oxford University Press.

Murch, W. (January 25, 2012). Walter Murch: Hollywood sound design [Video]. YouTube.

Summers, T. (2018). Understanding video game music (1st edition). Cambridge University Press.