In Defense of Cutscenes
The technique of cutscenes, as typically found in story-based action games, is placed within a wider discursive problematic, focusing on the role of pre-written narratives in general. Within a theoretical framework raised by Espen Aarseth, Markku Eskelinen and Marie-Laure Ryan, I discuss the relations between the ergodic and the representational, and between play and narration. I argue that any game event is also a representational event, a part of a typical and familiar symbolic action, in which cutscenes often play a crucial part. Through cutscenes, the ergodic effort acquires typical meanings from the generic worlds of popular culture.
Ergodics, narrative, rhetoric, representation, genre, popular culture.
In his excellent article about configurative mechanisms in games, The Gaming Situation , Markku Eskelinen rightly points out, drawing on Espen Aarseths well-known typology of cybertexts, that playing a game is predominantly a configurative practice, not an interpretative one like film or literature (page 1-2). However, the deeply problematic claim following from this is that stories "are just uninteresting ornaments or gift-wrappings to games, and laying any emphasis on studying these kind of marketing tools is just waste of time and energy" (page 10). This is a radical ludological argument: Everything other than the pure game mechanics of a computer game is essentially alien to its true aesthetic form.
The theoretical premise of this argument was introduced in Espen Aarseth’s ground-breaking Cybertext , the first book to suggest a theory about play and narration as two distinct modes of discourse, not only located in literature, but as a dialectic fundamental to human activity in general. Through his concept of the ergodic Aarseth has provided an invaluable tool for investigating games as a unique form of expression, a distinct category of cultural activity not reducible to other and more established categories.
The ergodic signifies the general principle of having to work with the materiality of a text, the need to participate in the construction of its material structure. Some ergodic works lead us towards a fixed solution (a jigsaw-puzzle), others can be unpredictable and open-ended (an experimental hypertext novel). As a discursive mode, the ergodic can be contrasted to narrative discourse, where the user is invited only to engage in the semantics of the text and does not have to worry about its material configuration. Reading narrative is, as Eskelinen says, a purely interpretative practice. In narrative discourse the user is only a reader, not a co-constructor, not a player.
In any game, the ergodic is the defining discursive mode, not the narrative. This means that the user is basically involved as a player (doing ergodic work on the materiality of the text), not as a reader (interpreting on a semantic level). This may sound obvious (games are games), but it is an important theoretical premise if we are to avoid studying computer games as if they were just another narrative genre. Following the general perspective raised by Aarseth, both Jesper Juul  and Gonzalo Frasca  has developed more specifically game-oriented ideas about how to understand this basic distinction, centred around the Latin term ludus, both as a mode of textuality and as a mode of activity on a more general level. Correspondingly, all game research, including the study of computer games, would be labelled ludology.
There are good reasons why the ludic dimension of computer games deserves considerable theoretical attention. The field is still developing through an early stage, and it is important not to leave it open to affirmative appropriation by established disciplines and theories. Also, play and games as a cultural activity has received remarkably little attention, except as a sub-category within children’s development studies. On the other hand — does it follow that other modes of discourse in a computer game are accidental to the gaming experience and hence less interesting to computer game theorists? Should computer game studies be a sub-category of general game theory?
Radical ludology only takes us so far, mainly for two reasons:
a) A computer game is a computing game. Play is transformed by the computer technology, producing distinctive new forms of challenge and attraction that can not be understood through concepts and theories developed to investigate non-computerized play. Although not the subject of this brief paper, both the procedural logic and the spectacular responsiveness of the computer as a media technology has indeed created unique, although not entirely new, textual attractions.
b) A computer game (in the narrow, ludological sense of the term) frequently uses conventions of popular culture. In fact, game genres offering ergodic challenges within a fictional universe known from other media make up a large portion of the games that people actually buy and play today (sport and driving games being the other major commercial category). The marketing of these genres addresses the buyer primarily as a reader, packing their games with heavy intertextual references, most often based on expensive licences from the film industry. Already a standard convention, narration of events within this fictional universe is typically conveyed through cutscenes — cinematic sequences adressing the reader, putting the player on hold.
Within the radical perspective raised by Markku Eskelinen (inspired by Aarseth) this category of games can be nothing but a bastard discourse, an impure commercial practice that may well be appreciated by mainstream consumers, but cannot be taken seriously by computer game studies, other than as a discursive misunderstanding that probably will go away as games mature (or, admittedly, will live on due to the inherent corruption of mainstream entertainment). Expanding this logic, one could say that not only cutscenes, but any pre-written narrative, fixed path, scripted event or movie-based character is a sign of immaturity, a dependence on film parallel to the way much early film was dependent on the conventions of staged drama. A mature, involving gameplay would not need any "You are James Bond" or the like, especially not when forced upon the player through elaborate and game-spoiling cinematic narration.
The problem with this line of argument is that the (necessary) theoretical project of articulating the principles of a specific discursive conflict seems to be confused with an ideology of ‘pure’ gaming. This ideology prescribes an ideal receptional mode of games, a strictly no-nonsense, gameplay-oriented attitude typical among the real (or ‘hard-core’) gamers — the ‘cineastes’ of the game world. This counter-establishment ideology of gaming, partly rooted in the dark arcades of the late 70’s and early 80’s, partly rooted in hacker culture, is instinctively sanctioned by a new breed of oppositional scholars, vaguely identifying mainstream players and mainstream commercial games with established theory.
There is a deliberate confusion of ‘game’ as a discursive mode and ‘computer game’ as an actual cultural product. This implies, rather conveniently, that the relevance of narration in any given computer game can be denounced simply by referring to the fact that ‘games’ and narratives are two different things. Originally suggested as tools for the study of computer game aesthetics, the concepts of ergodics and ludology turns into self-contained arguments for advocating the purity of games, targeting a broad category of games (story-based, single player action games) as unworthy of serious attention.
Alternatively, the purist can be less categorical, and argue like Aspen Aarseth , seriously doubting the feasibility of games that try to integrate filmic narration:
"But there seems to be a limit to the usefulness of these kinds of modal crossovers, in that an audience will want the work to perform as either one or the other, and their own role to be either that of player or observer." (page 35).
Arguing like this, having no empirical evidence, or even indications, of what different kinds of audiences will actually want or not, is not as hazardous as it may seem. It is based on the assumption that two distinct discursive modes — of which the basic theoretical principles have been soundly established — cannot be mixed into new, stable, meaningful and enjoyable cultural practices. Consequently, a game (read: the cultural product) should stick to being a game (read: the discursive mode), in order to avoid being a confusing half-game. Therefore, we should not even bother to understand story-based action games as a phenomenon, as they are, and probably will always be, an artistic failure (— even if consumers continue to enjoy ‘modal crossovers’ like Metal Gear Solid, for empirical reasons we do not know).
Contrary to the project of the die-hard ludologists, my general concern is how the ‘alien’ dimension of cultural conventions can work as an integrated part of the gaming experience. In this paper, this is not an empirical question about modes of reception in actual users (however interesting), but rather a call for a stronger interest in a typical textual practice, at once configurative and interpretative, both unique and intertextual. The most accentuated expression of such an impure duality is found in the oscillation between cutscenes and play in typical story-based action games. These games offer a highly structured, linear and progressive gameplay, framed by a pre-written story.
THE GAMEPLAY OF CUTSCENES
What can possibly be the reason for cutting up the players configurative activities with close-to-parodic, B-movie-type cinematic sequences? Let me first briefly look at some gameplay considerations, questioning the assumption that cutscenes are irrelevant or destructive to gameplay.
Framing gameplay in a single, linear story is convenient. A game within this genre needs a system of progression (with a clear goal), a reward structure, and the regular introduction of new elements (levels, enemies, weapons, skills). A simple, action-based story takes care of all this, offering a narrative project as a unifying logic. This narrative is pragmatic, as far as it serves as a plausible excuse for the construction of an interesting gameplay. The cutscene is an efficient tool for conveying this story, being more visually interesting than purely verbal narration, and more uncomplicated than distributing the necessary information through scripted events. But cutscenes also have strengths of their own, serving gameplay functions that cannot be taken care of through other means.
A cutscene does not cut off gameplay. It is an integral part of the configurative experience. Even if the player is denied any active input, this does not mean that the ergodic experience and effort is paused. A cutscene is never truly ‘cinematic’, no matter how poorly implemented it may be. In any case, it can not avoid affecting the rhythm of the gameplay. This need not be in the negative sense. For example, in the arcade-inspired James Bond in Agent Under Fire  (a fun game that makes up in spectacle and atmosphere for what it lacks in gameplay), the numerous but short cutscenes provide regular moments of release from intense action. They create a characteristic rhythm in which the regular interruption/release is always expected. As a player you quickly learn the code, constantly being thrown rapidly in and out of bodily ergodic effort.
Still, a good cutscene has other qualities than just being ‘rhythmically’ well-implemented. Notably, it may work as surveillance or planning tool, providing the player with helpful or crucial visual information. Another rather well-established convention is the ‘gameplay catapult’, building up suspense and creating a situation, only to drop the player directly into fast and demanding action-gameplay.
Both techniques are elegantly implemented in the gangster-themed Grand Theft Auto III , a game successfully combining story-based mission structure and a more open-ended gameplay. This unusual mix is enabled through the impressive simulation of a big, populated city for the player to play around in. The game also illustrates a significant gameplay-function of good cutscenes: reward by entertainment. The short, stylistic and humorous mission briefings in GTA III become a part of the gameplay’s reward structure, independently of which new missions, items or weapons they may introduce. Some of them are good, some of them are not so good, but you will never know before you get there. This may not be a very sophisticated technique, but it adds extra motivation and satisfaction to the game. Chasing new cutscenes can be more fun than chasing bigger guns.
GTA III also features an interesting kind of in-game ‘hybrid’ car-jump sequences, actually generated in real-time but looking much like a spectacular cutscene, a result of the triggered slow-motion effect and change of camera angle. Being both a simulation (run by the physics engine) and a sheer spectacle to sit back and watch, these jumps provide a striking illustration of the duality of computer games: At once representation and action, reading and configuration, communication and event, mediation and play.
Doing away with the communicative dimension of computer games can only be a provisional, pragmatic tool, intended to highlight ergodic mechanisms. Neglecting reading and mediation altogether leads to an unnecessary pessimism towards the collaboration between narrative and the ludic as discursive modes. When Markku Eskelinen points to the fact that playing with a ball and telling stories are two different things [5, page 1], he is certainly making a relevant argument as far as discursive modes is concerned, but still his choice of example very conveniently hides the very contradiction (and sometimes dilemma) that makes computer games so fascinating as a peculiar textual practice: Unlike for example a game of football, they are representational events. A ball is not a sign, it is a ball. Football is not narrated, because it is not an utterance in the first place.
The easiest way to write off narration in a computer game, then, is to deny its relevance as an utterance. Being an act of signification, a computer game is what Kenneth Burke calls a symbolic action . Much in line with Wayne C. Booth’s general argument in the classic Rhetoric of Fiction , Burke claims that all utterances (including literature) are rhetorical, in the sense that they testify to a motivation, a purpose of some kind. Because they are symbolic actions, holding pre-configured, rhetorical meanings, computer game events are not events like any events in the world. The actions I perform when I play, because they also have meanings within a pre-configured fictional world, are a part of a symbolic action of someone else. I may not pay any attention to it (being too busy playing), but my own actions speak to me in a voice which is not mine.
Espen Aarseth, although stressing that game events is a mode of textuality, nonetheless constructs a non-representational event-space within computer games. Given this premise, he can argue, as Eskelinen has done after him, that narration and play cannot co-exist on the same level in discourse. He claims that narration can only be about the events in a game, and that thinking otherwise would be to confuse the representation of an event with the event itself [2, page 35]. He is obviously unwilling to grant any significance to the fact that events in a non-abstract computer game are already representational, and therefore communicative, as they happen. Symbolic action is inscribed in all representational events. In story-based games, this symbolic act includes a narrative act. Narrative meaning does not depend on the user to perform a rhetorical reconstruction.
THE PARADOX OF MAKE-BELIEVE
My interest in the pre-configured textuality of computer games is partly based on an empirical speculation: People buy and play computer games because they want the illusion of playing in fantastic, but familiar worlds. When they play, people do not generally want to be artists, expressing themselves in new ways. We do not want to make our own toys, even if our parents tell us so. Playing with a home-made (or imaginary) revolver is fine, but playing with an exact replica of a colt 45 is much cooler.
Computer games presenting elaborate pre-written universes, containing typical narratives, are rhetorical-ludological bastards because we want them to be. We do not just want to play (— as in football, chess or in Tetris), we also want to play make-believe. A ‘story-game’, as Aarseth calls it, offers a complete cultural configuration of a world — as much as it offers a specific ludic challenge. It is not just a set-up for play, but also an object of desire, a rhetorically structured illusion.
Any story-game is, of course, a contradiction. We want freedom of action, and we want to do the same as the hero from the movies does. The illusion of potent agency in a mythical world — as any representational event — is a paradox, creating conflict when we play. I remember playing "police and bank-robbers" when we were kids, and my younger brother caught me and my sister before we robbed the bank. He ruined the play. There is a pre-written narrative. Yes, we want to be free, to play, to master and to conquer, but we also want our actions to be meaningful within a mythical fictional universe. This is the paradox of make-believe, the contradiction between the given and the agency.
The inherent paradox of mimetic games is dramatically amplified by the computer as a toy, due to its strictly rule-based regime and immediate response, coupled with its ever-increasing representational powers. This is what creates the typical oscillation between cutsenes and play. Oscillation is a standard convention in story-based computer games, and my guess is that this form will not go away. On the contrary, it is becoming a new kind of artistic language, developing its own rules.
Not trying to understand this hybrid form (because games should not be like this) is to disregard computer gaming as a unified practice. If we (eventually) want to bring aesthetic analysis together with reception studies, we need speculative concepts and theories which make relevant hypotheses about what is actually going on when people play, theories addressing questions of understanding, identity and ideology. We must try to understand what happens when play meets mediation. The puristic ludological approach will leave us relatively helpless, forcing us to conclude that players are stupid, that they have been duped by the industry, or that they do not really like games.
THE DIEGESIS OF DRAMATIC EVENTS
Suggesting the classical concept of mimesis as a relevant tool for research on event-spaces, Marie-Laure Ryan  is more open to the question of narrative meanings in computer games than Eskelinen — and, possibly, Aarseth. However, she seems to agree about the crucial role of any eventual re-telling of computer game events. Using Plato’s generic distinction between mimesis and diegesis, she searches for symbolic meaning in the player’s diegetic act of narration. As a result, the relevance of narrative in computer games rather disappointingly hinges on the possibility of a diegetic re-telling that may never take place (page 15).
The concepts of representational event and symbolic action imply that we should focus less on diegesis as method of narrative presentation (that is, presenting by telling instead of imitating), and instead take a clue from Gerard Genette’s narratological adaptation of the term. To Genette, the diegesis is a fictional world, created by discourse . The term comes in especially handy when there are fictional worlds within fictional worlds. This diegesis is not a method of presentation, but a level in discourse. Narration, as a mode of discourse, is the act of creating this diegesis. This narration may be a patchwork of dramatic and diegetic methods of presentation.
Narrative theory traditionally tries to explain a dramatic narrative from the spectator’s point of view. As an actor in a play, enacting the events, your way of relating to the narrative would be very different. Also, a play may only be scripted on a general level, so that you would have to improvise the details. But still, as long as there is some kind of script limiting the range of events, the dramatic narrative would be a part of a narrative situation, establishing a diegesis in which certain events may take place. Actors do indeed act, do configure mimetic events, but they also interpret the symbolic action of an implied author.
The concept of implied author, according to Wayne C. Booth, is not about the physical and historical author, but signifies the author-in-the-text, the rhetorical voice implied by the text, a unifying focus of the reader’s interpretation. In a computer game, there is also an implied author speaking, creating the diegetic world through general descriptions, through simulations, and through the pre-written events. The ‘implied designer’ may occasionally reveal signs of individuality, but as a general rule, he takes the form of a familiar, generic voice. The cutscene is a part of this typified symbolic action.
The meaning of a representational event is partly established through the descriptive characteristics of the representation. Sniping a Colombian gangster in the head in GTA III is one thing. Doing the same to, say, a little girl would not be the same (consequently, there are no children in Liberty City). The difference between these two representations is partly rooted in their respective real-world references — a mean-looking adult male versus a pretty little girl — but also linked to a specific, typified universe constructed by the game. Within this familiar fictional ready-made, the mean-looking guy turns into (through a few, stylised hints) a very familiar gangster of the most ruthless, drug-dealing kind.
Also, the cartoonish, over-the-top style of this game-version of the gangster genre adds to the general feeling that aimless killing (for no ludic reason) is somehow permitted and cannot be taken very seriously. The cutscenes are highly stylised in terms of characters, dialogue, setting and cinematics, leaving the fictional world somewhere between the parody and the real thing.
In GTA III, it is very hard to define a representational level of ‘description’, independent of the narration that frames it. The gangster universe as a setting, and as a set-up for play, is partly founded on the formulaic stories told within it. The actions performed by the player, being representational events, become meaningful within the genre-based universe as a whole. ‘Story’ and ‘fictional world’ are two flips of the same coin — a pre-written, typified symbolic action, defining a typical identity of the playground. Even though we can imagine a similar fictional world without one, single, overarching story framing the gameplay, there would at least have to be recognisable narrative elements that could give some more genre-specific substance to an otherwise vague atmosphere of urban crime. In genre fiction, description evokes implied narratives, and narration evokes implied descriptions.
CONSTRUCTING THE REPRESENTATIONAL EVENT
The events taking place in a computer game are not just representational. As ergodic actions, they are also real events, establishing meanings which, by abstraction, can be imagined independently from the particular fictional universe in which they take place. Puzzle solving, exploring, confusion, dead ends, fragmentation, construction, destruction, search, loops, randomness, backtracking etc. are formal categories, bearing cultural significances irrespective of different actualisations in specific game-worlds. Also, this concept of de-contextualized ergodic events is a very useful tool, enabling us to conceptualise the workings of a representational event.
A representational event is established through an internal relation between the pre-written and the ludic event. When there is only an external relation, there is no representational event. In the latter case, the ergodic effort is all about the configuration of the material discourse, revealing no other relation to the semantics of this discourse. A jigsaw puzzle is the classic example: The puzzle completes the picture, and that’s all there is to it. There is no other relation between the puzzle-gameplay as such and the idea of building the Eiffel Tower.
Also, the gameplay-functions of the cutscene mentioned above, like the ‘surveillance’ or the ‘catapult’, are external relations. They do not contribute to the representational event as a symbolic action, but alter the structure of the ludic action. Similarly, the shapes of the Eiffel Tower provide recognisable patterns to the jigsaw-puzzle, making the ergodic challenge more accessible.
In any representational event, there is a metaphoric relation, an analogy, between the event and the representation. This internal relation between configuration and interpretation is not only found in games, but also in ergodic literature . In Michael Joyce’ classic Afternoon: a story, the ergodic effort of the reader is an actualisation of central ideas as they are (somewhat aphoristically) expressed in the lexia. An unstable cybernetic feedback loop is operating in a text celebrating the blessings of unstable textuality. This analogy makes the ergodic work-path a representational path. The representation is not an addition to the event, but absorbs it, enabling representational action jointly performed by the user and the machine.
Similarly, in a computer game, the cybernetic feedback loop between the player and the computer is also a representation of an action in a fictional world. The game event has a double function: it is both configurative and representational, operating on the material level as well as on the semantic level, referring to the machine (the toy) as well as to the fictional world.
In a computer game, a space of possible representational events is typically enabled through a simulation. The simulation is a procedural representation, representing rules, not events. In a strategy game like Sim City, the simulation establishes a characteristic analogy between the player-machine-relation and the player-world-relation: balancing parameters is like rational managment of a city. System A (the computer program) is analogous to system B (the city) — both systems being a specific interpretation of the other. When system B is interpreted in terms of system A, playing with the machine is the attraction . When system A is interpreted in terms of system B, playing with a fictional world is the attraction.
In action games, because of real-time procedural representation of physical laws, the feedback loop of action-response-action is not operating only on the intellectual level. In GTA III, the ergodic involvement is crucially a matter of bodily (and partly automatized) interaction. When you play with the machine, it is as if, by analogy, you are a body in a world. A cutscene is a part of the more general strategy of providing a particularity to this body, and to this world. By inviting established fictional genres into the game, the cutscene places you as a typical subject in a typical world.
Fictional genre-worlds are not the only meaningful analogies enabling attractive representational events. A lot of computer games, from fishing-simulators to sport games, work wonderfully without them. Nevertheless, given that the typical stories of popular culture play a part in modern peoples lives, addressing our dreams and anxieties, they will also play a part in the favourite worlds we design for mimetic play.
A DREAM COME TRUE
The cutscene may indeed be a narrative of re-telling, as Ryan maybe would say, but more importantly: It is a narrative of pre-telling, paving the way for the mimetic event, making it a part of a narrative act, which does not take place after, but before the event. The cutscene casts its meanings forward, strengthening the diegetic, rhetorical dimension of the event to come.
In GTA III, narration always takes place as it is enacted — whether it is re-told or not. When the boss of the Italian family tells me how important this next mission is, and how I am going to earn his trust on my way to mobster stardom if all goes well, the event-to-come is placed within the generic world of Goodfellas and Miller’s Crossing. It is much like when you day-dream before a football match, imagining that the special girl you like is going to be in the audience and that you score the winning goal, saving the day and winning her heart, just like in the movies. And then, it turns out she is actually there, and everything actually happens that way. The match would not be the same without the previous narration of your day-dreaming. It would not have been a dramatic moment. Because of your day-dreams, this particular match turned out to be a dream come true.
Just like day-dreaming, the fictional genre gives vague expectations a form. Ergodic effort acquires new meaning through typical stories evoked by the pre-written. The cutscenes of GTA III play well on the genre. They do not tell elaborate back-stories, or try to explain complicated conspiracies. Style, setting, characters and simple stereotypical events bring the mobster stories to life. As a player-reader you are not just guided, you are spoken to. A recognisable rhetoric meets you; the voice of a genre. This voice is your dialogical partner, in a mythical world especially made for you. The distinct rhetoric of a fictional genre is perfectly suited to the single-player experience.
Playing story-games is an option. Many games do not cast any specific narrative expectations. Defending the importance of the pre-written is not based on epistemological claims about the all-encompassing narrative. It is based on an assumption about the role of stereotypes in our serious lives, about how the myths of popular culture play a part in our ergodic pleasures.
The conflict between narration and play is not a question of discursive levels — as if the first can only be about the other — but a conflict of agency. There is a balancing, and a struggle, between the agency of the story-game and the agency of the player. The mutual project of make-believe binds the two movements together. This project is a very persistent paradox, insisting on the combined pleasures of ergodic operation and symbolical seduction.