Species difference in the timing of gaze movement between chimpanzees and humans.

Fumihiro Kano, Masaki Tomonaga
DOI: 10.1007/s10071-011-0422-5

Abstract

How do humans and their closest relatives, chimpanzees, differ in their fundamental abilities for seeing the visual world? In this study, we directly compared the gaze movements of humans and the closest species, chimpanzees, using an eye-tracking system. During free viewing of a naturalistic scene, chimpanzees made more fixations per second (up to four) than did humans (up to three). This species difference was independent of the semantic variability of the presented scenes. The gap–overlap paradigm revealed that, rather than resulting from the sensitivity to the peripherally presented stimuli per se, the species difference reflected the particular strategy each species employed to solve the rivalry between central (fixated) and peripheral stimuli in their visual fields. Finally, when presented with a movie in which small images successively appeared/disappeared at random positions at the chosen presentation rate, chimpanzees tracked those images at the point of fixation for a longer time than did humans, outperforming humans in their speed of scanning. Our results suggest that chimpanzees and humans differ quantitatively in their visual strategies involving the timing of gaze movement. We discuss the functional reasons for each species’ employing such specific strategies.

Keywords

Chimpanzee, Eye-tracking, Fixation duration, Gap-overlap

Introduction

The visual strategy common across human and non-human primates includes the alternation of fixation and saccade; by fixation, we keep certain parts of the visual field fixed on the fovea where retinal acuity and colour sensitivity are optimised; by saccade, we bring new parts of the visual field onto the fovea using rapid eye movements. Although we have an impression that we see all parts of visual field with full clarity and resolution, in fact we obtain high-quality information only from the fovea. Therefore, a trade-off exists between the amount of time gaze is held still and the need to shift the gaze to attend to competing information that might be prioritised by the fovea for analysis. Hence, the timing of gaze movements reflects the way in which visual information is retrieved and analysed.
Human adults normally make fixations at a frequency of up to three per second, and thus, the duration of each fixation lasts for up to 330 ms. However, the timing of gaze movements varies significantly around this mean within and across individuals (Henderson and Hollingworth 1999). Studies have shown that the timing of gaze movements varies depending on ongoing cognitive processing activities (Findlay and Walker 1999). For example, when reading text, humans increase the duration of their fixations as the text becomes more demanding semantically (Rayner 1998). It is also known that individuals show different timings of gaze movement depending on their developmental stage and neurological condition. For example, human infants detach their gaze more slowly from the fixated object in response to peripheral stimuli, as they are in an earlier stage of development (Hood and Atkinson 1993). Autistic children are also known to have difficulty in detaching their gaze from the fixated object compared with typically developing children (Landry and Bryson 2004).
Several studies have been conducted in non-human primates. Studies have shown that during free viewing of a naturalistic dynamic scene, macaque monkeys shifted their gaze at an earlier time than did humans and thereby scanned the scene more rapidly (Berg et al. 2009; Shepherd et al. 2010). Kano and Tomonaga (2009) have found similar results in chimpanzees (Pan troglodytes), the species closest to humans. During free viewing of a naturalistic static scene, chimpanzees scanned the scene more rapidly than did humans. Therefore, humans may have a differential pattern for the timing of gaze movement compared with other primate lineages.
However, in contrast to the long history of eye-movement studies in macaques (e.g. Fuchs 1967), few studies have examined gaze dynamics in non-human great ape species. Therefore, studies are necessary to fill the phylogenetic gap between macaques and humans for the benefit of the comparative evolutionary perspective on this topic. Given the established knowledge in the human and macaque literature showing that the timing of gaze movement reflects neurological and cognitive processes (e.g. Findlay and Walker 1999), studies on primate groups closest to humans should provide insight into perceptual and cognitive evolution in human and non-human primates.
Chimpanzees are the closest species to humans genetically and are known to be similar to humans in a number of their visual abilities, such as visual acuity (Matsuzawa 1990) and colour (Matsuno et al. 2004; Matsuzawa 1985), form (Matsuzawa 1990) and face perception Parr et al. (1998, 2009); Tomonaga 2007; Tomonaga and Imura 2009). Recent evidence also suggests that chimpanzees are comparable to humans in tests involving short-term memory and the functional visual field (Inoue and Matsuzawa 2007).
Previous studies that compared the gaze movements of chimpanzees and humans using an eye-tracking system (Kano and Tomonaga 2009, 2010, 2011) found that during free viewing of a naturalistic scene, the species were very similar in terms of the location of fixations over the scene (‘where’ to shift gaze). For example, they concentrated fixations on socially informative regions such as faces of animals. A subsequent study (Kano and Tomonaga 2011) found that chimpanzees and humans did not significantly differ in the extent to which low-level visual properties (e.g. colour, contrast, orientation) influenced their patterns of fixation distribution.
On the other hand, as mentioned above, these previous studies found a difference between chimpanzees and humans in the duration of each fixation (‘when’ to shift gaze); chimpanzees employed shorter fixations at each location and therefore scanned a scene more rapidly than did humans. However, although this is also the case for previous studies in macaque monkeys, the finding is somewhat confounded by the fact that the scene contained various kinds of local objects (face, body, tree, food, etc.) and that chimpanzees and humans showed different fixation durations for different kinds of local objects (e.g. longer fixation duration for faces in humans than in chimpanzees). Therefore, in this study, we prepared stimuli depicting various subject matters and asked whether the species difference reflects specific responses to the particular subject matter of the scenes/objects or more generalised responses to visual stimuli. In addition, it remains unclear whether the species difference in the fixation duration results from differential sensitivities to the peripherally presented stimuli per se or from specific behavioural strategies used to deal with the competition between central and peripheral stimuli in their visual field (i.e. the trade-off between fixation and initiation of a saccade). We addressed this issue using a version of the gap–overlap paradigm, which was originally devised to examine visual development in human infants (Hood and Atkinson 1993). Finally, we asked whether there is a type of scene where the chimpanzees’ gaze strategy works more beneficially to fixate local objects than does humans’ gaze strategy. Specifically, we asked whether chimpanzees better track local objects at the point of fixation than do humans when presented with a dynamic scene where local objects appear/disappear successively in random positions at a chosen presentation rate.

General method

We conducted four experiments in this study. Chimpanzees and humans were tested using the same experimental procedure (with a few exceptions, see below) to enable a direct comparison. They were allowed to freely explore stimuli presented on a screen. Previous studies (Hattori et al. 2010; Hirata et al. 2010; Kano and Tomonaga 2009, 2010) have established that eye-tracking recordings in chimpanzees and humans are both accurate and comparable (for the details, see below).

Subjects

Six chimpanzees (five females, one male; aged 8–31 years) and 18 humans (12 women, six men; aged 18–31 years) participated in Experiments 1, 2 and 4. Experiment 3 was conducted as a follow-up experiment to Experiment 2, and the same six chimpanzees and the same 10 humans participated. The chimpanzees were members of a social group consisting of 14 individuals living in an enriched environment with a 700-m2 outdoor compound and an attached indoor residence (Matsuzawa et al. 2006). The outdoor compound was equipped with 15-m-high climbing frames, small streams and various species of trees (Ochiai and Matsuzawa 1997). Access to the outdoor compound was available to each individual every other day during the day. Daily meals included a wide variety of fresh fruits and vegetables fed throughout the day, supplemented with nutritionally balanced biscuits (fed twice daily) and water available ad libitum. No deprivation of food or water was experienced by the chimpanzees during the study period. Care and use of the chimpanzees adhered to the 2002 version of the Guidelines for the Care and Use of Laboratory Primates of the Primate Research Institute, Kyoto University. The experimental protocol was approved by the Animal Welfare and Care Committee of the Institute and by the Animal Research Committee of Kyoto University. Informed consent was obtained from all human participants. The chimpanzees had extensive experience observing images displayed on a computer screen and conducting perceptual and cognitive tasks using a touch-sensitive screen (Matsuzawa et al. 2006). However, neither chimpanzees nor humans had been explicitly trained for scanning scenes or shifting their gaze rapidly.

Apparatus

Both species used the same apparatus for the purpose of direct comparison between the species. Participants sat still and unrestrained in an experimental booth, and the eye-tracking apparatus and the experimenter were separated by transparent acrylic panels (see Fig. 1). A table-mounted eye tracker measured participants’ eye movements using infrared corneal reflection techniques (60 Hz; Tobii X120, Tobii Technology AB, Stockholm, Sweden). This eye tracker has wide-angle lenses (±40° in a semicircle above the camera) and thus obviated the necessity to restrain the subjects. The eye tracker and the 17-inch LCD monitor (1,280 × 1,024) were mounted on a movable platform, and the distance between the platform and the participants was adjusted to the point at which the gaze was most accurately recorded (approx. 60 cm). This flexible adjustment of the distance between the platform and the participants enabled us to record the gaze movements of chimpanzees with their heads unrestrained. The participant’s gaze was recorded as a relative coordinate with respect to the monitor size (i.e. not as the gaze angle). One degree of gaze angle corresponds to approximately 1 cm on the screen at a typical 60-cm viewing distance.

Fig. 1 A chimpanzee on an eye tracker
As a result of the training conducted during the study performed by Kano and Tomonaga (2009), the chimpanzees were already skilled at sitting still in front of an eye tracker and looking upon request at a fixation point that appeared on the screen. Five-point calibration was conducted for humans; for chimpanzees, the calibration points were reduced to two in order to decrease the time required for each calibration process. However, for chimpanzees, the calibration was repeated until the maximum accuracy was obtained. The accuracy was checked in both species by presenting five fixation points on the screen. Using these calibration procedures, six participants of each species were tested for accuracy and the errors were found to be small and comparable between the species (average errors of 0.62 ± 0.06 and 0.52 ± 0.05 cm ± s.e.m. on the monitor for chimpanzees and humans, respectively). The drift (the calibration error due to changes occurring in the eye surface) was checked occasionally by presenting the fixation points to the participants again.

Procedure

Procedural differences in testing chimpanzees and humans were minimised to allow for direct comparisons between the two species. In each trial, an image was presented after participants had focused on a fixation point that appeared in a central position on the screen. Participants were then allowed to freely view the images. Participants of both species rarely kept their gaze at the fixation point after the image presentation (i.e. spontaneous scanning was almost certain). Unlike some well known reports on macaque eye tracking (Gothard et al. 2004; Mendelson et al. 1982), we never observed fear responses to the images in the chimpanzees in either this or previous studies (Kano and Tomonaga 2010). The entire session was conducted on a single day for humans and was spread out over 12 days for the chimpanzees to decrease the time required for each experiment per day. On each day, the chimpanzees were presented with six stimuli. Daily experiments lasted 10–15 min for the chimpanzees and 20 min for humans. Following completion of the session, the human participants received book coupons as payment, whereas the chimpanzees received a small piece of apple after each trial. The reward was given to the chimpanzees in return for the initial fixation at the beginning of the trial and was provided independently of their viewing behaviour during the image presentation. Overall, these differences in procedure were designed to increase the motivation of both species to participate in the experiment from day to day and to sustain interest during the presentation of each image.
A fixation was scored if the gaze remained stationary within a radius of 50 pixels for at least 75 ms (more than five measurement samples). Otherwise, the recorded sample was defined as part of a saccade. We excluded samples recorded during the first 200 ms because it has been established that the spontaneous gaze movement response to the stimuli occurs after that period in primates (as shown by Experiment 2).

Experiment 1

We first presented naturalistic scenes containing various kinds of subject matter (Fig. 2b) including social (animal), ecologically meaningful (fruit), neutral (object) and meaningless (texture) stimuli to examine how chimpanzees and humans differ in their patterns of scanning the scenes and whether semantic categories might influence differences between the two species.

Fig. 2 Species differences in fixation duration. a Scan patterns in a chimpanzee and a human during free viewing, superimposed on the presented scene (toned down for clarity). The circles represent fixations (scaled in size to their durations, which are shown in milliseconds), and the lines represent saccades. The picture was presented for 3 s. b Examples of still images presented in Experiment 1. c Average fixation duration (ms) for chimpanzees (n = 6) and humans (n = 18). Error bars denote the standard error of the mean (s.e.m.)

Method

Stimulus types consisted of animal (a scene containing several animals, either chimpanzees or humans), fruit (a scene containing several kinds of fruit), object (a scene containing several nonliving objects) and texture (a textured image of a natural object). Six stimuli were prepared for each stimulus type (24 stimuli in total). Stimuli measured 1,000 × 750 pixels in size (26 × 20° at a typical 60-cm viewing distance) and were shown for 3 s. The presentation order of stimulus types was randomised for each participant. The area of interest (AOI) was defined with respect to the face regions in the animal scenes. To define AOI, polygons were drawn slightly (approx. 20 pixels) larger than the actual outlines to avoid errors in gaze estimation. After the whole session was completed, we repeated trials in which the gaze data had been lost for longer than 600 ms due to participants looking away from the monitor or blinking more than twice. We then replaced these trials with the new trials if those were completed satisfactorily; if not, we excluded these trials from the analysis. The data loss totalled 3.7 and 5.0% of all trials for chimpanzees and humans, respectively (no bias for a particular stimulus type). Chimpanzees and humans viewed the monitor for equal durations during successful trials.

Results and discussion

We found that chimpanzees scanned the scenes more rapidly; that is, the average fixation duration was shorter in chimpanzees than in humans (ANOVA, F 1, 22 = 16.68, P < 0.001, μ2 = 0.43; Table 1, Fig. 2c). Additionally, we established that this difference in average fixation time was independent of the subject matter of the scene because there was no significant effect of stimulus type on average fixation duration (F 3, 66 = 1.13, P = 0.34, μ2 = 0.04) or on the species difference (species × stimulus type, F 3, 66 = 0.87, P = 0.46, μ2 = 0.03). This does not imply that chimpanzees and humans neglected the subject matter of the scenes, as the fixations of both species were concentrated on socially informative regions, such as faces in the animal category (Fig. 3). Both chimpanzees and humans showed a high proportion of face fixations (average 4.7 ± 0.47 and 4.5 ± 0.20 fixations on the face (±s.e.m.); 49 ± 5.3 and 63 ± 3.0% (±s.e.m.) in all scene fixations, respectively). Therefore, it was concluded that the observed species difference in the duration of fixations reflected the general tendency employed by each species to scan the visual scenes.
Table 1 Species differences in eye-movement characteristics between chimpanzees and humans (standard deviations by trial and individual variance)
Chimpanzees
Humans
Difference
    t
P
Exp. 1
Average fixation duration (ms)
234 (20.0; 16.9)
323 (32.3; 67.9)
89**
3.07
0.006
Total number of fixations
9.8 (1.08; 0.43)
7.8 (0.65; 0.95)
2***
4.88
<0.001
Number of grids fixated
7.4 (1.15; 0.50)
6.1 (0.68; 0.84)
1.3**
3.68
0.001
Average numbers of fixations on each grid
1.31 (0.14; 0.06)
1.28 (0.08; 0.09)
0.03
0.74
0.46
Saccade size (pixel)
252 (37.8; 12.5)
241 (26.5; 30.7)
11
0.86
0.39
Exp. 2
Response time under gap condition (ms)
216 (10.9; 22.7)
229 (12.2; 29.0)
13
1.00
0.32
Response time under overlap condition (ms)
232 (20.5; 28.3)
334 (45.4; 82.2)
102**
2.92
0.008
Gap effect (ms)
16 (20.8; 11.8)
105 (47.6; 67.6)
89**
3.14
0.005
Exp. 3
Response time under gap condition (ms)
187 (19.9; 16.9)
202 (24.0; 39.7)
15
0.91
0.38
Response time under overlap condition (ms)
186 (19.2; 20.4)
296 (63.6; 100.7)
110*
0.22
0.02
Gap effect (ms)
1 (25.3; 18.0)
94 (55.7; 91.9)
95*
0.17
0.027
Exp. 4
Number of images fixated
9.3 (0.38; 0.76)
6.1 (0.61; 1.97)
3.2***
3.86
0.001
Total viewing time for images (ms)
2,051 (173; 144)
1,498 (128; 368)
553**
3.54
0.002
Average fixation duration (ms)
213 (11.6; 22.0)
294 (22.5; 72.3)
81*
2.67
0.014
Total number of fixations
11.4 (0.39; 1.01)
8.8 (0.47; 1.54)
2.6**
3.78
0.001
Average number of fixations on each image
1.08 (0.02; 0.03)
1.10 (0.04; 0.08)
0.02
0.94
0.35
Saccade size (pixel)
405 (32.1; 15.8)
337 (29.8; 42.5)
68**
3.71
0.001
Note df = 22 for Exp. 1, 2, 4; df = 13 for Exp. 3. *** p < 0.001, ** p < 0.01, * p < 0.05

Fig. 3 Spatial distribution of fixations by chimpanzees and humans for an animal scene. Fixations by six members of each species are superimposed on the presented scene (toned down for clarity)
We also confirmed that chimpanzees’ engaging in shorter fixations indicated that they scanned the scene more widely than did humans during the presentation period (Figs. 2a, 3). This was established by dividing the scene area into 8 × 6 grids and counting the number of grids covered by the fixations (Table 1). Because the number of repeated fixations on each grid was the same for chimpanzees and humans (Table 1), the shorter fixations by chimpanzees (thus, larger numbers of fixations) allowed fixation on a greater number of cells by the chimpanzees compared with the humans. The benefit to chimpanzees of scanning scenes this way was further examined in Experiment 4.

Experiment 2

In this experiment, we hypothesised that the species difference in fixation duration was due to a differential strategy for coping with the competition between the two activities of fixation and initiation of a saccade. That is, chimpanzees engage in shorter fixations because they are bias towards the latter rather than the former in their strategy and thus initiate a saccade sooner than do humans (hypothesis 1). A possible alternative explanation is that, compared with humans, chimpanzees might perceive peripheral visual stimuli as more salient and thus initiate a saccade in a more sensitive way (hypothesis 2), although this explanation seems unlikely given the similarity in the perceptual capacities of the two species (see above). To test these hypotheses, we used a version of the gap–overlap paradigm, which was originally devised to examine visual development in human infants (Hood and Atkinson 1993) (Fig. 4a). Two conditions, gap and overlap, were employed. Under these conditions, a central (fixated) and peripheral target stimulus appeared in that order (Fig. 4a). Under the gap condition, the central fixation stimulus disappeared shortly before target presentation, but under the overlap condition, the central fixation stimulus remained. The time between target presentation and initiation of a saccade to the target was measured (i.e. saccade latency). It is well known that in humans, saccade latency to peripheral stimuli tends to be longer under the overlap compared with the gap condition due to the presence of competition between the two activities involving fixation and initiation of a saccade under the overlap condition (known as the ‘gap effect’) (Findlay and Walker 1999). Findlay & Walker’s model, which is based on extensive evidence from macaque and human studies, assumes that resolving such competition is a time-consuming process and that various perceptual and cognitive events occurring centrally or peripherally (such as processing of central stimuli and perceiving the sudden appearance of peripheral stimuli) influence this process at various levels from automatic, automated (or habitual) to voluntary. During competition between fixation and initiation of a saccade, a saccade is generated when the latter overcomes the former. Therefore, in this study, if the species differed in their way of dealing with competition between central and peripheral stimuli (i.e. activities involving fixation and those involving initiation of a saccade), it is expected that they will differ in the degree of gap effect (no matter what level of process is involved; hypothesis 1). If, instead, the species difference results from their differential sensitivities for the peripheral target per se, it is expected that chimpanzees and humans will differ in the saccade latency under the gap condition (hypothesis 2).

Fig. 4 Species differences in saccade latency. a Gap–overlap paradigm in Experiment 2. b Saccade latency (ms) under the gap and overlap conditions. Error bars denote s.e.m
In Experiment 2, two stimulus types, face and object, were prepared to separate the effect of stimulus type on the saccade latency from that of condition (gap and overlap). Face and object were treated as different stimulus types because these two stimulus types were attended to differently in both species, as shown in Experiment 1 and previous studies (Kano and Tomonaga 2009; Tomonaga and Imura 2009); faces were predominantly fixated when presented simultaneously with objects within a scene. The same stimulus type (either face or object) was presented at central and peripheral locations within a trial. We expected that species difference would appear for both types of stimuli, following the results obtained in Experiment 1.

Method

A central fixation stimulus and a target stimulus (approx. 180 × 180 pixels) were presented in that order (approx. 340 pixels apart from each other; 9° at a typical 60-cm viewing distance), and the target was presented randomly to the left or right 560 ms after the onset of the trial (Fig. 4a). The time between target presentation and initiation of a saccade to the target was measured (i.e. saccade latency). Under the gap condition, the central fixation stimulus disappeared 260 ms before target presentation, but under the overlap condition, the central fixation stimulus remained. Each trial lasted for 1.5 s in total. The stimulus type was either face or object within a trial. More than 50 exemplars each were prepared for face (equal number of exemplars for chimpanzee or human faces) and object, and different exemplars were used for central and peripheral locations within a trial. Each exemplar was randomly selected from the entire exemplar pool. Six trials were prepared for each condition and stimulus type (24 trials in total). The presentation order of conditions and stimulus types was randomised for each participant. After the completion of the whole session, we repeated trials in which participants shifted their gaze before the presentation of the peripheral target or failed to shift their gaze to the target. We then replaced them with new trials if those were satisfactory; if not, we excluded these trials from the analysis. The data loss totalled 2.7 and 1.1% of all trials for chimpanzees and humans, respectively (no bias for a particular stimulus type or condition).

Results and discussion

As previously reported, the saccade latency was found to be longer under the overlap than under the gap condition in humans (Fig. 4b; 106 ms, F 1, 17 = 43.81, P < 0.001, μ2 = 0.72). In chimpanzees, however, this effect was much smaller (17 ms, F 1, 5 = 9.07, P = 0.030, μ2 = 0.64), resulting in a significantly greater gap effect in humans compared with chimpanzees (species × condition, F 1, 22 = 9.52, P = 0.005, μ2 = 0.30). Moreover, a significant species difference was found under the overlap condition (F 1, 22 = 8.18, P = 0.009, μ2 = 0.27), but not under the gap condition (F 1, 22 = 0.95, P = 0.33, μ2 = 0.04). In these results, no significant effect of stimulus type was found (P > 0.05).
Therefore, the species difference appeared in the difference between the gap and overlap conditions (i.e. gap effect) rather than in the gap condition, supporting hypothesis 1. That is, chimpanzees and humans differ in their behavioural strategy for dealing with the competition between central and peripheral visual stimuli in their visual field rather than in their sensitivity to the peripheral stimuli per se. In Experiment 2, we did not observe any significant effect of stimulus type. This may be because the central and peripheral stimuli of the same stimulus type competed equally with each other and, thus, did not strongly influence the saccade latency. An alternative explanation is that face and object were attended similarly by both species, although this is unlikely given the results obtained in Experiment 1. Experiment 3 was conducted to eliminate this possibility more thoroughly.

Experiment 3

In Experiment 3, a face was presented centrally along with an object at the periphery (face–object pair) or vice versa (object–face pair). As faces are known to attract the attention of both species more strongly than objects (see above), the competition is expected to be biased centrally in the former and peripherally in the latter pair. We thus expected that saccade latency would increase in the former and decrease in the latter pair. A critical question with regard to species difference is whether the species difference in the gap effect (i.e. an interaction between species and condition) would remain even when the competition between central and peripheral stimuli was experimentally biased in such a way.

Method

In Experiment 3, a face was presented centrally along with an object image at the periphery (face–object) or vice versa (object–face). Six trials were prepared for each condition (gap and overlap) and stimulus pair (face–object and object–face; 24 trials in total). The other procedures were the same as in Experiment 2. The data loss totalled 3.4 and 1.3% of all trials for chimpanzees and humans, respectively (no bias for a particular stimulus pair or condition).

Results and discussion

As in Experiment 2, a significant interaction between species and condition (gap and overlap) was found (Fig. 5; F 1, 13 = 4.66, P = 0.050, μ2 = 0.264). In addition, a significant main effect of stimulus pair was found (F 1, 13 = 14.25, P = 0.002, μ2 = 0.52). However, no interaction was found between stimulus pair and species, condition or both (P > 0.05).

Fig. 5 The effect of stimulus pair on saccade latency (ms) in the gap–overlap paradigm (Experiment 3). Error bars denote s.e.m
Therefore, we confirmed that the main effect of stimulus pair appears but that this effect occurs relatively independently of the species difference in the gap effect, not as an interaction with species or condition. We also confirmed that a face dominates in the competition between central and peripheral vision in both species. Thus, the absence of stimulus-type effects in Experiment 2 was indeed due to the fact that central and peripheral locations competed equally with each other. Nonetheless, the robustness of the species effect against manipulation of stimulus type is a little surprising given that the effect of a particular stimulus type (e.g. face) on the saccade latency was also robust over the others; further study is necessary to determine whether the manipulation of stimulus type offsets the effect of species and condition. One possible explanation of this robust effect of species is that the species difference involves competition occurring at a relatively lower level of processing (e.g. automatic or well-automated perceptual/cognitive process in Findlay and Walker’s model).

Experiment 4

Experiments 2 and 3 showed that chimpanzees shifted their gaze more rapidly than did humans to the peripheral visual stimuli when competition between central (fixated) and peripheral stimuli was present in their visual fields. Therefore, the visual strategy of chimpanzees may have benefits over that of humans in terms of their speed of scanning. This issue was also pointed out in Experiment 1; chimpanzees fixated each location for less time and thereby scanned a wider area of the visual scene more rapidly than did humans. Experiment 4 set out to show this issue more clearly under a controlled condition by modifying the task used in Experiment 2. We increased the number of local targets in the image (Fig. 6a), and these targets appeared/disappeared successively at random locations on the screen.

Fig. 6 Species differences in object tracking performance. a Sequence from a video presented in Experiment 4 (also see SI for the examples). b Number of fixated images (the number of images on which participants fixated at least once) and total viewing time (ms) for the images (the sum of the durations of fixations on any image). Error bars denote s.e.m

Method

At intervals of 260 ms, small images (approx. 140 × 140 pixels; 3.7 × 3.7° at a typical 60-cm viewing distance) were presented at random locations on the screen (within an area of 1,000 × 750 pixels; 26 × 20° at a typical 60-cm viewing distance; no overlap in location for items) for 700 ms. The presentation lasted for 3 s and consisted of a total of 12 objects. Two stimulus types, face and object, were prepared. The same stimulus type (either face or object) was presented within a trial. The same exemplars as those in Experiments 2 and 3 were used, and a different exemplar was presented at each location within a trial. Each exemplar was randomly selected from the entire exemplar pool. Twelve trials were prepared for each stimulus type (24 trials in total). The presentation order of stimulus types was randomised for each participant. We repeated trials in which the gaze data were lost for longer than 600 ms due to participants’ looking away from the monitor or blinking more than twice after the completion of the whole session. We replaced them with new trials if those were completed satisfactorily; if not, we excluded these trials from the analysis. The data loss totalled 2.7 and 0.0% of all trials for chimpanzees and humans, respectively (no bias for a particular stimulus type). In the successful trials, chimpanzees and humans viewed the monitor for equal durations.

Results and discussion

Under this condition, chimpanzees shifted their gaze sooner (i.e. fixated more briefly) and thereby fixated on more images than did humans (Table 1, Fig. 6b; F 1, 22 = 14.95, P = 0.001, μ2 = 0.40). It was additionally confirmed that chimpanzees viewed the fixated images for longer periods compared with humans (F 1, 22 = 12.55, P = 0.002, μ2 = 0.36). This suggests that, compared with chimpanzees, humans actually spent more time fixating on a grey background. In these results, no significant effect of stimulus type was found (P > 0.05), which is consistent with Experiments 1 and 2.

Fig. 7 Species differences in the frequency distribution of fixations (Experiments 1, 4; a, c, respectively) and saccade latency (Experiment 2; b). The data were pooled for all participants and stimuli. As previously reported for humans, chimpanzees and humans showed a skewed distribution in fixation durations, with the mode between 200 and 300 ms. The distribution of fixation durations was skewed more leftward in chimpanzees than in humans, and fixations over 300 ms were more frequent in humans than in chimpanzees. As shown in Fig. b, the distribution of saccade latency was similar for chimpanzees and humans under the gap condition. Under the overlap condition, however, the distribution was skewed more leftward in chimpanzees than in humans

Fig. 8 Individual and species differences in the timing of gaze movements (Experiments 1, 2 and 4). From a to f, Pearson’s r was 0.64 (P = 0.001), 0.18 (P = 0.39), 0.49 (P = 0.014), 0.66 (P < 0.001), 0.81 (P < 0.001) and 0.83 (P < 0.001), respectively
These results replicated (and combined) the results obtained in Experiments 1 and 2. A somewhat surprising result came from the fact that chimpanzees not only fixated a wider variety of images but also fixated those images for longer periods than did humans. That is, chimpanzees held those images in the fovea for longer periods than did humans.
It should be noted that in Experiments 1–4, the conditions were prepared to minimise the effects of instruction or training on the participants’ control of gaze movements. Therefore, participants were not provided with instructions, and in Experiments 2–4, the location at which a target might appear was not predictable. Furthermore, the gaze movements of participants did not differ significantly throughout the sessions in Experiments 1–4. This issue was examined by dividing the whole session into four blocks; integrating the blocks into the analysis for each dependent variable in Experiments 1–4 showed no effect of block (ps > 0.05).

General discussion

In this study, we directly compared the gaze movements of humans and the closest species, chimpanzees, using an eye-tracking system. During free viewing of a naturalistic scene, chimpanzees made more fixations per second (up to four) than did humans (up to three). This species difference was independent of the semantic variability of the presented scenes (Experiment 1). The gap–overlap paradigm revealed that, rather than resulting from the differential sensitivities to the peripherally presented stimuli per se, the species difference reflected the particular strategy used by each species to solve the rivalry between central (fixated) and peripheral stimuli in their visual fields (Experiment 2). Again, this species difference was independent of the variability of stimulus types (Experiments 2 and 3). Finally, when presented with a movie where small images appeared/disappeared at random positions successively, chimpanzees retained those images at the point of fixation for a longer time than did humans, outperforming humans in their speed of scanning (Experiment 4). Our results demonstrate that chimpanzees and humans differ quantitatively in their visual strategies involving the timing of gaze movement (Figs. 7, 8).
In this study, we were interested in the spontaneous (‘natural’ or ‘habitual’) gaze movement of chimpanzees and humans rather than their ability to control gaze movements. Thus, first, chimpanzees were rewarded independently of their viewing behaviour, i.e., even when they did not view the screen (although chimpanzees spontaneously viewed the screen in most of the trials; see “General method”). Second, in both species, heads were not restrained so that participants were not forced to view the screen. Third, both species received no instruction or training for scene scanning. Additionally, neither had been trained for gaze movements (except fixation training mentioned above) before this study. Fourth, the gaze movements of participants did not differ significantly throughout the sessions, showing no indication of effects of uninstructed training. Finally, the chimpanzees were not in a heightened state of vigilance because chimpanzees showed no fear response to the presented images, even in the first trial, unlike in other well known reports on monkeys (Gothard et al. 2004; Mendelson et al. 1982). Furthermore, chimpanzees were not more likely to be distracted by the environment outside of the monitor than were humans because both species shifted their attention to events occurring on the monitor (as seen in Experiment 4 particularly). Although these results suggest that chimpanzees and humans viewed the stimuli comparably and spontaneously, we cannot thoroughly exclude the possibility that chimpanzees and humans interpreted the current task (free viewing) in a different way. To eliminate this possibility more thoroughly, further studies are necessary utilising a more naturalistic setting (e.g. head-mount eye tracking; (Land et al. 1999)). It would also be worthwhile to train humans to move their gaze more frequently (like chimpanzees) and to observe the effect of training on their overall perceptual and cognitive abilities.
Do the differential gaze strategies of chimpanzees and humans reflect their differential evolutionary strategies? In answering this question, we consider, first, the rearing histories of participants from both species. Our chimpanzee participants have lived in a safe captive environment from a young age, with a secure supply of food. Thus, it is unlikely that differences in the severity of the rearing environment have shaped the differential patterns of gaze movement between the two species. Second, with respect to cultural effects on the pattern of gaze movement, it has been reported that those from Eastern cultures scanned the scenes more broadly and thus somewhat more rapidly than did those from Western cultures (Chua et al. 2005). As our human participants were from an Eastern culture (Japan), it is unlikely that their cultural experience had a critical effect on the difference between the species. Third, as for the effect of task experience on the pattern of gaze movements, it has been reported that video action game players have an improved visual spatial resolution compared with non-players in a task involving discriminating symbols in their peripheral vision (Green and Bavelier 2007). As our chimpanzee participants were highly experienced using computer screens, such an effect of task experience may have emerged more strongly than it would have with other groups of chimpanzees or, possibly, with the human participants (nonetheless, they had never been explicitly trained for scanning the scenes rapidly). However, as our data suggest that chimpanzees and humans differ in their strategies for dealing with the competition between central and peripheral stimuli, but not in their sensitivities to the peripheral stimuli per se, this evidence does not seem critical to our conclusion regarding species difference. However, to more thoroughly control for the effect of task experience, it is necessary to compare chimpanzee groups that differ in their experiences of computer screen use.
At least three lines of research are worth pursuing in future studies, and these relate to the development, mechanism and phylogeny of gaze-movement patterns. First, as mentioned in the Introduction, human infants detach their gaze more slowly from a fixated object in response to peripheral stimuli, as they are in an earlier stage of development, i.e., there is larger gap effect (Hood and Atkinson 1993). In this study, chimpanzees showed an opposite tendency from human infants in this regard. Thus, it might be the case that humans acquired a neotenous pattern through development and evolution. Future studies might valuably compare the development of gaze-movement patterns in both chimpanzees and humans to observe when those two species start to show differences. Second, autistic children are known to have a difficulty in detaching their gaze from fixated objects (Landry and Bryson 2004), a tendency opposite to that of the chimpanzees in this study. It would, therefore, be worthwhile to conduct further studies to clarify the mechanism underlying such differences in the timing of gaze shift. Finally, it has been reported that macaque monkeys scan dynamic scenes more rapidly and broadly than do humans (Berg et al. 2009; Shepherd et al. 2010), similar to chimpanzees. Therefore, we suspect that the strategy involving the rapid scanning of a scene is a relatively common among non-human primates and that humans have acquired a gaze-movement strategy that is divergent from that of their ancestors; further studies are necessary.
In Experiment 4, chimpanzees fixated on each local image better than humans did and thus tracked those images in the fovea for a longer period. There seemed to be at least two functional interpretations for the superiority of chimpanzees’ gaze strategy over that of humans in terms of the speed of scanning. First, quick scanning of the scene may result from chimpanzees’ exploring their species-specific visual environment. For example, chimpanzees habitually live in dense forest where the appearance of conspecific or dangerous animals is less predictable. In addition, chimpanzees live in social groups where the hierarchy among individuals and the competition over foods are occasionally severe (e.g. Pusey et al. 2008). Therefore, the frequent scanning of wider area in a scene may be more beneficial for chimpanzees than for humans. It would be valuable for future studies to test primate species from various ecological and social backgrounds for their patterns of gaze movement.
Second, rather than adaptation to specific environments, the species difference in gaze movements may reflect alternative strategies in which each species copes with its specific limitations with regard to retrieving and processing visual information. First, we ask whether chimpanzees and humans retrieve visual information differently while they are fixating their eyes (i.e. without eye movement) and thereby employ different gaze-movement strategies. There are two possibilities. First, it is possible that humans have higher resolution in their peripheral field than chimpanzees have, and humans thereby prefer to retrieve visual information from the peripheral field and keep their gaze more static. However, this seems unlikely because in Experiments 2 and 3, we observed similar timings of gaze shifts between the species in the gap condition (i.e. the condition that measures the viewer’s sensitivity to the peripheral stimuli per se). In addition, when given the opportunity to freely explore an image, chimpanzees and humans showed a similar pattern of fixation distribution across the scene (Experiment 1; also see (Kano and Tomonaga 2009)). Furthermore, in Inoue and Matsuzawa (2010), chimpanzees and humans were presented with multiple numeric symbols tachistoscopically (for less time than the duration of a typical fixation) in both central and peripheral visual fields and were subsequently required to recognise the numerical order. Chimpanzees and humans showed comparable accuracy on the task (some chimpanzees were even better than human participants). Second, it is possible that humans retrieve visual information more slowly from their fovea than do chimpanzees and therefore need to hold their gaze at the fixating spot for longer. This seems unlikely because within the typical period of human fixation (250–300 ms), less than 100 ms, is spent retrieving the visual information necessary for object/word recognition (Rayner et al. 1981; van Diepen et al. 1998), and the remainder of the fixation time is therefore spent processing the retrieved information.
Thus, it is possible that chimpanzees and humans process the retrieved information differently and thereby employ different gaze-movement strategies. In humans, when engaging in text reading (Rayner 1998) and a visual search task (Gould 1973; Hooge and Erkelens 1999), fixations tend to be longer as the processing load for foveal information becomes more demanding. In addition, when viewing a scene, it has been established that fixations tend to become longer as the fixated object contains more semantic information (De Graef et al. 1990; Henderson et al. 1999). Therefore, we speculate that a prolonged fixation duration in humans reflects the time allocated to higher-level processing (e.g. information integration, categorical processing, language processing). We suggest that due to certain limitations in processing speed (i.e. no greater ability to process visual information in humans than in chimpanzees), humans strategically retrieve a limited amount of visual information by keeping their gaze relatively static and thereby conserve time for information processing. However, although our data are consistent with this hypothesis, in this study, we did not directly examine the way in which chimpanzees and humans process (e.g. recognise and integrate) the retrieved information. This is a limitation of the present study that should be resolved in future studies.
Our hypothesis, which involves the cost–benefit relationship between information retrieval and high-level information processing, is similar to (but not the same as) those proposed by Humphrey (2002) and Matsuzawa (2009) (short-term memory vs. symbolic representation, in their cases). An important message from these hypotheses is that increased brain size in humans may not necessarily have led to a general enhancement of perceptual/cognitive abilities, and certain types of processing interference may appear more strongly in humans than in their closest species, chimpanzees. Further studies are necessary to examine this issue.
In conclusion, this study directly compared the gaze movements of chimpanzees and humans using an eye-tracking system and found that the species differ in the timing of their gaze shift when scanning a visual scene. As gaze movement is known to be a sensitive indicator of cognitive processes and neurological conditions, it would be worthwhile to examine this finding more thoroughly in the future.

Electronic supplementary material

References

  • Berg DJ, Boehnke SE, Marino RA, Munoz DP, Itti L (2009) Free viewing of dynamic stimuli by humans and monkeys. J Vision 9(5):1–15
  • Chua HF, Boland JE, Nisbett RE (2005) Cultural variation in eye movements during scene perception. Proc Natl Acad Sci USA 102(35):12629–12633
  • De Graef P, Christiaens D, d’Ydewalle G (1990) Perceptual effects of scene context on object identification. Psychol Res 52(4):317–329
  • Findlay JM, Walker R (1999) A model of saccade generation based on parallel processing and competitive inhibition. Behav Brain Sci 22(04):661–674
  • Fuchs AF (1967) Saccadic and smooth pursuit eye movements in the monkey. J Physiol 191(3):609–631
  • Gothard KM, Erickson CA, Amaral DG (2004) How do rhesus monkeys (Macaca mulatta) scan faces in a visual paired comparison task? Anim Cogn 7(1):25–36
  • Gould JD (1973) Eye movements during visual search and memory search. J Exp Psychol 98(1):184–195
  • Green CS, Bavelier D (2007) Video-action-game experience alters spatial resolution of vision. Psychol Sci 18(1):88–94
  • Hattori Y, Kano F, Tomonaga M (2010) Differential sensitivity to conspecific and allospecific cues in chimpanzees and humans: a comparative eye-tracking study. Biol Lett 6(5):610–613
  • Henderson JM, Hollingworth A (1999) High-level scene perception. Annu Rev Psychol 50:243–271
  • Henderson JM, Weeks PA Jr, Hollingworth A (1999) The effects of semantic consistency on eye movements during complex scene viewing. J Exp Psychol Hum Percept Perform 25(1):210–228
  • Hirata S, Fuwa K, Sugama K, Kusunoki K, Fujita S (2010) Facial perception of conspecifics: chimpanzees (Pan troglodytes) preferentially attend to proper orientation and open eyes. Anim Cogn 13(5):679–688
  • Hood BM, Atkinson J (1993) Disengaging visual attention in the infant and adult. Infant Behav Dev 16(4):405–422
  • Hooge ITC, Erkelens CJ (1999) Peripheral vision and oculomotor control during visual search. Vision Res 39(8):1567–1575
  • Humphrey N (2002) The deformed transformed. In the mind made flesh: essays from the frontiers of psychology and evolution. Oxford University Press, New York, pp 165–199
  • Inoue S, Matsuzawa T (2007) Working memory of numerals in chimpanzees. Curr Biol 17(23):1004–1005
  • Kano F, Tomonaga M (2009) How chimpanzees look at pictures: a comparative eye-tracking study. Proc Roy Soc B 276(1664):1949–1955
  • Kano F, Tomonaga M (2010) Face scanning in chimpanzees and humans: continuity and discontinuity. Anim Behav 79:227–235
  • Kano F, Tomonaga M (2011) Perceptual mechanism underlying gaze guidance in chimpanzees and humans. Anim Cogn 14(3):377–386.
  • Land M, Mennie N, Rusted J (1999) The roles of vision and eye movements in the control of activities of daily living. Perception 28(11):1311–1328
  • Landry R, Bryson SE (2004) Impaired disengagement of attention in young children with autism. J Child Psychol Psychiat 45(6):1115–1122
  • Matsuno T, Kawai N, Matsuzawa T (2004) Color classification by chimpanzees (Pan troglodytes) in a matching-to-sample task. Behav Brain Res 148(1–2):157–165
  • Matsuzawa T (1985) Colour naming and classification in a chimpanzee (Pan troglodytes). J Hum Evol 14(3):283–291
  • Matsuzawa T (1990) Form perception and visual acuity in a chimpanzee. Folia Primatol 55(1):24–32
  • Matsuzawa T (2009) Symbolic representation of number in chimpanzees. Curr Opin Neurobiol 19(1):92–98
  • Matsuzawa T, Tomonaga M, Tanaka M (2006) Cognitive development in chimpanzees. Springer, Tokyo
  • Mendelson MJ, Haith MM, Goldmanrakic PS (1982) Face scanning and responsiveness to social cues in infant rhesus monkeys. Dev Psychol 18(2):222–228
  • Parr LA, Dove T, Hopkins WD (1998) Why faces may be special: evidence of the inversion effect in chimpanzees. J Cogn Neurosci 10(5):615–622
  • Parr LA, Hecht E, Barks SK, Preuss TM, Votaw JR (2009) Face processing in the chimpanzee brain. Curr Biol 19(1):50–53
  • Pusey A, Murray C, Wallauer W, Wilson M, Wroblewski E, Goodall J (2008) Severe aggression among female Pan troglodytes schweinfurthii at Gombe National Park, Tanzania. Int J Primatol 29(4):949–973
  • Rayner K (1998) Eye movements in reading and information processing: 20?years of research. Psychol Bullet 124:372–422
  • Rayner K, Inhoff AW, Morrison RE, Slowiaczek ML, Bertera JH (1981) Masking of foveal and parafoveal vision during eye fixations in reading. J Exp Psychol Hum Percept Perform 7(1):167–179
  • Shepherd SV, Steckenfinger SA, Hasson U, Ghazanfar AA (2010) Human–monkey gaze correlations reveal convergent and divergent patterns of movie viewing. Curr Biol 20(7):649–656
  • Tomonaga M (2007) Visual search for orientation of faces by a chimpanzee (Pan troglodytes): face-specific upright superiority and the role of facial configural properties. Primates 48(1):1–12
  • Tomonaga M, Imura T (2009) Faces capture the visuospatial attention of chimpanzees (Pan troglodytes): evidence from a cueing experiment. Front Zool 6(1):14
  • van Diepen PMJ, Wampers M, d’Ydewalle G (1998) Functional division of the visual field: moving masks and moving windows. In: Underwood GDM (ed) Eye guidance in reading and scene perception. Elsevier, Oxford, pp 337–355
Article Information
Kano, F., & Tomonaga, M.(2011)Species difference in the timing of gaze movement between chimpanzees and humans. Animal Cognition , 14, 879-892. 10.1007/s10071-011-0422-5