Bottom     Previous     Contents

Introduction

A. The problem

The human ear has a remarkable ability to assign components of a complex waveform to corresponding sound sources even when several sources are active simultaneously. For example, we are able effortlessly to discriminate and separate human speech from simultaneous background interference such as that produced by cars, other people, etc. This has been described as being analogous to being able to look at motion of the waves entering a harbour and then describe the actions of boats out in the ocean (Darwin and Carlyon, 1995). This visual analogy capably shows how complex the auditory equivalent can be.

It is widely assumed that the peripheral auditory system splits the input waveform into a large number of components on the basis of frequency. The levels of the individual components will alter over time. Grouping is the mechanism whereby the components (or portions of the components) are assigned to a particular sound source. When considering source segregation, the difference between simultaneous and sequential processing must be made clear (sequential segregation is also referred to as stream segregation). Stream segregation is the mechanism by which sources are tracked through time. This means that the ear must attempt to group the components of each source, even though the components are not necessarily present at the same time. For instance, a violin playing a series of notes that are widely separated in frequency can appear to separate into two or more separate melodies. Such a phenomenon was used to great effect by composers such as Vivaldi. Workers such as Bregman (1978, 1990) and his co-workers (Dannenbring and Bregman, 1976, 1978; Bregman and Campbell, 1971) have focussed on stream segregation. Experiments in this field are often very direct and ask the subject to report the number of sources perceived, for example. These types of experimental paradigms will not be discussed further.

Simultaneous segregation is the mechanism by which components, presented at the same time, are grouped into a number of sources. For instance, components which are at harmonic frequencies of a fundamental are likely to have come from the same source and so will be grouped together. Another cue for grouping might be correlated amplitude changes over time. Even though such a cue has an across-time aspect, it is still a valid cue for simultaneous segregation as the components are presented simultaneously. Experiments investigating simultaneous segregation are generally much less direct than stream segregation experiments. The subjects are usually not asked about the perceived number of sources. Workers such as Hall, Wright, Fantini, Schooneveldt and their co-workers (e.g. Hall et al., 1984; Wright and McFadden, 1986; Wright, 1990; Schooneveldt and Moore, 1987, 1989a, 1989b; Fantini and Moore, 1992, 1994) have concentrated on simultaneous masking experiments. Even though many of these are essentially masking experiments, the results have been described with reference to sound source separation. This thesis will concentrate on such experimental work.

B. The concept of a channel

To understand how the auditory system groups components, it is necessary to discuss the basis of these components. How is the complex input waveform split into a series of components? A simple mechanism would be to separate the input into a number of frequency channels. The peripheral auditory system is traditionally viewed as containing a bank of quasi-linear overlapping bandpass filters which would perform such a separation. The output of each of the filters can be regarded as a separate channel. It is often assumed that, when detecting a sinusoidal signal in noise, the single filter with the highest signal-to-noise ratio (SNR) is attended to. When a criterion SNR is exceeded then detection takes place (In reality the criterion SNR is not a single fixed value. Signal Detection Theory (Green and Swets, 1974) is concerned with the nature of the detection process). These assumptions are collectively known as the 'Power Spectrum' model (Fletcher, 1940).

The assumptions that constitute the Power Spectrum model seem to be reasonable in many situations, but there are some occasions when the model breaks down. It is assumed that the filters are linear, i.e. they obey the conditions of superposition and homogeneity. This assumption is not valid, as the filters change shape with change of input level (Lutfi and Patterson, 1984; Moore and Glasberg, 1987). This gives rise to the 'upward spread of masking' whereby the masking produced by a low-frequency tone on a higher-frequency signal increases at a greater rate than the level of the low-frequency tone. It is assumed that the auditory filter attended to by the observer is tuned close to the signal frequency. There are however, many situations in which the filter with the optimal SNR has a centre frequency (CF) which is different from the signal frequency. This is known as off-frequency listening (Johnson-Davies and Patterson, 1979; Patterson, 1976; Patterson and Moore, 1986). Many experiments that require (and assume) the use of a single filter employ stimuli designed to counter the effects of off-frequency listening, such as notched noise (Patterson and Nimmo-Smith, 1980). Non-linear distortion products produced by the peripheral auditory system, such as cubic difference tones (Greenwood, 1971) and physical interactions such as beats, can provide other cues for detection and so alter the effective SNR.

Correlated temporal fluctuations across different frequency regions can also alter the SNR required for detection. This has been shown with such paradigms as comodulation masking release (CMR, after Hall et al. 1984) and comodulation detection differences (CDD, after McFadden, 1987) which will discussed further later. CMR experiments demonstrate that in signal detection tasks, we are able to attend to more than one filter at once. Indeed, it would be unprofitable if we could not, as much useful information would be ignored.

While the assumptions made in the power-spectrum model have been shown to be invalid on occasion, this does not necessarily mean that the model should be discarded. In particular, the concept that the ear contains an array of bandpass filters is useful and is widely accepted. Rather, these results bring to our attention where potential pitfalls lie so that we may plan ahead with ways to counter such problems. One must be aware of how the chosen stimuli and the peripheral auditory system may interact to produce misleading results.

C. Within- vs across-channel effects.

The output of each auditory filter is assumed to be a separate channel. If the Power Spectrum Model is valid, then the output of one filter is responsible for detection of a given signal. The signal could be detected by a variety of methods. For example, the mean level could increase when the signal is present. Even though the auditory filters are a theoretical concept, the frequency resolving capability of the Basilar Membrane (BM) is beyond doubt. It is known that a series of hair cells contact the BM and are sensitive to shearing motion. If the BM is displaced towards the tectorial membrane then the hair cells' motion results in a firing of the neurones attached to the hair cells. Detection could depend on an increased firing rate. Additionally, at frequencies below 4-5 kHz, the neural spikes will be approximately phase-locked to the dominant frequency at the corresponding place on the BM. An increase in SNR can therefore increase the phase-locking to the signal which could be used as a cue for detection. The firing pattern of one neurone can be likened to the output of a physical filter. If only one filter is used, this can be described as within-channel processing.

It is also possible, as shown by CMR studies amongst others, for the outputs of auditory filters to be compared. This can mean that even if there is not a sufficiently large cue for detection in the channel corresponding to the signal frequency, the across-channel comparison can be used for detection. In many circumstances, both across- and within-channel effects play a role in signal detection.

D. Comodulation Masking Release

Hall et al. (1984) showed that we can combine information across auditory filters. They used an extension of Fletcher's classic band-widening experiment, where a sinusoidal signal is masked by a bandpass random (Gaussian) noise centred at the signal frequency. In the original experiment, as the band of noise was widened, keeping the spectrum level constant, the threshold increased as more noise passed through the filter, up to a certain point (Figure 1.1, dashed line). However, beyond a certain width (known as the Critical BandWidth, CBW), the threshold remained constant. Fletcher used this finding as the basis for the power-spectrum model. Hall et al. introduced an extra condition in which they multiplied the gaussian noise by a very low-frequency narrow band noise, so as to produce a noise with slow random amplitude fluctuations that were the same in different frequency regions. This comodulation across frequencies produced a decrease in the masked threshold as the band of noise was widened (Figure 1.1, solid line). This was termed Comodulation Masking Release (CMR). CMR is not predicted by the power-spectrum model as according to that model energy far outside the critical bandwidth should not affect the threshold. This has been interpreted as an 'across-channel' effect.

Figure 1.1. Detection thresholds for a 1 kHz signal centred in either random noise or noise amplitude modulated at a slow, irregular rate, as a function of the noise bandwidth. Masker spectrum level is held constant (From Hall et al. 1984)

Since then, most work with CMR has used a narrow band noise centred at the signal frequency (the on-frequency band, OFB) with a constant bandwidth and one or more flanking narrow bands of noise centred at different frequencies. The flanking bands can have their envelopes individually altered. The flanking bands can have envelopes that are correlated with that of the OFB or the envelopes can be independent. Using narrow bands of noise permits much greater manipulation of the stimuli. In a band-widening experiment increasing the bandwidth of the masking noise can only add correlated modulation to the filters centred at frequencies remote from the signal frequency. In noise-band experiments, remote filters can be presented with correlated or uncorrelated modulation very easily.

In noise-band CMR experiments, there are three simple conditions; no flanking bands (R or reference condition), flanking bands correlated with OFB (C or correlated condition) and flanking bands uncorrelated with OFB (U or uncorrelated condition). The CMR can be defined as either the difference in threshold between the R and C conditions or the difference in threshold between the U and C conditions. These are denoted as CMR(R-C) and CMR(U-C) respectively. The CMR calculated is similar in both cases, though CMR(U-C) tends to be slightly larger than CMR(R-C). This appears to reflect an across-channel interference effect, whereby uncorrelated flanking bands raise threshold (Schooneveldt and Moore, 1987).

There are two common methods of producing such narrow band noises. A low pass noise can be multiplied by a sinusoid at the centre frequency of the desired band or a true gaussian narrow band noise can be constructed (e.g. by adding together many closely spaced sinusoids with random phases and Rayleigh distributed amplitudes). Narrow band noises have inherent slow random amplitude fluctuations and so do not need to be modulated, unlike the noise used by Hall et al. (1984). Gaussian noise and multiplied noise differ both in their fine structure and in their envelope statistics (Rice, 1954; Schooneveldt and Moore, 1989a). Gaussian noise tends to produce slightly larger CMRs than 'multiplied' noise. This happens because thresholds in the OFB alone condition (R) are lower in multiplied noise (Moore et al., 1990b). The envelope of multiplied noise tends to have more zero crossings and is thus more 'peaky'. When the signal is added, the number of envelope zero crossings is reduced as the troughs tend to be filled in by the signal. Gaussian noise does not have as many envelope zero crossings, hence when the signal is added there is a smaller change in the envelope statistics than with multiplied noise. The smoothing caused by the signal can be used as a cue for signal detection. This cue is more prominent for multiplied noise. Such a cue gives rise to a within-channel effect. Because thresholds are lower in the "reference" condition (OFB alone), the CMR (R-C) is smaller when multiplied noise is used.

Using separate bands of noise allows selected frequency regions to be tested and means that the temporal relationship between the bands can be altered. It is found that if the flanking (or cue) bands are comodulated (either randomly or sinusoidally) with the on-frequency band (also known as the masker band), then signal thresholds decrease relative to the condition where there are no flanking bands present. If the flanking bands have a different envelope from the masker band, then thresholds do not decrease. Indeed, they may increase due to across-channel interference effects (Schooneveldt and Moore, 1987), as noted earlier.

One explanation for the results of CMR-type experiments is described in terms of component grouping caused by correlated amplitude changes. In 1963, von Békésy reported that two tones presented to opposite ears were fused when common amplitude modulation, with a frequency of between 5 and 50 Hz, was applied to the two components. As the tones were presented in opposite ears, there could be no within-channel processing. Bregman et al. (1985) reported that the pitch discrimination of the upper tone in a two-tone complex was more difficult when the tones were modulated in-phase (0° phase condition) than when they were modulated in opposite phase (180° phase condition). This was taken as evidence of an across-channel process grouping the two tones. It is possible however that the results of Bregman et al. were mediated by a within-channel cue. The two tones were presented in quiet (unlike the experiments of Strickland et al., 1989 and Yost et al., 1989) which meant that the outputs of filters with CFs between those of the two tones would show a pattern of beating. The output of a simulated auditory filter centred between the two tones is alternately dominated by the two components in the 180° phase condition. Therefore it would be possible for the listener to attend to those half-cycles of modulation when the response of a filter with a CF between the two tones was dominated by the target component (Darwin and Carlyon, 1995). This is not the case if the tones are modulated coherently. In a CMR-type task, it is therefore possible that if the flanking and masking bands have correlated modulation, then they are grouped together and heard as one sound source. This is reasonable behaviour as the components of a complex sound tend to alter in amplitude in a correlated way. The onsets are roughly simultaneous as are the offsets. If a sound source suddenly increases in overall intensity, then it is likely that all the components of the complex sound will also increase in intensity. If the flanking bands and the on-frequency band are correlated and the signal is not present, the output of a filter centred on the signal frequency would have the same modulation pattern as the output of a filter centred on one of the flanking bands. The channels corresponding to the outputs of those filters would therefore be grouped together. When the signal is presented, the modulation pattern of the filter centred on the signal frequency would have the low level troughs 'filled' which would introduce a degree of decorrelation. This would indicate the presence of the signal. If the bands are not correlated, then the on-frequency and flanking bands will not be grouped whether or not the signal is present. Thus when the flanking band is remote from the signal frequency, there should be little difference between conditions in which the flanking bands have a different envelope from the masker band and conditions in which there are no flanking bands present, as in both cases the signal channel will not be grouped with anything else. This is generally the case, except for the small interference effects discussed above.

There are many factors that can influence the magnitude of CMR. CMR does not vary much with signal frequency over a wide range (Schooneveldt and Moore, 1989b) and it can even be demonstrated when the flanking bands are presented in the opposite ear to the masker band and signal (Cohen and Schubert, 1987b; Schooneveldt and Moore, 1987). CMRs are sometimes greater for flanking bands that are close to the signal frequency (Hall et al, 1990), although this effect may be partly due to within-channel cues (Schooneveldt and Moore, 1987). A masking release can be seen when the masker bandwidth is less than the bandwidth of the auditory filter (in the case of band-widening) or when the masker and cue bands fall in the same CB (in the case of separate bands (Schooneveldt and Moore, 1987, 1989a; Moore et al. 1992)). The masking release in such cases does not arise from comparisons of the outputs of an array of filters and so is not a 'true' CMR as it is not an across-channel effect. Figure 1.1 shows that, even at the critical bandwidth, the threshold is lower for the multiplied noise than for the random noise. There have been a few explanations advanced to account for this. It has been noted that adding the signal to the masker tends to fill in the dips in the modulated envelope and so the envelope at the filter output will be flatter (Schooneveldt and Moore, 1989b). The change in modulation depth at the output of the filter could be used as a detection cue. Another possible within-channel cue (when flanking bands are present) is the pattern of phase locking in neurons tuned close to the signal frequency. When the masker is at a peak, the output will be predominantly phase locked to the masker. When the masker is close to a minimum, the neurones will phase lock partly to the signal. If a flanking band is present within the same CB as the masker band, the two bands will interact to give a waveform with a centre frequency corresponding to the mean frequency of the two bands and thus, a different frequency to the signal. Note that this is not true for symmetrically placed flanking bands or wideband noises and so cannot explain the results of Hall et al. (1984). Such within-channel cues mean that it is possible to overestimate the 'true' CMR. As there are no such interactions with dichotically presented stimuli, it has been suggested that such a paradigm may give the 'true' magnitude of a CMR.

E. Models to account for CMR

When attempting to describe the mechanisms behind CMR, it is necessary to move away from the vague explanations in terms of grouping. While grouping explanations may adequately predict the general form of the results, their essential 'black box' nature cannot be taken as a rigorous explanation. Therefore, quantifiable mechanisms that aren't directly related to grouping must be sought.

One class of theory to account for CMR assumes that the auditory system compares the envelopes of the outputs of different auditory filters (Hall et al., 1984; Buus, 1985; Schooneveldt and Moore, 1987). The outputs of filters centred away from the signal frequency provide a standard modulation pattern to compare to the output of the filter centred at the signal frequency. A difference in modulation pattern indicates the presence of the signal. An alternative 'dip-listening' model assumes that the flanking bands indicate the best times to listen for the signal at the output of the filter tuned to the signal frequency, i.e. during the dips in the masker envelope.

Buus (1985) has proposed a specific mechanism for the detection of across filter disparities. This is similar to Durlach's (1963) 'Equalization-Cancellation' (EC) model for detecting signals and maskers that come from different spatial locations. The auditory system is assumed to 'equalize' the envelopes of the outputs of filters tuned away from the signal frequency to the output of the filter at the signal frequency when the signal is not present. This obviously requires the signal to be intermittent. The processed envelope is then subtracted from the envelope of the output of the auditory filter centred at the signal frequency, so that if there is no signal present, there is no output. Any remaining output therefore indicates the presence of the signal.

Richards (1987) proposed another model, which assumes that the envelopes are extracted from the filter outputs. The degree of correlation between the envelope at the signal frequency and the envelope at the flanking frequency is then determined. Low correlation indicates the presence of a signal. Moore and Emmerich (1990) have argued that CMR and envelope correlation discrimination are probably based upon different mechanisms. They noted that a decrease in noise bandwidth produces larger CMRs (due to slower amplitude fluctuations that are easier to follow) but envelope correlation discrimination becomes harder.

The results of much work (Hall and Grose, 1988; Moore, 1990; Moore and Schooneveldt, 1990) suggest that several different cues may be used to detect a signal in a comodulated masker. Grose and Hall (1989) used a sinusoidal amplitude-modulated (SAM) masker as this allows the modulation depth to be altered. They showed that some listeners appeared to be using a dip-listening strategy, whereas other listeners appeared to be using an EC mechanism. Grose and Hall used a masker band that had a fixed modulation depth of 63%. The flanking bands could have a modulation depth of either 63% or 100%. A dip-listening model predicts that performance would be better for flankers that are 100% modulated (as they would provide clearer cues as to when to listen for the signal). An EC model would predict better results for flankers that are modulated identically to the masker (i.e. 63% depth). The subjects tended to split into two groups; one group performed better with 63% depth and the other group performed better with 100% modulation depth.

Many workers have tested the relative contributions of energy at masker envelope maxima and minima (Grose and Hall, 1989; Moore et al., 1990a; Hall and Grose, 1988,1991). Grose and Hall (replicated by Moore et al., 1990a) added the signal selectively to either the masker peaks or dips, i.e. the signal consisted of a number of tone bursts. The masker consisted of a 100% Sinusoidally Amplitude Modulated (SAM) tone, to which the signal was added in phase. It was demonstrated that no CMR was observed if the signal was added only at masker peaks, but very large CMRs occurred when the signal was added at masker dips. This supports the dip-listening model as it shows that information in the dips is more important.

Moore et al. (1990a) showed that adding the signal out of phase with the carrier so as to decrease the level when it was presented did not produce a CMR whether it was presented in either the peak region or the dip region. Such changes produce significant across-channel decorrelation and so cannot be explained by an EC or correlation comparison model. Unfortunately, they are also difficult to reconcile with a dip-listening model, even if one assumes that subjects are predisposed to listen for an increase in level rather than a decrease. If subjects do listen for increases, they would effectively detect a signal in the non-signal interval (a 2 alternative forced choice technique was used). Moore et al. provided feedback and so it was hoped that subjects would be able to resolve this problem. It appears that across-frequency comparisons only improve performance when the level in the on-frequency band is momentarily greater than that in the flanking bands. This makes sense, as subtractive conditions do not occur in real-life and so it may be assumed that the auditory system ignores such cues.

When an in-phase signal is presented only at the peak region of the masker envelope, there is an increase in overall level. No CMR is seen in this condition, as previously discussed. This is surprising in view of the results of a study by Hall and Grose (1988). They showed that large CMRs can be produced by adding the masker to itself as a signal which would also increase overall level. However, this sort of experiment should be considered as a special case and is similar to Profile Analysis experiments (where a pure tone is added to itself as a signal amongst other background tones in a complex sound) (Green, 1988). Envelope subtraction without prior equalization could account for such a CMR. A dip-listening model cannot account for such a result as the SNR is constant across both the peak and dip sections of the envelope at the signal frequency.

It is possible that CMR involves the same mechanisms that allow us to hear out two separate sources whose components fall in overlapping frequency ranges. Hall and Grose (1990) measured CMR in the presence of six comodulated 20-Hz wide flanking bands. The large CMR obtained (15 dB) was reduced to roughly 3 dB when two 'deviant' flanking bands (uncorrelated with the other bands) were added on either side of the signal frequency. However, the CMR was partly restored when more co-deviant flanking bands (i.e. correlated with each other, but not with the on-frequency band) were added. The CMR was not restored when independent 'multideviant' bands (i.e. all uncorrelated with each other and the on-frequency band) were added. The deviant bands were interspersed with the correlated bands. When all the deviant bands were correlated with each other, they were effectively heard out as a separate source, even though they were interspersed in frequency with the background bands. CMR could therefore be due to the partitioning of filter output waveforms between the masker and signal sources, in varying proportions. The results of Hall and Grose (1988) described previously do not correspond with this view of CMR, as the signal can not really be considered to be coming from a separate source.

In conclusion, the results of a large number of experiments show that there is no single mechanism that alone can account for CMR.

F. Comodulation Detection Differences

Another experimental paradigm is based on a slightly different approach; the Comodulation Detection Difference (CDD), a term which was coined by McFadden (1987). In CMR, the signal is usually a sinusoid masked by an on-frequency narrow band noise. In CDD, the signal to be detected is itself a narrow band noise and there is no on-frequency masker. Again, there are flanking bands present that can be comodulated with the signal band. As CDD was originally an extension of CMR and the conditions for each are similar, this makes for confusing terminology. Some workers (McFadden, 1987) refer to the masking narrow band noises as 'cue' bands. I will, however, refer to the 'signal' band and 'masking' or 'flanking' bands, as my experiments are much more of a pure masking nature. The interesting difference between CMR and CDD is that, whereas in CMR, comodulated flanking bands decrease the signal threshold relative to uncorrelated flanking bands, in CDD such comodulation increases the signal threshold. This is not a surprising difference if we assume that object segregation mechanisms are involved, as both types of experiments study the 'capture' of information at a certain frequency region by similarly modulated excitation at another. In CMR, the on-frequency band is captured by the comodulated flanking bands and so is, in effect, less able to mask the signal. In CDD, the signal band is captured by the comodulated masker bands and so is not as easy to detect.

As described earlier, there has been much discussion about the reference condition to choose in CMR experiments. Some workers have taken the CMR to be the difference between thresholds with comodulated flankers and non-comodulated flankers. Others have taken the CMR to be the difference between thresholds with comodulated flankers and with no flankers (i.e. masker band only). These two measures of CMR often differ, as the non-comodulated (or uncorrelated) flanking bands often produce an increase in threshold relative to that produced by the masker band alone. The reference condition is much more clear-cut in CDD experiments. A condition with no maskers will give thresholds at absolute threshold, by definition. The CDD is therefore defined as the threshold with uncorrelated maskers subtracted from the threshold with comodulated (or correlated) maskers. Some workers have done this subtraction the other way round to give negative CDDs, in order to reinforce the difference between CMR and CDD experiments. In many CDD experiments, the masking bands are always correlated with each other. Thus when the signal band is uncorrelated with the maskers, the maskers are referred to as 'co-uncorrelated'.

There is one major difference between the stimuli used in studies of CDD and those used in CMR. In studies of CMR, the masker and flanking bands are usually at about the same level. Indeed, if the levels differ too much, the CMR decreases (McFadden, 1986). A 15- to 20-dB difference is sufficient to reduce CMR to zero (Hall, 1986). Moore and Shailer (1991) showed that with dichotic presentation of the stimuli (i.e. flanking bands presented to the opposite ear to the masker and signal), altering the level of the on-frequency masking band while holding the level of the flanking bands constant had virtually no effect on the CMR. However, if the masking band was held at a constant level and the level of the flanking bands was altered, there was a significant effect. Unfortunately, they did not carry out the experiment monaurally with the flanking bands constant and with the on-frequency band varying. CDDs cannot be measured effectively dichotically as contralateral (across-ears) masking is very small. In CDD experiments, the signal band (equivalent to the on-frequency band in CMR) is, by definition, close to masked threshold and is usually much lower in level than the flanking bands. This may mean that, for monaural presentation of the stimuli, similar conditions would produce no CMR. It is not clear how to interpret this discrepancy.

There has been little published work on CDDs compared with the amount published covering various aspects of CMR. McFadden (1987) and Cohen and Schubert (1987b) independently observed CDDs. Cohen and Schubert (1987b) used a very simple paradigm. They used two 100­Hz wide bands of noise each of which was created by multiplying a low-pass noise by a sinusoid. The low-pass noise was either the same for the two noise bands, resulting in correlated temporal envelopes, or different, resulting in independent envelopes. One band was presented as a continuous masker with a centre frequency of 1000 Hz and overall level of 73 dB SPL. The other band was used as a signal over a range of frequencies from 200 to 6000 Hz. The results are shown in figure 1.2.

Figure 1.2. Difference between detection thresholds of correlated and independent signals as a function of signal centre frequency. The masker was fixed at 1000 Hz. (From Cohen and Schubert, 1987b)

A significant CDD is only seen when the signal is centred at a higher frequency than the masker. Cohen and Schubert (1987b) advanced a theory based on the results of Schroeder and Mehrgardt (1982) as a possible explanation of their data. Schroeder and Mehrgardt (1982) showed that the detection threshold of an 800-Hz sinusoid masked by a highpass harmonic complex is lowered by as much as 30 dB when harmonics below the signal frequency are added to the stimulus, but only if those harmonics are in cosine phase with the rest of the complex. The majority of that drop is due to the within-channel effect of the adding of sinusoids in cosine phase giving a 'peaky' waveform. This gives lower level intervals between peaks ('time windows') in which the signal is easier to detect. This is basically a within-channel dip listening explanation, which Cohen and Schubert adopted to explain their results. They reasoned that when the envelopes of the signal and masker bands are independent, there will be times when the masker level is low and the signal level is high. Therefore detection with be easier than when the masker and signal envelopes are correlated. This implies that the CDD is due to a facilitation in detection when the noise bands are uncorrelated. It was not explained how the masker and signal were distinguished in the single channel; one assumes that the difference in phase locking between the two frequencies might be used. Cohen and Schubert (1987b) challenged other workers to formulate an explanation.

McFadden (1987) used flanking bands which varied in frequency, with a signal fixed at 2500 Hz. It was found that a single flanking band below the signal frequency gave a slightly higher CDD than when the flanking band was above. This was a very small asymmetry compared with that apparent in the results of Cohen and Schubert (1987b). If multiple flanking bands were used, there was little change in the size of the CDD if the levels of the flanking bands were 'scrambled' rather than being identical. In the 'identical' condition, all the cue bands were presented at roughly 70 dB overall. In the 'scrambled' condition, one band was presented at roughly 70 dB overall and the other band(s) were either 2, 4, 6 or 8 dB lower in level. The 8-dB range was chosen as McFadden (1986) observed that a larger difference in level abolished CMR. McFadden noted that the CDD generally increased by a small amount as the number of flanking bands increased. McFadden (1987) did not attempt to explain the difference between the thresholds obtained with correlated and uncorrelated bands, though he suggested that Richards' (1986) decorrelation explanation of CMR may be applicable.

The increase in threshold observed by McFadden (1987) for both the correlated and uncorrelated conditions when he changed the number of flanking bands from one to two was much larger than expected. If the threshold of the signal were determined simply by the power of the masker at the output of the auditory filter centred at the signal frequency, one would predict a 3 dB increase in threshold (i.e. double the masking) when two flanking bands were used instead of one. Instead the increase in threshold reported by McFadden was about 15 dB. This was probably due to performance being 'too good' with only one band due to extra detection strategies such as off-frequency listening (Patterson and Nimmo-Smith, 1980). McFadden (1987) also discussed the possibility of combination bands. Combination bands are bands that are not present in the original stimulus that are produced due to a non-linear process in the cochlea (Greenwood, 1971, 1972a,b). The cubic-difference combination band or tone (2f1 - f2 where f1 and f2 are the lower and upper frequency source bands respectively) is the most dominant and has been shown to be audible even at moderate levels (Plomp, 1965; Goldstein, 1967). Such combination bands would be present at very different frequency regions from the signal and masker bands (e.g. with a 1.5 kHz masker and a 2.5 kHz signal, the cubic-difference band would be present at 500 Hz) and so would provide an extra cue for detection. It is perhaps surprising that with such large variations in threshold with the number of bands, the relative values of the thresholds for correlated and uncorrelated bands (i.e. the CDD) alter by such a small amount, however this is indeed the case.

McFadden (1987) used flanking bands that were mutually correlated, even if the signal was uncorrelated. Wright and McFadden (1986) explored the effects of varying the correlation across five flanking bands. They found that if no bands were correlated, performance was essentially the same as when all bands including the signal were correlated (Similar results were found by Wright, 1990). This could be explained by proposing that if all the bands have distinct envelopes, then the auditory system segregates the bands into six separate perceived sources. Under such conditions, it may be difficult to detect any one source.

As mentioned previously, CMR can be produced by adding the on-frequency band to itself as a signal (Hall and Grose, 1988). This task is very similar to profile analysis, as the task is to detect an increment in level of a single component or band relative to the flanking bands. As there is a difference in performance between conditions in which the flanking bands are correlated and uncorrelated with the signal band, this means that detection is not based on the long-term spectrum, as this would be the same for both types of flanking bands. Increasing the noise bandwidth increases the rate of fluctuations, which are then followed less well. Fantini and Moore (1992) showed that for bandwidths between 4 and 16 Hz, the just-detectable increment in the signal band is less when correlated flanking bands are present, than when the flanking bands are uncorrelated or absent. For bandwidths of 32 and 64 Hz, the presence of flanking bands improved performance irrespective of whether they were correlated or not. Fantini and Moore (1992) randomised the overall level to reduce the effects of within-channel cues. In CMR studies such as that of Hall and Grose (1988), the overall level remains constant, whereas in profile analysis the overall level of the whole stimulus is randomly varied, so that subjects cannot use the within-channel cue of a change in level. Fantini and Moore (1994) investigated profile analysis conditions when the overall level was held constant, in an attempt to discover whether across-channel comparisons are only used if the within-channel cue of a change in level of the target band is unreliable. Performance was very similar between the constant-level condition and the roving-level condition. This is consistent with the auditory system using across-channel comparisons, even when it is not forced to do so by inappropriate within-channel cues. There were two uncorrelated conditions; co­uncorrelated in which all the flanking bands were correlated with each other and all-uncorrelated in which all the bands were modulated independently of each other. Performance was much better when all the bands were correlated than when they were either all-uncorrelated or co-uncorrelated. Performance was independent of the bandwidth of the noises in the correlated condition. Performance in both the uncorrelated conditions improved with increasing bandwidth. This is consistent with an across-channel view, as in the correlated condition the comparison of level across channels will provide a reliable cue even though the instantaneous levels may vary due to modulation or roving of level. In other words, spectral-shape information is present in both the short-term and long-term spectra of the stimuli. In the uncorrelated conditions, the envelope disparity between the signal and masker bands means that instantaneous relative levels in the signal and masker channels will not be sufficient to detect the increment in relative level of the signal band. The slow fluctuations of the masker and signal cause stimulus uncertainty. As the noise bandwidth increases, the fluctuations are followed less well due to the limited temporal resolution of the ear. This means that the misleading short-term spectral-shape information due to uncorrelated modulation will be much reduced. The long term spectral differences of overall noise band level then become much easier to detect.

Such CMR and noise-band profile analysis experiments are similar to CDD tasks, as in CDD the task could be viewed as that of detecting an increment in level of the signal from a sub-threshold value (or absent) to a higher value. However, in CDD experiments performance is degraded if the flanking bands are correlated. Fantini and Moore (1994) considered various potential explanations for the differences between the two paradigms. As described earlier, perceptual grouping gives a possible explanation for both CDD and CMR. In CMR experiments, an on-frequency masker may be grouped with correlated flanking bands, thus effectively leaving the signal as a separate source and hence making it easier to detect than when the flanking bands are uncorrelated. In CDD experiments, the signal may be grouped with maskers that are correlated with it, thus making it harder to detect the signal relative to the case when the signal band is uncorrelated with the flanking bands.

Another possible explanation of differences between CMR and CDD comes from consideration of within-channel cues. Simulated auditory filters provide a way to assess the role of within-channel cues in CDD, by allowing inspection of the output of a single filter. Using a gammatone filter with an Equivalent Rectangular Bandwidth (ERB) of 230 Hz, Fantini and Moore (1994) studied the output in a CDD-type paradigm. The signal band was fixed at a level 30 dB lower than the flanking bands as this gave the approximate relative levels of the bands that were presented at threshold in their experiment. They concluded that if the flanking bands were co-uncorrelated (i.e. they were correlated with each other), then a within-channel masking explanation based on differences in fine structure could account for the improvement in performance relative to correlated condition. They found that the signal-to-masker ratio altered in the co­uncorrelated condition, which gave rise to alternations between a complex fine structure (when the masker was dominant) and a quasi-sinusoidal fine structure (when the signal was dominant). For the correlated condition or where the flanking bands were presented alone, the waveform was complex throughout. This cue is similar to that described by Schooneveldt and Moore (1987) for a CMR task. The gammatone filtering method is only a rough approximation to the filtering on the BM and does not take into account the 'Upward Spread Of Masking' or the possibility that the instantaneous short-term structure of the flanking bands has an effect on filter width. The within-channel fine structure theory cannot be easily extended to produce quantitative results. I will discuss more detailed simulations later in this thesis.

Wright (1990) also considered the possibility that within-channel effects may account for CDD. However, she concluded that within-channel processes could not account for the results entirely, as decreasing the spacing between the flanking bands and the signal did not increase the CDD consistently (Cohen and Schubert, 1987b; McFadden, 1987) as it appears to do in the case of CMR (Schooneveldt and Moore, 1987). This argument does not take within-channel interactions such as beats into account. Also, it is also not obvious why the CDD should increase consistently; the thresholds in correlated and uncorrelated conditions are expected to increase with decreased spacing, but the difference between them may not.

In CMR tasks, the on-frequency band is responsible for most of the masking of the signal and the flanking bands mainly provide modulation information. In CDD tasks, the flanking bands are responsible for all of the masking and so across-channel processes require listening to the same flanking band through at least two different filters; one that contains primarily the flanking band alone and one that contains both the flanking and signal bands. Within-channel effects are much more likely in CDD and it would be prudent to assess the extent to which they are responsible for the results before invoking across-channel explanations. The experiments discussed later attempted to replicate and refine some of the experiments performed by other workers. The possibility of any spurious cues such as combination bands, was reduced by adding a masking noise in the frequency regions where such combination bands would be expected to occur. The possible benefits of off-frequency listening were eliminated by using symmetrically placed masker bands. As it is possible that CDD is mainly due to conventional masking, I have attempted to approach the experimental design from a masking point of view. Other work, detailed later, includes mathematical modelling of the auditory filter, in order to predict thresholds on the assumption that only conventional masking is responsible for CDD.

G. Summary

There is a strong case for arguing that the results of CMR-type experiments can be explained in part by an across-channel mechanism. Schooneveldt and Moore (1987) described a 'true' CMR which was seen when flanking bands were not close in frequency to the signal or when the flanking bands were presented to the opposite ear. The total CMR appeared to be greater when the flanking bands and signal were close in frequency and presented to the same ear. This increase over the 'true' CMR was described in terms of a within-channel effect. They also noted that the data from some previous work (McFadden, 1986; Cohen and Schubert, 1987a; Hall, 1987) were obtained under the conditions that would be affected by within-channel cues. Presenting the flanking bands in the opposite ear is a simple way to eliminate within-channel effects. The CMR then is roughly constant as a function of frequency separation of the flanking bands and the signal (Schooneveldt and Moore, 1987).

CDD-type experiments have often been described in the same manner as CMR-type experiments. McFadden (1987) discussed the results purely in terms of grouping (i.e. across-channel processes). Cohen and Schubert (1987b) discussed the similarity to CMR, but did not mention grouping. They briefly covered a dip-listening strategy, but any in-depth discussion of within- versus across-channel mechanisms was avoided.

Later work (Wright, 1990) discussed both within- and across-channel explanations for CDD, but did not offer any estimate as to the relative contribution of the two. Without knowing how the two mechanisms interact, it is difficult to characterize either. This is especially true in the case of CDD-type experiments as there is no simple way to stop within-channel effects. If flanking bands (which are also masking bands, of course) are presented to the opposite ear to the signal band, then the threshold for detection will be very close to absolute threshold.

In the experiments described here, the contributions of both within- and across-channel effects will be examined. The experiments will mainly be geared towards paradigms and conditions that allow such an assessment. The experiments should also be able to independently demonstrate the nature of CDD-type experiments.


Next     Top