The Stroop paradigm is a great experimental tool to assess the extent that task-irrelevant, but target-related, distractors influence target identification in a variety of contexts. In particular, it has been applied beyond the traditional visual modality (e.g., audio, or audiovisual). However, audiovisual studies using Stroop-like tasks have reported conflicting results. Importantly, these bimodal studies assessed only group-level mean differences and did not investigate whether the degree of bimodal conflict is greater than what is expected of two unimodality distractors that are inhibited in an unlimited capacity, independent, and parallel fashion. In this research, we relied on cognitive-based models of individuals’ performance to estimate audiovisual conflict and directly compared the influence that two types of bimodal distractors had on performance: 1) the same conflicting information was presented in both modalities and 2) different conflicting information was presented in each modality. We found unimodal visual, but not auditory, distractors significantly influenced target processing. Most interestingly, we found that despite a lack of unimodal auditory influence some participants performance indicated that bimodal distractors were harder (easier) to inhibit than expected given our model-based predictions, and the direction (limited or super capacity) and degree of deviation from our model prediction depended on cross-modal distractor similarity.