Objects are always situated within a scene context and have specific relationships with their environment. Understanding how scene context and the relationships between objects and their context affect object identification is crucial. Previous studies have indicated that scene-incongruent objects are detected faster than scene-congruent ones, and that "context cueing" can enhance object identification. However, no study has directly tested this relationship while considering the effects of bottom-up and top-down attention processes on object judgment. In our research, we explored the influence of context and its relationships by incorporating "context cueing" and categorizing these relationships into two types: semantic and syntactic, within an object judgment task. The behavioral results from Experiment 1 revealed that the recognition accuracy for syntactically incongruent objects was higher, with shorter response times. Eye-tracking data indicated that when semantic congruence was present, the first fixation duration on syntactically incongruent objects was shorter; conversely, when semantic incongruence was present, the first fixation duration on syntactically congruent objects was longer. In Experiment 2, which introduced context cueing, we found that the recognition accuracy for semantically congruent objects was higher, and they received more fixations. Notably, when syntactic incongruence was present, the first fixation duration on semantically congruent objects was longer. These findings suggest that under conditions without background cueing, syntactic processing has priority in scene processing. We interpret these results as evidence that top-down attention biases object processing, leading to reduced processing of scene-congruent objects compared to scene-incongruent ones. Thus, "context cueing" activates top-down attention, playing a pivotal role in object identification.