Although humans communicate in various ways, the most natural expressions are related to speech and gestures. This paper describes a pilot study of the relationship between the two modalities of speech and gesture for Chinese spontaneous speech to improve the interactive capability of human computer interaction systems (HCI). The paper uses a speech and gesture production model with a multimodal coding scheme to annotate four video and audio clips. The speech stress is then correlated with the hand gesture amplitude with the correlation between the gesture boundary and the prosodic boundary statistically analyzed. The results demonstrate that stress expression usually accompanies stronger hand gestures with compensatory hand and head gestures. No time correspondence was found between the prosodic boundary and the gesture boundary, although they have significant correlation. All the results support the interface hypothesis.