Object interaction recommendation from Internet of Things (IoT) is a crucial basis for IoT related applications. While many efforts are devoted to suggesting object for interaction, the majority of models rigidly infer relationships from human social network, overlook the neighbor information in their own object social network and the correlation of multiple heterogeneous features, and ignore multi-scale structure of the network. To tackle the above challenges, this work focuses on object social network, formulates object interaction recommendation as multi-modals object ranking, and proposes Multi-Modal Attentionbased Hierarchical Graph Neural Network (MM-AHGNN), that describes object with multiple knowledge of actions and pair-wise interaction feature, encodes heterogeneous actions with multi-modal encoder, integrates neighbor information and fuses correlative multi-modal feature by intra-modal hybrid-attention graph convolution and inter-modal transformer encoder, and employs multi-modal multi-scale encoder to integrate multi-level information, for suggesting object interaction more flexibly. With extensive experiments on real-world datasets, we prove that MM-AHGNN achieves better recommendation results (improve 3-4% HR@3 and 4-5% NDCG@3) than the most advanced baseline. To our knowledge, our MM-AHGNN is the first research in GNN design for object interaction recommend ation. Source codes are available at: https://github.com/gaosaroma/MM-AHGNN.