The widespread adoption of smart speaker technologies like Amazon Echo and Google Home has significantly embedded them into our everyday lives. These devices offer advanced features, including emergency services and smart home capabilities, through environmental sound source detection. However, they also face cybersecurity risks such as backdoor and adversarial attacks, which exploit server-side vulnerabilities. In our experiment, we initially conducted a user survey (n=97) to get an overview of user perspectives on security concerns related to voice assistants. To counteract these threats, our study investigates a secure multi-party homomorphic neural network for audio classification, aiming to safeguard against such threats by processing encrypted audio data. We collected data from 20 homes, extracted time-series features, and encrypted these for input into the Homomorphic Neural Network (HNN), leading to encrypted predictions. The model demonstrated a high accuracy of 93.18% in identifying various audio objects (11 types in this study). This research not only assesses accuracy but also contrasts the multi-party homomorphic method with conventional neural networks across various performance indicators, highlighting the efficiencies, and potential challenges of using encrypted models.