QueryMintAI: Multipurpose Multimodal Large Language Models for Personal Data

被引：0

作者：

Ghosh, Ananya ^{[1
]}

Deepa, K. ^{[1
]}

机构：

[1] Vellore Inst Technol VIT, Sch Comp Sci & Engn, Vellore 632014, India

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Context modeling; Accuracy; Videos; Natural language processing; Computational modeling; Adaptation models; Deep learning; Large language models; Generative AI; Open source software; Multimodal large language models; generative AI; private database; Langchain; OpenAI;

D O I：

10.1109/ACCESS.2024.3468996

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

QueryMintAI, a versatile multimodal Language Learning Model (LLM) designed to address the complex challenges associated with processing various types of user inputs and generating corresponding outputs across different modalities. The proliferation of diverse data formats, including text, images, videos, documents, URLs, and audio recordings, necessitates an intelligent system capable of understanding and responding to user queries effectively. Existing models often exhibit limitations in handling multimodal inputs and generating coherent outputs across different modalities. The proposed QueryMintAI framework leverages state-of-the-art language models such as GPT-3.5 Turbo, DALL-E-2, TTS-1 and Whisper v2 among others, to enable seamless interaction with users across multiple modalities. By integrating advanced natural language processing (NLP) techniques with domain-specific models, QueryMintAI offers a comprehensive solution for text-to-text, text-to-image, text-to-video, and text-to-audio conversions. Additionally, the system supports document processing, URL analysis, image description, video summarization, audio transcription, and database querying, catering to diverse user needs and preferences. The proposed model addresses several limitations observed in existing approaches, including restricted modality support, lack of adaptability to various data formats, and limited response generation capabilities. QueryMintAI overcomes these challenges by employing a combination of advanced NLP algorithms, deep learning architectures, and multimodal fusion techniques.

引用

页码：144631 / 144651

页数：21

共 48 条

[1]

Abburi H, 2023, Arxiv, DOI arXiv:2309.07755

[2] Hallucinations in ChatGPT: An Unreliable Tool for Learning [J].

Ahmad, Zakia ;

Kaiser, Wahid ;

Rahim, Sifatur .

RUPKATHA JOURNAL ON INTERDISCIPLINARY STUDIES IN HUMANITIES, 2023, 15 (04)

[3]

Asyrofi Rakha, 2023, 2023 International Electronics Symposium (IES), P533, DOI 10.1109/IES59143.2023.10242497

[4]

Azzuni H., 2024, P IEEE INT C CONS EL, P1

[5]

Bansal A., 2021, Advanced natural language processing with TensorFlow 2: Build effective real-world NLP applications using NER, RNNS, seq2seq models, transformers, and more

[6]

Boopathi S., 2023, Promoting Diversity, Equity, and Inclusion in Language Learning Environments, P255

[7]

Camara Vanessa, 2024, 2024 IEEE International Conference on Consumer Electronics (ICCE), P1, DOI 10.1109/ICCE59016.2024.10444148

[8] An Attentive Survey of Attention Models [J].