ai

LLMs Boost Activity Recognition Without Training

November 21, 2025 · 2 min read

LLMs Boost Activity Recognition Without Training

In a recent study, large language models achieved 12-class zero- and one-shot classification F1-scores significantly above chance for activity recognition, with no task-specific training required.

This finding highlights how LLMs can effectively fuse multimodal sensor data, such as audio and motion time series, to classify diverse activities like household tasks and sports.

The researchers curated a subset from the Ego4D dataset, focusing on varied contexts to test the models' ability to integrate complementary information without aligned training data.

They employed a late fusion approach, where modality-specific models processed audio and motion streams separately, and an LLM combined these outputs for final classification.

Showed that this not only surpassed random guessing but also offered a practical solution for applications with limited labeled data, reducing the need for extensive model retraining.

This approach can enable deployment in scenarios where computational resources are constrained, as it avoids the memory and processing demands of custom multimodal models.

However, the authors note limitations, including potential s in scaling to more complex activity sets and dependencies on the quality of modality-specific inputs.

Overall, this work demonstrates a step toward more adaptable AI systems that leverage pre-trained models for real-world sensor fusion tasks.