This is cheap as there is no labeling cost, but the synthetic images may not be realistic enough, resulting in poor generalization on real test images.
It then uses a temporal integration process to compute a confidence score that the phrase you uttered was Hey Siri.
This article describes how we met those challenges to achieve real-time performance on iPhone, iPad, and Apple Watch (in Scribble mode).
We faced significant challenges in developing the framework so that we could preserve user privacy and run efficiently on-device.
1, Issue 8 December 2017 by Differential Privacy Team.
When it detects Hey Siri, the rest of Siri parses the following speech as a command or query.