Фото: Алексей Витвицкий / РИА Новости
Silero is a tiny, open-source model (around 2MB) that can quickly determine whether a short chunk of audio contains speech. Turn-taking is a much harder problem than speech detection, but VAD is still a useful primitive, especially for deciding whether audio should be forwarded to more expensive downstream systems.
,详情可参考下载安装 谷歌浏览器 开启极速安全的 上网之旅。
其实当豆包手机火到海外之后,就有网友开始畅想,如果 Google 在 Pixel 以及 Android 手机上推广这个技术,那前景将会非常广阔。
And dynamic types everywhere, yes.