Conversation

Conversation

Notices

Masanori Ogino ???? (omasanori@mstdn.maud.io@mstdn.maud.io)'s status on Sunday, 22-Sep-2019 08:33:03 JST Masanori Ogino ????
- Masanori Ogino ????
Common Voiceの日本語版が始まってた！ https://voice.mozilla.org/ja
In conversation Sunday, 22-Sep-2019 08:33:03 JST from mstdn.maud.io permalink
Attachments
1. Common Voice by Mozilla
  
  Common Voice is a project to help make voice recognition open to everyone. Now you can donate your voice to help us build an open-source voice database that anyone can use to make innovative apps for devices and the web.
- ? Himawari Prodromou repeated this.
- Masanori Ogino ???? (omasanori@mstdn.maud.io@mstdn.maud.io)'s status on Sunday, 22-Sep-2019 08:37:34 JST Masanori Ogino ????
  in reply to
  
  Common Voiceはオープンな音声認識の学習用データセットを作るプロジェクトで、各自で表示される文を読んで音声データを投稿したり既に投稿されている音声データを聞いて文を正しく読んでいるかどうかを評価したりしてcontributeできる
  
  In conversation Sunday, 22-Sep-2019 08:37:34 JST permalink
  
  ? Himawari Prodromou repeated this.
- Masanori Ogino ???? (omasanori@mstdn.maud.io@mstdn.maud.io)'s status on Sunday, 22-Sep-2019 08:40:35 JST Masanori Ogino ????
  in reply to
  
  ちなみにMozillaはCommon Voiceと並列してBaiduのDeep Speechアルゴリズム https://arxiv.org/abs/1412.5567 をTensoFlowで実装・公開している https://github.com/mozilla/DeepSpeech
  In conversation Sunday, 22-Sep-2019 08:40:35 JST permalink
  Attachments
  1. No result found on File_thumbnail lookup.
    
    Deep Speech: Scaling up end-to-end speech recognition
    
    from arXiv.org
    
    We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learns a function that is robust to such effects. We do not need a phoneme dictionary, nor even the concept of a "phoneme." Key to our approach is a well-optimized RNN training system that uses multiple GPUs, as well as a set of novel data synthesis techniques that allow us to efficiently obtain a large amount of varied data for training. Our system, called Deep Speech, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set. Deep Speech also handles challenging noisy environments better than widely used, state-of-the-art commercial speech systems.
  2. mozilla/DeepSpeech
    
    from GitHub
    
    A TensorFlow implementation of Baidu's DeepSpeech architecture - mozilla/DeepSpeech
  ? Himawari Prodromou repeated this.

Public

Notices

Feeds