Skip to content

Releases: PaddlePaddle/PaddleSpeech

PaddleSpeech r1.5.0

05 Mar 07:11
9c01a0b
Compare
Choose a tag to compare

Highlight

New Features

  • Add AudioTools toolkit support used in DAC (Descript-Audio-Codec) training and inference.
  • Reproduce the losses required for DAC model: MultiScaleSTFTLoss, GANLoss, and SISDRLoss.

Version Adaptation

Upgrade and adapt PaddleSpeech from Paddle 2.5.1 to Paddle 3.0.0-beta. Address incompatibility issues caused by the new version upgrade of Paddle, perform adaptation development and regression testing on the models in PaddleSpeech, and ensure the suite operates normally without loss of model functionality or accuracy.

  • Ensure the adaptation of 80+ existing models in the demo and example directories.
  • Ensure the adaptation and accuracy alignment of 10+ core models in the example directory.
  • Support the re-export of 20+ dynamic-to-static models using the PIR + predictor approach and ensure successful inference.

More Detail

New Features

Version Adaptation

Installation Adaptation

Hardware Support

Docs

Bug Fix

CI

Acknowledgements

Special thanks to contributors including @wanx7130, @warrentdrew, @DrRyanHuang, @cchenhaifeng, @undefined-ux, @zxcd, @GreatV, @yinfan98, @Liyulingyue, @megemini, @SuiYunsy, @Netrvin, @enkilee, @tianshuo78520a, @guspan-tanadi, @co63oc, @Echo-Nie and others for their support.

New Contributors

PaddleSpeech r1.4.2

27 Jun 03:45
7b78036
Compare
Choose a tag to compare

S2T

T2S

Server

Docs

Others

Acknowledgements

Special thanks to @jiamingkong @Zth9730 @yeyupiaoling @zxcd @zh794390558 @lemondy @lym0302 @longRookie @ljhzxc @yt605155624 @USTCKAY @mattheliu @Gsonovb @twoDogy @Coloryr @kk-2000 @mmglove @Yulv-git @46319943 @luyao-cv

New Contributors

Full Changelog: r1.4.1...r1.4.2

PaddleSpeech r1.4.1

14 Apr 09:36
9d61b8c
Compare
Choose a tag to compare

Others

PaddleSpeech r1.4.0

15 Mar 08:10
d103cb8
Compare
Choose a tag to compare

S2T

T2S

Server

Engine

Audio

Demos

Docs

Others

  • Remove fluid API in ASR. #2944 #2859 #2852 by @zxcd
  • Add python simple adadelta optimizer. #2925 by @zxcd
  • Add encoding=utf-8 for text. #2896 by @zxcd #2865 by @yt605155624
  • Fix Tensor.numpy()[0] to float(Tensor) to adapt 0D. #2884 by @zhouwei25
  • Fix libsndfile.so not found in ubuntu18-cpu/Dockerfile. #2763 by @linkec
  • Fix AttributeError "module 'distutils' has no attribute 'ccompiler'" in setup.py in ctc_decoders. #2745 by @GreatV

New Contributors

Full Changelog: r1.3.0...r1.4.0

PaddleSpeech r1.3.0

14 Dec 06:38
c54c950
Compare
Choose a tag to compare

HighLIght

S2T

T2S

Audio

Demo

New Contributors

Full Changelog: r1.2.0...r1.3.0

PaddleSpeech r1.2.0

10 Oct 03:31
15ca007
Compare
Choose a tag to compare

S2T

T2S

Text

  • Create preprocess.py for Punctuation Restoration. #2295 by @THUzyt21

Demo

Server

  • Add num_decoding_left_chunks in streaming_asr_server's config. #2337 by @THUzyt21
  • Removed useless spk_id in speech_server and streaming_tts_server, support Chinese English mixed TTS server engine. #2380 by @WongLaw

Doc

Test

  • Add barrier for distributed training using multiple machines. #2309 #2311 by @sneaxiy
  • Fix prepare.sh for PWGAN TIPC. #2376 by @yuehuayingxueluo

Other

Acknowledgements

Special thanks to @yt605155624 @lym0302 @THUzyt21 @iftaken @Zth9730 @zhoupc2015 @WongLaw @david-95 @pengzhendong @kslz @HighCWu @yuehuayingxueluo @sneaxiy @SmileGoat

New Contributors

Full Changelog: r1.1.0...r1.2.0

PaddleSpeech r1.1.0

19 Aug 10:58
aab5412
Compare
Choose a tag to compare

S2T

  • Add wer tools. #1709
  • Add optimize attention cache used for attention ; 0-dim tensor for model export. #2124
  • Fix cnn cache dy2st shape. #2168

TTS

Speechx

  • add custom asr script. #1946
  • refactor frontend. #2003
  • deepspeech2 to onnx #2034
  • Refactor audio/data/feature cache. #1638
  • Frontend refactor . #1640
  • Fix nnet itf header. #1641
  • Refactor speech egs. #1707
  • Refactor egs and more egs for TLG wfst graph build. #1715
  • Speedup ngram building . #1729
  • Update speechx install doc. #1736
  • Fix nnet input and output name. #1740
  • Update wfst graph. #1742
  • Fix model params path name. #1750
  • Remove fluid tools for onnx export. #2116

Audio

  • Refactor paddleaudio to paddlespeech.audio. #2007
  • Add webdataset in paddlespeech.audio. #2062

Server

  • Remove extra logs. #2111 #2113
  • Change streaming tts servers' fs from 24k to models' fs. #2121
  • Fix bug in engine_warmup. #2171 by @Betterman-qs
  • Replace default vocoder in seerver to mb_melgan. #2214
  • Fix bug in streaming_asr_server with punctuation restoration. #2244
  • Rename time_s and time_ns to time_b and time_nb. #2133
  • More accuracy decoding somthing. #2128

CLI

  • Add paddlespeech.resource module. #1917
  • Dynamic cli commands registration. #1959
  • Fix unnecessary download. #2103
  • Remove extra logs. #2084 #2085 #2107
  • Add Chinese English mixed TTS CLI. #2249
  • Add onnxruntime infer for CLI. #2222

Demo

  • Add speech web demo. #2039 #2080
  • Add kws cli and demo. #2063
  • Use paddle web for streaming asr. #2105
  • add custom asr script #1946
  • More cli for speech demos. #2138

Doc

  • Add API doc. #2075
  • Format tts doc string for read the docs. #2115

Others

Acknowledgements

Special thanks to @buchongyu2 @BrightXiaoHan @BarryKCL @Betterman-qs @david-95 @jerryuhoo @QingshuChen @iftaken @zh794390558 @Jackwaterveg @lym0302 @SmileGoat @yt605155624

New Contributors

Full Changelog: r1.0.0...r1.1.0

PaddleSpeech r1.0.0

13 May 10:25
44b7e51
Compare
Choose a tag to compare

Highlight

More

ASR

  • DeepSpeech2 streaming model aishell cer 6.66%
  • DeepSpeech2 streaming model wenetspeech cer: 15.2% (test_net, w/o LM), 24.17% (test_meeting, w/o LM), 5.3% (aishell, w/ LM)
  • Conformer aishell cer 4.64%
  • Conformer streaming model aishell cer 5.44%
  • Conformer streaming model wenetspeech cer: 11.0% (test_net), 18.79% (test_meeting)

Speechx

KWS

Audio

  • [Audio] rename paddleaudio to audio, since confilict with pkg name by @zh794390558 in #1758
  • [Audio] Fix mcd issue. by @KPatr1ck in #1658
  • [Audio] Remove mcd. by @KPatr1ck in #1659
  • [Audio] Add VoxCeleb dataset for speaker recognition.
  • [Audio] Add HeySnips dataset for keyword spotting.

What's Changed

Full Changelog: r1.0.0a...r1.0.0

PaddleSpeech r1.0.0a

28 Apr 04:59
b5fb276
Compare
Choose a tag to compare

Highlight

  • Release Streaming ASR and Streaming TTS system for industrial application.
  • Support KWS model
  • Deepspeech2 streaming model aishell cer 6.66%
  • Conformer aishell cer 4.64%
  • Conformer streaming model aishell cer 5.44%
  • SpeechX Deepspeech2 streaming with WFST

What's Changed

Read more

PaddleSpeech r0.2.0

01 Apr 07:48
05b8ba8
Compare
Choose a tag to compare

S2T

  • Replace kaidi_fbank with paddleaudio #1612
  • Support CTC decoder online #821 #1626
  • Improve accuracy of Conformer. Support using kaiming Uniform as default initialization. #1577

TTS

  • Add SpeedySpeech multi-speaker support for synthesize_e2e.py. #1370 by @jerryuhoo
  • Add WaveRNN for CSMSC dataset. #1379
  • Add Tacotron2 for CSMSC / LJSpeech datasets. #1314 / #1416
  • Add GE2E Tacotron2 Voice Cloning for AISHELL3 dataset. #1419
  • Update text frontend. #1506
  • Add HiFiGAN for LJSpeech / AISHELL-3 / VCTK datasets. #1549 / #1581 / #1587
  • Add NPU support for TransformerTTS. #1593 by @windstamp
  • Add CNN Decoder for Streaming Fastspeech2. #1634

Audio

  • Add paddleaudio.compliance modules that offers audio feature APIs aligned with Kaldi and Librosa. #1518
  • Unittest and benchmark for audio feature APIs. #1548
  • [Audio] - [audio] refactor audio arch #1494 by @zh794390558
  • [Audio] - [audio] dtw metric #1493 by @zh794390558
  • [Audio] - [audio] fix complicance bug #1597 by @zh794390558

Deployment

server

vector

  • [vector] - [vector] ecapa-tdnn on voxceleb #1523 by @honei

CLI

  • Batch input supported. #1460
  • TTS: Add WaveRNN for CSMSC dataset.
  • TTS: Add HiFiGAN for LJSpeech / AISHELL-3 / VCTK datasets.
  • Vector: add speaker verification demo and doc #1605 by @honei

Demo

  • [Demo] - [vec][search] update client image url #1628 by @qingen
  • [Demo] - [server] add server demo #1480 by @lym0302
  • [Demo] - [vec][search] add audio similarity search #1609 by @qingen

Acknowledgements

Special thanks to @WilliamZhang06 @yt605155624 @windstamp @Jackwaterveg @honei @SmileGoat @KPatr1ck @zh794390558 @lym0302 @qingen