Releases: PaddlePaddle/PaddleSpeech
PaddleSpeech r1.5.0
Highlight
New Features
- Add AudioTools toolkit support used in DAC (Descript-Audio-Codec) training and inference.
- Reproduce the losses required for DAC model: MultiScaleSTFTLoss, GANLoss, and SISDRLoss.
Version Adaptation
Upgrade and adapt PaddleSpeech from Paddle 2.5.1 to Paddle 3.0.0-beta. Address incompatibility issues caused by the new version upgrade of Paddle, perform adaptation development and regression testing on the models in PaddleSpeech, and ensure the suite operates normally without loss of model functionality or accuracy.
- Ensure the adaptation of 80+ existing models in the demo and example directories.
- Ensure the adaptation and accuracy alignment of 10+ core models in the example directory.
- Support the re-export of 20+ dynamic-to-static models using the PIR + predictor approach and ensure successful inference.
More Detail
New Features
- Add AudioTools toolkit #3900 (@DrRyanHuang)
- Add FFT convolution layer implementation #3947 (@DrRyanHuang)
- Implement loss functions required for DAC training #3988 (@cchenhaifeng)
- Add quantifiers and unit symbols support #3837 (@undefined-ux)
- Add multiple PIR models #3956, #3982 (@zxcd)
- Add chunk configuration for tal_cs #3936 (@zxcd)
Version Adaptation
- Enhance NumPy compatibility #3907 (@GreatV)
- Fix Whisper model support under Paddle 3.0 #3880 (@yinfan98)
- Remove dependency on paddlepaddle-gpu #3898 (@Liyulingyue)
- Support new inference interface #3927 (@zxcd)
- Modify inference to be compatible with Paddle 3.0 #3963 (@megemini)
- Fix cls static model infer error #3856 (@zxcd)
- Remove parser.add_argument #3878 (@Liyulingyue)
- Add strtobool implementation #3877 (@Liyulingyue)
- Fix view to shape for wav2vec2 #3904 (@Liyulingyue)
- Fix 0D tensor to 1D issue #3913 (@megemini)
- Fix type promotion issues #3817, #3883, #3944, #3943 (@megemini @GreatV)
- Fix shape error in layer normalization #3884 (@Liyulingyue)
- Resolve scipy import error #3874 (@GreatV)
- Fix vits type promotion and 0D #3920 (@Liyulingyue)
- Fix fastspeech2 0d issue #3951 (@megemini)
- Fix emb initialization #3962 (@megemini)
- Replace view with reshape #3887, #3939 (@GreatV, @megemini)
- Fix max between int and value #3903 (@megemini)
- Fix duplicated argument #3934 (@megemini)
- Fix asr5 test.sh script path error #3941 (@megemini)
- Fix vctk spk_emb dimension issue #3916 (@megemini)
- Fix type promotion for aishell3/vctk vc0/ernie #3928 (@Liyulingyue)
- Use numpy for transpose #3933 (@megemini)
- Fix shape issues in opencopop svs1 #3912 (@enkilee)
- Fix deepspeech2online export issue #3935 (@Liyulingyue)
- Adapt whisper list for paddle 3.0 #4018 (@Liyulingyue)
Installation Adaptation
- Optimize Python version compatibility #3965, #3967, #3969, #3970, #3972 (@Liyulingyue)
- Add hints for installing with
-e
option #3979 (@Liyulingyue) - Move audiotools requirements to setup.py #3999 (@zxcd)
- Lower installation requirements #3985 by @Liyulingyue
- Remove paddleaudio from PaddleSpeech #3986 by @zxcd
- Update install_openblas.sh #3876 (@GreatV)
- Update setup.py #3964, #3995 (@Liyulingyue)
- Adapt for librosa #3989 (@Liyulingyue)
- Lower installation requirements #3985 (@Liyulingyue)
- Remove paddleaudio from PaddleSpeech #3986 (@zxcd)
- Define PythonDetermine in setup.py #3975 (@Liyulingyue)
- Add paddle3.0 beta1 cpu docker #4000 (@Liyulingyue)
Hardware Support
- Add GCU Backend support #3875 (@wanx7130)
- SpeedySpeech code adaptation for NPU #3804 (@warrentdrew)
- SpeedySpeech code adaptation for MLU #3828 (@warrentdrew)
Docs
- Add Squeezeformer information to README #3860 (@zxcd)
- Add README documentation for TIMIT/ASR1 #3930 (@enkilee)
- Fix multiple examples and demos #3830, #3872 (@zxcd @Liyulingyue)
- Fix tess readme #3882 (@megemini)
- Update README.md #3890 (@Liyulingyue)
- Fix Example/tiny documentation errors #3892 (@Liyulingyue)
- Update tal_cs readme #3911 (@megemini)
- Fix librispeech asr readme #3917 (@megemini)
- Fix CSMSC voc1 readme.md #3915 (@enkilee)
- Fix s2t example errors #3950 (@megemini)
- Fix led_en_zh st1 example #3955 (@GreatV)
- Text frontend intended links #3958 (@guspan-tanadi)
- Update Tiny README.md #3896 (@Liyulingyue)
- Fix acs demo #3826 (@zxcd)
- Fix g2p run.sh #3886 (@megemini)
- Fix asr4 test_wav redundant arguments #3940 (@megemini)
- Add synthesize_e2e.sh for csmsc/voc1, fix run.sh #3945 (@enkilee)
- Add synthesize_e2e.sh for csmsc/voc5, fix run.sh #3959 (@enkilee)
- Fix CSMSC Voc5/Jets/TTS2 #3906 (@Liyulingyue)
- Update utility script paths #3942 (@GreatV)
- Remove non-existent folders and add existing folders #3881 (@Liyulingyue)
- Fix file name #3895 (@zxcd)
- Fix missing ' #3869 (@Liyulingyue)
- Fix typos #3980, #3981, #3984, #4021, #4024 (@co63oc) #4011 (@rich04lin)
- Fix csmsc/voc3 script #3960 (@enkilee)
- Fix runtime doc to suit code #4057, #4042, #4045, #4050, #4051, #4037, #4043, #4044, #4049, #4047, #4038, #4013 (@Echo-Nie) #4068 (@zxcd)
- Fix g2p model link #4040 (@zxcd)
Bug Fix
- Fix streaming TTS server issues #3865 (@SuiYunsy)
- Fix matplotlib version incompatibility #3841 (@zxcd)
- Fix pydantic dependency issues #3715 (@Netrvin)
- Fix audiotools file path #3968 (@zxcd)
- Add missing keywords for aishell3/vits-vc #3932 (@yinfan98)
- Fix data traversal error caused by empty folders without *.npy files #3948 (@megemini)
- Fix package dependency issues in opencopop svs1 #3889 (@enkilee)
- Separate paddle.logsumexp #3897 (@zxcd)
- Fix audiotools model save and load #3994 (@zxcd)
- Fix TimeDomainSpecAugment import error #3919 (@megemini)
- Fix print_arguments import error #3918 (@megemini)
- Fix panns predict.py for pir json model path #3914 (@megemini)
- Complete missing parameters in synthesis series scripts #3998 (@enkilee)
- Fix tests/unit/tts/test_pwg.py #3974 (@co63oc)
- Fix sinc api accuracy issue #4061 (@zxcd)
CI
- Add server CI #3857 by @tianshuo78520a
- Add unit tests #3835, #3836 (@zxcd, @tianshuo78520a)
- Close test_expand.py #3971 (@co63oc)
- Close test_snapshot.py #3976 (@co63oc)
Acknowledgements
Special thanks to contributors including @wanx7130, @warrentdrew, @DrRyanHuang, @cchenhaifeng, @undefined-ux, @zxcd, @GreatV, @yinfan98, @Liyulingyue, @megemini, @SuiYunsy, @Netrvin, @enkilee, @tianshuo78520a, @guspan-tanadi, @co63oc, @Echo-Nie and others for their support.
New Contributors
- @wanx7130 made their first contribution in #3875
- @cchenhaifeng made their first contribution in #3988
- @undefined-ux made their first contribution in #3837
- @DrRyanHuang made their first contribution in #3900
- @SuiYunsy made their first contribution in #3865
- @Netrvin made their first contribution in #3715
- @guspan-tanadi made their first contribution in #3958
- @enkilee made their first contribution in #3889
- @co63oc made their first contribution in #3971
- @Echo-Nie made their first contribution in #4057
PaddleSpeech r1.4.2
S2T
- Add WavLM ASR-en, WavLM fine-tuning for ASR on LibriSpeech. #3242 by @jiamingkong
- Add HuBERT ASR-en, HuBERT fine-tuning for ASR on LibriSpeech. #3088 by @Zth9730
- Add Squeezeformer model. #2755 by @yeyupiaoling
- Add AMP for U2 conformer. #3167 by @zxcd
- Mv dataset into paddlespeech.dataset. #3183 #3189 by @zh794390558
- Fix example/aishell local/train.sh if condition bug. #3146 by @lemondy
- Fix cli args to config. #3194 by @zh794390558
- Fix scaler save, load, unscale_ blow, grad_clip. by @zxcd
T2S
- Add SVS(Singing Voice Synthesis) examples with Opencpop dataset, including DiffSinger、PWGAN (#3031 by @lym0302) and HiFiGAN (#3038 by @lym0302), the effect is continuously optimized.
- Add SVS frontend. #3062 by @lym0302
- Add TTS iSTFTNet (#3006 by @longRookie), TTS JETS (#3109 by @ljhzxc)
- Starganv2: by @yt605155624
- Support for LITE: by @yt605155624
- Add XPU support for SpeedySpeech and FastSpeech2. #3502 #3514 by @USTCKAY
- Fix some preprocess bugs. #3155 by @yt605155624
- Fix bug of merge_yi function. #3786 by @mattheliu
Server
- Add code-switch conformer_talcs support. #3230 by @Gsonovb
- Add subtitle file (.srt format) generation example. #3123 by @twoDogy
- Fix: add file read encoding. #3606 by @Coloryr
Install & Benchmark - Update paddle2onnx to newest install version. #3084 by @yt605155624
- Update to py3.8, fix librosa==0.8.1 numpy==1.23.5 for paddleaudio. by @zh794390558
- Fix transformation import error. #3779 by @kk-2000
- Adapt view behavior change, fix KeyError. #3794 by @zxcd
- Fix profiler, fix gpu_mem unit, add max_mem_reserved for benchmark. #3323 #3634 #3604 by @mmglove
Docs
- Fix some typos. #3178 by @Yulv-git
- Update svs_music_score.md. #3085 #3070 by @lym0302
- Update quick_start.md. #3175 #3176 by @46319943
- Add cli test readme. #3784 by @zxcd
- Update bug-report-tts.md. #3120 by @yt605155624
Others
- Fix 0-d tensor, with the upgrade of paddlepaddle==2.5, the problem of modifying 0-d tensor has been solved. #3214 by zxcd #3334 by @zh794390558
- Add dtype param for arange API. #3302 by @zxcd
- Fix develop bug function:view to reshape. #3633 by @luyao-cv
- Fix progress bar unit. #3177 by @46319943
- Rm unused dep. #3097 by @lym0302
Acknowledgements
Special thanks to @jiamingkong @Zth9730 @yeyupiaoling @zxcd @zh794390558 @lemondy @lym0302 @longRookie @ljhzxc @yt605155624 @USTCKAY @mattheliu @Gsonovb @twoDogy @Coloryr @kk-2000 @mmglove @Yulv-git @46319943 @luyao-cv
New Contributors
- @jiamingkong made their first contribution in #3242
- @yeyupiaoling made their first contribution in #2755
- @lemondy made their first contribution in #3146
- @longRookie made their first contribution in #3006
- @ljhzxc made their first contribution in #3109
- @USTCKAY made their first contribution in #3502
- @mattheliu made their first contribution in #3786
- @Gsonovb made their first contribution in #3230
- @twoDogy made their first contribution in #3123
- @Coloryr made their first contribution in #3606
- @kk-2000 made their first contribution in #3779
- @Yulv-git made their first contribution in #3178
- @46319943 made their first contribution in #3175
- @luyao-cv made their first contribution in #3633
Full Changelog: r1.4.1...r1.4.2
PaddleSpeech r1.4.1
Others
- fix typeguard version. #3056 @yt605155624
PaddleSpeech r1.4.0
S2T
- Add wav2vec2-zh finetune pipeline. #3012 #2916 by @zxcd
- Fix some bugs in Whisper. #2900 #2825 by @zxcd
- Add code-switch asr tal_cs recipe. #2816 #2796 by @zxcd
T2S
- Add dygraph to static、PaddleInference、Paddle2ONNX and ONNXRuntime Infer for Cantonese TTS. #2990 by @JiehangXie
- Add Cantonese test examples. #2937 by @JiehangXie
- Add VITS inference pipeline. #3002 #2972 #2883 by @yt605155624
- Rearrange encoder_infer param's order. #2983 by @443127316
- Add male speaker and Chinese-English mix ONNXRuntime infer in CLI. #2945 by @lym0302
- Add Cantonese TTS example. #2950 #2927 #2924 #2907 #2899 by @WongLaw
- Fix PWGAN TIPC. #2882 by @yt605155624
- Add a case in not_erhua. #2863 by @QuanZ9
- Fix data prepare for PaddleSlim PTQ of TTS. #2862 by @yt605155624
- Avoid using variable "attn_loss" before assignment. #2860 by @hopingZ
- add soft link for shell in example, Add skip_copy_wave in norm stage of GANVocoders to save disk. #2851 by @yt605155624
- Optimize the training of VITS. #2843 #2809 #2791 #2770 by @WongLaw
- Add StarGANv2-VC model scripts and synthsize scripts. #2842 by @yt605155624
- Add diffusion module for training diffsinger. #2868 #2832 by @HighCWu
- Fix some Text Frontend bugs. #2831 by @yt605155624
- For mixed Chinese and English speech synthesis, add SSML support for Chinese. #2830 by @jindongyi011039
- Add mkldnn and trt config for TTS Inference. #2748 by @yt605155624
- Fix dygraph to static for tacotron2. #2426 by @yt605155624
Server
Engine
- Add wfst decoder. #2886 by @SmileGoat
- Add batch recognizer decode. #2866 by @SmileGoat
- Add nnet prob cache && make 2 thread decode work. #2769 by @SmileGoat
- Engine directory refactor. #2746 by @SmileGoat
- Fix openfst download error. #2742 by @SmileGoat
Audio
- Replace kaldi fbank with kaldi-native-fbank in paddleaudio. #2799 by @SmileGoat
- Fix load paddleaudio fail. #2815 by @SmileGoat
- Update paddleaudio readme. #2801 by @SmileGoat
Demos
- Add TTS ARM Linux C++ Demo. #2991 by @SwimmingTiger
- Add Cantonese TTS in CLI. #2977 by @WongLaw
- Add ONNXRuntime infer for Cantonese TTS in CLI. #2990 by @JiehangXie
Docs
- Add u2pp_wenetspeech_static_quant to released_model.md. #2973 @zxcd
- Remove redundant dependencies and Fix some bugs in setup.py. #2970 #2871 #2867 #2853 #2771 #2767 #2764 by @yt605155624
Others
- Remove fluid API in ASR. #2944 #2859 #2852 by @zxcd
- Add python simple adadelta optimizer. #2925 by @zxcd
- Add encoding=utf-8 for text. #2896 by @zxcd #2865 by @yt605155624
- Fix Tensor.numpy()[0] to float(Tensor) to adapt 0D. #2884 by @zhouwei25
- Fix libsndfile.so not found in ubuntu18-cpu/Dockerfile. #2763 by @linkec
- Fix AttributeError "module 'distutils' has no attribute 'ccompiler'" in setup.py in ctc_decoders. #2745 by @GreatV
New Contributors
- @GreatV made their first contribution in #2745
- @linkec made their first contribution in #2763
- @cxumol made their first contribution in #2828
- @jindongyi011039 made their first contribution in #2830
- @QuanZ9 made their first contribution in #2863
- @hopingZ made their first contribution in #2860
- @zhouwei25 made their first contribution in #2884
- @EscaticZheng made their first contribution in #2915
- @chinobing made their first contribution in #2922
- @lance6716 made their first contribution in #2924
- @443127316 made their first contribution in #2983
Full Changelog: r1.3.0...r1.4.0
PaddleSpeech r1.3.0
HighLIght
S2T
- Support U2/U2++ Conformer dy2static, and U2/U2++ C++ High Performance Streaming ASR Deployment. @zh794390558
- Add Wav2vec2ASR-en, wav2vec2.0 fine-tuning for ASR on LibriSpeech. @Zth9730
- Add Whisper CLI and Demos, support multi language recognition and translation. @zxcd
- Add Wav2vec2 CLI and Demos, support ASR and Feature Extraction. @Zth9730
- Add whisper. #2640 #2704 by @zxcd
- Fix gpu training hang. #2478 by @Zth9730
- Support u2++ based cli and server. #2489 #2510 by @Zth9730
- Add wav2vec2-en. #2518 #2527 #2637 by @Zth9730
- Add wav2vec2-zh cli. #2697 by @Zth9730
T2S
- Add seek for BytesIO. #2484 by @ZapBird
- Add mix finetune. #2525 #2647 by @lym0302
- Add streaming TTS fastdeploy serving. #2528 by @HexToString
- Add SSML for Chinese Text Frontend. #2531 by @david-95
- Add end-to-end Prosody Prediction pipeline (including using prosody labels in Acoustic Model). #2548 #2615 #2693 by @WongLaw
- Add Adversarial Loss for Chinese English mixed TTS. #2588 by @lym0302
- Fix frontend bugs. #2539 #2606 by @yt605155624
- Add TN for English unit. #2629 by @WongLaw
- Add male voice for TTS. #2660 by @lym0302
- Add double byte char for zh normalization. #2661 by @david-95
- Add TTS Paddle-Lite x86 inference. #2636 #2667 by @yt605155624
- Add greek char and fix #2571. #2683 by @david-95
- Add Slim for TTS. #2729 by @yt605155624
Audio
- Move paddlespeech/audio to paddleaudio. #2706 by @SmileGoat
Demo
- Add TTSAndroid demo. #2703 by @yt605155624
New Contributors
- @ZapBird made their first contribution in #2484
- @HexToString made their first contribution in #2528
- @dahu1 made their first contribution in #2554
- @kFoodie made their first contribution in #2664
- @zxcd made their first contribution in #2640
- @michael-skynorth made their first contribution in #2666
- @heyudage made their first contribution in #2688
Full Changelog: r1.2.0...r1.3.0
PaddleSpeech r1.2.0
S2T
- Fix conformer/transformer multi GPU training. #2327 #2334 #2336 #2372 by @Zth9730
- Fix deepspeech2 decode_wav. #2351 by @Zth9730
- Support BiTransformer decoder. #2415 by @Zth9730
T2S
- Update VITS to support VITS and its voice cloning training on AISHELL-3. #2268 by @HighCWu
- Add ERNIE-SAT synthesize_e2e. #2287 #2316 #2355 #2378 #2432 by @yt605155624
- Specify the input data type of G2PW. #2288 by @kslz
- Add TTS finetune example. #2297 #2385 #2418 #2430 by @lym0302
- Fix Chinese English mixed TTS frontend. #2299 #2493 by @lym0302
- Add words into polyphonic.yaml for g2pW. #2300 by @david-95
- Update the quantifier unit in Text Normalization. #2308 by @pengzhendong
- Fix Chinese frontend bugs. #2312 #2323 by @david-95
- Add AISHELL-3 Voice Cloning with ECAPA-TDNN speaker encoder. #2359 #2429 by @yt605155624
- Add pre-install doc for G2P and TN, update version of pypinyin. #2364 by @WongLaw
- Add tools to compare two test results of G2P to show differences. #2367 by @david-95
- Revise must_neural_tone_words. #2370 by @WongLaw
- Add type-hint for g2pW. #2390 by @yt605155624
- Replaced fixed path with path variable in MFA. #2416 by @WongLaw
- Solve "unknown format: 3" for wavfile.write(). #2422 by @zhoupc2015
Text
Demo
Server
- Add num_decoding_left_chunks in streaming_asr_server's config. #2337 by @THUzyt21
- Removed useless spk_id in speech_server and streaming_tts_server, support Chinese English mixed TTS server engine. #2380 by @WongLaw
Doc
- Add Chinese doc and language switcher for metaverse, style_fs2 and story_talker. #2357 by @WongLaw
- Update API docs. #2406 by @yt605155624
- Add finetune demos in readthedocs. #2411 by @yt605155624
Test
- Add barrier for distributed training using multiple machines. #2309 #2311 by @sneaxiy
- Fix prepare.sh for PWGAN TIPC. #2376 by @yuehuayingxueluo
Other
- Format paddlespeech with pre-commit. #2331 by @yt605155624
Acknowledgements
Special thanks to @yt605155624 @lym0302 @THUzyt21 @iftaken @Zth9730 @zhoupc2015 @WongLaw @david-95 @pengzhendong @kslz @HighCWu @yuehuayingxueluo @sneaxiy @SmileGoat
New Contributors
- @HighCWu made their first contribution in #2268
- @pengzhendong made their first contribution in #2308
- @Zth9730 made their first contribution in #2327
- @WongLaw made their first contribution in #2357
- @yuehuayingxueluo made their first contribution in #2376
- @zhoupc2015 made their first contribution in #2422
Full Changelog: r1.1.0...r1.2.0
PaddleSpeech r1.1.0
S2T
- Add wer tools. #1709
- Add optimize attention cache used for attention ; 0-dim tensor for model export. #2124
- Fix cnn cache dy2st shape. #2168
TTS
- Fix random speaker embedding bug in voice clone. #1828 by @jerryuhoo
- Add VITS model. #1855 #1957 #2040
- Add kunlun support for speedyspeech. #1879 by @QingshuChen
- Normalize wav max value to 1 in preprocess. #1887 by @jerryuhoo
- Remove fluid dependence in TTS. #1940
- Add onnx models for aishell3/ljspeech/vctk's tts3/voc1/voc5. #2068
- Add TTS static/onnx models in pretrained_models.py. #2074
- Add Ernie SAT model. #2052 #2117
- Add Chinese English mixed TTS frontend. #2143
- Add Chinese English mixed TTS example. #2234
- Fix English text frontend bug. #2235 by @david-95
- Add g2pW to Chinese frontend. #2230 by @BarryKCL
- Fix text frontend bugs. #1912 #2250 #2254 #2255 #2272
Speechx
- add custom asr script. #1946
- refactor frontend. #2003
- deepspeech2 to onnx #2034
- Refactor audio/data/feature cache. #1638
- Frontend refactor . #1640
- Fix nnet itf header. #1641
- Refactor speech egs. #1707
- Refactor egs and more egs for TLG wfst graph build. #1715
- Speedup ngram building . #1729
- Update speechx install doc. #1736
- Fix nnet input and output name. #1740
- Update wfst graph. #1742
- Fix model params path name. #1750
- Remove fluid tools for onnx export. #2116
Audio
Server
- Remove extra logs. #2111 #2113
- Change streaming tts servers' fs from 24k to models' fs. #2121
- Fix bug in engine_warmup. #2171 by @Betterman-qs
- Replace default vocoder in seerver to mb_melgan. #2214
- Fix bug in streaming_asr_server with punctuation restoration. #2244
- Rename time_s and time_ns to time_b and time_nb. #2133
- More accuracy decoding somthing. #2128
CLI
- Add paddlespeech.resource module. #1917
- Dynamic cli commands registration. #1959
- Fix unnecessary download. #2103
- Remove extra logs. #2084 #2085 #2107
- Add Chinese English mixed TTS CLI. #2249
- Add onnxruntime infer for CLI. #2222
Demo
- Add speech web demo. #2039 #2080
- Add kws cli and demo. #2063
- Use paddle web for streaming asr. #2105
- add custom asr script #1946
- More cli for speech demos. #2138
Doc
Others
- Fix CPU Dockerfile. #2172 by @BrightXiaoHan
- Add PaddleSpeech Dockerfile for hard mode of installation. #2127 by @buchongyu2
Acknowledgements
Special thanks to @buchongyu2 @BrightXiaoHan @BarryKCL @Betterman-qs @david-95 @jerryuhoo @QingshuChen @iftaken @zh794390558 @Jackwaterveg @lym0302 @SmileGoat @yt605155624
New Contributors
- @QingshuChen made their first contribution in #1879
- @Zhangjingyu06 made their first contribution in #1951
- @ryanrussell made their first contribution in #1976
- @freeliuzc made their first contribution in #2044
- @vpegasus made their first contribution in #2043
- @dependabot made their first contribution in #2061
- @raycool made their first contribution in #2109
- @YDX-2147483647 made their first contribution in #2125
- @chenkui164 made their first contribution in #2130
- @0x45f made their first contribution in #2162
- @Doubledongli made their first contribution in #2167
- @Betterman-qs made their first contribution in #2171
- @BrightXiaoHan made their first contribution in #2172
- @THUzyt21 made their first contribution in #2202
- @david-95 made their first contribution in #2235
- @BarryKCL made their first contribution in #2230
Full Changelog: r1.0.0...r1.1.0
PaddleSpeech r1.0.0
Highlight
- Release PP-ASR: Streaming ASR with timestamp and punctuation restoration, uses WenetSpeech Streaming Conformer and DeepSpeech2 ASR model.
- Release PP-TTS: Streaming TTS system for industrial application.
- Release PP-VPR: Industrial Voiceprint Recognition system and ECAPA-TDNN model.
- Custom ASR apply reimbursement for transportation
- Support MDTC KWS model
More
ASR
- DeepSpeech2 streaming model aishell cer 6.66%
- DeepSpeech2 streaming model wenetspeech cer: 15.2% (test_net, w/o LM), 24.17% (test_meeting, w/o LM), 5.3% (aishell, w/ LM)
- Conformer aishell cer 4.64%
- Conformer streaming model aishell cer 5.44%
- Conformer streaming model wenetspeech cer: 11.0% (test_net), 18.79% (test_meeting)
Speechx
- [SpeechX] DeepSpeech2 streaming with WFST in streaming asr example
- [SpeechX] Add websocket websocket example
- [SpeechX] custom asr, apply reimbursement for transportation demo
KWS
- [KWS] Add kws example on HeySnips dataset. by @KPatr1ck in #1558
- [KWS] Update KWS example. by @KPatr1ck in #1783
Audio
- [Audio] rename paddleaudio to audio, since confilict with pkg name by @zh794390558 in #1758
- [Audio] Fix mcd issue. by @KPatr1ck in #1658
- [Audio] Remove mcd. by @KPatr1ck in #1659
- [Audio] Add
VoxCeleb
dataset for speaker recognition. - [Audio] Add
HeySnips
dataset for keyword spotting.
What's Changed
- [R1.0][asr][server]add vector server by @honei in #1845
- [R1.0][asr][server]join streaming asr and punc server by @honei in #1846
- [R1.0]asr streaming server add time stamp by @honei in #1850
- [R1.0][tts][server] update readme by @lym0302 in #1852
- [R1.0] update cli by @Jackwaterveg in #1854
- [r1.0] update version to r1.0.0 by @zh794390558 in #1857
- [R1.0] Add doc for wenetspeech model (ds2 online, conformer online) by @Jackwaterveg in #1862
- [R1.0][server] improve server code by @lym0302 in #1866
- [R1.0][asr][server]update the streaming asr readme by @honei in #1871
- [R1.0] Updata released model info ( Wenetspeech ds2 online, conformer online) by @Jackwaterveg in #1869
- [R1.0]fix server doc and decode_method by @Jackwaterveg in #1889
- [speechx] add custom_streaming_asr @SmileGoat #1891
- [speechx] speedup ngram building @zh794390558 #1729
- [speechx] refactor egs and more egs for TLG wfst graph build @zh794390558 #1715
- [speechx]add aishell test script & json parser & no db norm linear feature & json2kaldi type cmvn @SmileGoat #1676
- [speechx] Add websocket & make it work @SmileGoat #1720
- [speechx] Frontend refactor @SmileGoat #1640
- [Speechx] add tlg decoder @SmileGoat #1599
Full Changelog: r1.0.0a...r1.0.0
PaddleSpeech r1.0.0a
Highlight
- Release Streaming ASR and Streaming TTS system for industrial application.
- Support KWS model
- Deepspeech2 streaming model aishell cer 6.66%
- Conformer aishell cer 4.64%
- Conformer streaming model aishell cer 5.44%
- SpeechX Deepspeech2 streaming with WFST
What's Changed
- [speechx] refactor audio/data/feature cache by @zh794390558 in #1638
- [speechx] Frontend refactor by @zh794390558 in #1640
- [speechx] fix nnet itf header by @zh794390558 in #1641
- [TTS]add license and reference for some models by @yt605155624 in #1642
- [Doc] supplement note by @Jackwaterveg in #1643
- [vec][search] update search demo README by @qingen in #1644
- [speechx]refactor linear feature:unify vector & remove redundant function & add remained_wav cache shift wav by @SmileGoat in #1649
- [Audio] Fix mcd issue. by @KPatr1ck in #1658
- [Audio] Remove mcd. by @KPatr1ck in #1659
- [vec]update the speaker verification model by @honei in #1663
- [ASR] update ds2 online model by @Jackwaterveg in #1668
- [TTS]fix preprocess bug, test=tts by @yt605155624 in #1660
- update README, test=doc by @iftaken in #1672
- [Punc] Update RESULTS.md. by @KPatr1ck in #1675
- [CLI] update ds2 online model in cli by @Jackwaterveg in #1674
- [CLI] ASR: Add duration limitation for asr by @Jackwaterveg in #1666
- [vec]add speaker verification score method by @honei in #1646
- [TTS]add onnx inference for fastspeech2 + hifigan/mb_melgan by @yt605155624 in #1665
- [doc]update readme by @yt605155624 in #1680
- [WebSocket] fixed online model md5 error , test=doc by @WilliamZhang06 in #1682
- [speechx]add aishell test script & json parser & no db norm linear feature & json2kaldi type cmvn by @SmileGoat in #1676
- [server] add stream tts server by @lym0302 in #1652
- [speechx]remove mutable in audio_cache by @SmileGoat in #1687
- [Doc] update readem for aishell/asr0 by @Jackwaterveg in #1677
- [vec] add speaker diarization pipeline by @ccrrong in #1651
- [vec]voxceleb convert dataset format to paddlespeech by @honei in #1630
- [Speechx] add tlg decoder by @SmileGoat in #1599
- [vec]add vector necessary note, test=doc by @honei in #1690
- Revert "[WebSocket] fixed online model md5 error , test=doc" by @zh794390558 in #1691
- [WebSocket] added online web client, test=doc by @WilliamZhang06 in #1692
- 修复 example/aishell 目录中speech单词拼写错误问题 by @buchongyu2 in #1694
- 修改hack 单词拼写错误 by @buchongyu2 in #1697
- [TTS]change NLC to NCL in speedyspeech, test=tts by @yt605155624 in #1693
- [doc]fix typo, test=doc by @yt605155624 in #1698
- [doc]add pwgan onnx model, test=doc by @yt605155624 in #1700
- [WebSocket] added online asr doc and online asr command line, test=doc by @WilliamZhang06 in #1701
- [vec][server] vpr demo support by @qingen in #1696
- [speechx] refactor speech egs by @zh794390558 in #1707
- [asr]add wer tools by @zh794390558 in #1709
- [asr][websocket]fix the ws send bug, cache buffer, text=doc by @honei in #1710
- [TTS]add fastspeech2 cnndecoder onnx model by @yt605155624 in #1712
- [speechx] refactor egs and more egs for TLG wfst graph build by @zh794390558 in #1715
- [vec][score] add plda model by @qingen in #1681
- [CLI]update cli, test=doc by @yt605155624 in #1716
- [server] add streaming am infer by @lym0302 in #1713
- [speechx] Add websocket & make it work by @SmileGoat in #1720
- [asr][websocket] add asr conformer websocket server by @honei in #1704
- [vec][loss] add NCE Loss from RNNLM by @qingen in #1719
- [vec][loss] add FocalLoss to deal with class imbalances by @qingen in #1722
- [TTS]restructure syn_utils.py, test=tts by @yt605155624 in #1723
- [TTS]add paddle device set for ort and inference by @yt605155624 in #1727
- [vec] add GRL to domain adaptation by @qingen in #1725
- [speechx] speedup ngram building by @zh794390558 in #1729
- [asr] Add new cer tools by @Jackwaterveg in #1673
- [speechx]add websocket lib by @SmileGoat in #1732
- [speechx]update speechx install doc by @zh794390558 in #1736
- [Doc] prefect the packing scripts by @Jackwaterveg in #1735
- [Doc]renew the released mode by @Jackwaterveg in #1739
- [asr][websocket]add streaming asr demo by @honei in #1737
- [speechx] fix nnet input and output name by @zh794390558 in #1740
- [ASR] remove redundant log by @Jackwaterveg in #1741
- [speechx] update wfst graph by @zh794390558 in #1742
- [speechx] Add recognizer_test_main script by @SmileGoat in #1743
- [vec][doc]update the voxceleb readme.md, test=doc by @honei in #1744
- [ASR] fix CER tools by @Jackwaterveg in #1747
- [Doc] Fix release_model info by @Jackwaterveg in #1746
- [Doc] Updata released model info by @Jackwaterveg in #1748
- Updata released model info by @Jackwaterveg in #1749
- [speechx] fix model params path name by @zh794390558 in #1750
- [speechx] fix linear-spectrogram-wo-db-norm-ol read feature issue by @SmileGoat in #1751
- [TTS]fix wavernn white noise bug for paddle develop(2.3) by @yt605155624 in #1752
- [server] add onnx tts engine by @lym0302 in #1733
- [TTS]Update paddle2onnx by @yt605155624 in #1754
- [Setup] to r1.0.0a by @Jackwaterveg in #1759
- [audio] rename paddleaudio to audio, since confilict with pkg name by @zh794390558 in #1758
- [speechx] to_float32, fix shell script by @zh794390558 in #1757
- [vec] bug fix to adapt VUE by @qingen in #1760
- [asr][weboscket]fix the streaming asr server bug, server client by @honei in #1761
- [speechx] fbank and mfcc by @zh794390558 in #1765
- format code by @zh794390558 in #1764
- [CLI] Add conformer_aishell, conformer_online_aishell by @Jackwaterveg in #1767
- [speechx]make cmvn global in run.sh by @SmileGoat in #1768
- [ASR] ds2: add log_interval and fix lr problem when resume training by @Jackwaterveg in #1766
- [speechx] set nnet param by flags by @zh794390558 in #1769
- [server] add streaming tts demos by @lym0302...
PaddleSpeech r0.2.0
S2T
- Replace kaidi_fbank with paddleaudio #1612
- Support CTC decoder online #821 #1626
- Improve accuracy of Conformer. Support using kaiming Uniform as default initialization. #1577
TTS
- Add SpeedySpeech multi-speaker support for synthesize_e2e.py. #1370 by @jerryuhoo
- Add WaveRNN for CSMSC dataset. #1379
- Add Tacotron2 for CSMSC / LJSpeech datasets. #1314 / #1416
- Add GE2E Tacotron2 Voice Cloning for AISHELL3 dataset. #1419
- Update text frontend. #1506
- Add HiFiGAN for LJSpeech / AISHELL-3 / VCTK datasets. #1549 / #1581 / #1587
- Add NPU support for TransformerTTS. #1593 by @windstamp
- Add CNN Decoder for Streaming Fastspeech2. #1634
Audio
- Add
paddleaudio.compliance
modules that offers audio feature APIs aligned with Kaldi and Librosa. #1518 - Unittest and benchmark for audio feature APIs. #1548
- [Audio] - [audio] refactor audio arch #1494 by @zh794390558
- [Audio] - [audio] dtw metric #1493 by @zh794390558
- [Audio] - [audio] fix complicance bug #1597 by @zh794390558
Deployment
- [Deployment] - [speechx] high performance inference of speech task #1496 by @SmileGoat @zh794390558
- [Deployment] - [Speechx]fix normalizer bug #1600 #1621 #1619 #1633 #1635 #1619 by @SmileGoat
- [Deployment] - [speechx] refactor speechx #1631 #1616 #1576 #1572 #1541 by @zh794390558
- [Deployment] - [speechx] simplify cmake compiler #1538 #1536 #1535 by @zh794390558
server
- [server] - [websocket] added online asr engine #1627 by @WilliamZhang06
- [server] - [server] added engine type and asr inference #1475 by @WilliamZhang06
- [server] - [Server] added asr engine #1413 by @WilliamZhang06
- [server] - [Server] added engine factory and config #1399 by @WilliamZhang06
- [server] - [server] added engine framework #1383 by @WilliamZhang06
- [server] - [server] update readme #1604 by @lym0302
- [server] - [server] add server cls #1554 by @lym0302
- [server] - [server] add paddlespeech_server stats #1510 by @lym0302
- [server] - [server] add cli #1466 by @lym0302
- [server] - [server] add tts postprocess #1411 by @lym0302
- [server] - [server] tts server #1386 by @lym0302
vector
CLI
- Batch input supported. #1460
- TTS: Add WaveRNN for CSMSC dataset.
- TTS: Add HiFiGAN for LJSpeech / AISHELL-3 / VCTK datasets.
- Vector: add speaker verification demo and doc #1605 by @honei
Demo
- [Demo] - [vec][search] update client image url #1628 by @qingen
- [Demo] - [server] add server demo #1480 by @lym0302
- [Demo] - [vec][search] add audio similarity search #1609 by @qingen
Acknowledgements
Special thanks to @WilliamZhang06 @yt605155624 @windstamp @Jackwaterveg @honei @SmileGoat @KPatr1ck @zh794390558 @lym0302 @qingen