Highlight

@DrRyanHuang

Highlight

New Features

Add AudioTools toolkit support used in DAC (Descript-Audio-Codec) training and inference.
Reproduce the losses required for DAC model: MultiScaleSTFTLoss, GANLoss, and SISDRLoss.

Version Adaptation

Upgrade and adapt PaddleSpeech from Paddle 2.5.1 to Paddle 3.0.0-beta. Address incompatibility issues caused by the new version upgrade of Paddle, perform adaptation development and regression testing on the models in PaddleSpeech, and ensure the suite operates normally without loss of model functionality or accuracy.

Ensure the adaptation of 80+ existing models in the demo and example directories.
Ensure the adaptation and accuracy alignment of 10+ core models in the example directory.
Support the re-export of 20+ dynamic-to-static models using the PIR + predictor approach and ensure successful inference.

More Detail

New Features

Add AudioTools toolkit #3900 (@DrRyanHuang)
Add FFT convolution layer implementation #3947 (@DrRyanHuang)
Implement loss functions required for DAC training #3988 (@cchenhaifeng)
Add quantifiers and unit symbols support #3837 (@undefined-ux)
Add multiple PIR models #3956, #3982 (@zxcd)
Add chunk configuration for tal_cs #3936 (@zxcd)

Version Adaptation

Enhance NumPy compatibility #3907 (@GreatV)
Fix Whisper model support under Paddle 3.0 #3880 (@yinfan98)
Remove dependency on paddlepaddle-gpu #3898 (@Liyulingyue)
Support new inference interface #3927 (@zxcd)
Modify inference to be compatible with Paddle 3.0 #3963 (@megemini)
Fix cls static model infer error #3856 (@zxcd)
Remove parser.add_argument #3878 (@Liyulingyue)
Add strtobool implementation #3877 (@Liyulingyue)
Fix view to shape for wav2vec2 #3904 (@Liyulingyue)
Fix 0D tensor to 1D issue #3913 (@megemini)
Fix type promotion issues #3817, #3883, #3944, #3943 (@megemini @GreatV)
Fix shape error in layer normalization #3884 (@Liyulingyue)
Resolve scipy import error #3874 (@GreatV)
Fix vits type promotion and 0D #3920 (@Liyulingyue)
Fix fastspeech2 0d issue #3951 (@megemini)
Fix emb initialization #3962 (@megemini)
Replace view with reshape #3887, #3939 (@GreatV, @megemini)
Fix max between int and value #3903 (@megemini)
Fix duplicated argument #3934 (@megemini)
Fix asr5 test.sh script path error #3941 (@megemini)
Fix vctk spk_emb dimension issue #3916 (@megemini)
Fix type promotion for aishell3/vctk vc0/ernie #3928 (@Liyulingyue)
Use numpy for transpose #3933 (@megemini)
Fix shape issues in opencopop svs1 #3912 (@enkilee)
Fix deepspeech2online export issue #3935 (@Liyulingyue)
Adapt whisper list for paddle 3.0 #4018 (@Liyulingyue)

Installation Adaptation

Optimize Python version compatibility #3965, #3967, #3969, #3970, #3972 (@Liyulingyue)
Add hints for installing with -e option #3979 (@Liyulingyue)
Move audiotools requirements to setup.py #3999 (@zxcd)
Lower installation requirements #3985 by @Liyulingyue
Remove paddleaudio from PaddleSpeech #3986 by @zxcd
Update install_openblas.sh #3876 (@GreatV)
Update setup.py #3964, #3995 (@Liyulingyue)
Adapt for librosa #3989 (@Liyulingyue)
Lower installation requirements #3985 (@Liyulingyue)
Remove paddleaudio from PaddleSpeech #3986 (@zxcd)
Define PythonDetermine in setup.py #3975 (@Liyulingyue)
Add paddle3.0 beta1 cpu docker #4000 (@Liyulingyue)

Hardware Support

Add GCU Backend support #3875 (@wanx7130)
SpeedySpeech code adaptation for NPU #3804 (@warrentdrew)
SpeedySpeech code adaptation for MLU #3828 (@warrentdrew)

Docs

Add Squeezeformer information to README #3860 (@zxcd)
Add README documentation for TIMIT/ASR1 #3930 (@enkilee)
Fix multiple examples and demos #3830, #3872 (@zxcd @Liyulingyue)
Fix tess readme #3882 (@megemini)
Update README.md #3890 (@Liyulingyue)
Fix Example/tiny documentation errors #3892 (@Liyulingyue)
Update tal_cs readme #3911 (@megemini)
Fix librispeech asr readme #3917 (@megemini)
Fix CSMSC voc1 readme.md #3915 (@enkilee)
Fix s2t example errors #3950 (@megemini)
Fix led_en_zh st1 example #3955 (@GreatV)
Text frontend intended links #3958 (@guspan-tanadi)
Update Tiny README.md #3896 (@Liyulingyue)
Fix acs demo #3826 (@zxcd)
Fix g2p run.sh #3886 (@megemini)
Fix asr4 test_wav redundant arguments #3940 (@megemini)
Add synthesize_e2e.sh for csmsc/voc1, fix run.sh #3945 (@enkilee)
Add synthesize_e2e.sh for csmsc/voc5, fix run.sh #3959 (@enkilee)
Fix CSMSC Voc5/Jets/TTS2 #3906 (@Liyulingyue)
Update utility script paths #3942 (@GreatV)
Remove non-existent folders and add existing folders #3881 (@Liyulingyue)
Fix file name #3895 (@zxcd)
Fix missing ' #3869 (@Liyulingyue)
Fix typos #3980, #3981, #3984, #4021, #4024 (@co63oc) #4011 (@rich04lin)
Fix csmsc/voc3 script #3960 (@enkilee)
Fix runtime doc to suit code #4057, #4042, #4045, #4050, #4051, #4037, #4043, #4044, #4049, #4047, #4038, #4013 (@Echo-Nie) #4068 (@zxcd)
Fix g2p model link #4040 (@zxcd)

Bug Fix

Fix streaming TTS server issues #3865 (@SuiYunsy)
Fix matplotlib version incompatibility #3841 (@zxcd)
Fix pydantic dependency issues #3715 (@Netrvin)
Fix audiotools file path #3968 (@zxcd)
Add missing keywords for aishell3/vits-vc #3932 (@yinfan98)
Fix data traversal error caused by empty folders without *.npy files #3948 (@megemini)
Fix package dependency issues in opencopop svs1 #3889 (@enkilee)
Separate paddle.logsumexp #3897 (@zxcd)
Fix audiotools model save and load #3994 (@zxcd)
Fix TimeDomainSpecAugment import error #3919 (@megemini)
Fix print_arguments import error #3918 (@megemini)
Fix panns predict.py for pir json model path #3914 (@megemini)
Complete missing parameters in synthesis series scripts #3998 (@enkilee)
Fix tests/unit/tts/test_pwg.py #3974 (@co63oc)
Fix sinc api accuracy issue #4061 (@zxcd)

CI

Add server CI #3857 by @tianshuo78520a
Add unit tests #3835, #3836 (@zxcd, @tianshuo78520a)
Close test_expand.py #3971 (@co63oc)
Close test_snapshot.py #3976 (@co63oc)

Acknowledgements

Special thanks to contributors including @wanx7130, @warrentdrew, @DrRyanHuang, @cchenhaifeng, @undefined-ux, @zxcd, @GreatV, @yinfan98, @Liyulingyue, @megemini, @SuiYunsy, @Netrvin, @enkilee, @tianshuo78520a, @guspan-tanadi, @co63oc, @Echo-Nie and others for their support.

New Contributors

@wanx7130 made their first contribution in #3875
@cchenhaifeng made their first contribution in #3988
@undefined-ux made their first contribution in #3837
@DrRyanHuang made their first contribution in #3900
@SuiYunsy made their first contribution in #3865
@Netrvin made their first contribution in #3715
@guspan-tanadi made their first contribution in #3958
@enkilee made their first contribution in #3889
@co63oc made their first contribution in #3971
@Echo-Nie made their first contribution in #4057

@jiamingkong

S2T

Add WavLM ASR-en, WavLM fine-tuning for ASR on LibriSpeech. #3242 by @jiamingkong
Add HuBERT ASR-en, HuBERT fine-tuning for ASR on LibriSpeech. #3088 by @Zth9730
Add Squeezeformer model. #2755 by @yeyupiaoling
Add AMP for U2 conformer. #3167 by @zxcd
Mv dataset into paddlespeech.dataset. #3183 #3189 by @zh794390558
Fix example/aishell local/train.sh if condition bug. #3146 by @lemondy
Fix cli args to config. #3194 by @zh794390558
Fix scaler save, load, unscale_ blow, grad_clip. by @zxcd

T2S

Add SVS(Singing Voice Synthesis) examples with Opencpop dataset, including DiffSinger、PWGAN (#3031 by @lym0302) and HiFiGAN (#3038 by @lym0302), the effect is continuously optimized.
Add SVS frontend. #3062 by @lym0302
Add TTS iSTFTNet (#3006 by @longRookie), TTS JETS (#3109 by @ljhzxc)
Starganv2: by @yt605155624
- Clean starganv2 vc model code and add docstring. #2987
- Add starganv2 vc trainer. #3143 #3182
- Add StarGANv2VC preprocess. #3163
- Fix losses of StarGAN v2 VC. #3184
Support for LITE: by @yt605155624
- Fix elementwise_floordiv's fill_constant. #3075
- Fix VITS lite infer. #3098
- Fix vits reduce_sum's input/output dtype. #3028
- Fix dtype diff of last expand_v2 op of VITS. #3041
- Fix input dtype of elementwise_mul op from bool to int64. #3054
Add XPU support for SpeedySpeech and FastSpeech2. #3502 #3514 by @USTCKAY
Fix some preprocess bugs. #3155 by @yt605155624
Fix bug of merge_yi function. #3786 by @mattheliu

Server

Add code-switch conformer_talcs support. #3230 by @Gsonovb
Add subtitle file (.srt format) generation example. #3123 by @twoDogy
Fix: add file read encoding. #3606 by @Coloryr
Install & Benchmark
Update paddle2onnx to newest install version. #3084 by @yt605155624
Update to py3.8, fix librosa==0.8.1 numpy==1.23.5 for paddleaudio. by @zh794390558
Fix transformation import error. #3779 by @kk-2000
Adapt view behavior change, fix KeyError. #3794 by @zxcd
Fix profiler, fix gpu_mem unit, add max_mem_reserved for benchmark. #3323 #3634 #3604 by @mmglove

Docs

Fix some typos. #3178 by @Yulv-git
Update svs_music_score.md. #3085 #3070 by @lym0302
Update quick_start.md. #3175 #3176 by @46319943
Add cli test readme. #3784 by @zxcd
Update bug-report-tts.md. #3120 by @yt605155624

Others

Fix 0-d tensor, with the upgrade of paddlepaddle==2.5, the problem of modifying 0-d tensor has been solved. #3214 by zxcd #3334 by @zh794390558
Add dtype param for arange API. #3302 by @zxcd
Fix develop bug function:view to reshape. #3633 by @luyao-cv
Fix progress bar unit. #3177 by @46319943
Rm unused dep. #3097 by @lym0302

Acknowledgements

Special thanks to @jiamingkong @Zth9730 @yeyupiaoling @zxcd @zh794390558 @lemondy @lym0302 @longRookie @ljhzxc @yt605155624 @USTCKAY @mattheliu @Gsonovb @twoDogy @Coloryr @kk-2000 @mmglove @Yulv-git @46319943 @luyao-cv

New Contributors

@jiamingkong made their first contribution in #3242
@yeyupiaoling made their first contribution in #2755
@lemondy made their first contribution in #3146
@longRookie made their first contribution in #3006
@ljhzxc made their first contribution in #3109
@USTCKAY made their first contribution in #3502
@mattheliu made their first contribution in #3786
@Gsonovb made their first contribution in #3230
@twoDogy made their first contribution in #3123
@Coloryr made their first contribution in #3606
@kk-2000 made their first contribution in #3779
@Yulv-git made their first contribution in #3178
@46319943 made their first contribution in #3175
@luyao-cv made their first contribution in #3633

Full Changelog: r1.4.1...r1.4.2

@yt605155624

Others

fix typeguard version. #3056 @yt605155624

@zxcd

S2T

Add wav2vec2-zh finetune pipeline. #3012 #2916 by @zxcd
Fix some bugs in Whisper. #2900 #2825 by @zxcd
Add code-switch asr tal_cs recipe. #2816 #2796 by @zxcd

T2S

Add dygraph to static、PaddleInference、Paddle2ONNX and ONNXRuntime Infer for Cantonese TTS. #2990 by @JiehangXie
Add Cantonese test examples. #2937 by @JiehangXie
Add VITS inference pipeline. #3002 #2972 #2883 by @yt605155624
Rearrange encoder_infer param's order. #2983 by @443127316
Add male speaker and Chinese-English mix ONNXRuntime infer in CLI. #2945 by @lym0302
Add Cantonese TTS example. #2950 #2927 #2924 #2907 #2899 by @WongLaw
Fix PWGAN TIPC. #2882 by @yt605155624
Add a case in not_erhua. #2863 by @QuanZ9
Fix data prepare for PaddleSlim PTQ of TTS. #2862 by @yt605155624
Avoid using variable "attn_loss" before assignment. #2860 by @hopingZ
add soft link for shell in example, Add skip_copy_wave in norm stage of GANVocoders to save disk. #2851 by @yt605155624
Optimize the training of VITS. #2843 #2809 #2791 #2770 by @WongLaw
Add StarGANv2-VC model scripts and synthsize scripts. #2842 by @yt605155624
Add diffusion module for training diffsinger. #2868 #2832 by @HighCWu
Fix some Text Frontend bugs. #2831 by @yt605155624
For mixed Chinese and English speech synthesis, add SSML support for Chinese. #2830 by @jindongyi011039
Add mkldnn and trt config for TTS Inference. #2748 by @yt605155624
Fix dygraph to static for tacotron2. #2426 by @yt605155624

Server

Add static infer for multi-spk tts. #2779 by @lym0302

Engine

Add wfst decoder. #2886 by @SmileGoat
Add batch recognizer decode. #2866 by @SmileGoat
Add nnet prob cache && make 2 thread decode work. #2769 by @SmileGoat
Engine directory refactor. #2746 by @SmileGoat
Fix openfst download error. #2742 by @SmileGoat

Audio

Replace kaldi fbank with kaldi-native-fbank in paddleaudio. #2799 by @SmileGoat
Fix load paddleaudio fail. #2815 by @SmileGoat
Update paddleaudio readme. #2801 by @SmileGoat

Demos

Add TTS ARM Linux C++ Demo. #2991 by @SwimmingTiger
Add Cantonese TTS in CLI. #2977 by @WongLaw
Add ONNXRuntime infer for Cantonese TTS in CLI. #2990 by @JiehangXie

Docs

Add u2pp_wenetspeech_static_quant to released_model.md. #2973 @zxcd
Remove redundant dependencies and Fix some bugs in setup.py. #2970 #2871 #2867 #2853 #2771 #2767 #2764 by @yt605155624

Others

Remove fluid API in ASR. #2944 #2859 #2852 by @zxcd
Add python simple adadelta optimizer. #2925 by @zxcd
Add encoding=utf-8 for text. #2896 by @zxcd #2865 by @yt605155624
Fix Tensor.numpy()[0] to float(Tensor) to adapt 0D. #2884 by @zhouwei25
Fix libsndfile.so not found in ubuntu18-cpu/Dockerfile. #2763 by @linkec
Fix AttributeError "module 'distutils' has no attribute 'ccompiler'" in setup.py in ctc_decoders. #2745 by @GreatV

New Contributors

@GreatV made their first contribution in #2745
@linkec made their first contribution in #2763
@cxumol made their first contribution in #2828
@jindongyi011039 made their first contribution in #2830
@QuanZ9 made their first contribution in #2863
@hopingZ made their first contribution in #2860
@zhouwei25 made their first contribution in #2884
@EscaticZheng made their first contribution in #2915
@chinobing made their first contribution in #2922
@lance6716 made their first contribution in #2924
@443127316 made their first contribution in #2983

Full Changelog: r1.3.0...r1.4.0

@zh794390558

HighLIght

S2T

Support U2/U2++ Conformer dy2static, and U2/U2++ C++ High Performance Streaming ASR Deployment. @zh794390558
Add Wav2vec2ASR-en, wav2vec2.0 fine-tuning for ASR on LibriSpeech. @Zth9730
Add Whisper CLI and Demos, support multi language recognition and translation. @zxcd
Add Wav2vec2 CLI and Demos, support ASR and Feature Extraction. @Zth9730
Add whisper. #2640 #2704 by @zxcd
Fix gpu training hang. #2478 by @Zth9730
Support u2++ based cli and server. #2489 #2510 by @Zth9730
Add wav2vec2-en. #2518 #2527 #2637 by @Zth9730
Add wav2vec2-zh cli. #2697 by @Zth9730

T2S

Add seek for BytesIO. #2484 by @ZapBird
Add mix finetune. #2525 #2647 by @lym0302
Add streaming TTS fastdeploy serving. #2528 by @HexToString
Add SSML for Chinese Text Frontend. #2531 by @david-95
Add end-to-end Prosody Prediction pipeline (including using prosody labels in Acoustic Model). #2548 #2615 #2693 by @WongLaw
Add Adversarial Loss for Chinese English mixed TTS. #2588 by @lym0302
Fix frontend bugs. #2539 #2606 by @yt605155624
Add TN for English unit. #2629 by @WongLaw
Add male voice for TTS. #2660 by @lym0302
Add double byte char for zh normalization. #2661 by @david-95
Add TTS Paddle-Lite x86 inference. #2636 #2667 by @yt605155624
Add greek char and fix #2571. #2683 by @david-95
Add Slim for TTS. #2729 by @yt605155624

Audio

Move paddlespeech/audio to paddleaudio. #2706 by @SmileGoat

Demo

Add TTSAndroid demo. #2703 by @yt605155624

New Contributors

@ZapBird made their first contribution in #2484
@HexToString made their first contribution in #2528
@dahu1 made their first contribution in #2554
@kFoodie made their first contribution in #2664
@zxcd made their first contribution in #2640
@michael-skynorth made their first contribution in #2666
@heyudage made their first contribution in #2688

Full Changelog: r1.2.0...r1.3.0

@Zth9730

S2T

Fix conformer/transformer multi GPU training. #2327 #2334 #2336 #2372 by @Zth9730
Fix deepspeech2 decode_wav. #2351 by @Zth9730
Support BiTransformer decoder. #2415 by @Zth9730

T2S

Update VITS to support VITS and its voice cloning training on AISHELL-3. #2268 by @HighCWu
Add ERNIE-SAT synthesize_e2e. #2287 #2316 #2355 #2378 #2432 by @yt605155624
Specify the input data type of G2PW. #2288 by @kslz
Add TTS finetune example. #2297 #2385 #2418 #2430 by @lym0302
Fix Chinese English mixed TTS frontend. #2299 #2493 by @lym0302
Add words into polyphonic.yaml for g2pW. #2300 by @david-95
Update the quantifier unit in Text Normalization. #2308 by @pengzhendong
Fix Chinese frontend bugs. #2312 #2323 by @david-95
Add AISHELL-3 Voice Cloning with ECAPA-TDNN speaker encoder. #2359 #2429 by @yt605155624
Add pre-install doc for G2P and TN, update version of pypinyin. #2364 by @WongLaw
Add tools to compare two test results of G2P to show differences. #2367 by @david-95
Revise must_neural_tone_words. #2370 by @WongLaw
Add type-hint for g2pW. #2390 by @yt605155624
Replaced fixed path with path variable in MFA. #2416 by @WongLaw
Solve "unknown format: 3" for wavfile.write(). #2422 by @zhoupc2015

Text

Create preprocess.py for Punctuation Restoration. #2295 by @THUzyt21

Demo

Add Voice Cloning, TTS finetune, and ERNIE-SAT in speech_web. #2412 #2451 by @iftaken

Server

Add num_decoding_left_chunks in streaming_asr_server's config. #2337 by @THUzyt21
Removed useless spk_id in speech_server and streaming_tts_server, support Chinese English mixed TTS server engine. #2380 by @WongLaw

Doc

Add Chinese doc and language switcher for metaverse, style_fs2 and story_talker. #2357 by @WongLaw
Update API docs. #2406 by @yt605155624
Add finetune demos in readthedocs. #2411 by @yt605155624

Test

Add barrier for distributed training using multiple machines. #2309 #2311 by @sneaxiy
Fix prepare.sh for PWGAN TIPC. #2376 by @yuehuayingxueluo

Other

Format paddlespeech with pre-commit. #2331 by @yt605155624

Acknowledgements

Special thanks to @yt605155624 @lym0302 @THUzyt21 @iftaken @Zth9730 @zhoupc2015 @WongLaw @david-95 @pengzhendong @kslz @HighCWu @yuehuayingxueluo @sneaxiy @SmileGoat

New Contributors

@HighCWu made their first contribution in #2268
@pengzhendong made their first contribution in #2308
@Zth9730 made their first contribution in #2327
@WongLaw made their first contribution in #2357
@yuehuayingxueluo made their first contribution in #2376
@zhoupc2015 made their first contribution in #2422

Full Changelog: r1.1.0...r1.2.0

@jerryuhoo

S2T

Add wer tools. #1709
Add optimize attention cache used for attention ; 0-dim tensor for model export. #2124
Fix cnn cache dy2st shape. #2168

TTS

Fix random speaker embedding bug in voice clone. #1828 by @jerryuhoo
Add VITS model. #1855 #1957 #2040
Add kunlun support for speedyspeech. #1879 by @QingshuChen
Normalize wav max value to 1 in preprocess. #1887 by @jerryuhoo
Remove fluid dependence in TTS. #1940
Add onnx models for aishell3/ljspeech/vctk's tts3/voc1/voc5. #2068
Add TTS static/onnx models in pretrained_models.py. #2074
Add Ernie SAT model. #2052 #2117
Add Chinese English mixed TTS frontend. #2143
Add Chinese English mixed TTS example. #2234
Fix English text frontend bug. #2235 by @david-95
Add g2pW to Chinese frontend. #2230 by @BarryKCL
Fix text frontend bugs. #1912 #2250 #2254 #2255 #2272

Speechx

add custom asr script. #1946
refactor frontend. #2003
deepspeech2 to onnx #2034
Refactor audio/data/feature cache. #1638
Frontend refactor . #1640
Fix nnet itf header. #1641
Refactor speech egs. #1707
Refactor egs and more egs for TLG wfst graph build. #1715
Speedup ngram building . #1729
Update speechx install doc. #1736
Fix nnet input and output name. #1740
Update wfst graph. #1742
Fix model params path name. #1750
Remove fluid tools for onnx export. #2116

Audio

Refactor paddleaudio to paddlespeech.audio. #2007
Add webdataset in paddlespeech.audio. #2062

Server

Remove extra logs. #2111 #2113
Change streaming tts servers' fs from 24k to models' fs. #2121
Fix bug in engine_warmup. #2171 by @Betterman-qs
Replace default vocoder in seerver to mb_melgan. #2214
Fix bug in streaming_asr_server with punctuation restoration. #2244
Rename time_s and time_ns to time_b and time_nb. #2133
More accuracy decoding somthing. #2128

CLI

Add paddlespeech.resource module. #1917
Dynamic cli commands registration. #1959
Fix unnecessary download. #2103
Remove extra logs. #2084 #2085 #2107
Add Chinese English mixed TTS CLI. #2249
Add onnxruntime infer for CLI. #2222

Demo

Add speech web demo. #2039 #2080
Add kws cli and demo. #2063
Use paddle web for streaming asr. #2105
add custom asr script #1946
More cli for speech demos. #2138

Doc

Add API doc. #2075
Format tts doc string for read the docs. #2115

Others

Fix CPU Dockerfile. #2172 by @BrightXiaoHan
Add PaddleSpeech Dockerfile for hard mode of installation. #2127 by @buchongyu2

Acknowledgements

Special thanks to @buchongyu2 @BrightXiaoHan @BarryKCL @Betterman-qs @david-95 @jerryuhoo @QingshuChen @iftaken @zh794390558 @Jackwaterveg @lym0302 @SmileGoat @yt605155624

New Contributors

@QingshuChen made their first contribution in #1879
@Zhangjingyu06 made their first contribution in #1951
@ryanrussell made their first contribution in #1976
@freeliuzc made their first contribution in #2044
@vpegasus made their first contribution in #2043
@dependabot made their first contribution in #2061
@raycool made their first contribution in #2109
@YDX-2147483647 made their first contribution in #2125
@chenkui164 made their first contribution in #2130
@0x45f made their first contribution in #2162
@Doubledongli made their first contribution in #2167
@Betterman-qs made their first contribution in #2171
@BrightXiaoHan made their first contribution in #2172
@THUzyt21 made their first contribution in #2202
@david-95 made their first contribution in #2235
@BarryKCL made their first contribution in #2230

Full Changelog: r1.0.0...r1.1.0

@KPatr1ck

Highlight

Release PP-ASR: Streaming ASR with timestamp and punctuation restoration, uses WenetSpeech Streaming Conformer and DeepSpeech2 ASR model.
Release PP-TTS: Streaming TTS system for industrial application.
Release PP-VPR: Industrial Voiceprint Recognition system and ECAPA-TDNN model.
Custom ASR apply reimbursement for transportation
Support MDTC KWS model

More

ASR

DeepSpeech2 streaming model aishell cer 6.66%
DeepSpeech2 streaming model wenetspeech cer: 15.2% (test_net, w/o LM), 24.17% (test_meeting, w/o LM), 5.3% (aishell, w/ LM)
Conformer aishell cer 4.64%
Conformer streaming model aishell cer 5.44%
Conformer streaming model wenetspeech cer: 11.0% (test_net), 18.79% (test_meeting)

Speechx

[SpeechX] DeepSpeech2 streaming with WFST in streaming asr example
[SpeechX] Add websocket websocket example
[SpeechX] custom asr, apply reimbursement for transportation demo

KWS

[KWS] Add kws example on HeySnips dataset. by @KPatr1ck in #1558
[KWS] Update KWS example. by @KPatr1ck in #1783

Audio

[Audio] rename paddleaudio to audio, since confilict with pkg name by @zh794390558 in #1758
[Audio] Fix mcd issue. by @KPatr1ck in #1658
[Audio] Remove mcd. by @KPatr1ck in #1659
[Audio] Add VoxCeleb dataset for speaker recognition.
[Audio] Add HeySnips dataset for keyword spotting.

What's Changed

[R1.0][asr][server]add vector server by @honei in #1845
[R1.0][asr][server]join streaming asr and punc server by @honei in #1846
[R1.0]asr streaming server add time stamp by @honei in #1850
[R1.0][tts][server] update readme by @lym0302 in #1852
[R1.0] update cli by @Jackwaterveg in #1854
[r1.0] update version to r1.0.0 by @zh794390558 in #1857
[R1.0] Add doc for wenetspeech model (ds2 online, conformer online) by @Jackwaterveg in #1862
[R1.0][server] improve server code by @lym0302 in #1866
[R1.0][asr][server]update the streaming asr readme by @honei in #1871
[R1.0] Updata released model info ( Wenetspeech ds2 online, conformer online) by @Jackwaterveg in #1869
[R1.0]fix server doc and decode_method by @Jackwaterveg in #1889
[speechx] add custom_streaming_asr @SmileGoat #1891
[speechx] speedup ngram building @zh794390558 #1729
[speechx] refactor egs and more egs for TLG wfst graph build @zh794390558 #1715
[speechx]add aishell test script & json parser & no db norm linear feature & json2kaldi type cmvn @SmileGoat #1676
[speechx] Add websocket & make it work @SmileGoat #1720
[speechx] Frontend refactor @SmileGoat #1640
[Speechx] add tlg decoder @SmileGoat #1599

Full Changelog: r1.0.0a...r1.0.0

@zh794390558

Highlight

Release Streaming ASR and Streaming TTS system for industrial application.
Support KWS model
Deepspeech2 streaming model aishell cer 6.66%
Conformer aishell cer 4.64%
Conformer streaming model aishell cer 5.44%
SpeechX Deepspeech2 streaming with WFST

What's Changed

[speechx] refactor audio/data/feature cache by @zh794390558 in #1638
[speechx] Frontend refactor by @zh794390558 in #1640
[speechx] fix nnet itf header by @zh794390558 in #1641
[TTS]add license and reference for some models by @yt605155624 in #1642
[Doc] supplement note by @Jackwaterveg in #1643
[vec][search] update search demo README by @qingen in #1644
[speechx]refactor linear feature:unify vector & remove redundant function & add remained_wav cache shift wav by @SmileGoat in #1649
[Audio] Fix mcd issue. by @KPatr1ck in #1658
[Audio] Remove mcd. by @KPatr1ck in #1659
[vec]update the speaker verification model by @honei in #1663
[ASR] update ds2 online model by @Jackwaterveg in #1668
[TTS]fix preprocess bug, test=tts by @yt605155624 in #1660
update README, test=doc by @iftaken in #1672
[Punc] Update RESULTS.md. by @KPatr1ck in #1675
[CLI] update ds2 online model in cli by @Jackwaterveg in #1674
[CLI] ASR: Add duration limitation for asr by @Jackwaterveg in #1666
[vec]add speaker verification score method by @honei in #1646
[TTS]add onnx inference for fastspeech2 + hifigan/mb_melgan by @yt605155624 in #1665
[doc]update readme by @yt605155624 in #1680
[WebSocket] fixed online model md5 error , test=doc by @WilliamZhang06 in #1682
[speechx]add aishell test script & json parser & no db norm linear feature & json2kaldi type cmvn by @SmileGoat in #1676
[server] add stream tts server by @lym0302 in #1652
[speechx]remove mutable in audio_cache by @SmileGoat in #1687
[Doc] update readem for aishell/asr0 by @Jackwaterveg in #1677
[vec] add speaker diarization pipeline by @ccrrong in #1651
[vec]voxceleb convert dataset format to paddlespeech by @honei in #1630
[Speechx] add tlg decoder by @SmileGoat in #1599
[vec]add vector necessary note, test=doc by @honei in #1690
Revert "[WebSocket] fixed online model md5 error , test=doc" by @zh794390558 in #1691
[WebSocket] added online web client, test=doc by @WilliamZhang06 in #1692
修复 example/aishell 目录中speech单词拼写错误问题 by @buchongyu2 in #1694
修改hack 单词拼写错误 by @buchongyu2 in #1697
[TTS]change NLC to NCL in speedyspeech, test=tts by @yt605155624 in #1693
[doc]fix typo, test=doc by @yt605155624 in #1698
[doc]add pwgan onnx model, test=doc by @yt605155624 in #1700
[WebSocket] added online asr doc and online asr command line, test=doc by @WilliamZhang06 in #1701
[vec][server] vpr demo support by @qingen in #1696
[speechx] refactor speech egs by @zh794390558 in #1707
[asr]add wer tools by @zh794390558 in #1709
[asr][websocket]fix the ws send bug, cache buffer, text=doc by @honei in #1710
[TTS]add fastspeech2 cnndecoder onnx model by @yt605155624 in #1712
[speechx] refactor egs and more egs for TLG wfst graph build by @zh794390558 in #1715
[vec][score] add plda model by @qingen in #1681
[CLI]update cli, test=doc by @yt605155624 in #1716
[server] add streaming am infer by @lym0302 in #1713
[speechx] Add websocket & make it work by @SmileGoat in #1720
[asr][websocket] add asr conformer websocket server by @honei in #1704
[vec][loss] add NCE Loss from RNNLM by @qingen in #1719
[vec][loss] add FocalLoss to deal with class imbalances by @qingen in #1722
[TTS]restructure syn_utils.py, test=tts by @yt605155624 in #1723
[TTS]add paddle device set for ort and inference by @yt605155624 in #1727
[vec] add GRL to domain adaptation by @qingen in #1725
[speechx] speedup ngram building by @zh794390558 in #1729
[asr] Add new cer tools by @Jackwaterveg in #1673
[speechx]add websocket lib by @SmileGoat in #1732
[speechx]update speechx install doc by @zh794390558 in #1736
[Doc] prefect the packing scripts by @Jackwaterveg in #1735
[Doc]renew the released mode by @Jackwaterveg in #1739
[asr][websocket]add streaming asr demo by @honei in #1737
[speechx] fix nnet input and output name by @zh794390558 in #1740
[ASR] remove redundant log by @Jackwaterveg in #1741
[speechx] update wfst graph by @zh794390558 in #1742
[speechx] Add recognizer_test_main script by @SmileGoat in #1743
[vec][doc]update the voxceleb readme.md, test=doc by @honei in #1744
[ASR] fix CER tools by @Jackwaterveg in #1747
[Doc] Fix release_model info by @Jackwaterveg in #1746
[Doc] Updata released model info by @Jackwaterveg in #1748
Updata released model info by @Jackwaterveg in #1749
[speechx] fix model params path name by @zh794390558 in #1750
[speechx] fix linear-spectrogram-wo-db-norm-ol read feature issue by @SmileGoat in #1751
[TTS]fix wavernn white noise bug for paddle develop(2.3) by @yt605155624 in #1752
[server] add onnx tts engine by @lym0302 in #1733
[TTS]Update paddle2onnx by @yt605155624 in #1754
[Setup] to r1.0.0a by @Jackwaterveg in #1759
[audio] rename paddleaudio to audio, since confilict with pkg name by @zh794390558 in #1758
[speechx] to_float32, fix shell script by @zh794390558 in #1757
[vec] bug fix to adapt VUE by @qingen in #1760
[asr][weboscket]fix the streaming asr server bug, server client by @honei in #1761
[speechx] fbank and mfcc by @zh794390558 in #1765
format code by @zh794390558 in #1764
[CLI] Add conformer_aishell, conformer_online_aishell by @Jackwaterveg in #1767
[speechx]make cmvn global in run.sh by @SmileGoat in #1768
[ASR] ds2: add log_interval and fix lr problem when resume training by @Jackwaterveg in #1766
[speechx] set nnet param by flags by @zh794390558 in #1769
[server] add streaming tts demos by @lym0302...

@jerryuhoo

S2T

Replace kaidi_fbank with paddleaudio #1612
Support CTC decoder online #821 #1626
Improve accuracy of Conformer. Support using kaiming Uniform as default initialization. #1577

TTS

Add SpeedySpeech multi-speaker support for synthesize_e2e.py. #1370 by @jerryuhoo
Add WaveRNN for CSMSC dataset. #1379
Add Tacotron2 for CSMSC / LJSpeech datasets. #1314 / #1416
Add GE2E Tacotron2 Voice Cloning for AISHELL3 dataset. #1419
Update text frontend. #1506
Add HiFiGAN for LJSpeech / AISHELL-3 / VCTK datasets. #1549 / #1581 / #1587
Add NPU support for TransformerTTS. #1593 by @windstamp
Add CNN Decoder for Streaming Fastspeech2. #1634

Audio

Add paddleaudio.compliance modules that offers audio feature APIs aligned with Kaldi and Librosa. #1518
Unittest and benchmark for audio feature APIs. #1548
[Audio] - [audio] refactor audio arch #1494 by @zh794390558
[Audio] - [audio] dtw metric #1493 by @zh794390558
[Audio] - [audio] fix complicance bug #1597 by @zh794390558

Deployment

[Deployment] - [speechx] high performance inference of speech task #1496 by @SmileGoat @zh794390558
[Deployment] - [Speechx]fix normalizer bug #1600 #1621 #1619 #1633 #1635 #1619 by @SmileGoat
[Deployment] - [speechx] refactor speechx #1631 #1616 #1576 #1572 #1541 by @zh794390558
[Deployment] - [speechx] simplify cmake compiler #1538 #1536 #1535 by @zh794390558

server

[server] - [websocket] added online asr engine #1627 by @WilliamZhang06
[server] - [server] added engine type and asr inference #1475 by @WilliamZhang06
[server] - [Server] added asr engine #1413 by @WilliamZhang06
[server] - [Server] added engine factory and config #1399 by @WilliamZhang06
[server] - [server] added engine framework #1383 by @WilliamZhang06
[server] - [server] update readme #1604 by @lym0302
[server] - [server] add server cls #1554 by @lym0302
[server] - [server] add paddlespeech_server stats #1510 by @lym0302
[server] - [server] add cli #1466 by @lym0302
[server] - [server] add tts postprocess #1411 by @lym0302
[server] - [server] tts server #1386 by @lym0302

vector

[vector] - [vector] ecapa-tdnn on voxceleb #1523 by @honei

CLI

Batch input supported. #1460
TTS: Add WaveRNN for CSMSC dataset.
TTS: Add HiFiGAN for LJSpeech / AISHELL-3 / VCTK datasets.
Vector: add speaker verification demo and doc #1605 by @honei

Demo

[Demo] - [vec][search] update client image url #1628 by @qingen
[Demo] - [server] add server demo #1480 by @lym0302
[Demo] - [vec][search] add audio similarity search #1609 by @qingen

Acknowledgements

Special thanks to @WilliamZhang06 @yt605155624 @windstamp @Jackwaterveg @honei @SmileGoat @KPatr1ck @zh794390558 @lym0302 @qingen

Releases: PaddlePaddle/PaddleSpeech

PaddleSpeech r1.5.0

Highlight

New Features

Version Adaptation

More Detail

New Features

Version Adaptation

Installation Adaptation

Hardware Support

Docs

Bug Fix

CI

Acknowledgements

New Contributors

Contributors

Uh oh!

PaddleSpeech r1.4.2

S2T

T2S

Server

Docs

Others

Acknowledgements

New Contributors

Contributors

Uh oh!

PaddleSpeech r1.4.1

Others

Contributors

Uh oh!

PaddleSpeech r1.4.0

S2T

T2S

Server

Engine

Audio

Demos

Docs

Others

New Contributors

Contributors

Uh oh!

PaddleSpeech r1.3.0

HighLIght

S2T

T2S

Audio

Demo

New Contributors

Contributors

Uh oh!

PaddleSpeech r1.2.0

S2T

T2S

Text

Demo

Server

Doc

Test

Other

Acknowledgements

New Contributors

Contributors

Uh oh!

PaddleSpeech r1.1.0

S2T

TTS

Speechx

Audio

Server

CLI

Demo

Doc

Others

Acknowledgements

New Contributors

Contributors

Uh oh!

PaddleSpeech r1.0.0