Skip to content

dalinvip/corpus_process_script

Folders and files

NameName
Last commit message
Last commit date

Latest commit

bc5b22e · Jan 22, 2019

History

22 Commits
Sep 7, 2018
May 2, 2018
May 2, 2018
Aug 29, 2018
Jan 22, 2019
Sep 12, 2018
May 9, 2018
May 9, 2018
Aug 11, 2018
May 10, 2018
May 2, 2018
May 2, 2018
Jan 22, 2019

Repository files navigation

Introduction

这里将会有中英文数据处理脚本,编程语言不限,会有详细的README说明。

Script Lists

  1. 中文繁体转简体
  2. 维基百科数据处理
  3. 抽取单字特征
  4. 抽取双字特征
  5. 抽取汉字笔画信息
  6. 去除非中文字符
  7. 中文Money转换数字Money
  8. 全半角转换
  9. python2代码批量转换python3
  10. NER标签转换(BIO, BMESO)

Question

  • if you have any question, you can open a issue or email bamtercelboo@{gmail.com, 163.com}.

  • if you have any good suggestions, you can PR or email me.

About

chinese and english corpus process script, python, c++, java

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published