README.md

# Han

Utils for processing chinese.

This module provides three core functionalities related to chinese:

1. translate: translate between tranditional chinese to simplified chinese based on [Wikipedia's conversion data]("https://raw.githubusercontent.com/wikimedia/mediawiki/master/languages/data/ZhConversion.php").
2. pinyin: translate chinese words to pinyin. It is based on the data from [janx/ruby-pinyin](https://github.com/janx/ruby-pinyin).
3. slugify: slugify chinese words.

## Installation

First, add Han to your `mix.exs` dependencies:

```elixir
def deps do
  [{:han, "~> 0.3.0"}]
end
```

Then, run `$ mix deps.get` to get the dependencies.

> This module will compile over 133, 000 functions by default (compile all the 2-char phrases and 1-char chanracters). Due to this, compilation time is around 30 minutes. So be patient! You can set environment variable `MAX_WORD_LEN` to tune the compilation:

```bash
# This will compile around 40, 000 functions
$ MAX_WORD_LEN=1 mix compile
```

## Update database

This module has a built-in mix task - update database:

```bash
$ mix han.update_database
```

The downloaded file will be placed into `priv/`.

## Usage

Han is very easy to use, as follows:

### Translate

```sh
iex> Han.translate("中国")
"中國"

iex> Han.translate("中国", :simplified)
"中國"

iex> Han.translate("中國", :traditional)
"中国"
```

### Pinyin

```sh
iex> Han.pinyin("中国")
"zhōng guó"

iex> Han.pinyin("中国", :simplified)
"zhōng guó"

iex> Han.pinyin("中國", :traditional)
"zhōng guó"
```

### Slugify

```sh
iex> Han.slugify("中国")
"zhong-guo"

iex> Han.slugify("中國", :traditional)
"zhong-guo"

iex> Han.slugify(" *& 46 848 中 ----- 国")
"46-848-zhong-guo"

iex> Han.slugify("关于 Elixir 的 HTML5 页面")
"guan-yu-elixir-de-html5-ye-mian"
```

## Performance

```text
Operating System: macOS
CPU Information: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Number of Available Cores: 12
Available memory: 16 GB
Elixir 1.8.1
Erlang 21.3.3

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 56 s

Name                                                  ips        average  deviation         median         99th %
translate a simplified chinese character        3242.74 K        0.31 μs  ±8355.96%           0 μs           1 μs
translate a traditional chinese character       3062.79 K        0.33 μs ±11885.74%           0 μs           1 μs
pinyin a sentence in simplified chinese           82.29 K       12.15 μs    ±58.57%          12 μs          22 μs
translate a sentence in simplified chinese        77.82 K       12.85 μs    ±38.33%          12 μs          29 μs
translate a sentence in traditional chinese       77.69 K       12.87 μs    ±51.61%          13 μs          18 μs
pinyin a sentence in traditional chinese          36.19 K       27.63 μs    ±14.74%          27 μs          36 μs
slugify a sentence in simplified chinese           5.59 K      178.85 μs     ±9.50%         176 μs         256 μs
slugify a sentence in traditional chinese          5.09 K      196.59 μs     ±6.97%         193 μs         272 μs
```

## License

MIT