又双叒叕——汉字的堆叠¶
In [ ]:
Copied!
from google.colab import drive
drive.mount('/content/drive')
work_dir = '/content/drive/MyDrive/Colab Notebooks/colab_data/叠字'
from google.colab import drive
drive.mount('/content/drive')
work_dir = '/content/drive/MyDrive/Colab Notebooks/colab_data/叠字'
Mounted at /content/drive
In [ ]:
Copied!
import pandas as pd
from functools import reduce
import pandas as pd
from functools import reduce
In [ ]:
Copied!
data=[pd.read_table(f'{work_dir}/{i}.txt') for i in range(2, 5)]
data=[pd.read_table(f'{work_dir}/{i}.txt') for i in range(2, 5)]
In [ ]:
Copied!
def merge_out(x, y):
return pd.merge(x, y, on='单', how='outer')
jointed=reduce(merge_out, data)
def merge_out(x, y):
return pd.merge(x, y, on='单', how='outer')
jointed=reduce(merge_out, data)
具有多种堆叠方式的字¶
具有二三四堆叠¶
In [ ]:
Copied!
jointed.dropna().head()
jointed.dropna().head()
Out[ ]:
| 单 | 双 | 三 | 四 | |
|---|---|---|---|---|
| 3 | 一 | 二 | 三 | 亖 |
| 20 | 人 | 仌 | 众 | 𠈌 |
| 21 | 人 | 仌 | 㐺 | 𠈌 |
| 22 | 人 | 从 | 众 | 𠈌 |
| 23 | 人 | 从 | 㐺 | 𠈌 |
具有二三四堆叠的单字¶
In [ ]:
Copied!
def single_word(table):
return ''.join(table.dropna()['单'].drop_duplicates().tolist())
def single_word(table):
return ''.join(table.dropna()['单'].drop_duplicates().tolist())
In [ ]:
Copied!
single_word(jointed)
single_word(jointed)
Out[ ]:
'一人厶又口土屮日木朿果水火牛田石老言車金風魚龍'
金木水火土¶
- 金 鍂 鑫 𨰻
- 木 林 森 𣛧 𣡽
- 土 圭 垚 㙓
- 水 沝 淼 㵘
- 火 炏 焱 燚
只具有二三堆叠的单字¶
In [ ]:
Copied!
S3=set(single_word(jointed[['单','双','三']]))
S4=set(single_word(jointed))
''.join(S3-S4)
S3=set(single_word(jointed[['单','双','三']]))
S4=set(single_word(jointed))
''.join(S3-S4)
Out[ ]:
'香生ㄑ白飞虫心耳女犬舌太弓目㔾力大欠山瓜面户子吉馬手毛隹'
双叠字¶
In [ ]:
Copied!
jointed[['单','双']].drop_duplicates().dropna().head()
jointed[['单','双']].drop_duplicates().dropna().head()
Out[ ]:
| 单 | 双 | |
|---|---|---|
| 0 | ㄑ | 巜 |
| 1 | 㔾 | 𠨎 |
| 2 | 㣇 | 㣈 |
| 3 | 一 | 二 |
| 7 | 串 | 丳 |
三叠字¶
In [ ]:
Copied!
jointed[['单','三']].drop_duplicates().dropna().head()
jointed[['单','三']].drop_duplicates().dropna().head()
Out[ ]:
| 单 | 三 | |
|---|---|---|
| 0 | ㄑ | 巛 |
| 1 | 㔾 | 𠨕 |
| 3 | 一 | 三 |
| 4 | 七 | 㐂 |
| 6 | 个 | 𠁭 |
四叠字¶
In [ ]:
Copied!
jointed[['单','四']].drop_duplicates().dropna().head()
jointed[['单','四']].drop_duplicates().dropna().head()
Out[ ]:
| 单 | 四 | |
|---|---|---|
| 3 | 一 | 亖 |
| 5 | 且 | 𠁠 |
| 8 | 丶 | 灬 |
| 10 | 乂 | 㸚 |
| 20 | 人 | 𠈌 |
In [ ]:
Copied!