又双叒叕——汉字的堆叠¶
In [ ]:
Copied!
from google.colab import drive
drive.mount('/content/drive')
work_dir = '/content/drive/MyDrive/Colab Notebooks/colab_data/叠字'
from google.colab import drive
drive.mount('/content/drive')
work_dir = '/content/drive/MyDrive/Colab Notebooks/colab_data/叠字'
Mounted at /content/drive
In [ ]:
Copied!
import pandas as pd
from functools import reduce
import pandas as pd
from functools import reduce
In [ ]:
Copied!
data=[pd.read_table(f'{work_dir}/{i}.txt') for i in range(2, 5)]
data=[pd.read_table(f'{work_dir}/{i}.txt') for i in range(2, 5)]
In [ ]:
Copied!
def merge_out(x, y):
return pd.merge(x, y, on='单', how='outer')
jointed=reduce(merge_out, data)
def merge_out(x, y):
return pd.merge(x, y, on='单', how='outer')
jointed=reduce(merge_out, data)
具有多种堆叠方式的字¶
具有二三四堆叠¶
In [ ]:
Copied!
jointed.dropna().head()
jointed.dropna().head()
Out[ ]:
单 | 双 | 三 | 四 | |
---|---|---|---|---|
3 | 一 | 二 | 三 | 亖 |
20 | 人 | 仌 | 众 | 𠈌 |
21 | 人 | 仌 | 㐺 | 𠈌 |
22 | 人 | 从 | 众 | 𠈌 |
23 | 人 | 从 | 㐺 | 𠈌 |
具有二三四堆叠的单字¶
In [ ]:
Copied!
def single_word(table):
return ''.join(table.dropna()['单'].drop_duplicates().tolist())
def single_word(table):
return ''.join(table.dropna()['单'].drop_duplicates().tolist())
In [ ]:
Copied!
single_word(jointed)
single_word(jointed)
Out[ ]:
'一人厶又口土屮日木朿果水火牛田石老言車金風魚龍'
金木水火土¶
- 金 鍂 鑫 𨰻
- 木 林 森 𣛧 𣡽
- 土 圭 垚 㙓
- 水 沝 淼 㵘
- 火 炏 焱 燚
只具有二三堆叠的单字¶
In [ ]:
Copied!
S3=set(single_word(jointed[['单','双','三']]))
S4=set(single_word(jointed))
''.join(S3-S4)
S3=set(single_word(jointed[['单','双','三']]))
S4=set(single_word(jointed))
''.join(S3-S4)
Out[ ]:
'香生ㄑ白飞虫心耳女犬舌太弓目㔾力大欠山瓜面户子吉馬手毛隹'
双叠字¶
In [ ]:
Copied!
jointed[['单','双']].drop_duplicates().dropna().head()
jointed[['单','双']].drop_duplicates().dropna().head()
Out[ ]:
单 | 双 | |
---|---|---|
0 | ㄑ | 巜 |
1 | 㔾 | 𠨎 |
2 | 㣇 | 㣈 |
3 | 一 | 二 |
7 | 串 | 丳 |
三叠字¶
In [ ]:
Copied!
jointed[['单','三']].drop_duplicates().dropna().head()
jointed[['单','三']].drop_duplicates().dropna().head()
Out[ ]:
单 | 三 | |
---|---|---|
0 | ㄑ | 巛 |
1 | 㔾 | 𠨕 |
3 | 一 | 三 |
4 | 七 | 㐂 |
6 | 个 | 𠁭 |
四叠字¶
In [ ]:
Copied!
jointed[['单','四']].drop_duplicates().dropna().head()
jointed[['单','四']].drop_duplicates().dropna().head()
Out[ ]:
单 | 四 | |
---|---|---|
3 | 一 | 亖 |
5 | 且 | 𠁠 |
8 | 丶 | 灬 |
10 | 乂 | 㸚 |
20 | 人 | 𠈌 |
In [ ]:
Copied!