schedule2018-07-17

02. 「パトカー」＋「タクシー」＝「パタトクカシーー」

自然言語処理100本ノックPython

はじめに

言語処理100本ノック 2015

Pythonを勉強するため、東京工業大学の岡崎教授が出題されている言語処理100本ノック 2015を解いていきます。

より深く理解するため、別解や利用したライブラリの解説もまとめていきます。

環境

Python3.6

OS : mac lint : pep8

問題

02. 「パトカー」＋「タクシー」＝「パタトクカシーー」
「パトカー」＋「タクシー」の文字を先頭から交互に連結して文字列「パタトクカシーー」を得よ．

解答：for文

word_p = "パトカー"
word_t = "タクシー"

result = ""
for i in range(len(word_p)):
    result += word_p[i] + word_t[i]
print(result)

出力

パタトクカシーー

解説

for文を使って1文字ずつ取り出し結合しています。文字列の結合は+で出来ます。+=を使い文字列を追加しています。

別解：zip()

word_p = "パトカー"
word_t = "タクシー"

result = ""
for p, t in zip(word_p, word_t):
    result += p + t
print(result)

出力

パタトクカシーー

解説

zip()を使うと、複数の文字列から文字を１つずつ取り出すことが出来ます。

異なる長さでのzip()

word_h = "ヘリコプター"
word_p = "パトカー"

result = ""
for h, p in zip(word_h, word_p):
    result += h + p
print(result)
>>> ヘパリトコカプー

異なる長さの文字列をzip()に入れた場合、短いほうに合わせるようです。

zip()の構造

zip()の中身はどうなっているのでしょうか？

word_h = "ヘリコプター"
word_p = "パトカー"

print(zip(word_p, word_t))
>>> <zip object at 0x103b4d108>

そのまま出力すると、zip objectを返します。list()でリストにしてみます。

print(list(zip(word_p, word_t)))
>>> [('パ', 'タ'), ('ト', 'ク'), ('カ', 'シ'), ('ー', 'ー')]

同じインデックスの要素のタプル型が並んだ配列でした。

別解：zip() + 連想配列

word_p = "パトカー"
word_t = "タクシー"

print("".join([''.join(pair) for pair in zip(word_p, word_t)]))

出力

パタトクカシーー

解説

同じインデックスの要素のタプル型が配列であることがわかったので、連想配列を使ってみます。

うーん。一応出来ますね。

別解：試して出来たもの

試したいろんな解答を書き出す。

どこかで誰かのお役に立てると良いです。。。

zip() + map()

print("".join(map(lambda x: ''.join(x), zip(word_p, word_t))))
>>> パタトクカシーー

高階関数のmap()を使った解答です。for文から脱却できました。

zip() + functools.reduce()

from functools import reduce 

word_p = "パトカー"
word_t = "タクシー"

print("".join(reduce(lambda p, t: p + t, zip(word_p, word_t))))
>>> パタトクカシーー

Python２まで標準としてあった高階関数のreduce()を使いました。

zip() + itertools.chain()

from itertools import chain

word_p = "パトカー"
word_t = "タクシー"

print("".join(map(lambda x: "".join(chain(x)), zip(word_p, word_t))))
>>> パタトクカシーー

itertools

itertools.zip_longest()

import itertools

word_h = "ヘリコプター"
word_p = "パトカー"

result = ""
for h, p in itertools.zip_longest(word_h, word_p, fillvalue='*'):
    result += h + p
print(result)
>>> ヘパリトコカプータ*ー*

zip()では短い方に合わせていたが、itertools.zip_longest()では長い方に合わせられるそうだ。足りない分はデフォルトではNoneで埋まる。fillvalueで好きな文字をを指定できる。

numpy

転置行列を使って解いてみる。行列計算のライブラリnumpyに関数があるため、練習も兼ねて試す。

インストール方法

pip install numpy

解法

import numpy as np

word_p = "パトカー"
word_t = "タクシー"


def zip_by_numpy(str1, str2):
    arr = np.asarray([list(map(ord, str1)), list(map(ord, str2))])
    return [(chr(s1), chr(s2)) for s1, s2 in arr.T.tolist()]


result = ''
for p, t in zip_by_numpy(word_p, word_t):
    result += p + t
    
print(result)
>>> パタトクカシーー

zip_by_numpy()はzip()を模した関数。

nuｍpyに文字は代入できないため、ordを使って数値に変換する。また、数値から文字列に戻すときはchrを使う。

転置行列はarr.Tでできる。

numpyで「パタパタトクトクカシカシーーーー」

zip()関数の中身をみるとfor文を使っている。

def zip(*iterables):
    # zip('ABCD', 'xy') --> Ax By
    sentinel = object()
    iterators = [iter(it) for it in iterables]
    while iterators:
        result = []
        for it in iterators:
            elem = next(it, sentinel)
            if elem is sentinel:
                return
            result.append(elem)
        yield tuple(result)

イテレーターの使い方がカッコ良い。

文字列を交互にする用途であれば、転置行列を使った方が早い気がする。早速作ってみた。

word_p = "パトカー"
word_t = "タクシー"


def str_to_int_array(string):
    # 文字列を整数にしたtupleを返す。
    return tuple(map(ord, string))

def alternately(*args):
    # 複数の文字列を交互にした文字列を返す。
    # alternately('ABC', 'xyz') --> 'AxByCz'
    arr = np.asarray(list(map(str_to_int_array, args)))
    return "".join(tuple(map(chr, arr.T.flatten().tolist())))

print(alternately(word_p, word_t))
>>> パタトクカシーー

# たくさんの引数
print(alternately(word_p, word_t, word_p, word_t))
>>> パタパタトクトクカシカシーーーー