Slackに投稿した写真をGoogle Driveに自動保存する

最近、相方さんと連絡を取るのにSlackを使っています。チャンネルで話題を分けられるのが便利ですね。プライベートでSlackを使ってみた系の有名なエントリはこれです。

確かに、天気予報を朝に流したり、気になるワードに関連するニュースを流すチャンネルを作ったり、botと戯れたり、と好きな機能を開発できるのが楽しいです。（docomoの雑談APIを入れたhubotがかわいいです）

が、しかしですよ？SlackにはLINEでいうアルバム機能がないのです。
（ないですよね？あったら教えてください＼(^o^)／）

と、いうわけでSlackのAPI群を使って、

自動でSlackのチャンネルから画像を取得
Google Driveの共有フォルダに格納
これを定期的に自動実行

することを目指します。

結果、一応できるにはできたのですが、スマートでない点がいくつか...

サーバ化しているラズパイから定期実行（自前サーバが必要）
ラズパイでGoogle driveを使うために課金（$4.99）が必要

これらが許せる方は以下のエントリをどうぞ^^ ちなみに、Google APIを使えば課金なしでもラズパイからGoogle driveが使えるようですが、情報が少なくて挫折しましたorz また時間を見つけて検討します...!!

実行結果↓↓（実行してる感はない）各写真のidがファイル名に入っているので念のためモザイク^^

手順は以下の通りです。一番下にこの2から5を実現するためのPython スクリプトがあります。

Slack API tokenを入手
Slack API file.listで画像ファイルを一覧を取得
Slack API file.sharedPublicURLで公開用URLを入手
公開URLから入手した画像をGoogle driveに保存
Slack API file.revokePublicURLで公開用URLを無効化

1. Slack API tokenを入手

毎度お馴染み先人の知恵。これは簡単。

2. Slack API file.listで画像ファイルを一覧を取得

SlackのAPIを使います。
https://api.slack.com/methods/files.list

Testerタブで上で取得したtokenを含むパラメータを設定すると、条件に合うファイルリストをjson形式で得るためのリクエストURLが生成されます。特定の日時以降・以前、ファイルタイプ、ソースとなるチャンネルなどが設定できます。このURLから得られたjsonの、"id": の後の英数のidが画像ファイルを取得するためのキーになります。

3. Slack API file.sharedPublicURLで公開用URLを入手

SlackのAPIを使います。
https://api.slack.com/methods/files.sharedPublicURL

上で取得したidをこのAPIのパラメータとして与え、画像を公開するためのリクエストURLを取得します。リクエストの結果得られるjsonファイルの中身から画像の公開URLが得られます。この後、この公開URLは無効化しますが、一定時間だれでもアクセスできる状態になることにご注意ください（とはいえ、どこかに通知されるわけでもないですし、そこまで神経質にならなくてもよいかと思います）。

jsonファイルの"permalink_public": のあとに続くURLが画像の公開URLになります。このURLのアクセス先ページのHTMLから画像自身のURLを取得します（ややこしい...）。HTMLの中の img src= のあとのURLが画像自身のURLです。

4. 公開URLから入手した画像をGoogle driveに保存

画像のURLがわかればあとは簡単...と思ったのですが、実はここが一番ハマりました... 画像の取得自体は簡単です（URLにブラウザからアクセスからの画像を保存 or Pythonでrequests.getなど）。さーてあとはGoogle driveにラズパイからアクセスするだけ...

そして死んだ

Google APIわかりくすぎ...orz overGriveを使えばとりあえずラズパイからGoogle driveを使えますが、トライアル期間以降は課金が必要です。ローカル保存でいいよって方やサーバがWindowsやMacだよって方はこのへん無視でOKです。

5. Slack API file.revokePublicURLで公開用URLを無効化

SlackのAPIを使います。
https://api.slack.com/methods/files.revokePublicURL
3. と同じ手順なので割愛。リクエストURLにアクセスするだけでOKです。

長くなりましたが、とりあえず今まで投稿してきた画像を全て取得するPython スクリプトは以下です。

# coding:utf-8

import time
import datetime
import urllib2
import shutil
import requests
import re


#################################################
#
#   parameters
#
#################################################

output_folder   = '/path/to/directory'
channel         = 'channel_ID'
slack_API_token = 'Slack_API_token'

#################################################
#
#   def
#
#################################################

def download_imgurl(url, file_name):
    
    res = requests.get(url, stream=True)
    
    if res.status_code == 200:
        with open(file_name, 'wb') as file:
            shutil.copyfileobj(res.raw, file)
        return 1
    else:
        return -1


def revoke_all(file_list):
    
    for file in file_list:
        # revoke the shared public url.
        revoke_url = "https://slack.com/api/files.revokePublicURL?token=%s&file=%s&pretty=1" % (slack_API_token, file)
        response4 = urllib2.urlopen(revoke_url).read()


#################################################
#
#   main
#
#################################################

# get image file ids.
file_list_url = "https://slack.com/api/files.list?token=%s&channel=%s&types=images&pretty=1" % (slack_API_token, channel)
response1  = urllib2.urlopen(file_list_url).read()
id_pattern = re.compile(r'\"id\": \"([a-zA-Z0-9]+)\",\n')
file_list  = id_pattern.findall(response1)

# revoke public url (if present).
revoke_all(file_list)

# show infomation
print '[INFO] ' + str(len(file_list)) + ' files will be downloaded from Slack.'

# download images to the specified local folder.
for file in file_list:
    
    try:

        # get shared public url.
        public_url = "https://slack.com/api/files.sharedPublicURL?token=%s&file=%s&pretty=1" % (slack_API_token, file)
        response2 = urllib2.urlopen(public_url).read()
        pubhtml_pattern = re.compile(r'\"permalink_public\": \"([a-zA-Z0-9!-/:-@¥[-`{-~]+)\",\n')
        img_html_url  = pubhtml_pattern.findall(response2)[0].replace('\\', '')

        # get timestamp
        timestamp_pattern = re.compile(r'\"timestamp": ([0-9]+),\n')
        timestamp = timestamp_pattern.findall(response2)[0]
        timestamp = datetime.datetime.fromtimestamp(float(timestamp))
        timestamp = timestamp.strftime('%Y-%m-%d-%H-%M-%S')

        # get image url.
        response3 = urllib2.urlopen(img_html_url).read()
        puburl_pattern = re.compile(r'<img src=\"([a-zA-Z0-9!-/:-@¥[-`{-~]+)\">\n')
        img_url  = puburl_pattern.findall(response3)[0]


        # download image
        download_imgurl(img_url, output_folder + str(timestamp) + '-' + file + '.jpg')


        # revoke the shared public url.
        revoke_url = "https://slack.com/api/files.revokePublicURL?token=%s&file=%s&pretty=1" % (slack_API_token, file)
        response4 = urllib2.urlopen(revoke_url).read()

    except: pass

ちなみに、パラメータでチャンネルを指定しているのは、こうしないとコメント付きの画像ファイルを検索できなかったからです（Slack APIのバグ？）。またコメント付きの画像の情報（json内）には、画像ファイルid以外のid番号（なんのid かは知りません...）が含まれているようで、これで3. を実行するとエラーになるため、try exceptで回避しています。チャンネルIDの取得方法は以下を参照くださいませ〜

ついでに、毎日午前3時に24時間前から現在までにSlackに投稿された画像を検索してダウンロードするスクリプト。サーバで動かす用です。

# coding:utf-8

import time
import datetime
import urllib2
import shutil
import requests
import re


#################################################
#
#   parameters
#
#################################################

output_folder   = '/path/to/directory'
channel         = 'channel_ID'
slack_API_token = 'Slack_API_token'

#################################################
#
#   def
#
#################################################

def download_imgurl(url, file_name):
    
    res = requests.get(url, stream=True)
    
    if res.status_code == 200:
        with open(file_name, 'wb') as file:
            shutil.copyfileobj(res.raw, file)
        return 1
    else:
        return -1


def revoke_all(file_list):
    
    for file in file_list:
        # revoke the shared public url.
        revoke_url = "https://slack.com/api/files.revokePublicURL?token=%s&file=%s&pretty=1" % (slack_API_token, file)
        response4 = urllib2.urlopen(revoke_url).read()


#################################################
#
#   main
#
#################################################

while(1):
    
    # get current datetime.
    dt = datetime.datetime.now()

    if (dt.hour == 3 and dt.minute == 0):
        ts_current = int(time.mktime(dt.timetuple()))
        ts_1daybef = int(ts_current - 86400)

        # get image file ids.
        file_list_url = "https://slack.com/api/files.list?token=%s&ts_from=%d&channel=%s&types=images&pretty=1" % (slack_API_token, ts_1daybef, channel)
        response1  = urllib2.urlopen(file_list_url).read()
        id_pattern = re.compile(r'\"id\": \"([a-zA-Z0-9]+)\",\n')
        file_list  = id_pattern.findall(response1)

        if len(file_list) == 0:
            print '[INFO] No update was found. Datetime: ' +  dt.strftime('%Y-%m-%d %H:%M:%S')
            time.sleep(84100)
            continue
        
        # revoke public url (if present).
        revoke_all(file_list)

        # download images to the specified local folder.
        for file in file_list:
            
            try:

                # get shared public url.
                public_url = "https://slack.com/api/files.sharedPublicURL?token=%s&file=%s&pretty=1" % (slack_API_token, file)
                response2 = urllib2.urlopen(public_url).read()
                pubhtml_pattern = re.compile(r'\"permalink_public\": \"([a-zA-Z0-9!-/:-@¥[-`{-~]+)\",\n')
                img_html_url  = pubhtml_pattern.findall(response2)[0].replace('\\', '')

                # get timestamp
                timestamp_pattern = re.compile(r'\"timestamp": ([0-9]+),\n')
                timestamp = timestamp_pattern.findall(response2)[0]
                timestamp = datetime.datetime.fromtimestamp(float(timestamp))
                timestamp = timestamp.strftime('%Y-%m-%d-%H-%M-%S')

                # get image url.
                response3 = urllib2.urlopen(img_html_url).read()
                puburl_pattern = re.compile(r'<img src=\"([a-zA-Z0-9!-/:-@¥[-`{-~]+)\">\n')
                img_url  = puburl_pattern.findall(response3)[0]


                # download image
                download_imgurl(img_url, output_folder + str(timestamp) + '-' + file + '.jpg')
                print '[INFO] %s.jpg is uploaded from Slack.' % str(timestamp)

                # revoke the shared public url.
                revoke_url = "https://slack.com/api/files.revokePublicURL?token=%s&file=%s&pretty=1" % (slack_API_token, file)
                response4 = urllib2.urlopen(revoke_url).read()
        
            except: pass

        time.sleep(86100)

    else:time.sleep(60)

うーん、もうちょっとスマートにやりたいけど、とりあえずはこんなもんでしょうか。とりあえずはうまく動いて、Google driveに画像が上がってきています。本当はGoogle APIを使ってやりたいのですがそちらはまた時間があればやってみます（やらないフラグ...）。

kikuの四苦Hack

ガジェットやプログラミングで遊ぶ