2013-08: butcher blog

まったく入門とは思えないオライリーのMining the Social Web(入門ソーシャルデータ)その3― 1章終わる ― 2013/08/01 03:10

やっと　1章　終わる

例1-12を動かして　snl_search_results.dotを作成
dataから　pnp(grafic）を作成
　ノードが結構出てきて面白い

＄circo -Tpng -Osnl_search_results snl_search_results.dot

例１−12のpython
# --- sampleソース　で認証の所は　隠してあります
import twitter

# Go to http://twitter.com/apps/new to create an app and get these items
# See https://dev.twitter.com/docs/auth/oauth for more information on Twitter's OAuth implementation

CONSUMER_KEY = '***' #<-各自直してね
CONSUMER_SECRET = '***' #<-各自直してね
OAUTH_TOKEN = '***' #<-各自直してね
OAUTH_TOKEN_SECRET = '***' #<-各自直してね

auth = twitter.oauth.OAuth(OAUTH_TOKEN, OAUTH_TOKEN_SECRET,
CONSUMER_KEY, CONSUMER_SECRET)

twitter_api = twitter.Twitter(domain='api.twitter.com',
api_version='1.1',
auth=auth
)

# <markdowncell>

# **Example 1-3. Retrieving Twitter trends**

# <codecell>

# With an authenticated twitter_api in existence, you can now use it to query Twitter resources as usual.
# However, the trends resource is cleaned up a bit in v1.1, so requests are a bit simpler than in the latest
# printing. See https://dev.twitter.com/docs/api/1.1/get/trends/place

# The Yahoo! Where On Earth ID for the entire world is 1
WORLD_WOE_ID = 1

# Prefix id with the underscore for query string parameterization.
# Without the underscore, it's appended to the URL itself
world_trends = twitter_api.trends.place(_id=WORLD_WOE_ID)
print world_trends

# <markdowncell>

# IPython Notebook didn't display the output because the result of the API call was captured as a variable. Here's how you could print a readable version of the response.

# <codecell>

import json
print json.dumps(world_trends, indent=1)

# <markdowncell>

# Now that you're authenticated and understand a bit more about making requests with Twitter's new API, let's look at some examples that involve search requests, which are a bit different with v1.1 of the API.

# <markdowncell>

# **Example 1-4. Paging through Twitter search results**

# <codecell>

# Like all other APIs, search requests now require authentication and have a slightly different request and
# response format. See https://dev.twitter.com/docs/api/1.1/get/search/tweets

q = "SNL"
count = 100

search_results = twitter_api.search.tweets(q=q, count=count)
statuses = search_results['statuses']

# v1.1 of Twitter's API provides a value in the response for the next batch of results that needs to be parsed out
# and passed back in as keyword args if you want to retrieve more than one page. It appears in the 'search_metadata'
# field of the response object and has the following form:'?max_id=313519052523986943&q=NCAA&include_entities=1'
# The tweets themselves are encoded in the 'statuses' field of the response

# Here's how you would grab five more batches of results and collect the statuses as a list
for _ in range(5):
try:
next_results = search_results['search_metadata']['next_results']
except KeyError, e: # No more results when next_results doesn't exist
break

kwargs = dict([ kv.split('=') for kv in next_results[1:].split("&") ]) # Create a dictionary from the query string params
search_results = twitter_api.search.tweets(**kwargs)
statuses += search_results['statuses']

import json
print json.dumps(statuses[0:2], indent=1)

tweets = [ status['text'] for status in statuses ]

print tweets[0]

words = []
for t in tweets:
words += [ w for w in t.split() ]

# total words
print len(words)

# unique words
print len(set(words))

# lexical diversity
print 1.0*len(set(words))/len(words)

# avg words per tweet
print 1.0*sum([ len(t.split()) for t in tweets ])/len(tweets)

import nltk

freq_dist = nltk.FreqDist(words)
print freq_dist.keys()[:50] # 50 most frequent tokens
print freq_dist.keys()[-50:] # 50 least frequent tokens

import re
rt_patterns = re.compile(r"(RT|via)((?:\b\W*@\w+)+)", re.IGNORECASE)
example_tweets = ["RT @SocialWebMining Justin Bieber is on SNL 2nite. w00t?!?",
"Justin Bieber is on SNL 2nite. w00t?!? (via @SocialWebMining)"]
for t in example_tweets:
print rt_patterns.findall(t)

import networkx as nx
import re
g = nx.DiGraph()

def get_rt_sources(tweet):
rt_patterns = re.compile(r'(RT|via)((?:\b\W*@\w+)+)', re.IGNORECASE)
return [ source.strip()
for tuple in rt_patterns.findall(tweet)
for source in tuple
if source not in ("RT", "via") ]

for status in statuses:
rt_sources = get_rt_sources(status['text'])
if not rt_sources: continue
for rt_source in rt_sources:
g.add_edge(rt_source, status['user']['screen_name'], {'tweet_id' : status['id']})

print nx.info(g)
print g.edges(data=True)[0]
print len(nx.connected_components(g.to_undirected()))
print sorted(nx.degree(g).values())

#import ex1_11
OUT = "snl_search_results.dot"
try:
nx.drawing.write_dot(g, OUT)
except ImportError, e:
dot = ['"%s" -> "%s" [tweet_id=%s]' % (n1, n2, g[n1][n2]['tweet_id']) \
for n1, n2 in g.edges()]
f = open(OUT, 'w')
f.write('strict digraph {\n%s\n}' % (';\n'.join(dot),))
f.close()

まったく入門とは思えないオライリーのMining the Social Web(入門ソーシャルデータ)その4― 2章始める ― 2013/08/05 04:35

まったく入門とは思えないオライリーのMining the Social Web(入門ソーシャルデータ)その４　

なかなか　分からないとだらけなので　やっと2章の１

まず　例題をgitで落とした

git clone https://github.com/ptwobrussell/Mining-the-Social-Web

それと　NTLKの利用等

XFNでblogの人間の関係が引っ張れるというので
必要な物を入れた

easy_install nltk
easy_install BeautifulSoup
easy_install networkx

python microformats__xfn_scrape.py http://ajaxian.com/
Dion Almaer http://www.almaer.com/blog/ [u'me']
Ben Galbraith http://weblogs.java.net/blog/javaben/ [u'co-worker']
Rey Bango http://reybango.com/ [u'friend']
Michael Mahemoff http://softwareas.com/ [u'friend']
Chris Cornutt http://blog.phpdeveloper.org/ [u'friend']
Rob Sanheim http://www.robsanheim.com/ [u'friend']
Dietrich Kappe http://blogs.pathf.com/agileajax/ [u'friend']
Chris Heilmann http://wait-till-i.com/ [u'friend']
Brad Neuberg http://codinginparadise.org/about/ [u'friend']

python microformats__xfn_scrape.py http://www.almaer.com
getBlog() http://almaer.com/blog [u'me']
getOpenWebPodcast() http://openwebpodcast.com/ [u'me']
getTwitter() http://twitter.com/dalmaer [u'me']
getFriendFeed() http://friendfeed.com/dion [u'me']
getLinkedIn() http://www.linkedin.com/in/dalmaer [u'me']
getFacebook() http://facebook.com/dalmaer [u'me']
getFlickr() http://flickr.com/dalmaer [u'me']
getEmail() mailto:dion@almaer.com [u'me']
getResume() http://almaer.com/dion/cv [u'me']
getBook() http://www.pragprog.com/titles/ajax [u'me']
getArticles() http://almaer.com/dion/articles/ [u'me']
getTools() http://almaer.com/dion/tools/ [u'me']
getEmily() http://almaer.com/dion/personal/private/emily/ [u'me']
getWedding() http://almaer.com/dion/personal/private/wedding/ [u'me']
getInlaws() http://almaer.com/dion/personal/private/rock/ [u'me']
getRedbook() http://almaer.com/dion/personal/red_book/ [u'me']
getOldPhotos() http://almaer.com/dion/personal/private/photos/ [u'me']
blog http://almaer.com/blog/ [u'me']
ajaxian http://ajaxian.com/ [u'me']
owp http://openwebpodcast.com/ [u'me']
twitter http://twitter.com/dalmaer [u'me']
ff http://friendfeed.com/dion [u'me']
linkedin http://www.linkedin.com/in/dalmaer [u'me']
facebook http://facebook.com/dalmaer [u'me']
flickr http://flickr.com/dalmaer [u'me']
resume http://almaer.com/dion/cv [u'me']
mybook http://www.pragprog.com/titles/ajax [u'me']
articles http://almaer.com/dion/articles [u'me']
tools http://almaer.com/dion/tools [u'me']
email mailto:dion@almaer.com [u'me']
_emily http://almaer.com/dion/personal/private/emily [u'me']
_wedding http://almaer.com/dion/personal/private/wedding [u'me']
_inlaws http://almaer.com/dion/personal/private/rock [u'me']
_redbook http://almaer.com/dion/personal/red_book [u'me']
_oldphotos http://almaer.com/dion/personal/private/photos [u'me']

ついでにアサブロとか引っ張れないか試して見たが
NGだった。アサブロのｘｍｌはこれから確認予定。

モレスキンノートの裏写りテストパーカーの5thでやってみた ― 2013/08/26 08:50

今まで　モレスキンノートにパイロットの万年筆を使っていたのだが　インクの乾きがイマイチ遅いので

パーカーを購入、

ノートは最近消せるボールペンのフリクションと書き味が良いのでラミーを使っていたが　ラミーはボテる（この夏の暑さかもしれない）

パーカーの5thのうたい文句にはインクの出過ぎは無くなったとの事

裏写り激しいモレスキンで使えるか　一寸試して見た

結論は　微妙、ユックリ書くと　青だと目立つ、インク黒だと罫線で目立たない。　パーカーの書き味は　スキだったトンボのPROGRAPHみたいに良いです。

Hamada note馬鹿に出来ず裏写りしないノート ― 2013/08/26 22:08

びっくり

世界堂で安く売っている　レトロ調のhamada Note（ハマダノート）裏写りしない。コクヨのキャンパスノートより全然　使える

<< 2013/08 >>
日	月	火	水	木	金	土
				01	02	03
04	05	06	07	08	09	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

butcher blog

まったく入門とは思えないオライリーのMining the Social Web(入門ソーシャルデータ)その3― 1章終わる ― 2013/08/01 03:10

まったく入門とは思えないオライリーのMining the Social Web(入門ソーシャルデータ)その4― 2章始める ― 2013/08/05 04:35

モレスキンノートの裏写りテストパーカーの5thでやってみた ― 2013/08/26 08:50

Hamada note馬鹿に出来ず裏写りしないノート ― 2013/08/26 22:08

最近の記事

このブログについて

カテゴリ一覧

最近のコメント

最近のトラックバック

バックナンバー

RSS

butcher blog

まったく 入門とは思えない オライリーのMining the Social Web(入門 ソーシャルデータ)その3― 1章 終わる ― 2013/08/01 03:10

まったく 入門とは思えない オライリーのMining the Social Web(入門 ソーシャルデータ)その4― 2章 始める ― 2013/08/05 04:35

モレスキンノートの裏写りテスト パーカーの5thでやってみた ― 2013/08/26 08:50

Hamada note馬鹿に出来ず 裏写りしないノート ― 2013/08/26 22:08

最近の記事

このブログについて

カテゴリ一覧

最近のコメント

最近のトラックバック

バックナンバー

RSS

ログイン

まったく入門とは思えないオライリーのMining the Social Web(入門ソーシャルデータ)その3― 1章終わる ― 2013/08/01 03:10

まったく入門とは思えないオライリーのMining the Social Web(入門ソーシャルデータ)その4― 2章始める ― 2013/08/05 04:35

モレスキンノートの裏写りテストパーカーの5thでやってみた ― 2013/08/26 08:50

Hamada note馬鹿に出来ず裏写りしないノート ― 2013/08/26 22:08