[python] 뉴스기사 텍스트마이닝 토큰화, 감정분석 및 단어구름 분석

 

신문기사를 수집하여 텍스트 마이닝(Text Analysis)를 수행해보고자 한다.

 

우선 텍스트마이닝은 기본적으로 아래와 같은 단계로 나누어 수행된다.

 

  • 텍스트마이닝을 실시할 데이터를 선정(예를들어 기사, 코멘트, 문서, 위키 등)
  • 텍스트마이닝을 통해 어떠한 유의미한 데이터를 뽑아올 것인지 분석
  • 수집한 기사를 정돈하고 무의미한 데이터를 삭제(문서 단위)
  • 문서의 토큰화(단어로 나누거나) 또는 불용어(stopwords, 무의미한 단어 사이 특수문자 등) 제거
  • 다양한 분석을 진행
    • TF-IDF 등 핵심어를 뽑아 내거나 word2vec 처럼 단어를 벡터화
    • 문서를 분류하거나 N-GRAM 등을 이용
    • 단어 구름을 통해 text를 시각화

 

간단하게 보면 위의 단계로 나누어서 수행할 수 있다.

 

 

이번에 해보고자 한 것은 영어로 된 Financial Article 을 수집하고 이 기사들의 논조를 분석해보고자 하였다.

 

기본 기사를 제공하는 데이터셋(연습용)도 있지만 실제 기사를 수집하였다.

로이터(reuter) 통신의 기사를 일부 수집하였다.

이는 별도 크롤링을 통해 수행하였다. (추후 기회가 된다면 소개하고자 한다.)

 

요즘에는 해외 언론사 등의 경우에는 잦은 크롤링을 통한 시스템의 피해 혹은 Text 데이터의 가치 상승 등의 사유로 인해 예전만큼 쉽게 데이터를 얻을 수는 없는 듯 하다.

 

수집해온 파일의 형태는 다음과 같다.

 

<?xml version="1.0"?>

-<xml>


-<article>

<title>South Korea December factory activity returns to growth as export orders expand most in 18 months: PMI</title>

<date>2020-01-02</date>

<contents>By Reuters Staff3 Min ReadSEOUL (Reuters) - South Korea’s factory activity returned to growth in December, snapping seven straight months of contraction, helped by improving demand, especially from abroad, a private business survey showed on Thursday.FILE PHOTO: Cargo containters are stacked at a yard in Busan, the country's biggest and the world's No. 5 container port, about 420 km (262 miles) southeast of Seoul, in this July 21, 2010 file photo. REUTERS/Truth Leem/Files (SOUTH KOREA)The Nikkei/Markit purchasing managers’ index (PMI) in December rose to 50.1, from 49.4 in November.It stood just above the 50-point level that separates growth from contraction and was the highest reading since April, when it marginally stood above the threshold at 50.2.Manufacturing output expanded for the first time in 14 months, helped by new product launches and general boost in demand conditions, the survey showed.New export orders index jumped to 51.3 in December, from 49.9 in the previous month and the highest reading since June 2018. Panellists reported greater sales to Asian markets such as Japan, China and Vietnam.Total new orders, which snapped a 13th month of contraction, rose to 50.7, on stronger overall demand and penetration into new overseas markets.“Perhaps most important was growth in overseas demand, the strongest increase in foreign workloads since mid-2018,” IHS Markit economist Joe Hayes said.“Sustained growth in exports will be key to ensuring that South Korea’s manufacturing sector can positively contribute to overall economic output,” Hayes added.South Korean exports in the first 20 days of December slid 2.0% on-year in value, marking the slowest fall in a year, as recovery in demand from China and stabilizing chip prices offered signs that a year-long run of declines may be nearing its end.Unlike most sub-indexes where overall improvements were spotted, the pace of job losses accelerated to 47.5 in December, from 49.4 a month earlier. The contraction, the fastest since May 2018, extended the streak into an eight consecutive month, though this had little impact on work efficiency as backlogs of work were also reduced during the month.The PMI survey showed business sentiment for the next 12 months was cheered by optimism on greater demand in new products and hopes for the trend in the global manufacturing industry to pick up.Reporting by Joori Roh; Editing by Sam HolmesOur Standards: The Thomson Reuters Trust Principles.</contents>

</article>


-<article>

<title>PRECIOUS-Gold rises back towards 3-month high, markets eye Fed minutes</title>

<date>2020-01-02</date>

<contents>By K. Sathya Narayanan0 Min Read (Updates prices) * Fed December meeting minutes due at 1900 GMT on Friday * Dollar index holds near six-month low By K. Sathya Narayanan Jan 2 (Reuters) - Gold firmed on Thursday, edging backtowards the three-month peak it reached earlier in the week onthe back of dollar weakness, with the market focusing on minutesof the U.S. Federal Reserve's December policy meeting. Spot gold was up 0.4% to $1,523.12 per ounce as of1248 GMT, having touched its highest since Sept. 25 at $1,525.20on Tuesday. U.S. gold futures were up 0.2% at $1,525.80. "We are seeing a bit of a bounce-back in the dollar but ifyou look at the movements that we saw (in the past few days), itis probably supporting gold in the interim," said OANDA analystCraig Erlam. The negative correlation between the dollar and bullion iswhat really propelled gold from $1,480 to $1,520, he said, butfurther upside in the U.S. currency could put pressure on gold. Against key rivals, the dollar was up 0.3% thissession, but was trading not far from the six-month low ittouched on Tuesday. Beijing's decision to ease monetary policy further supportedbullion, Erlam added. China's central bank on Wednesday said it was cutting theamount of cash that all banks must hold as reserves, releasingfunds to shore up the slowing economy. Bullion prices posted their biggest annual rise in nearly adecade in 2019, boosted by the drawn-out trade war between theUnited States and China that dragged on global economic growth. Many analysts said prices were likely to rise further in2020, with shaky growth and global stock markets potentiallylooking unsustainable at record highs. "A key thing to look out for is stock markets, which havebeen setting new highs," said Brian Lan, managing director atdealer GoldSilver Central in Singapore. "In case there is somecorrection, we (could) see some capital flows into gold." Brexit, the U.S. presidential election, protests in HongKong and tensions with North Korea would be the other keyfactors for the market this year, he said. Investor focus has now turned to the minutes of the FederalReserve's Dec. 10-11 policy meeting, due at 1900 GMT on Friday.Lower interest rates encourage the buying of non-interest-payingbullion. "Friday's U.S. manufacturing ISM and the December FederalOpen Market Committee (FOMC) minutes could provide an impulse,"Stephen Innes, a market strategist at AxiTrader said in a note. Among other precious metals, silver gained 0.5% to$17.92 per ounce, while platinum rose 1.7% to $979.65 andpalladium edged up 0.4% to $1,947.37 per ounce. (Reporting by K. Sathya Narayanan and Sumita Layek inBengaluru; Editing by Jan Harvey) Our Standards: The Thomson Reuters Trust Principles.</contents>

</article>


-<article>

<title>GLOBAL MARKETS-Asian shares rise on China's policy easing, trade deal hopes</title>

<date>2020-01-02</date>

<contents>By Andrew Galbraith4 Min Read* MSCI Asia ex-Japan +0.35%* China blue-chips jump after PBOC announces RRR cut* Trump says Phase 1 trade deal to be signed Jan. 15.* Asian stock markets: tmsnrt.rs/2zpUAr4SHANGHAI, Jan 2 (Reuters) - Asian shares kicked off the new decade higher on Thursday, after global stocks ended the previous one at record highs, and buoyed by Chinese markets after Beijing eased monetary policy to support slowing growth.Investors also cheered news that the United States and China will sign a trade pact soon after a year of volatile negotiations between the world’s two largest economies.MSCI’s broadest index of Asia-Pacific shares outside Japan was up 0.35% in morning trade after rising 5.6% in December.U.S. President Donald Trump said on Tuesday that Phase 1 of trade deal with China would be signed on Jan. 15 at the White House, though uncertainty surrounds details about the agreement.Rising hopes for a resolution to the U.S.-China trade war helped propel global equities to record highs late last year and depress the value of the U.S. dollar.MSCI’s all-country world index of stock performance in 49 nations touched an all-time high of 567.80 on Dec. 27. It was last quoted at 565.46, off 0.41% from that peak.In China, the blue-chip CSI300 index, one of the world’s best-performing indexes last year, was 1.34% higher in early trade.China’s central bank on Wednesday that it would cut the amount of cash that banks must hold as reserves, releasing around 800 billion yuan in funds effective Jan. 6.“I think the monetary angle in terms of what it means for the companies, is not that important,” said Jim McCafferty, head of Asia ex-Japan equity research at Nomura in Hong Kong.“However for what it means for the consumer point of view, then clearly if there’s easy money and ... individuals can borrow cheaply, repay debt quickly, then that of course is going to help the economy and the companies.”McCafferty said he expects a memory up-cycle and new handset development prompted by the rollout of 5G mobile technology could help to lift tech-heavy markets like Korea and Taiwan this year.Australian shares flicked between small gains and losses, and were last up 0.2%. Seoul’s Kospi began the year down 0.85%, while shares in Taiwan added 0.51%.Markets in Japan are closed for a national holiday.The gains in Asia follow a bullish end to the year on Wall Street on Tuesday. The Dow Jones Industrial Average rose 0.27% to 28,538.44 and the S&P 500 gained 0.29% to 3,230.78. The Nasdaq Composite added 0.3% to 8,972.60.In currency markets on Thursday, the dollar continued to weaken slightly against major peers as investors bet on a better outlook for global growth and trade.The dollar was 0.06% weaker against the yen at 108.64 while the euro gained 0.11% to 1.1222.The dollar index, which tracks the greenback against a basket of six rivals, was little changed, rising 0.04% to 96.427.U.S. crude was up 0.36% to $61.28 and global benchmark Brent crude rose to $66.24 per barrel, building on a rise that gave oil its biggest annual gain in three years in 2019.Gold, which has benefited from a weaker greenback, was up 0.18% on the spot market, fetching $1,519.64 per ounce.Reporting by Andrew Galbraith; Editing by Sam HolmesOur Standards: The Thomson Reuters Trust Principles.</contents>

</article>


-<article>

<title>GLOBAL MARKETS-Asian shares jump on China policy easing, trade deal hopes</title>

<date>2020-01-02</date>

<contents>By Andrew Galbraith4 Min Read* MSCI Asia ex-Japan +0.43%* Euro Stoxx 50 futures point to higher open in Europe* China blue chips jump after PBOC announces RRR cut* Trump says Phase 1 trade deal to be signed Jan. 15.* Asian stock markets: tmsnrt.rs/2zpUAr4SHANGHAI, Jan 2 (Reuters) - Asian shares kicked off 2020 on a strong note on Thursday, spurred by Chinese markets after Beijing eased monetary policy to support the slowing economy.Investors also cheered news that the United States and China will sign a trade pact soon after months of volatile negotiations between the world’s two largest economies.European equities were set to follow Asia higher in their first trading session of the new decade. Pan-region Euro Stoxx 50 futures rose 0.62% and FTSE futures were up 0.31%, though German DAX futures fell 0.18%.U.S. stock futures also suggested a bright start on Wall Street, with S&P 500 e-minis up 0.28%.MSCI’s broadest index of Asia-Pacific shares outside Japan rose 0.43%, after rising 5.6% in December.U.S. President Donald Trump said on Tuesday that Phase 1 of trade deal with China would be signed on Jan. 15 at the White House, though uncertainty surrounds details about the agreement.Rising hopes for a resolution to the U.S.-China trade war helped propel global equities to record highs late last year and depressed the value of the U.S. dollar.MSCI’s all-country world index of stock performance in 49 nations touched an all-time high of 567.80 on Dec. 27. It was last quoted at 565.28, off 0.44% from that peak.In China, the blue-chip CSI300 index, one of the world’s best-performing indexes last year, jumped as much as 1.86% on Thursday to its highest level since Feb. 7, 2018. It was last up 1.35%.Hong Kong’s Hang Seng added 1.05%.Investors were cheered after China’s central bank on Wednesday said that it would cut the amount of cash that banks must hold as reserves, releasing around 800 billion yuan ($114.9 billion) in funds for lending, effective Jan. 6.Though China’s economy has started to show some signs of bottoming out, analysts say it is not out of the woods yet and expect further growth boosting moves in coming months.“I think the monetary angle in terms of what it means for the companies, is not that important,” said Jim McCafferty, head of Asia ex-Japan equity research at Nomura in Hong Kong.“However, for what it means from the consumer point of view, then clearly if there’s easy money and ... individuals can borrow cheaply and repay debt quickly, then that of course is going to help the economy and the companies.”McCafferty said a recovery in memory chips and new handset development prompted by the rollout of 5G mobile technology could help lift tech-heavy equity markets like South Korea and Taiwan this year.Australian shares flicked between small gains and losses before ending up 0.1%. Seoul’s Kospi lost 0.85%, while shares in Taiwan added 0.86%.Markets in Japan were closed for a national holiday.The gains in Asia followed a bullish end to the year on Wall Street on Tuesday. The Dow Jones Industrial Average rose 0.27% and the S&P 500 gained 0.29%. The Nasdaq Composite added 0.3%.In currency markets, the dollar was slightly stronger against major peers, but gains were capped as investors continued to expect a better outlook for global growth and trade as well as an end to U.S. economic outperformance.The dollar was 0.03% stronger against the yen at 108.73 while the euro shaved off 0.04% to 1.1205.The dollar index, which tracks the greenback against a basket of six rivals, was up 0.23% to 96.606.U.S. crude was up 0.28% to $61.23 and global benchmark Brent crude rose 0.33% to $66.22 per barrel, building on a rise that gave oil its biggest annual gain in three years in 2019.Gold, which has benefited from a weaker greenback, was up 0.23% on the spot market despite the slightly strong dollar. It last fetched $1,520.37 per ounce.$1 = 6.9633 Chinese yuan renminbi Reporting by Andrew Galbraith; Editing by Sam Holmes & Kim CoghillOur Standards: The Thomson Reuters Trust Principles.</contents>

</article>

</xml>

 

동 파일을 기초 데이터로 데이터 분석을 수행해보았다.

코드를 수행 순서대로 쪼개어서 설명해보겠다.

 

우선 각종 라이브러리들을 import 해주고 XML 형태의 데이터를 load 해보았다.

 

from nltk.corpus import reuters
from nltk import sent_tokenize
from nltk.tokenize import word_tokenize
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import nltk
import pandas as pd
import numpy as np
import ssl
import csv    
import re
import os
from xml.etree.ElementTree import Element, SubElement, ElementTree
import xml.etree.ElementTree as ET
from glove import Corpus, Glove


targetXML = open('./2020/20200102.xml', 'rt', encoding='UTF8')
tree = ET.parse(targetXML)
root = tree.getroot()

cont = root[2][2].text

print(cont)
trade_docs = cont

 

이후 해당 기사를 전처리하기 위해 문장 토큰화(tokenize)를 진행하였다.

 

##문장 토큰화

sentences = sent_tokenize(trade_docs)
print('분리된 문장 개수 : ', len(sentences))
print()

for i in range(len(sentences)):

    print('{}번 째 문장 '.format(i+1))
    print(sentences[i], '\n')

# 분리된 문장 별 단어 토큰화

word_tokens = [word_tokenize(sentence) for sentence in sentences]
print('분리한 문장을 다시 단어로 분리 :')
print()

 
for i in word_tokens:

    print('단어 개수 :', len(i))
    print(i, '\n')

 

이후 해당 토큰화가 완료된 단어들을 하나씩 비교하여 불용어(Stopwords)를 제거하였다.

각종 문장부호나 said, reuter 등 반복사용되거나 무의미한 데이터를 제거하였다.

 

 

# 관사와 같은 제거할 불용어를 확인
#nltk.download('stopwords')
#print('영어 stop word 개수 : ', len(nltk.corpus.stopwords.words('english')))
#print(nltk.corpus.stopwords.words('english'))

stopwords = nltk.corpus.stopwords.words('english')
newStopWords = ['``','`','’','\'','','%','.','\'\'',',',':',';','(',')','said','reuters']
stopwords.extend(newStopWords)


## 정의한 불용어를 쪼개진 문장의 단어별로 비교하여 제거


all_tokens = []
all_tokens_list = []
 

print('분리한 문장에서 불용어 제거 :')
print()


for sentence in word_tokens:
    filtered_words = []
    for word in sentence:

        # 소문자 변환

        word = word.lower()

        # 개별 단어가 스톱워드에 포함되지 않으면 word_token에 추가

        if word not in stopwords:

            filtered_words.append(word)

            all_tokens.append(word)

    all_tokens_list.append(filtered_words)
 

for i in all_tokens_list:

    print('단어 개수 :', len(i))
    print(i, '\n')


## all_tokens_list 는 문장, 단어 별로 쪼개고 불용어까지 제거한 list

print(all_tokens_list)

이후 여러가지 text analysis 를 진행해보았다.

 

- GloVe를 사용하여 특정 단어와 most_similar 한 단어를 찾아보았다.(korea와 관계깊은 단어)

- Vader를 통해 Sentiment Analysis를 진행해보았다.  긍정, 중립, 부정에 따른 Polarity 값을 제시한다.

- 또한 wordcloud를 이용해 단어구름 시각화를 진행해보았다.

 

 

# 훈련 데이터로부터 GloVe에서 사용할 동시 등장 행렬 생성
corpus = Corpus() 
corpus.fit(all_tokens_list, window=10)
glove = Glove(no_components=100, learning_rate=0.05)

# 학습에 이용할 쓰레드의 개수는 4로 설정, 에포크는 20.
glove.fit(corpus.matrix, epochs=20, no_threads=4, verbose=True)
glove.add_dictionary(corpus.dictionary)


# 단어와의 연관성
model_result1=glove.most_similar("korea")

print('########[korea] 과 관계 깊은 단어들 리스트########')
print(model_result1)


np.savetxt('./glove-vector.tsv', glove.word_vectors, delimiter='\t') 
with open('./glove-metadata.tsv', 'w', encoding='utf-8') as f: 
    for key in glove.dictionary.keys():
        f.write(f"{key}\n")
   
        

print(all_tokens)

test1_en = nltk.Text(all_tokens)

print(len(test1_en.tokens))

print(len(set(test1_en.tokens)))

test1_en.vocab()

test1_en.plot(20)
 
vader = SentimentIntensityAnalyzer()

print(vader.polarity_scores(trade_docs))


### 단어구름 시각화

from wordcloud import WordCloud

import matplotlib.pyplot as plt
 

# Generate a word cloud image

wordcloud = WordCloud().generate(trade_docs)

plt.imshow(wordcloud, interpolation='bilinear')

plt.axis("off")

 

위의 소스로 인한 실행 결과는 다음과 같다. (이것도 쪼개어서 작성)

 

By Andrew Galbraith4 Min Read* MSCI Asia ex-Japan +0.35%* China blue-chips jump after PBOC announces RRR cut* Trump says Phase 1 trade deal to be signed Jan. 15.* Asian stock markets: tmsnrt.rs/2zpUAr4SHANGHAI, Jan 2 (Reuters) - Asian shares kicked off the new decade higher on Thursday, after global stocks ended the previous one at record highs, and buoyed by Chinese markets after Beijing eased monetary policy to support slowing growth.Investors also cheered news that the United States and China will sign a trade pact soon after a year of volatile negotiations between the world’s two largest economies.MSCI’s broadest index of Asia-Pacific shares outside Japan was up 0.35% in morning trade after rising 5.6% in December.U.S. President Donald Trump said on Tuesday that Phase 1 of trade deal with China would be signed on Jan. 15 at the White House, though uncertainty surrounds details about the agreement.Rising hopes for a resolution to the U.S.-China trade war helped propel global equities to record highs late last year and depress the value of the U.S. dollar.MSCI’s all-country world index of stock performance in 49 nations touched an all-time high of 567.80 on Dec. 27. It was last quoted at 565.46, off 0.41% from that peak.In China, the blue-chip CSI300 index, one of the world’s best-performing indexes last year, was 1.34% higher in early trade.China’s central bank on Wednesday that it would cut the amount of cash that banks must hold as reserves, releasing around 800 billion yuan in funds effective Jan. 6.“I think the monetary angle in terms of what it means for the companies, is not that important,” said Jim McCafferty, head of Asia ex-Japan equity research at Nomura in Hong Kong.“However for what it means for the consumer point of view, then clearly if there’s easy money and ... individuals can borrow cheaply, repay debt quickly, then that of course is going to help the economy and the companies.”McCafferty said he expects a memory up-cycle and new handset development prompted by the rollout of 5G mobile technology could help to lift tech-heavy markets like Korea and Taiwan this year.Australian shares flicked between small gains and losses, and were last up 0.2%. Seoul’s Kospi began the year down 0.85%, while shares in Taiwan added 0.51%.Markets in Japan are closed for a national holiday.The gains in Asia follow a bullish end to the year on Wall Street on Tuesday. The Dow Jones Industrial Average rose 0.27% to 28,538.44 and the S&P 500 gained 0.29% to 3,230.78. The Nasdaq Composite added 0.3% to 8,972.60.In currency markets on Thursday, the dollar continued to weaken slightly against major peers as investors bet on a better outlook for global growth and trade.The dollar was 0.06% weaker against the yen at 108.64 while the euro gained 0.11% to 1.1222.The dollar index, which tracks the greenback against a basket of six rivals, was little changed, rising 0.04% to 96.427.U.S. crude was up 0.36% to $61.28 and global benchmark Brent crude rose to $66.24 per barrel, building on a rise that gave oil its biggest annual gain in three years in 2019.Gold, which has benefited from a weaker greenback, was up 0.18% on the spot market, fetching $1,519.64 per ounce.Reporting by Andrew Galbraith; Editing by Sam HolmesOur Standards: The Thomson Reuters Trust Principles.
분리된 문장 개수 :  9

1번 째 문장 
By Andrew Galbraith4 Min Read* MSCI Asia ex-Japan +0.35%* China blue-chips jump after PBOC announces RRR cut* Trump says Phase 1 trade deal to be signed Jan. 

2번 째 문장 
15. 

3번 째 문장 
* Asian stock markets: tmsnrt.rs/2zpUAr4SHANGHAI, Jan 2 (Reuters) - Asian shares kicked off the new decade higher on Thursday, after global stocks ended the previous one at record highs, and buoyed by Chinese markets after Beijing eased monetary policy to support slowing growth.Investors also cheered news that the United States and China will sign a trade pact soon after a year of volatile negotiations between the world’s two largest economies.MSCI’s broadest index of Asia-Pacific shares outside Japan was up 0.35% in morning trade after rising 5.6% in December.U.S. 

4번 째 문장 
President Donald Trump said on Tuesday that Phase 1 of trade deal with China would be signed on Jan. 15 at the White House, though uncertainty surrounds details about the agreement.Rising hopes for a resolution to the U.S.-China trade war helped propel global equities to record highs late last year and depress the value of the U.S. dollar.MSCI’s all-country world index of stock performance in 49 nations touched an all-time high of 567.80 on Dec. 27. 

5번 째 문장 
It was last quoted at 565.46, off 0.41% from that peak.In China, the blue-chip CSI300 index, one of the world’s best-performing indexes last year, was 1.34% higher in early trade.China’s central bank on Wednesday that it would cut the amount of cash that banks must hold as reserves, releasing around 800 billion yuan in funds effective Jan. 6.“I think the monetary angle in terms of what it means for the companies, is not that important,” said Jim McCafferty, head of Asia ex-Japan equity research at Nomura in Hong Kong.“However for what it means for the consumer point of view, then clearly if there’s easy money and ... individuals can borrow cheaply, repay debt quickly, then that of course is going to help the economy and the companies.”McCafferty said he expects a memory up-cycle and new handset development prompted by the rollout of 5G mobile technology could help to lift tech-heavy markets like Korea and Taiwan this year.Australian shares flicked between small gains and losses, and were last up 0.2%. 

6번 째 문장 
Seoul’s Kospi began the year down 0.85%, while shares in Taiwan added 0.51%.Markets in Japan are closed for a national holiday.The gains in Asia follow a bullish end to the year on Wall Street on Tuesday. 

7번 째 문장 
The Dow Jones Industrial Average rose 0.27% to 28,538.44 and the S&P 500 gained 0.29% to 3,230.78. 

8번 째 문장 
The Nasdaq Composite added 0.3% to 8,972.60.In currency markets on Thursday, the dollar continued to weaken slightly against major peers as investors bet on a better outlook for global growth and trade.The dollar was 0.06% weaker against the yen at 108.64 while the euro gained 0.11% to 1.1222.The dollar index, which tracks the greenback against a basket of six rivals, was little changed, rising 0.04% to 96.427.U.S. 

9번 째 문장 
crude was up 0.36% to $61.28 and global benchmark Brent crude rose to $66.24 per barrel, building on a rise that gave oil its biggest annual gain in three years in 2019.Gold, which has benefited from a weaker greenback, was up 0.18% on the spot market, fetching $1,519.64 per ounce.Reporting by Andrew Galbraith; Editing by Sam HolmesOur Standards: The Thomson Reuters Trust Principles. 

분리한 문장을 다시 단어로 분리 :

단어 개수 : 32
['By', 'Andrew', 'Galbraith4', 'Min', 'Read', '*', 'MSCI', 'Asia', 'ex-Japan', '+0.35', '%', '*', 'China', 'blue-chips', 'jump', 'after', 'PBOC', 'announces', 'RRR', 'cut', '*', 'Trump', 'says', 'Phase', '1', 'trade', 'deal', 'to', 'be', 'signed', 'Jan', '.'] 

단어 개수 : 2
['15', '.'] 

단어 개수 : 101
['*', 'Asian', 'stock', 'markets', ':', 'tmsnrt.rs/2zpUAr4SHANGHAI', ',', 'Jan', '2', '(', 'Reuters', ')', '-', 'Asian', 'shares', 'kicked', 'off', 'the', 'new', 'decade', 'higher', 'on', 'Thursday', ',', 'after', 'global', 'stocks', 'ended', 'the', 'previous', 'one', 'at', 'record', 'highs', ',', 'and', 'buoyed', 'by', 'Chinese', 'markets', 'after', 'Beijing', 'eased', 'monetary', 'policy', 'to', 'support', 'slowing', 'growth.Investors', 'also', 'cheered', 'news', 'that', 'the', 'United', 'States', 'and', 'China', 'will', 'sign', 'a', 'trade', 'pact', 'soon', 'after', 'a', 'year', 'of', 'volatile', 'negotiations', 'between', 'the', 'world', '’', 's', 'two', 'largest', 'economies.MSCI', '’', 's', 'broadest', 'index', 'of', 'Asia-Pacific', 'shares', 'outside', 'Japan', 'was', 'up', '0.35', '%', 'in', 'morning', 'trade', 'after', 'rising', '5.6', '%', 'in', 'December.U.S', '.'] 

단어 개수 : 80
['President', 'Donald', 'Trump', 'said', 'on', 'Tuesday', 'that', 'Phase', '1', 'of', 'trade', 'deal', 'with', 'China', 'would', 'be', 'signed', 'on', 'Jan.', '15', 'at', 'the', 'White', 'House', ',', 'though', 'uncertainty', 'surrounds', 'details', 'about', 'the', 'agreement.Rising', 'hopes', 'for', 'a', 'resolution', 'to', 'the', 'U.S.-China', 'trade', 'war', 'helped', 'propel', 'global', 'equities', 'to', 'record', 'highs', 'late', 'last', 'year', 'and', 'depress', 'the', 'value', 'of', 'the', 'U.S.', 'dollar.MSCI', '’', 's', 'all-country', 'world', 'index', 'of', 'stock', 'performance', 'in', '49', 'nations', 'touched', 'an', 'all-time', 'high', 'of', '567.80', 'on', 'Dec.', '27', '.'] 

단어 개수 : 198
['It', 'was', 'last', 'quoted', 'at', '565.46', ',', 'off', '0.41', '%', 'from', 'that', 'peak.In', 'China', ',', 'the', 'blue-chip', 'CSI300', 'index', ',', 'one', 'of', 'the', 'world', '’', 's', 'best-performing', 'indexes', 'last', 'year', ',', 'was', '1.34', '%', 'higher', 'in', 'early', 'trade.China', '’', 's', 'central', 'bank', 'on', 'Wednesday', 'that', 'it', 'would', 'cut', 'the', 'amount', 'of', 'cash', 'that', 'banks', 'must', 'hold', 'as', 'reserves', ',', 'releasing', 'around', '800', 'billion', 'yuan', 'in', 'funds', 'effective', 'Jan.', '6.', '“', 'I', 'think', 'the', 'monetary', 'angle', 'in', 'terms', 'of', 'what', 'it', 'means', 'for', 'the', 'companies', ',', 'is', 'not', 'that', 'important', ',', '”', 'said', 'Jim', 'McCafferty', ',', 'head', 'of', 'Asia', 'ex-Japan', 'equity', 'research', 'at', 'Nomura', 'in', 'Hong', 'Kong.', '“', 'However', 'for', 'what', 'it', 'means', 'for', 'the', 'consumer', 'point', 'of', 'view', ',', 'then', 'clearly', 'if', 'there', '’', 's', 'easy', 'money', 'and', '...', 'individuals', 'can', 'borrow', 'cheaply', ',', 'repay', 'debt', 'quickly', ',', 'then', 'that', 'of', 'course', 'is', 'going', 'to', 'help', 'the', 'economy', 'and', 'the', 'companies.', '”', 'McCafferty', 'said', 'he', 'expects', 'a', 'memory', 'up-cycle', 'and', 'new', 'handset', 'development', 'prompted', 'by', 'the', 'rollout', 'of', '5G', 'mobile', 'technology', 'could', 'help', 'to', 'lift', 'tech-heavy', 'markets', 'like', 'Korea', 'and', 'Taiwan', 'this', 'year.Australian', 'shares', 'flicked', 'between', 'small', 'gains', 'and', 'losses', ',', 'and', 'were', 'last', 'up', '0.2', '%', '.'] 

단어 개수 : 43
['Seoul', '’', 's', 'Kospi', 'began', 'the', 'year', 'down', '0.85', '%', ',', 'while', 'shares', 'in', 'Taiwan', 'added', '0.51', '%', '.Markets', 'in', 'Japan', 'are', 'closed', 'for', 'a', 'national', 'holiday.The', 'gains', 'in', 'Asia', 'follow', 'a', 'bullish', 'end', 'to', 'the', 'year', 'on', 'Wall', 'Street', 'on', 'Tuesday', '.'] 

단어 개수 : 22
['The', 'Dow', 'Jones', 'Industrial', 'Average', 'rose', '0.27', '%', 'to', '28,538.44', 'and', 'the', 'S', '&', 'P', '500', 'gained', '0.29', '%', 'to', '3,230.78', '.'] 

단어 개수 : 76
['The', 'Nasdaq', 'Composite', 'added', '0.3', '%', 'to', '8,972.60.In', 'currency', 'markets', 'on', 'Thursday', ',', 'the', 'dollar', 'continued', 'to', 'weaken', 'slightly', 'against', 'major', 'peers', 'as', 'investors', 'bet', 'on', 'a', 'better', 'outlook', 'for', 'global', 'growth', 'and', 'trade.The', 'dollar', 'was', '0.06', '%', 'weaker', 'against', 'the', 'yen', 'at', '108.64', 'while', 'the', 'euro', 'gained', '0.11', '%', 'to', '1.1222.The', 'dollar', 'index', ',', 'which', 'tracks', 'the', 'greenback', 'against', 'a', 'basket', 'of', 'six', 'rivals', ',', 'was', 'little', 'changed', ',', 'rising', '0.04', '%', 'to', '96.427.U.S', '.'] 

단어 개수 : 75
['crude', 'was', 'up', '0.36', '%', 'to', '$', '61.28', 'and', 'global', 'benchmark', 'Brent', 'crude', 'rose', 'to', '$', '66.24', 'per', 'barrel', ',', 'building', 'on', 'a', 'rise', 'that', 'gave', 'oil', 'its', 'biggest', 'annual', 'gain', 'in', 'three', 'years', 'in', '2019.Gold', ',', 'which', 'has', 'benefited', 'from', 'a', 'weaker', 'greenback', ',', 'was', 'up', '0.18', '%', 'on', 'the', 'spot', 'market', ',', 'fetching', '$', '1,519.64', 'per', 'ounce.Reporting', 'by', 'Andrew', 'Galbraith', ';', 'Editing', 'by', 'Sam', 'HolmesOur', 'Standards', ':', 'The', 'Thomson', 'Reuters', 'Trust', 'Principles', '.'] 

분리한 문장에서 불용어 제거 :

단어 개수 : 26
['andrew', 'galbraith4', 'min', 'read', '*', 'msci', 'asia', 'ex-japan', '+0.35', '*', 'china', 'blue-chips', 'jump', 'pboc', 'announces', 'rrr', 'cut', '*', 'trump', 'says', 'phase', '1', 'trade', 'deal', 'signed', 'jan'] 

단어 개수 : 1
['15'] 

단어 개수 : 61
['*', 'asian', 'stock', 'markets', 'tmsnrt.rs/2zpuar4shanghai', 'jan', '2', '-', 'asian', 'shares', 'kicked', 'new', 'decade', 'higher', 'thursday', 'global', 'stocks', 'ended', 'previous', 'one', 'record', 'highs', 'buoyed', 'chinese', 'markets', 'beijing', 'eased', 'monetary', 'policy', 'support', 'slowing', 'growth.investors', 'also', 'cheered', 'news', 'united', 'states', 'china', 'sign', 'trade', 'pact', 'soon', 'year', 'volatile', 'negotiations', 'world', 'two', 'largest', 'economies.msci', 'broadest', 'index', 'asia-pacific', 'shares', 'outside', 'japan', '0.35', 'morning', 'trade', 'rising', '5.6', 'december.u.s'] 

단어 개수 : 51
['president', 'donald', 'trump', 'tuesday', 'phase', '1', 'trade', 'deal', 'china', 'would', 'signed', 'jan.', '15', 'white', 'house', 'though', 'uncertainty', 'surrounds', 'details', 'agreement.rising', 'hopes', 'resolution', 'u.s.-china', 'trade', 'war', 'helped', 'propel', 'global', 'equities', 'record', 'highs', 'late', 'last', 'year', 'depress', 'value', 'u.s.', 'dollar.msci', 'all-country', 'world', 'index', 'stock', 'performance', '49', 'nations', 'touched', 'all-time', 'high', '567.80', 'dec.', '27'] 

단어 개수 : 108
['last', 'quoted', '565.46', '0.41', 'peak.in', 'china', 'blue-chip', 'csi300', 'index', 'one', 'world', 'best-performing', 'indexes', 'last', 'year', '1.34', 'higher', 'early', 'trade.china', 'central', 'bank', 'wednesday', 'would', 'cut', 'amount', 'cash', 'banks', 'must', 'hold', 'reserves', 'releasing', 'around', '800', 'billion', 'yuan', 'funds', 'effective', 'jan.', '6.', '“', 'think', 'monetary', 'angle', 'terms', 'means', 'companies', 'important', '”', 'jim', 'mccafferty', 'head', 'asia', 'ex-japan', 'equity', 'research', 'nomura', 'hong', 'kong.', '“', 'however', 'means', 'consumer', 'point', 'view', 'clearly', 'easy', 'money', '...', 'individuals', 'borrow', 'cheaply', 'repay', 'debt', 'quickly', 'course', 'going', 'help', 'economy', 'companies.', '”', 'mccafferty', 'expects', 'memory', 'up-cycle', 'new', 'handset', 'development', 'prompted', 'rollout', '5g', 'mobile', 'technology', 'could', 'help', 'lift', 'tech-heavy', 'markets', 'like', 'korea', 'taiwan', 'year.australian', 'shares', 'flicked', 'small', 'gains', 'losses', 'last', '0.2'] 

단어 개수 : 23
['seoul', 'kospi', 'began', 'year', '0.85', 'shares', 'taiwan', 'added', '0.51', '.markets', 'japan', 'closed', 'national', 'holiday.the', 'gains', 'asia', 'follow', 'bullish', 'end', 'year', 'wall', 'street', 'tuesday'] 

단어 개수 : 13
['dow', 'jones', 'industrial', 'average', 'rose', '0.27', '28,538.44', '&', 'p', '500', 'gained', '0.29', '3,230.78'] 

단어 개수 : 42
['nasdaq', 'composite', 'added', '0.3', '8,972.60.in', 'currency', 'markets', 'thursday', 'dollar', 'continued', 'weaken', 'slightly', 'major', 'peers', 'investors', 'bet', 'better', 'outlook', 'global', 'growth', 'trade.the', 'dollar', '0.06', 'weaker', 'yen', '108.64', 'euro', 'gained', '0.11', '1.1222.the', 'dollar', 'index', 'tracks', 'greenback', 'basket', 'six', 'rivals', 'little', 'changed', 'rising', '0.04', '96.427.u.s'] 

단어 개수 : 43
['crude', '0.36', '$', '61.28', 'global', 'benchmark', 'brent', 'crude', 'rose', '$', '66.24', 'per', 'barrel', 'building', 'rise', 'gave', 'oil', 'biggest', 'annual', 'gain', 'three', 'years', '2019.gold', 'benefited', 'weaker', 'greenback', '0.18', 'spot', 'market', 'fetching', '$', '1,519.64', 'per', 'ounce.reporting', 'andrew', 'galbraith', 'editing', 'sam', 'holmesour', 'standards', 'thomson', 'trust', 'principles'] 

[['andrew', 'galbraith4', 'min', 'read', '*', 'msci', 'asia', 'ex-japan', '+0.35', '*', 'china', 'blue-chips', 'jump', 'pboc', 'announces', 'rrr', 'cut', '*', 'trump', 'says', 'phase', '1', 'trade', 'deal', 'signed', 'jan'], ['15'], ['*', 'asian', 'stock', 'markets', 'tmsnrt.rs/2zpuar4shanghai', 'jan', '2', '-', 'asian', 'shares', 'kicked', 'new', 'decade', 'higher', 'thursday', 'global', 'stocks', 'ended', 'previous', 'one', 'record', 'highs', 'buoyed', 'chinese', 'markets', 'beijing', 'eased', 'monetary', 'policy', 'support', 'slowing', 'growth.investors', 'also', 'cheered', 'news', 'united', 'states', 'china', 'sign', 'trade', 'pact', 'soon', 'year', 'volatile', 'negotiations', 'world', 'two', 'largest', 'economies.msci', 'broadest', 'index', 'asia-pacific', 'shares', 'outside', 'japan', '0.35', 'morning', 'trade', 'rising', '5.6', 'december.u.s'], ['president', 'donald', 'trump', 'tuesday', 'phase', '1', 'trade', 'deal', 'china', 'would', 'signed', 'jan.', '15', 'white', 'house', 'though', 'uncertainty', 'surrounds', 'details', 'agreement.rising', 'hopes', 'resolution', 'u.s.-china', 'trade', 'war', 'helped', 'propel', 'global', 'equities', 'record', 'highs', 'late', 'last', 'year', 'depress', 'value', 'u.s.', 'dollar.msci', 'all-country', 'world', 'index', 'stock', 'performance', '49', 'nations', 'touched', 'all-time', 'high', '567.80', 'dec.', '27'], ['last', 'quoted', '565.46', '0.41', 'peak.in', 'china', 'blue-chip', 'csi300', 'index', 'one', 'world', 'best-performing', 'indexes', 'last', 'year', '1.34', 'higher', 'early', 'trade.china', 'central', 'bank', 'wednesday', 'would', 'cut', 'amount', 'cash', 'banks', 'must', 'hold', 'reserves', 'releasing', 'around', '800', 'billion', 'yuan', 'funds', 'effective', 'jan.', '6.', '“', 'think', 'monetary', 'angle', 'terms', 'means', 'companies', 'important', '”', 'jim', 'mccafferty', 'head', 'asia', 'ex-japan', 'equity', 'research', 'nomura', 'hong', 'kong.', '“', 'however', 'means', 'consumer', 'point', 'view', 'clearly', 'easy', 'money', '...', 'individuals', 'borrow', 'cheaply', 'repay', 'debt', 'quickly', 'course', 'going', 'help', 'economy', 'companies.', '”', 'mccafferty', 'expects', 'memory', 'up-cycle', 'new', 'handset', 'development', 'prompted', 'rollout', '5g', 'mobile', 'technology', 'could', 'help', 'lift', 'tech-heavy', 'markets', 'like', 'korea', 'taiwan', 'year.australian', 'shares', 'flicked', 'small', 'gains', 'losses', 'last', '0.2'], ['seoul', 'kospi', 'began', 'year', '0.85', 'shares', 'taiwan', 'added', '0.51', '.markets', 'japan', 'closed', 'national', 'holiday.the', 'gains', 'asia', 'follow', 'bullish', 'end', 'year', 'wall', 'street', 'tuesday'], ['dow', 'jones', 'industrial', 'average', 'rose', '0.27', '28,538.44', '&', 'p', '500', 'gained', '0.29', '3,230.78'], ['nasdaq', 'composite', 'added', '0.3', '8,972.60.in', 'currency', 'markets', 'thursday', 'dollar', 'continued', 'weaken', 'slightly', 'major', 'peers', 'investors', 'bet', 'better', 'outlook', 'global', 'growth', 'trade.the', 'dollar', '0.06', 'weaker', 'yen', '108.64', 'euro', 'gained', '0.11', '1.1222.the', 'dollar', 'index', 'tracks', 'greenback', 'basket', 'six', 'rivals', 'little', 'changed', 'rising', '0.04', '96.427.u.s'], ['crude', '0.36', '$', '61.28', 'global', 'benchmark', 'brent', 'crude', 'rose', '$', '66.24', 'per', 'barrel', 'building', 'rise', 'gave', 'oil', 'biggest', 'annual', 'gain', 'three', 'years', '2019.gold', 'benefited', 'weaker', 'greenback', '0.18', 'spot', 'market', 'fetching', '$', '1,519.64', 'per', 'ounce.reporting', 'andrew', 'galbraith', 'editing', 'sam', 'holmesour', 'standards', 'thomson', 'trust', 'principles']]
Performing 20 training epochs with 4 threads
Epoch 0
Epoch 1
Epoch 2
Epoch 3
Epoch 4
Epoch 5
Epoch 6
Epoch 7
Epoch 8
Epoch 9
Epoch 10
Epoch 11
Epoch 12
Epoch 13
Epoch 14
Epoch 15
Epoch 16
Epoch 17
Epoch 18
Epoch 19

 

 

########[korea] 과 관계 깊은 단어들 리스트########
[('seoul', 0.32610274393218963), ('andrew', 0.26986231430825436), ('industrial', 0.24519823799252374), ('one', 0.23288489335190174)]
['andrew', 'galbraith4', 'min', 'read', '*', 'msci', 'asia', 'ex-japan', '+0.35', '*', 'china', 'blue-chips', 'jump', 'pboc', 'announces', 'rrr', 'cut', '*', 'trump', 'says', 'phase', '1', 'trade', 'deal', 'signed', 'jan', '15', '*', 'asian', 'stock', 'markets', 'tmsnrt.rs/2zpuar4shanghai', 'jan', '2', '-', 'asian', 'shares', 'kicked', 'new', 'decade', 'higher', 'thursday', 'global', 'stocks', 'ended', 'previous', 'one', 'record', 'highs', 'buoyed', 'chinese', 'markets', 'beijing', 'eased', 'monetary', 'policy', 'support', 'slowing', 'growth.investors', 'also', 'cheered', 'news', 'united', 'states', 'china', 'sign', 'trade', 'pact', 'soon', 'year', 'volatile', 'negotiations', 'world', 'two', 'largest', 'economies.msci', 'broadest', 'index', 'asia-pacific', 'shares', 'outside', 'japan', '0.35', 'morning', 'trade', 'rising', '5.6', 'december.u.s', 'president', 'donald', 'trump', 'tuesday', 'phase', '1', 'trade', 'deal', 'china', 'would', 'signed', 'jan.', '15', 'white', 'house', 'though', 'uncertainty', 'surrounds', 'details', 'agreement.rising', 'hopes', 'resolution', 'u.s.-china', 'trade', 'war', 'helped', 'propel', 'global', 'equities', 'record', 'highs', 'late', 'last', 'year', 'depress', 'value', 'u.s.', 'dollar.msci', 'all-country', 'world', 'index', 'stock', 'performance', '49', 'nations', 'touched', 'all-time', 'high', '567.80', 'dec.', '27', 'last', 'quoted', '565.46', '0.41', 'peak.in', 'china', 'blue-chip', 'csi300', 'index', 'one', 'world', 'best-performing', 'indexes', 'last', 'year', '1.34', 'higher', 'early', 'trade.china', 'central', 'bank', 'wednesday', 'would', 'cut', 'amount', 'cash', 'banks', 'must', 'hold', 'reserves', 'releasing', 'around', '800', 'billion', 'yuan', 'funds', 'effective', 'jan.', '6.', '“', 'think', 'monetary', 'angle', 'terms', 'means', 'companies', 'important', '”', 'jim', 'mccafferty', 'head', 'asia', 'ex-japan', 'equity', 'research', 'nomura', 'hong', 'kong.', '“', 'however', 'means', 'consumer', 'point', 'view', 'clearly', 'easy', 'money', '...', 'individuals', 'borrow', 'cheaply', 'repay', 'debt', 'quickly', 'course', 'going', 'help', 'economy', 'companies.', '”', 'mccafferty', 'expects', 'memory', 'up-cycle', 'new', 'handset', 'development', 'prompted', 'rollout', '5g', 'mobile', 'technology', 'could', 'help', 'lift', 'tech-heavy', 'markets', 'like', 'korea', 'taiwan', 'year.australian', 'shares', 'flicked', 'small', 'gains', 'losses', 'last', '0.2', 'seoul', 'kospi', 'began', 'year', '0.85', 'shares', 'taiwan', 'added', '0.51', '.markets', 'japan', 'closed', 'national', 'holiday.the', 'gains', 'asia', 'follow', 'bullish', 'end', 'year', 'wall', 'street', 'tuesday', 'dow', 'jones', 'industrial', 'average', 'rose', '0.27', '28,538.44', '&', 'p', '500', 'gained', '0.29', '3,230.78', 'nasdaq', 'composite', 'added', '0.3', '8,972.60.in', 'currency', 'markets', 'thursday', 'dollar', 'continued', 'weaken', 'slightly', 'major', 'peers', 'investors', 'bet', 'better', 'outlook', 'global', 'growth', 'trade.the', 'dollar', '0.06', 'weaker', 'yen', '108.64', 'euro', 'gained', '0.11', '1.1222.the', 'dollar', 'index', 'tracks', 'greenback', 'basket', 'six', 'rivals', 'little', 'changed', 'rising', '0.04', '96.427.u.s', 'crude', '0.36', '$', '61.28', 'global', 'benchmark', 'brent', 'crude', 'rose', '$', '66.24', 'per', 'barrel', 'building', 'rise', 'gave', 'oil', 'biggest', 'annual', 'gain', 'three', 'years', '2019.gold', 'benefited', 'weaker', 'greenback', '0.18', 'spot', 'market', 'fetching', '$', '1,519.64', 'per', 'ounce.reporting', 'andrew', 'galbraith', 'editing', 'sam', 'holmesour', 'standards', 'thomson', 'trust', 'principles']
368
293
{'neg': 0.056, 'neu': 0.836, 'pos': 0.108, 'compound': 0.9776}

뉴스기사 텍스트 분석
기사 분석 단어 Count 그래프
뉴스기사 텍스트 단어 분석
기사 데이터 단어구름(wordcloud) 분석

 

+ Recent posts