動機: 由於我們未來有比較大的可能從事軟體相關的工作,因此細部討論軟體版
目標: 觀察關係相近詞彙是否能歸類同一主題
遇到的困難和解決方法:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import re
import jieba
import jieba.analyse
import math
from nltk import ngrams, FreqDist
from collections import Counter, namedtuple
import networkx as nx
from sklearn.feature_extraction.text import CountVectorizer,TfidfTransformer
from sklearn.metrics.pairwise import cosine_similarity
# # 設定圖的中文字體 (無法顯示的話可以試試‘Microsoft JhengHei’字體)
# # 也可參考:https://pyecontech.com/2020/03/27/python_matplotlib_chinese/
# plt.rcParams['font.sans-serif'] = ['Arial Unicode Ms'] #使圖中中文能正常顯示
# plt.rcParams['axes.unicode_minus'] = False #使負號能夠顯示
# 微軟正黑體
# !apt-get -y install fonts-noto-cjk
# plt.rcParams['font.sans-serif'] = ['Noto Sans CJK TC']
# plt.rcParams['axes.unicode_minus'] = False
原本用作業的方法沒辦法顯示中文,所以換個方法直接下載網路上的字型並做解壓縮後再匯入matplotlib套件。
import matplotlib
# cloab 字體設定
!wget -O taipei_sans_tc_beta.ttf https://drive.google.com/uc?id=1eGAsTN1HBpJAkeVM57_C7ccp7hbgSz3_&export=download
# 新增字體
matplotlib.font_manager.fontManager.addfont('taipei_sans_tc_beta.ttf')
# 將 font-family 設為 Taipei Sans TC Beta
# 設定完後,之後的圖表都可以顯示中文了
matplotlib.rc('font', family='Taipei Sans TC Beta')
--2025-04-17 07:02:48-- https://drive.google.com/uc?id=1eGAsTN1HBpJAkeVM57_C7ccp7hbgSz3_ Resolving drive.google.com (drive.google.com)... 64.233.179.138, 64.233.179.101, 64.233.179.100, ... Connecting to drive.google.com (drive.google.com)|64.233.179.138|:443... connected. HTTP request sent, awaiting response... 303 See Other Location: https://drive.usercontent.google.com/download?id=1eGAsTN1HBpJAkeVM57_C7ccp7hbgSz3_ [following] --2025-04-17 07:02:48-- https://drive.usercontent.google.com/download?id=1eGAsTN1HBpJAkeVM57_C7ccp7hbgSz3_ Resolving drive.usercontent.google.com (drive.usercontent.google.com)... 74.125.69.132, 2607:f8b0:4001:c01::84 Connecting to drive.usercontent.google.com (drive.usercontent.google.com)|74.125.69.132|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 20659344 (20M) [application/octet-stream] Saving to: ‘taipei_sans_tc_beta.ttf’ taipei_sans_tc_beta 100%[===================>] 19.70M 129MB/s in 0.2s 2025-04-17 07:02:51 (129 MB/s) - ‘taipei_sans_tc_beta.ttf’ saved [20659344/20659344]
# 下載 GitHub 中的 dataset
!git clone https://github.com/leo85741/dataset.git
%cd dataset
!unzip softjob_23_25.csv.zip
Cloning into 'dataset'... remote: Enumerating objects: 25, done. remote: Counting objects: 100% (25/25), done. remote: Compressing objects: 100% (23/23), done. remote: Total 25 (delta 2), reused 0 (delta 0), pack-reused 0 (from 0) Receiving objects: 100% (25/25), 7.19 MiB | 14.56 MiB/s, done. Resolving deltas: 100% (2/2), done. /content/dataset Archive: softjob_23_25.csv.zip inflating: softjob_23_25.csv inflating: __MACOSX/._softjob_23_25.csv
#匯入資料
df = pd.read_csv('softjob_23_25.csv', encoding = 'UTF-8')
df.head(3)
system_id | artUrl | artTitle | artDate | artPoster | artCatagory | artContent | artComment | e_ip | insertedDate | dataSource | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | appleboy46 | Soft_Job | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | [{"cmtStatus": "→", "cmtPoster": "loadingN", "... | 123.110.136.13 | 2023-01-02 02:02:59 | ptt |
1 | 2 | https://www.ptt.cc/bbs/Soft_Job/M.1672559293.A... | [請益]北漂Offer金融vs假外商 | 2023-01-01 15:48:11 | carsun00 | Soft_Job | 背景:\n 私立資工學士\n 軟體經驗5Y,後端為主,可以支援前端/CICD\n\n ... | [{"cmtStatus": "推", "cmtPoster": "xyzb", "cmtC... | 111.252.104.32 | 2023-01-02 02:02:59 | ptt |
2 | 3 | https://www.ptt.cc/bbs/Soft_Job/M.1672571470.A... | [請益]有人的公司也沒有提供API文件的嗎 | 2023-01-01 19:11:08 | cv123741 | Soft_Job | 安安\n\n小弟剛轉前端,進到一家接案公司寫網頁,工作大概9成都在接API,\n但公司內部沒... | [{"cmtStatus": "推", "cmtPoster": "newbout", "c... | 112.78.88.96 | 2023-01-02 02:03:00 | ptt |
MetaData = df.copy()
# 去除一些不需要的欄位
MetaData = MetaData.drop(['artPoster', 'artCatagory', 'artComment', 'e_ip', 'insertedDate', 'dataSource'], axis=1)
# 只留下中文字 (artContent -> sentence)
MetaData['sentence'] = MetaData['artContent'].apply(lambda x: re.sub('[^\u4e00-\u9fff]+', '', str(x)) if isinstance(x, str) else '')
MetaData.head(3)
system_id | artUrl | artTitle | artDate | artContent | sentence | |
---|---|---|---|---|---|---|
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... |
1 | 2 | https://www.ptt.cc/bbs/Soft_Job/M.1672559293.A... | [請益]北漂Offer金融vs假外商 | 2023-01-01 15:48:11 | 背景:\n 私立資工學士\n 軟體經驗5Y,後端為主,可以支援前端/CICD\n\n ... | 背景私立資工學士軟體經驗後端為主可以支援前端是碩士價廢牡蠣假外商單位產險體系資融金融顧問部分... |
2 | 3 | https://www.ptt.cc/bbs/Soft_Job/M.1672571470.A... | [請益]有人的公司也沒有提供API文件的嗎 | 2023-01-01 19:11:08 | 安安\n\n小弟剛轉前端,進到一家接案公司寫網頁,工作大概9成都在接API,\n但公司內部沒... | 安安小弟剛轉前端進到一家接案公司寫網頁工作大概成都在接但公司內部沒有提供規格文件讓我參考導致... |
# 設定繁體中文詞庫
jieba.set_dictionary('./dict/dict.txt.big')
# 新增stopwords
with open('./dict/stopwords.txt',encoding="utf-8") as f:
stopWords = [line.strip() for line in f.readlines()]
# 設定斷詞 function
def getToken(row):
seg_list = jieba.lcut(row)
exclude_words = {'一三五', '一一列舉', '真的', '</s>'}
seg_list = [
w for w in seg_list
if w not in stopWords
and len(w) > 1
and not re.match(r'(.)\1+$', w) # 篩掉重複字詞
and w not in exclude_words # 自訂排除
# and not w.startswith('一')
]
return seg_list
data = MetaData.copy()
# 斷詞、去除停用字並將word欄位展開
# data['word'] = data.sentence.apply(getToken).explode('word')
data['word'] = data['sentence'].apply(getToken)
data = data.explode('word')
data.head(3)
system_id | artUrl | artTitle | artDate | artContent | sentence | word | |
---|---|---|---|---|---|---|---|
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 文字 |
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 教學 |
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 教學 |
data
system_id | artUrl | artTitle | artDate | artContent | sentence | word | |
---|---|---|---|---|---|---|---|
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 文字 |
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 教學 |
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 教學 |
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 影片 |
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 範例 |
... | ... | ... | ... | ... | ... | ... | ... |
1546 | 1547 | https://www.ptt.cc/bbs/Soft_Job/M.1743354201.A... | [心得]8年的博弈業工作心得 | 2025-03-31 01:03:18 | 受惠軟體開發板各位大大分享,我也來分享一些在博弈軟體開發工作的心得\n\n網頁好讀版\nht... | 受惠軟體開發板各位大大分享我也來分享一些在博弈軟體開發工作的心得網頁好讀版從事軟體開發也有年... | 猶豫 |
1546 | 1547 | https://www.ptt.cc/bbs/Soft_Job/M.1743354201.A... | [心得]8年的博弈業工作心得 | 2025-03-31 01:03:18 | 受惠軟體開發板各位大大分享,我也來分享一些在博弈軟體開發工作的心得\n\n網頁好讀版\nht... | 受惠軟體開發板各位大大分享我也來分享一些在博弈軟體開發工作的心得網頁好讀版從事軟體開發也有年... | 博弈 |
1546 | 1547 | https://www.ptt.cc/bbs/Soft_Job/M.1743354201.A... | [心得]8年的博弈業工作心得 | 2025-03-31 01:03:18 | 受惠軟體開發板各位大大分享,我也來分享一些在博弈軟體開發工作的心得\n\n網頁好讀版\nht... | 受惠軟體開發板各位大大分享我也來分享一些在博弈軟體開發工作的心得網頁好讀版從事軟體開發也有年... | 軟體 |
1546 | 1547 | https://www.ptt.cc/bbs/Soft_Job/M.1743354201.A... | [心得]8年的博弈業工作心得 | 2025-03-31 01:03:18 | 受惠軟體開發板各位大大分享,我也來分享一些在博弈軟體開發工作的心得\n\n網頁好讀版\nht... | 受惠軟體開發板各位大大分享我也來分享一些在博弈軟體開發工作的心得網頁好讀版從事軟體開發也有年... | 開發 |
1546 | 1547 | https://www.ptt.cc/bbs/Soft_Job/M.1743354201.A... | [心得]8年的博弈業工作心得 | 2025-03-31 01:03:18 | 受惠軟體開發板各位大大分享,我也來分享一些在博弈軟體開發工作的心得\n\n網頁好讀版\nht... | 受惠軟體開發板各位大大分享我也來分享一些在博弈軟體開發工作的心得網頁好讀版從事軟體開發也有年... | 工作 |
168854 rows × 7 columns
TF-IDF 是一種統計方法,可用來評估單詞對於文件的集合的重要程度
使用sklearn中計算詞頻與tf-idf的套件。
DTM
Document term matrix (DTM),是一種用於自然語言處理的數學矩陣,描述了在一組文件中各個詞彙出現的頻率。 DTM 中的每一行代表一個文件(Document),每一列代表一個詞彙(Term),每一格的值表示該詞彙在該文件中的出現次數。
softjob_df = MetaData.copy()
softjob_df.head(3)
system_id | artUrl | artTitle | artDate | artContent | sentence | |
---|---|---|---|---|---|---|
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... |
1 | 2 | https://www.ptt.cc/bbs/Soft_Job/M.1672559293.A... | [請益]北漂Offer金融vs假外商 | 2023-01-01 15:48:11 | 背景:\n 私立資工學士\n 軟體經驗5Y,後端為主,可以支援前端/CICD\n\n ... | 背景私立資工學士軟體經驗後端為主可以支援前端是碩士價廢牡蠣假外商單位產險體系資融金融顧問部分... |
2 | 3 | https://www.ptt.cc/bbs/Soft_Job/M.1672571470.A... | [請益]有人的公司也沒有提供API文件的嗎 | 2023-01-01 19:11:08 | 安安\n\n小弟剛轉前端,進到一家接案公司寫網頁,工作大概9成都在接API,\n但公司內部沒... | 安安小弟剛轉前端進到一家接案公司寫網頁工作大概成都在接但公司內部沒有提供規格文件讓我參考導致... |
# 留下要用到的欄位
softjob_df = softjob_df.loc[:,["system_id", "sentence"]]
# 使用空格連接斷完的詞
softjob_df['word'] = softjob_df.sentence.apply(getToken).map(' '.join)
softjob_df.head()
system_id | sentence | word | |
---|---|---|---|
0 | 1 | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 文字 教學 教學 影片 範例 程式 系統 架構圖 本篇 取消 執行 工作 系統 內有 資源 ... |
1 | 2 | 背景私立資工學士軟體經驗後端為主可以支援前端是碩士價廢牡蠣假外商單位產險體系資融金融顧問部分... | 背景 私立 資工 學士 軟體 經驗 支援 前端 碩士 價廢 牡蠣 外商 單位 產險 體系 資... |
2 | 3 | 安安小弟剛轉前端進到一家接案公司寫網頁工作大概成都在接但公司內部沒有提供規格文件讓我參考導致... | 小弟 剛轉 前端 一家 接案 公司 網頁 工作 成都 公司 內部 提供 規格 文件 參考 導... |
3 | 4 | 幾種可能做法寫在假如有用開給後端做請後端完成後將他測試時的貼上去這對新應該沒什麼問題舊有的就... | 幾種 做法 有用 開給後端 請後端 將他 測試 貼上去 這對 沒什麼 舊有 記在 帳號 下次... |
4 | 5 | 從網路上的資訊得知如果是從事韌體開發則用的都是如果是從事開發則用的都是之類的語言那麼的發展空... | 網路上 資訊 得知 韌體 開發 則用 開發 則用 語言 發展 空間 領域 有人 |
# Bag of Word
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(softjob_df["word"])
vocabulary = vectorizer.get_feature_names_out()
# 轉成 dataframe
DTM_df = pd.DataFrame(columns = vocabulary, data = X.toarray())
DTM_df
一下下 | 一下全部 | 一下子 | 一不小心 | 一不鳥 | 一並 | 一串 | 一事無成 | 一二 | 一二一 | ... | 齊聚一堂 | 龍年 | 龍心 | 龍滑 | 龍潭 | 龍頭 | 龐大 | 龜大 | 龜山 | 龜毛 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1542 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1543 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1544 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1545 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1546 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1547 rows × 24580 columns
transformer = TfidfTransformer()
# 將詞頻矩陣X統計成TF-IDF值
tfidf = transformer.fit_transform(X)
# 轉成dataframe
TFIDF_df = pd.DataFrame(columns = vocabulary, data = tfidf.toarray())
TFIDF_df
一下下 | 一下全部 | 一下子 | 一不小心 | 一不鳥 | 一並 | 一串 | 一事無成 | 一二 | 一二一 | ... | 齊聚一堂 | 龍年 | 龍心 | 龍滑 | 龍潭 | 龍頭 | 龐大 | 龜大 | 龜山 | 龜毛 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1542 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1543 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1544 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1545 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1546 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1547 rows × 24580 columns
softjob_tfidf = TFIDF_df.mean().to_frame().reset_index() ## !!
softjob_tfidf.columns = ["word", "avg"]
softjob_tfidf.sort_values('avg', ascending = False).head(10)
word | avg | |
---|---|---|
3484 | 公司 | 0.041759 |
8525 | 工作 | 0.030058 |
23816 | 面試 | 0.025904 |
23082 | 開發 | 0.020176 |
8582 | 工程師 | 0.018448 |
10634 | 技術 | 0.018176 |
16857 | 程式 | 0.016935 |
12423 | 時間 | 0.016600 |
17469 | 系統 | 0.015202 |
21324 | 軟體 | 0.015185 |
toptens = TFIDF_df.copy()
toptens.insert(0, 'doc_id', toptens.index+1)
toptens
doc_id | 一下下 | 一下全部 | 一下子 | 一不小心 | 一不鳥 | 一並 | 一串 | 一事無成 | 一二 | ... | 齊聚一堂 | 龍年 | 龍心 | 龍滑 | 龍潭 | 龍頭 | 龐大 | 龜大 | 龜山 | 龜毛 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1 | 2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | 3 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | 4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | 5 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1542 | 1543 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1543 | 1544 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1544 | 1545 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1545 | 1546 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1546 | 1547 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1547 rows × 24581 columns
toptens = toptens.melt(id_vars = "doc_id", var_name = "word", value_name = 'tfidf')
toptens
doc_id | word | tfidf | |
---|---|---|---|
0 | 1 | 一下下 | 0.0 |
1 | 2 | 一下下 | 0.0 |
2 | 3 | 一下下 | 0.0 |
3 | 4 | 一下下 | 0.0 |
4 | 5 | 一下下 | 0.0 |
... | ... | ... | ... |
38025255 | 1543 | 龜毛 | 0.0 |
38025256 | 1544 | 龜毛 | 0.0 |
38025257 | 1545 | 龜毛 | 0.0 |
38025258 | 1546 | 龜毛 | 0.0 |
38025259 | 1547 | 龜毛 | 0.0 |
38025260 rows × 3 columns
(
# 從每篇文章挑選出tf-idf最大的前十個詞
toptens.groupby("doc_id").apply(lambda x : x.nlargest(10, "tfidf")).reset_index(drop=True)
# 計算每個詞被選中的次數
.groupby(['word'],as_index=False).size()
).sort_values('size', ascending=False).head(10) # 排序看前十名
<ipython-input-81-1a4971485870>:3: DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning. toptens.groupby("doc_id").apply(lambda x : x.nlargest(10, "tfidf")).reset_index(drop=True)
word | size | |
---|---|---|
1334 | 公司 | 111 |
8086 | 面試 | 76 |
0 | 一下下 | 53 |
7891 | 開發 | 53 |
3160 | 工作 | 48 |
6156 | 系統 | 45 |
1 | 一下全部 | 43 |
3924 | 技術 | 42 |
3193 | 工程師 | 35 |
2023 | 台灣 | 34 |
以上結果可以發現討論的焦點都是軟體工程師、工作相關的字眼,如 面試、工作、程式、系統開發等等。
N-gram 指文本中連續出現的n個語詞。
透過N-gram我們可以找出有哪些詞彙較常一起出現,檢查是否需要加入自定義字典中。
# 設定 ngram 斷詞 function
def ngram_getToken(row, n):
# 進行斷詞
seg_list = jieba.lcut(row)
# 篩選掉停用字與字元數小於1的詞彙
seg_list = [w for w in seg_list if w not in stopWords and len(w)>1]
# ngram斷詞
seg_list = ngrams(seg_list, n)
seg_list = [" ".join(w) for w in list(seg_list)]
return seg_list
softjob_bigram = MetaData.copy()
softjob_bigram["word"] = softjob_bigram['sentence'].apply(lambda row: ngram_getToken(row, 2))
softjob_bigram = softjob_bigram.explode('word')
softjob_bigram.head(3)
system_id | artUrl | artTitle | artDate | artContent | sentence | word | |
---|---|---|---|---|---|---|---|
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 文字 教學 |
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 教學 教學 |
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 教學 影片 |
# 計算每個組合出現的次數
softjob_bigram_count = softjob_bigram['word'].value_counts().reset_index()
softjob_bigram_count.sort_values('count', ascending=False).head(10)
word | count | |
---|---|---|
0 | 軟體 開發 | 83 |
1 | 自備 工具 | 82 |
2 | 員工 自備 | 81 |
3 | 加班費 制度 | 79 |
5 | 保證 最低年薪 | 72 |
4 | 能力 經歷 | 72 |
7 | 徵才 聯絡方式 | 71 |
6 | 薪資 保證 | 71 |
9 | 最低年薪 必填 | 68 |
8 | 必填 項目 | 68 |
softjob_bigram_count.sort_values('count', ascending=False).head(60) # 方便後續做lexicon
word | count | |
---|---|---|
0 | 軟體 開發 | 83 |
1 | 自備 工具 | 82 |
2 | 員工 自備 | 81 |
3 | 加班費 制度 | 79 |
5 | 保證 最低年薪 | 72 |
4 | 能力 經歷 | 72 |
7 | 徵才 聯絡方式 | 71 |
6 | 薪資 保證 | 71 |
9 | 最低年薪 必填 | 68 |
8 | 必填 項目 | 68 |
10 | 每日 工作時間 | 66 |
11 | 職缺 能力 | 64 |
12 | 工作 福利 | 62 |
14 | 接案 公司 | 60 |
13 | 計算 方式 | 60 |
17 | 詳細 至號 | 59 |
15 | 公司名稱 統編 | 59 |
16 | 填寫 詳細 | 59 |
18 | 公司 介紹 | 58 |
19 | 薪資 月薪 | 57 |
20 | 每周 工作時間 | 57 |
21 | 註冊 可免 | 56 |
22 | 中華民國 註冊 | 56 |
23 | 統編 中華民國 | 56 |
24 | 年終獎金 計算 | 56 |
25 | 人資 徵才 | 55 |
30 | 工作環境 職缺 | 54 |
29 | 工具 薪資 | 54 |
28 | 職缺 團隊 | 54 |
27 | 團隊 介紹 | 54 |
26 | 公司地址 填寫 | 54 |
31 | 公司 分紅 | 50 |
32 | 資深 工程師 | 50 |
33 | 超過 小時 | 49 |
34 | 分紅 獎金 | 49 |
36 | 公司 公司 | 46 |
35 | 項目 年終獎金 | 46 |
37 | 面試 過程 | 44 |
38 | 公司 面試 | 43 |
39 | 面試 流程 | 41 |
40 | 面試 面試 | 41 |
41 | 第一份 工作 | 38 |
42 | 前端 工程師 | 38 |
43 | 新創 公司 | 37 |
44 | 相關 經驗 | 37 |
45 | 全薪 計算 | 37 |
46 | 邀約 面試 | 36 |
47 | 工時 每日 | 36 |
48 | 系統 設計 | 35 |
49 | 開發 工程師 | 34 |
50 | 中午 休息 | 34 |
51 | 這件 事情 | 34 |
53 | 科技 公司 | 33 |
54 | 系統 開發 | 33 |
52 | 程式 設計師 | 33 |
55 | 工作時間 加班費 | 32 |
56 | 技術 面試 | 32 |
57 | 開發 流程 | 32 |
58 | 公司 工作 | 32 |
59 | 內部 系統 | 31 |
softjob_trigram = MetaData.copy()
softjob_trigram["word"] = softjob_trigram.sentence.apply(lambda row: ngram_getToken(row, 3))
softjob_trigram = softjob_trigram.explode('word')
softjob_trigram.head(3)
system_id | artUrl | artTitle | artDate | artContent | sentence | word | |
---|---|---|---|---|---|---|---|
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 文字 教學 教學 |
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 教學 教學 影片 |
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 教學 影片 範例 |
# 計算每個組合出現的次數
softjob_trigram_count = softjob_trigram['word'].value_counts().reset_index()
softjob_trigram_count.sort_values('count', ascending=False).head(10)
word | count | |
---|---|---|
0 | 員工 自備 工具 | 81 |
1 | 薪資 保證 最低年薪 | 71 |
2 | 最低年薪 必填 項目 | 68 |
3 | 保證 最低年薪 必填 | 68 |
4 | 職缺 能力 經歷 | 64 |
5 | 填寫 詳細 至號 | 59 |
6 | 統編 中華民國 註冊 | 56 |
7 | 中華民國 註冊 可免 | 56 |
8 | 年終獎金 計算 方式 | 56 |
9 | 公司名稱 統編 中華民國 | 56 |
softjob_trigram_count.sort_values('count', ascending=False).head(40) # 方便後續做lexicon
word | count | |
---|---|---|
0 | 員工 自備 工具 | 81 |
1 | 薪資 保證 最低年薪 | 71 |
2 | 最低年薪 必填 項目 | 68 |
3 | 保證 最低年薪 必填 | 68 |
4 | 職缺 能力 經歷 | 64 |
5 | 填寫 詳細 至號 | 59 |
6 | 統編 中華民國 註冊 | 56 |
7 | 中華民國 註冊 可免 | 56 |
8 | 年終獎金 計算 方式 | 56 |
9 | 公司名稱 統編 中華民國 | 56 |
10 | 人資 徵才 聯絡方式 | 55 |
11 | 公司地址 填寫 詳細 | 54 |
12 | 工作環境 職缺 團隊 | 54 |
13 | 自備 工具 薪資 | 54 |
14 | 職缺 團隊 介紹 | 54 |
15 | 公司 分紅 獎金 | 49 |
16 | 必填 項目 年終獎金 | 46 |
17 | 項目 年終獎金 計算 | 45 |
18 | 工時 每日 工作時間 | 35 |
19 | 工作時間 加班費 制度 | 31 |
20 | 工具 薪資 月薪 | 29 |
21 | 加班費 制度 勞基法 | 28 |
22 | 每周 工作時間 加班費 | 27 |
24 | 中午 休息 每周 | 27 |
23 | 休息 每周 工作時間 | 27 |
28 | 平日 小時 工資額 | 24 |
30 | 小時 以內 平日 | 24 |
29 | 以內 平日 小時 | 24 |
27 | 小時 工資額 加給 | 24 |
26 | 工作時間 小時 以內 | 24 |
25 | 延長 工作時間 小時 | 24 |
31 | 工時 加班費 必填 | 23 |
32 | 八小時 中午 休息 | 23 |
33 | 加班費 必填 不填 | 23 |
34 | 必填 不填 刪文 | 23 |
35 | 每日 工作時間 八小時 | 23 |
36 | 不填 刪文 水桶 | 23 |
37 | 刪文 水桶 工時 | 22 |
38 | 工作時間 八小時 中午 | 22 |
39 | 計算 方式 全薪 | 22 |
從上面的 bigram 和 trigram 的結果中,我們發現"餐廳 名稱"、"服務 人員"等詞可以組合在一起,所以我們增加自定義字典來幫助斷詞能更準確。
我們將詞彙整理好存在 dict 文件夾中的 buffet_lexicon.txt 中。
# 新增軟體工作相關自定義字典
jieba.load_userdict('./dict/softjob_lexicon.txt')
使用自建辭典進行斷詞與計算
# 剛才的斷詞結果沒有使用新增的辭典,因此我們重新進行斷詞
data2 = MetaData.copy()
data2['word'] = data2.sentence.apply(getToken)
data2 = data2.explode('word')
data2.head(3)
system_id | artUrl | artTitle | artDate | artContent | sentence | word | |
---|---|---|---|---|---|---|---|
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 文字 |
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 教學 |
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 教學 |
更新斷詞字典後重新進行bigram斷詞
bigramfdist = MetaData.copy()
bigramfdist["word"] = bigramfdist['sentence'].apply(lambda row: ngram_getToken(row, 2))
bigramfdist = bigramfdist.explode('word')
bigramfdist.head(3)
system_id | artUrl | artTitle | artDate | artContent | sentence | word | |
---|---|---|---|---|---|---|---|
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 文字 教學 |
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 教學 教學 |
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 教學 影片 |
bigramfdist['word'] = bigramfdist['word'].astype(str)
# 使用FreqDist 取得 bigram 斷詞 與 bigram 出現頻率
bigramfdist = FreqDist(bigramfdist['word'].apply(lambda x: tuple(x.split(' '))))
bigramfdist.most_common(5)
[(('軟體', '開發'), 83), (('自備', '工具'), 82), (('員工', '自備'), 81), (('加班費', '制度'), 79), (('能力', '經歷'), 72)]
針對重新斷詞後的bigram出現頻率最高的前50對進行視覺化,觀察文章的關鍵詞對
# 建立bigram和count的dictionary
# 這裡取最多的前50項
d = {k:v for k,v in bigramfdist.most_common(50)}
# Create network plot
G = nx.Graph()
# 建立 nodes 間的連結
for k, v in [d][0].items():
G.add_edge(k[0], k[1], weight=v) # nodes:詞彙,weight:組合出現頻率
# 取得調整edge權重
weights = [w[2]['weight']*0.01 for w in G.edges(data=True)]
fig, ax = plt.subplots(figsize=(12, 10))
pos = nx.spring_layout(G, k=1.5)
# networks
nx.draw_networkx(G, pos,
font_size=16,
width=weights,
edge_color='grey',
node_color='purple',
with_labels = False,
ax=ax)
# 增加 labels
for key, value in pos.items():
x, y = value[0]+.07, value[1]+.045
ax.text(x, y,
s=key,
bbox=dict(facecolor='red', alpha=0.25),
horizontalalignment='center', fontsize=13)
plt.show()
計算兩個詞彙間的相關性 Pearson correlation
data_cor = MetaData.copy()
# 需要改成使用空格連接斷好的詞
data_cor['word'] = data_cor.sentence.apply(getToken).map(' '.join)
data_cor.head(3)
system_id | artUrl | artTitle | artDate | artContent | sentence | word | |
---|---|---|---|---|---|---|---|
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 文字 教學 教學 影片 範例 程式 系統 架構圖 本篇 取消 執行 工作 系統 內有 資源 ... |
1 | 2 | https://www.ptt.cc/bbs/Soft_Job/M.1672559293.A... | [請益]北漂Offer金融vs假外商 | 2023-01-01 15:48:11 | 背景:\n 私立資工學士\n 軟體經驗5Y,後端為主,可以支援前端/CICD\n\n ... | 背景私立資工學士軟體經驗後端為主可以支援前端是碩士價廢牡蠣假外商單位產險體系資融金融顧問部分... | 背景 私立 資工 學士 軟體 經驗 支援 前端 碩士 價廢 牡蠣 外商 單位 產險 體系 資... |
2 | 3 | https://www.ptt.cc/bbs/Soft_Job/M.1672571470.A... | [請益]有人的公司也沒有提供API文件的嗎 | 2023-01-01 19:11:08 | 安安\n\n小弟剛轉前端,進到一家接案公司寫網頁,工作大概9成都在接API,\n但公司內部沒... | 安安小弟剛轉前端進到一家接案公司寫網頁工作大概成都在接但公司內部沒有提供規格文件讓我參考導致... | 小弟 剛轉 前端 一家 接案 公司 網頁 工作 成都 公司 內部 提供 規格 文件 參考 導... |
# Bag of Word
# 篩選至少出現在5篇文章以上且詞頻前300的詞彙
vectorizer = CountVectorizer(min_df = 5, max_features = 300)
X = vectorizer.fit_transform(data_cor["word"])
vocabulary = vectorizer.get_feature_names_out()
# 轉成dataframe
DTM_df = pd.DataFrame(columns = vocabulary, data = X.toarray())
DTM_df
一位 | 一個月 | 一堆 | 一家 | 一年 | 三個 | 上班 | 上課 | 下班 | 不好 | ... | 雲端 | 電腦 | 需求 | 面試 | 面試官 | 項目 | 領域 | 題目 | 類似 | 體驗 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1542 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1543 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | ... | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
1544 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1545 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1546 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1547 rows × 300 columns
# 計算詞之間的相關係數
corr_matrix = np.corrcoef(DTM_df.T)
# 轉成dataframe
Cor_df = pd.DataFrame(corr_matrix, index = DTM_df.columns, columns = DTM_df.columns)
Cor_df.insert(0, 'word1', Cor_df.columns)
Cor_df.reset_index(inplace = True, drop = True)
Cor_df
word1 | 一位 | 一個月 | 一堆 | 一家 | 一年 | 三個 | 上班 | 上課 | 下班 | ... | 雲端 | 電腦 | 需求 | 面試 | 面試官 | 項目 | 領域 | 題目 | 類似 | 體驗 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 一位 | 1.000000 | 0.159846 | -1.765706e-03 | 0.104942 | 0.033851 | 0.099022 | 0.032960 | -0.014102 | -0.008807 | ... | 0.065139 | 0.051655 | 0.199537 | 0.415864 | 0.334704 | -0.012056 | 0.028774 | 0.270608 | 0.177241 | 6.448712e-02 |
1 | 一個月 | 0.159846 | 1.000000 | 3.571678e-02 | 0.127704 | 0.145364 | 0.224112 | 0.049390 | 0.011228 | 0.071302 | ... | -0.025550 | 0.079238 | 0.111684 | 0.171787 | 0.112428 | 0.029049 | 0.013733 | 0.185146 | 0.127701 | 9.947623e-02 |
2 | 一堆 | -0.001766 | 0.035717 | 1.000000e+00 | 0.037403 | 0.026721 | 0.028623 | -0.019040 | 0.038307 | 0.067070 | ... | -0.033117 | 0.004233 | 0.074356 | 0.009608 | 0.003247 | -0.062307 | 0.037858 | 0.018902 | 0.160758 | -1.053095e-16 |
3 | 一家 | 0.104942 | 0.127704 | 3.740258e-02 | 1.000000 | 0.093270 | 0.121934 | 0.033317 | -0.013980 | -0.005988 | ... | 0.021367 | 0.026858 | 0.067133 | 0.153141 | 0.071573 | 0.001990 | 0.068545 | 0.086312 | 0.083040 | 5.701817e-02 |
4 | 一年 | 0.033851 | 0.145364 | 2.672114e-02 | 0.093270 | 1.000000 | 0.252865 | 0.113868 | 0.026349 | 0.157107 | ... | 0.053547 | 0.035168 | 0.061017 | 0.123327 | 0.043871 | 0.041406 | 0.057836 | 0.078071 | 0.168020 | 2.412222e-01 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
295 | 項目 | -0.012056 | 0.029049 | -6.230745e-02 | 0.001990 | 0.041406 | 0.004227 | 0.092823 | -0.014001 | 0.000815 | ... | 0.032305 | 0.043758 | 0.087745 | 0.005791 | -0.019543 | 1.000000 | 0.060876 | 0.041924 | 0.030155 | 1.582617e-02 |
296 | 領域 | 0.028774 | 0.013733 | 3.785825e-02 | 0.068545 | 0.057836 | 0.086005 | 0.038893 | 0.008542 | 0.054025 | ... | 0.095111 | 0.028211 | 0.064299 | 0.083822 | 0.068510 | 0.060876 | 1.000000 | 0.046278 | 0.159528 | 8.207269e-02 |
297 | 題目 | 0.270608 | 0.185146 | 1.890225e-02 | 0.086312 | 0.078071 | 0.153779 | -0.004477 | 0.013957 | 0.007473 | ... | 0.039196 | 0.075840 | 0.209068 | 0.478958 | 0.447179 | 0.041924 | 0.046278 | 1.000000 | 0.227189 | 1.148807e-01 |
298 | 類似 | 0.177241 | 0.127701 | 1.607578e-01 | 0.083040 | 0.168020 | 0.140712 | 0.038829 | 0.004475 | 0.070476 | ... | 0.036193 | 0.058158 | 0.116715 | 0.126621 | 0.067306 | 0.030155 | 0.159528 | 0.227189 | 1.000000 | 7.717720e-03 |
299 | 體驗 | 0.064487 | 0.099476 | -1.053095e-16 | 0.057018 | 0.241222 | 0.145803 | 0.017102 | -0.012292 | 0.045976 | ... | 0.090822 | 0.044098 | 0.104110 | 0.283397 | 0.209098 | 0.015826 | 0.082073 | 0.114881 | 0.007718 | 1.000000e+00 |
300 rows × 301 columns
word_cor_df = Cor_df.melt(id_vars = 'word1', var_name = 'word2', value_name = 'cor')
# 去除兩個詞相同的情況
word_cor_df = word_cor_df[word_cor_df["word1"] != word_cor_df["word2"]]
word_cor_df.sort_values('cor', ascending=False).head(10)
word1 | word2 | cor | |
---|---|---|---|
50793 | 工作時間 | 每日 | 0.911766 |
28069 | 每日 | 工作時間 | 0.911766 |
27947 | 加班費 | 工作時間 | 0.857897 |
14193 | 工作時間 | 加班費 | 0.857897 |
74313 | 結束 | 詢問 | 0.812984 |
64147 | 詢問 | 結束 | 0.812984 |
50747 | 加班費 | 每日 | 0.800851 |
14269 | 每日 | 加班費 | 0.800851 |
10725 | 聯絡方式 | 公司名稱 | 0.781778 |
67535 | 公司名稱 | 聯絡方式 | 0.781778 |
language_sum = word_cor_df[(word_cor_df["word1"] == "語言")].sort_values(by = ['cor'], ascending = False).head(10)
workinghours_sum = word_cor_df[(word_cor_df["word1"] == "工時")].sort_values(by = ['cor'], ascending = False).head(10)
plt.figure(figsize=(12,8)) # 顯示圖框架大小 (寬,高)
plt.style.use("ggplot") # 使用ggplot主題樣式
plt.subplot(121)
plt.title('語言')
plt.xlabel('cor')
plt.barh(language_sum['word2'],language_sum['cor'])
plt.gca().invert_yaxis()
plt.subplot(122)
plt.title('工時')
plt.xlabel('cor')
plt.barh(workinghours_sum['word2'],workinghours_sum['cor'],color="darkblue")
plt.gca().invert_yaxis()
plt.show()
# 透過DTM找出詞頻前60高的詞彙
most_freq_df = DTM_df.sum().sort_values(ascending=False).head(60).reset_index().rename(columns={'index':'word', 0:'count'})
most_freq_word = most_freq_df['word'].tolist()
# 保留存在詞頻前60高之詞彙的組合
filtered_df = word_cor_df[(word_cor_df['word1'].isin(most_freq_word)) & (word_cor_df['word2'].isin(most_freq_word))]
# 篩選出相關係數大於0.3的組合
filtered_df = filtered_df[filtered_df['cor'] > 0.3]
filtered_df.reset_index(inplace=True, drop=True)
filtered_df
word1 | word2 | cor | |
---|---|---|---|
0 | 系統 | 主管 | 0.328367 |
1 | 面試 | 主管 | 0.318299 |
2 | 團隊 | 公司 | 0.354824 |
3 | 工作 | 公司 | 0.432953 |
4 | 工程師 | 公司 | 0.304760 |
... | ... | ... | ... |
203 | 流程 | 面試 | 0.317183 |
204 | 簡單 | 面試 | 0.454456 |
205 | 經歷 | 面試 | 0.526000 |
206 | 經驗 | 面試 | 0.553045 |
207 | 英文 | 面試 | 0.362818 |
208 rows × 3 columns
# Create network plot
g = nx.Graph()
# 建立 nodes 間的連結
for i in range(len(filtered_df)):
g.add_edge(filtered_df["word1"][i], filtered_df["word2"][i], weight=filtered_df["cor"][i])
# 取得edge權重
weights = [w[2]['weight']*5 for w in g.edges(data=True)]
fig, ax = plt.subplots(figsize=(12, 8))
pos = nx.spring_layout(g, k=0.3)
# networks
nx.draw_networkx(g, pos,
font_size=16,
width=weights,
edge_color='grey',
node_color='lightblue',
with_labels = False,
ax=ax)
# 增加 labels
for key, value in pos.items():
x, y = value[0]+.07, value[1]+.045
ax.text(x, y,
s=key,
bbox=dict(facecolor='red', alpha=0.25),
horizontalalignment='center', fontsize=12)
plt.show()
從圖中可以清楚看出具有高度相連關係的詞彙。
以TF-IDF的結果當作文章的向量,計算 Cosine Similarity 找出相似的文章
data_cos = data_cor.copy()
data_cos.head(3)
system_id | artUrl | artTitle | artDate | artContent | sentence | word | |
---|---|---|---|---|---|---|---|
0 | 1 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... | [分享]系統設計:如何取消正在執行的工作任務 | 2023-01-01 09:39:03 | 文字教學:\nhttps://bit.ly/3jFMwvS\n教學影片:\nhttps://... | 文字教學教學影片範例程式系統架構圖本篇來聊聊如何取消正在執行的工作任務當系統內有需要處理比較... | 文字 教學 教學 影片 範例 程式 系統 架構圖 本篇 取消 執行 工作 系統 內有 資源 ... |
1 | 2 | https://www.ptt.cc/bbs/Soft_Job/M.1672559293.A... | [請益]北漂Offer金融vs假外商 | 2023-01-01 15:48:11 | 背景:\n 私立資工學士\n 軟體經驗5Y,後端為主,可以支援前端/CICD\n\n ... | 背景私立資工學士軟體經驗後端為主可以支援前端是碩士價廢牡蠣假外商單位產險體系資融金融顧問部分... | 背景 私立 資工 學士 軟體 經驗 支援 前端 碩士 價廢 牡蠣 外商 單位 產險 體系 資... |
2 | 3 | https://www.ptt.cc/bbs/Soft_Job/M.1672571470.A... | [請益]有人的公司也沒有提供API文件的嗎 | 2023-01-01 19:11:08 | 安安\n\n小弟剛轉前端,進到一家接案公司寫網頁,工作大概9成都在接API,\n但公司內部沒... | 安安小弟剛轉前端進到一家接案公司寫網頁工作大概成都在接但公司內部沒有提供規格文件讓我參考導致... | 小弟 剛轉 前端 一家 接案 公司 網頁 工作 成都 公司 內部 提供 規格 文件 參考 導... |
transformer = TfidfTransformer()
print(transformer)
# 將詞頻矩陣X統計成TF-IDF值
tfidf = transformer.fit_transform(X)
# 轉成dataframe
TFIDF_df = pd.DataFrame(columns = vocabulary, data = tfidf.toarray())
TFIDF_df
TfidfTransformer()
一位 | 一個月 | 一堆 | 一家 | 一年 | 三個 | 上班 | 上課 | 下班 | 不好 | ... | 雲端 | 電腦 | 需求 | 面試 | 面試官 | 項目 | 領域 | 題目 | 類似 | 體驗 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.000000 | ... | 0.000000 | 0.0 | 0.063611 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 |
1 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.098792 | 0.0 | 0.0 | 0.000000 | ... | 0.108983 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 |
2 | 0.0 | 0.0 | 0.0 | 0.244478 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.000000 | ... | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 |
3 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.000000 | ... | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.222714 | 0.0 |
4 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.000000 | ... | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.388807 | 0.0 | 0.000000 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1542 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.000000 | ... | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 |
1543 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.310894 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.334192 | ... | 0.000000 | 0.0 | 0.000000 | 0.230891 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 |
1544 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.000000 | ... | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 |
1545 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.000000 | ... | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 |
1546 | 0.0 | 0.0 | 0.0 | 0.021037 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.000000 | ... | 0.000000 | 0.0 | 0.031363 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 |
1547 rows × 300 columns
計算文章間的cosine similarity
cosine_matrix = cosine_similarity(tfidf.toarray(), tfidf.toarray())
檢視與第一篇文章相似的文章
cos_df = pd.DataFrame(cosine_matrix[0], columns = ['cos_similarity'])
cos_df
cos_similarity | |
---|---|
0 | 1.000000 |
1 | 0.053813 |
2 | 0.046821 |
3 | 0.012354 |
4 | 0.047206 |
... | ... |
1542 | 0.000000 |
1543 | 0.000000 |
1544 | 0.077998 |
1545 | 0.126623 |
1546 | 0.033367 |
1547 rows × 1 columns
cos_df = cos_df.merge(data_cos, how = 'left', left_index=True, right_index=True)
cos_df.loc[:,["cos_similarity", "artTitle", "artUrl"]].sort_values(by=['cos_similarity'], ascending=False).head(10)
cos_similarity | artTitle | artUrl | |
---|---|---|---|
0 | 1.000000 | [分享]系統設計:如何取消正在執行的工作任務 | https://www.ptt.cc/bbs/Soft_Job/M.1672537150.A... |
1156 | 0.569822 | Re:[討論]國泰、玉山的IT部門是不是皮皮剉? | https://www.ptt.cc/bbs/Soft_Job/M.1722141381.A... |
133 | 0.552698 | [心得]自動更新執行中的Docker容器解決方案 | https://www.ptt.cc/bbs/Soft_Job/M.1677928855.A... |
800 | 0.542625 | [討論]Youtube的SLA是幾個9? | https://www.ptt.cc/bbs/Soft_Job/M.1703210676.A... |
1272 | 0.450926 | [討論]AWS帳戶被盜 | https://www.ptt.cc/bbs/Soft_Job/M.1727630526.A... |
132 | 0.440927 | [討論]關於挖角跳槽一事。 | https://www.ptt.cc/bbs/Soft_Job/M.1677851305.A... |
1073 | 0.428680 | [討論].NETFramework跨平台是不是假議題 | https://www.ptt.cc/bbs/Soft_Job/M.1718260802.A... |
796 | 0.425261 | Re:[討論]有可能不學coding就可以取得前後端工作? | https://www.ptt.cc/bbs/Soft_Job/M.1703165264.A... |
744 | 0.418454 | [新聞]百年郵局全面數位大轉型 | https://www.ptt.cc/bbs/Soft_Job/M.1701558922.A... |
159 | 0.401656 | [心得]用ChatGPT幫忙整理CodeChanges | https://www.ptt.cc/bbs/Soft_Job/M.1678703464.A... |
檢視與第14篇文章相似的文章
cos_df_14 = pd.DataFrame(cosine_matrix[14], columns=['cos_similarity'])
cos_df_14 = cos_df_14.merge(data_cos, how = 'left', left_index=True, right_index=True)
cos_df_14.loc[:,["cos_similarity", "artTitle", "artUrl"]].sort_values(by=['cos_similarity'], ascending=False).head(10)
cos_similarity | artTitle | artUrl | |
---|---|---|---|
14 | 1.000000 | [請益]什麼語言是用ppt跟word寫還可以月入100K? | https://www.ptt.cc/bbs/Soft_Job/M.1673243236.A... |
1134 | 0.716686 | [請益]請教關於"統一資訊" | https://www.ptt.cc/bbs/Soft_Job/M.1721364214.A... |
163 | 0.543876 | [請益]請益關於"統一資訊",謝謝 | https://www.ptt.cc/bbs/Soft_Job/M.1678755633.A... |
629 | 0.532129 | [請益]請問Wireshark如何plotfilter數值? | https://www.ptt.cc/bbs/Soft_Job/M.1695009578.A... |
272 | 0.480419 | [請益]技能樹怎麼點 | https://www.ptt.cc/bbs/Soft_Job/M.1681442265.A... |
1544 | 0.467814 | Re:[情報]高雄台南免費程式入門教學 | https://www.ptt.cc/bbs/Soft_Job/M.1743221251.A... |
124 | 0.456777 | [請益]jsthedefinitiveguide3rd價值? | https://www.ptt.cc/bbs/Soft_Job/M.1677476198.A... |
898 | 0.454119 | Re:[請益]想學程式但數學基礎很差怎麼進步 | https://www.ptt.cc/bbs/Soft_Job/M.1709995806.A... |
1469 | 0.452112 | [請益]純軟研究所中興基資vs北科電子甲 | https://www.ptt.cc/bbs/Soft_Job/M.1739165453.A... |
237 | 0.449494 | [請益]請問資策會地區會有很大差別嗎? | https://www.ptt.cc/bbs/Soft_Job/M.1680458545.A... |
使用我們的抓取的 PTT 軟體工程師版資料集
data3 = MetaData.copy()
sen_tokens = data3.sentence.apply(getToken).tolist()
def ngram(documents, N=2):
ngram_prediction = dict()
total_grams = list()
words = list()
Word = namedtuple('Word', ['word', 'prob'])
for doc in documents:
# 加上開頭和結尾 tag
split_words = ['<s>'] + list(doc) + ['</s>']
# 計算分子
[total_grams.append(tuple(split_words[i:i+N])) for i in range(len(split_words)-N+1)]
# 計算分母
[words.append(tuple(split_words[i:i+N-1])) for i in range(len(split_words)-N+2)]
total_word_counter = Counter(total_grams)
word_counter = Counter(words)
for key in total_word_counter:
word = ''.join(key[:N-1])
if word not in ngram_prediction:
ngram_prediction.update({word: set()})
next_word_prob = total_word_counter[key]/word_counter[key[:N-1]] #P(B|A)
w = Word(key[-1], '{:.3g}'.format(next_word_prob))
ngram_prediction[word].add(w)
return ngram_prediction
# Bigram預測模型為例
bi_prediction = ngram(sen_tokens, N=2)
預測下一個出現的詞
text = '喜歡'
next_words = list(bi_prediction[text])
next_words.sort(key = lambda s: s[1], reverse = True)
for next_word in next_words[:5]:
print('next word: {}, probability: {}'.format(next_word.word, next_word.prob))
next word: 工作, probability: 0.0309 next word: </s>, probability: 0.0247 next word: 加班, probability: 0.0185 next word: 接觸, probability: 0.0185 next word: 鑽研, probability: 0.0123
text = '提供'
next_words = list(bi_prediction[text])
next_words.sort(key = lambda s: s[1], reverse = True)
for next_word in next_words[:5]:
print('next word: {}, probability: {}'.format(next_word.word, next_word.prob))
next word: 建議, probability: 0.0253 next word: 意見, probability: 0.0253 next word: 服務, probability: 0.0228 next word: 相關, probability: 0.0177 next word: 完整, probability: 0.0152
課程:社群媒體分析
授課教授:黃三益老師
組別:Group_2
組員:M124020028,何允中、M134020016,王予芙、M134020034,黃沛萱、M134020037,陳宥齊、B104020032,翁武麟、M124111057,張伶宣
資料來源:ptt
版別:打工、軟體工作、科技工作
分析動機:我們想探討不同類型工作的討論重點是否有明顯的差異,特別是軟體工作及科技工作是否存在一定程度的相似性。
分析目標:將三個版別的文章合起來,訓練模型能預測版別,並觀察哪些詞是被模型認為屬於其中一版的
步驟:
遇到的困難和解決方式
軟體工作版的資料筆數少於其他兩個版7-8倍,同樣從2024/01/01-2025/03/31的資料,軟體工作只有700+筆,另外兩個版都有5000+,以這樣資料筆數存在極大差異下訓練出來的模型,都很明顯在軟體工作版的預測上比較糟糕,因此我們將時間延長至2020/01/01-2025/03/31,讓三個版別的資料筆數相近,訓練出的模型也更為準確。
import re
from pprint import pprint
import os
import glob
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import jieba
from sklearn.model_selection import train_test_split, cross_validate, cross_val_predict, KFold
from sklearn.metrics import (
confusion_matrix,
classification_report,
roc_curve,
auc,
precision_recall_curve,
RocCurveDisplay
)
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.preprocessing import LabelBinarizer
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.ensemble import RandomForestClassifier
from sklearn import svm
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
# 設定圖的中文字體
!wget -O TaipeiSansTCBeta-Regular.ttf https://drive.google.com/uc?id=1eGAsTN1HBpJAkeVM57_C7ccp7hbgSz3_&export=download
import matplotlib
matplotlib.font_manager.fontManager.addfont('TaipeiSansTCBeta-Regular.ttf')
matplotlib.rc('font', family='Taipei Sans TC Beta')
# 設定圖的中文字體 (無法顯示的話可以試試‘Microsoft JhengHei’字體)
# 也可參考:https://pyecontech.com/2020/03/27/python_matplotlib_chinese/
#plt.rcParams['font.sans-serif'] = ['Noto Sans CJK TC'] #使圖中中文能正常顯示
plt.rcParams['axes.unicode_minus'] = False #使負號能夠顯示
--2025-04-15 07:20:03-- https://drive.google.com/uc?id=1eGAsTN1HBpJAkeVM57_C7ccp7hbgSz3_ Resolving drive.google.com (drive.google.com)... 142.250.152.139, 142.250.152.101, 142.250.152.100, ... Connecting to drive.google.com (drive.google.com)|142.250.152.139|:443... connected. HTTP request sent, awaiting response... 303 See Other Location: https://drive.usercontent.google.com/download?id=1eGAsTN1HBpJAkeVM57_C7ccp7hbgSz3_ [following] --2025-04-15 07:20:03-- https://drive.usercontent.google.com/download?id=1eGAsTN1HBpJAkeVM57_C7ccp7hbgSz3_ Resolving drive.usercontent.google.com (drive.usercontent.google.com)... 74.125.202.132, 2607:f8b0:4001:c06::84 Connecting to drive.usercontent.google.com (drive.usercontent.google.com)|74.125.202.132|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 20659344 (20M) [application/octet-stream] Saving to: ‘TaipeiSansTCBeta-Regular.ttf’ TaipeiSansTCBeta-Re 100%[===================>] 19.70M --.-KB/s in 0.1s 2025-04-15 07:20:06 (176 MB/s) - ‘TaipeiSansTCBeta-Regular.ttf’ saved [20659344/20659344]
folder_path = "/content/drive/MyDrive/SMA_2025S-main/"
csv_files = glob.glob(os.path.join(folder_path, '*.csv'))
# 用來存放每一個 DataFrame
df_list = []
# 讀取每個 CSV 並加到 list 裡
for file in csv_files:
df = pd.read_csv(file)
df_list.append(df)
# 合併所有 DataFrame
df = pd.concat(df_list, ignore_index=True)
df.head()
system_id | artUrl | artTitle | artDate | artPoster | artCatagory | artContent | artComment | e_ip | insertedDate | dataSource | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | https://www.ptt.cc/bbs/Tech_Job/M.1704076606.A... | Re:[請益]在新竹上班到底有什麼優點 | 2024-01-01 10:36:44 | dilson | Tech_Job | :\n\n不見得喔\n\n我看過私校學店正妹在科技業的男友也是私校學店\n\n因為沒腦的跟有... | [{"cmtStatus": "推", "cmtPoster": "k11k", "cmtC... | 42.79.72.237 | 2024-01-02 02:21:08 | ptt |
1 | 2 | https://www.ptt.cc/bbs/Tech_Job/M.1704078788.A... | [新聞]台積電效應!日本半導體廠開2024第1槍 | 2024-01-01 11:13:06 | qazxc1156892 | Tech_Job | 新聞標題: 台積電效應!日本半導體廠開2024第1槍 宣告新進員工加薪40%\n\n\n記者... | [{"cmtStatus": "推", "cmtPoster": "sunnyhung", ... | 114.136.154.65 | 2024-01-02 02:21:08 | ptt |
2 | 3 | https://www.ptt.cc/bbs/Tech_Job/M.1704080503.A... | Re:[請益]在新竹上班到底有什麼優點 | 2024-01-01 11:41:41 | francej | Tech_Job | 如果有要生小孩的 新竹大概是目前全國最適合學齡小孩成長的環境吧\n\n人口平均素質高 別的縣... | [{"cmtStatus": "→", "cmtPoster": "SpongebobMac... | 36.230.152.131 | 2024-01-02 02:21:08 | ptt |
3 | 4 | https://www.ptt.cc/bbs/Tech_Job/M.1704100050.A... | Re:[請益]在新竹上班到底有什麼優點 | 2024-01-01 17:07:27 | Onnnnnnnnnnn | Tech_Job | 講新竹太籠統\n\n是新竹市 還是新竹縣 還是以前被割地的竹南?\n\n學區來說\n只要在新... | [{"cmtStatus": "推", "cmtPoster": "marsonele", ... | 61.230.11.219 | 2024-01-02 02:21:08 | ptt |
4 | 5 | https://www.ptt.cc/bbs/Tech_Job/M.1704106015.A... | [新聞]「股王製造機」王雪紅不看一時成敗、拚 | 2024-01-01 18:46:53 | GuanLaoBan | Tech_Job | https://www.nownews.com/news/6331938\n2023年12月... | [{"cmtStatus": "推", "cmtPoster": "forestsea722... | 138.199.22.107 | 2024-01-02 02:21:10 | ptt |
# 幾篇文章
print(f"number of posts: {df.shape[0]}")
print(f"date range: {(df['artDate'].min(), df['artDate'].max())}")
print("="*60)
for file in csv_files:
csv = pd.read_csv(file)
post_count = csv.shape[0]
date_range = (csv['artDate'].min(), csv['artDate'].max())
filename = os.path.basename(file)
print(f"({filename})")
print(f"number of posts: {post_count}")
print(f"date range: {date_range[0]} ~ {date_range[1]}\n")
number of posts: 16332 date range: ('2020-06-01 11:49:04', '2025-03-31 23:33:43') ============================================================ (ptt_techjob.csv) number of posts: 5401 date range: 2024-01-01 10:36:44 ~ 2025-03-31 23:33:43 (ptt_parttime.csv) number of posts: 5570 date range: 2024-01-01 00:25:21 ~ 2025-03-31 22:54:22 (ptt_softjob.csv) number of posts: 5361 date range: 2020-06-01 11:49:04 ~ 2025-03-31 03:14:57
# 過濾 nan 的資料
df = df.dropna(subset=['artTitle'])
df = df.dropna(subset=['artContent'])
# 移除網址格式
df["artContent"] = df.artContent.apply(
lambda x: re.sub("(http|https)://.*", "", x)
)
df["artTitle"] = df["artTitle"].apply(
lambda x: re.sub("(http|https)://.*", "", x)
)
# 只留下中文字
df["artContent"] = df.artContent.apply(
lambda x: re.sub("[^\u4e00-\u9fa5]+", "", x)
)
df["artTitle"] = df["artTitle"].apply(
lambda x: re.sub("[^\u4e00-\u9fa5]+", "", x)
)
df.head(3)
<ipython-input-7-62c67f3139c0>:5: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df["artContent"] = df.artContent.apply( <ipython-input-7-62c67f3139c0>:8: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df["artTitle"] = df["artTitle"].apply(
system_id | artUrl | artTitle | artDate | artPoster | artCatagory | artContent | artComment | e_ip | insertedDate | dataSource | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | https://www.ptt.cc/bbs/Tech_Job/M.1704076606.A... | 請益在新竹上班到底有什麼優點 | 2024-01-01 10:36:44 | dilson | Tech_Job | 不見得喔我看過私校學店正妹在科技業的男友也是私校學店因為沒腦的跟有腦的怎麼可能有話聊台清交電... | [{"cmtStatus": "推", "cmtPoster": "k11k", "cmtC... | 42.79.72.237 | 2024-01-02 02:21:08 | ptt |
1 | 2 | https://www.ptt.cc/bbs/Tech_Job/M.1704078788.A... | 新聞台積電效應日本半導體廠開第槍 | 2024-01-01 11:13:06 | qazxc1156892 | Tech_Job | 新聞標題台積電效應日本半導體廠開第槍宣告新進員工加薪記者陳瑩欣台北報導台積電等大型半導體廠赴... | [{"cmtStatus": "推", "cmtPoster": "sunnyhung", ... | 114.136.154.65 | 2024-01-02 02:21:08 | ptt |
2 | 3 | https://www.ptt.cc/bbs/Tech_Job/M.1704080503.A... | 請益在新竹上班到底有什麼優點 | 2024-01-01 11:41:41 | francej | Tech_Job | 如果有要生小孩的新竹大概是目前全國最適合學齡小孩成長的環境吧人口平均素質高別的縣市包括雙北擔... | [{"cmtStatus": "→", "cmtPoster": "SpongebobMac... | 36.230.152.131 | 2024-01-02 02:21:08 | ptt |
# 留下 content
df["content"] = df["artTitle"] + df["artContent"]
df = df[["content", "artUrl", "artCatagory"]] # 文章內容 文章連結
df.head()
content | artUrl | artCatagory | |
---|---|---|---|
0 | 請益在新竹上班到底有什麼優點不見得喔我看過私校學店正妹在科技業的男友也是私校學店因為沒腦的跟... | https://www.ptt.cc/bbs/Tech_Job/M.1704076606.A... | Tech_Job |
1 | 新聞台積電效應日本半導體廠開第槍新聞標題台積電效應日本半導體廠開第槍宣告新進員工加薪記者陳瑩... | https://www.ptt.cc/bbs/Tech_Job/M.1704078788.A... | Tech_Job |
2 | 請益在新竹上班到底有什麼優點如果有要生小孩的新竹大概是目前全國最適合學齡小孩成長的環境吧人口... | https://www.ptt.cc/bbs/Tech_Job/M.1704080503.A... | Tech_Job |
3 | 請益在新竹上班到底有什麼優點講新竹太籠統是新竹市還是新竹縣還是以前被割地的竹南學區來說只要在... | https://www.ptt.cc/bbs/Tech_Job/M.1704100050.A... | Tech_Job |
4 | 新聞股王製造機王雪紅不看一時成敗拚年月日記者許家禎特稿股王製造機王雪紅不看一時成敗拚氣長宏達... | https://www.ptt.cc/bbs/Tech_Job/M.1704106015.A... | Tech_Job |
print(f"total docs: {df.shape[0]}")
total docs: 16310
# 設定繁體中文詞庫
jieba.set_dictionary("/content/drive/MyDrive/SMA_2025S-main/week07/dict/dict.txt.big")
# 新增stopwords
# jieba.analyse.set_stop_words('./dict/stop_words.txt') #jieba.analyse.extract_tags才會作用
with open("/content/drive/MyDrive/SMA_2025S-main/week07/dict/stop_words.txt", encoding="utf-8") as f:
stopWords = [line.strip() for line in f.readlines()]
# 設定斷詞 function
def getToken(row):
seg_list = jieba.cut(row, cut_all=False)
seg_list = [
w for w in seg_list if w not in stopWords and len(w) > 1
] # 篩選掉停用字與字元數大於1的詞彙
return seg_list
df["words"] = df["content"].apply(getToken).map(" ".join)
df.head()
Building prefix dict from /content/drive/MyDrive/SMA_2025S-main/week07/dict/dict.txt.big ... DEBUG:jieba:Building prefix dict from /content/drive/MyDrive/SMA_2025S-main/week07/dict/dict.txt.big ... Dumping model to file cache /tmp/jieba.u7fdbfa156d71dde3366dc327b32adc37.cache DEBUG:jieba:Dumping model to file cache /tmp/jieba.u7fdbfa156d71dde3366dc327b32adc37.cache Loading model cost 5.069 seconds. DEBUG:jieba:Loading model cost 5.069 seconds. Prefix dict has been built successfully. DEBUG:jieba:Prefix dict has been built successfully.
content | artUrl | artCatagory | words | |
---|---|---|---|---|
0 | 請益在新竹上班到底有什麼優點不見得喔我看過私校學店正妹在科技業的男友也是私校學店因為沒腦的跟... | https://www.ptt.cc/bbs/Tech_Job/M.1704076606.A... | Tech_Job | 請益 新竹 上班 優點 不見得 看過 私校 學店 正妹 科技 業的 男友 私校 學店 沒腦 ... |
1 | 新聞台積電效應日本半導體廠開第槍新聞標題台積電效應日本半導體廠開第槍宣告新進員工加薪記者陳瑩... | https://www.ptt.cc/bbs/Tech_Job/M.1704078788.A... | Tech_Job | 新聞 台積電 效應 日本 半導體 廠開 第槍 新聞標題 台積電 效應 日本 半導體 廠開 第... |
2 | 請益在新竹上班到底有什麼優點如果有要生小孩的新竹大概是目前全國最適合學齡小孩成長的環境吧人口... | https://www.ptt.cc/bbs/Tech_Job/M.1704080503.A... | Tech_Job | 請益 新竹 上班 優點 有要 生小孩 新竹 目前 全國 適合 學齡 小孩 成長 環境 人口 ... |
3 | 請益在新竹上班到底有什麼優點講新竹太籠統是新竹市還是新竹縣還是以前被割地的竹南學區來說只要在... | https://www.ptt.cc/bbs/Tech_Job/M.1704100050.A... | Tech_Job | 請益 新竹 上班 優點 新竹 籠統 新竹市 新竹縣 以前 割地 竹南 學區 來說 新竹市 東... |
4 | 新聞股王製造機王雪紅不看一時成敗拚年月日記者許家禎特稿股王製造機王雪紅不看一時成敗拚氣長宏達... | https://www.ptt.cc/bbs/Tech_Job/M.1704106015.A... | Tech_Job | 新聞 股王 製造機 王雪紅 一時 成敗 年月日 記者 許家 特稿 股王 製造機 王雪紅 一時... |
# 檢視資料
print(f"total posts: {len(df['artUrl'].unique())}")
print(f"category: \n{df['artCatagory'].value_counts()}")
print("="*100)
data = df
X = data["words"]
y = data["artCatagory"]
# 把整個資料集七三切
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=777
)
print(X_train.head())
print(y_train.head())
total posts: 16310 category: artCatagory part_time 5567 Tech_Job 5395 Soft_Job 5348 Name: count, dtype: int64 ==================================================================================================== 11414 請益 大學 休學 全心 工作 覺得 聰明 讀書 兼顧 帶來 回報 讀書 自然 選擇 輟學 天... 7721 新北 新店 裕隆 活動 派票 時薪 同意 願意 遵守 現行 法律 本站 使用者 條款 本站 ... 3256 新聞 中國 實現 國產 曝光 機可 生產 奈米 中國 實現 國產 曝光 機可 生產 奈米 以... 15141 徵才 遠端 美商 公司 名稱 統編 中華民國 以外 註冊 可免 公司地址 填寫 詳細 至號 ... 13980 問卷 藍牙 耳機 進修 網之 職涯 進修 調查 人力 銀行 今年 針對 上班族 進修 調查 ... Name: words, dtype: object 11414 Soft_Job 7721 part_time 3256 Tech_Job 15141 Soft_Job 13980 Soft_Job Name: artCatagory, dtype: object
# 看一下各個資料集切分的比例,應該要一致
print(
f"raw data percentage :\n{data['artCatagory'].value_counts(normalize=True) * 100}"
)
print(f"\ntrain percentage :\n{y_train.value_counts(normalize=True) * 100}")
print(f"\ntest percentage :\n{y_test.value_counts(normalize=True) * 100}")
raw data percentage : artCatagory part_time 34.132434 Tech_Job 33.077866 Soft_Job 32.789700 Name: proportion, dtype: float64 train percentage : artCatagory part_time 34.229658 Tech_Job 33.432601 Soft_Job 32.337742 Name: proportion, dtype: float64 test percentage : artCatagory part_time 33.905579 Soft_Job 33.844267 Tech_Job 32.250153 Name: proportion, dtype: float64
vectorizer = CountVectorizer(max_features=1000)
X_train.head()
words | |
---|---|
11414 | 請益 大學 休學 全心 工作 覺得 聰明 讀書 兼顧 帶來 回報 讀書 自然 選擇 輟學 天... |
7721 | 新北 新店 裕隆 活動 派票 時薪 同意 願意 遵守 現行 法律 本站 使用者 條款 本站 ... |
3256 | 新聞 中國 實現 國產 曝光 機可 生產 奈米 中國 實現 國產 曝光 機可 生產 奈米 以... |
15141 | 徵才 遠端 美商 公司 名稱 統編 中華民國 以外 註冊 可免 公司地址 填寫 詳細 至號 ... |
13980 | 問卷 藍牙 耳機 進修 網之 職涯 進修 調查 人力 銀行 今年 針對 上班族 進修 調查 ... |
vec_train = vectorizer.fit_transform(X_train)
vocabulary = vectorizer.get_feature_names_out()
Count_df = pd.DataFrame(columns = vocabulary, data = vec_train.toarray())
Count_df
一下 | 一些 | 一任 | 一份 | 一位 | 一個月 | 一堆 | 一天 | 一定 | 一家 | ... | 願意 | 類似 | 類別 | 顧問 | 顯示 | 風險 | 餐廳 | 馬斯克 | 體驗 | 高雄 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 |
3 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
11412 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 2 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
11413 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
11414 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
11415 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
11416 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
11417 rows × 1000 columns
vec_test = vectorizer.transform(X_test)
print(vec_train.shape)
print(vec_test.shape)
(11417, 1000) (4893, 1000)
# 建立分類器模型
clf = LogisticRegression()
clf.fit(vec_train, y_train)
clf
/usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_logistic.py:465: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result(
LogisticRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LogisticRegression()
clf.classes_
array(['Soft_Job', 'Tech_Job', 'part_time'], dtype=object)
使用train set訓練完後,用測試集試試看模型的分類結果
y_pred = clf.predict(vec_test)
y_pred_proba = clf.predict_proba(vec_test)
print(y_pred[:10])
print(y_pred_proba.shape)
y_pred_proba[0,:]
['part_time' 'part_time' 'Tech_Job' 'part_time' 'Soft_Job' 'Soft_Job' 'Soft_Job' 'part_time' 'Tech_Job' 'Tech_Job'] (4893, 3)
array([7.48678735e-23, 1.11943780e-22, 1.00000000e+00])
## Accuracy, Precision, Recall, F1-score
print(classification_report(y_test, y_pred))
precision recall f1-score support Soft_Job 0.87 0.85 0.86 1656 Tech_Job 0.85 0.87 0.86 1578 part_time 1.00 1.00 1.00 1659 accuracy 0.91 4893 macro avg 0.91 0.91 0.91 4893 weighted avg 0.91 0.91 0.91 4893
Confusion Matrix
classes = clf.classes_
cm = confusion_matrix(y_test, y_pred)
cm
## Plot confusion matrix
fig, ax = plt.subplots()
sns.heatmap(cm, annot=True, fmt="d", ax=ax, cmap=plt.cm.Blues, cbar=False)
ax.set(
xlabel="Pred",
ylabel="True",
xticklabels=classes,
yticklabels=classes,
title="Confusion matrix",
)
plt.yticks(rotation=0)
(array([0.5, 1.5, 2.5]), [Text(0, 0.5, 'Soft_Job'), Text(0, 1.5, 'Tech_Job'), Text(0, 2.5, 'part_time')])
vectorizer = TfidfVectorizer(max_features=1000)
vec_train = vectorizer.fit_transform(X_train)
vec_test = vectorizer.transform(X_test)
vocabulary = vectorizer.get_feature_names_out()
tfidf_df = pd.DataFrame(columns = vocabulary, data = vec_train.toarray())
tfidf_df
一下 | 一些 | 一任 | 一份 | 一位 | 一個月 | 一堆 | 一天 | 一定 | 一家 | ... | 願意 | 類似 | 類別 | 顧問 | 顯示 | 風險 | 餐廳 | 馬斯克 | 體驗 | 高雄 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | ... | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 |
1 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | ... | 0.125851 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 |
2 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | ... | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.064781 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 |
3 | 0.0 | 0.046436 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | ... | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 |
4 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | ... | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
11412 | 0.0 | 0.000000 | 0.025203 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | ... | 0.038270 | 0.0 | 0.024139 | 0.043534 | 0.023576 | 0.023945 | 0.024656 | 0.0 | 0.0 | 0.0 |
11413 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | ... | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 |
11414 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | ... | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 |
11415 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.271224 | ... | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 |
11416 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | ... | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 |
11417 rows × 1000 columns
clf.fit(vec_train, y_train)
y_pred = clf.predict(vec_test)
y_pred_proba = clf.predict_proba(vec_test)
# results
## Accuracy, Precision, Recall, F1-score
print(classification_report(y_test, y_pred))
precision recall f1-score support Soft_Job 0.89 0.89 0.89 1656 Tech_Job 0.88 0.88 0.88 1578 part_time 1.00 0.99 1.00 1659 accuracy 0.92 4893 macro avg 0.92 0.92 0.92 4893 weighted avg 0.92 0.92 0.92 4893
切成 k 組 train-test dataset
clf = LogisticRegression()
vec_train = TfidfVectorizer(max_features=1000).fit_transform(X_train)
scores = cross_validate(clf, vec_train, y_train, cv=5, scoring=("f1_macro", "recall_macro", "precision_macro"), return_estimator=True)
pprint(scores)
{'estimator': [LogisticRegression(), LogisticRegression(), LogisticRegression(), LogisticRegression(), LogisticRegression()], 'fit_time': array([0.23089862, 0.27636576, 0.25236869, 0.34397149, 0.27416658]), 'score_time': array([0.02289605, 0.024647 , 0.02181077, 0.0233047 , 0.0221231 ]), 'test_f1_macro': array([0.91902789, 0.9123334 , 0.91506679, 0.91453077, 0.90227382]), 'test_precision_macro': array([0.91950184, 0.91240797, 0.91527236, 0.91474791, 0.90267443]), 'test_recall_macro': array([0.9191661 , 0.91231691, 0.91495399, 0.91452967, 0.90217693])}
y_pred = cross_val_predict(clf, vec_train, y_train, cv=5)
print(classification_report(y_train, y_pred))
precision recall f1-score support Soft_Job 0.86 0.88 0.87 3692 Tech_Job 0.88 0.86 0.87 3817 part_time 1.00 0.99 0.99 3908 accuracy 0.91 11417 macro avg 0.91 0.91 0.91 11417 weighted avg 0.91 0.91 0.91 11417
# 定義模型訓練組合
## pipeline: 資料處理 vectorizer + 分類器 clf
## 由於 cross-validation 會自動將資料分成 train/test,因此 input 只要給 X, y 即可
def train_cv(vectorizer, clf, X, y):
## train classifier
vec_X = vectorizer.fit_transform(X).toarray()
## get cv results
cv_results = cross_validate(clf, vec_X, y, cv=5, return_estimator=True)
y_pred = cross_val_predict(clf, vec_X, y, cv=5)
y_pred_proba = cross_val_predict(clf, vec_X, y, cv=5, method="predict_proba")
## Accuracy, Precision, Recall, F1-score
cls_report = classification_report(y, y_pred, output_dict=True)
print(classification_report(y, y_pred))
classes = cv_results['estimator'][0].classes_
## Plot confusion matrix
cm = confusion_matrix(y, y_pred)
fig, ax = plt.subplots()
sns.heatmap(cm, annot=True, fmt="d", ax=ax, cmap=plt.cm.Blues, cbar=False)
ax.set(
xlabel="Pred",
ylabel="True",
xticklabels=classes,
yticklabels=classes,
title= str(clf) + "Confusion matrix",
)
plt.yticks(rotation=0)
clf.fit(vec_X, y)
# return the model object
return cls_report
vectorizer = TfidfVectorizer(max_features=1000)
clf = LogisticRegression()
result = train_cv(vectorizer, clf, X_train, y_train)
precision recall f1-score support Soft_Job 0.86 0.88 0.87 3692 Tech_Job 0.88 0.86 0.87 3817 part_time 1.00 0.99 0.99 3908 accuracy 0.91 11417 macro avg 0.91 0.91 0.91 11417 weighted avg 0.91 0.91 0.91 11417
# 準備訓練資料
X = data["words"]
y = data["artCatagory"]
# 把整個資料集七三切
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=777
)
# 定義模型訓練組合
model_set = dict()
model_set['clf_logistic'] = LogisticRegression()
model_set['clf_dtree'] = DecisionTreeClassifier()
model_set['clf_svm'] = svm.SVC(probability=True) # 要使用SVM的predict_proba的話,必須在叫出SVC的時候就將probability設為True
model_set['clf_rf'] = RandomForestClassifier()
# 定義 vectorizer
# vectorizer = CountVectorizer(max_features=1000)
vectorizer = TfidfVectorizer(max_features=1000)
# 存結果
result_set = dict()
for k, model in model_set.items():
print("="*100)
print(f"now training: {k}")
result_set[k] = train_cv(vectorizer, model, X_train, y_train)
print("="*100)
==================================================================================================== now training: clf_logistic precision recall f1-score support Soft_Job 0.86 0.88 0.87 3692 Tech_Job 0.88 0.86 0.87 3817 part_time 1.00 0.99 0.99 3908 accuracy 0.91 11417 macro avg 0.91 0.91 0.91 11417 weighted avg 0.91 0.91 0.91 11417 ==================================================================================================== ==================================================================================================== now training: clf_dtree precision recall f1-score support Soft_Job 0.79 0.80 0.80 3692 Tech_Job 0.80 0.80 0.80 3817 part_time 0.99 0.99 0.99 3908 accuracy 0.86 11417 macro avg 0.86 0.86 0.86 11417 weighted avg 0.86 0.86 0.86 11417 ==================================================================================================== ==================================================================================================== now training: clf_svm precision recall f1-score support Soft_Job 0.85 0.90 0.87 3692 Tech_Job 0.89 0.86 0.87 3817 part_time 1.00 0.99 0.99 3908 accuracy 0.91 11417 macro avg 0.91 0.91 0.91 11417 weighted avg 0.92 0.91 0.92 11417 ==================================================================================================== ==================================================================================================== now training: clf_rf precision recall f1-score support Soft_Job 0.84 0.91 0.87 3692 Tech_Job 0.90 0.84 0.87 3817 part_time 1.00 0.99 0.99 3908 accuracy 0.91 11417 macro avg 0.91 0.91 0.91 11417 weighted avg 0.91 0.91 0.91 11417 ====================================================================================================
分別觀察各個分類模型在不同類別的評估指標表現如何
result_set['clf_logistic']
{'Soft_Job': {'precision': 0.8581560283687943, 'recall': 0.8848862405200434, 'f1-score': 0.8713161754900653, 'support': 3692.0}, 'Tech_Job': {'precision': 0.8807486631016043, 'recall': 0.8629813990044538, 'f1-score': 0.8717745136959111, 'support': 3817.0}, 'part_time': {'precision': 0.9997416020671834, 'recall': 0.9900204708290685, 'f1-score': 0.9948572897917203, 'support': 3908.0}, 'accuracy': 0.9135499693439607, 'macro avg': {'precision': 0.9128820978458606, 'recall': 0.9126293701178553, 'f1-score': 0.9126493263258988, 'support': 11417.0}, 'weighted avg': {'precision': 0.9141735906696125, 'recall': 0.9135499693439607, 'f1-score': 0.9137571102034384, 'support': 11417.0}}
找出f1-score表現最好的模型是哪個,作為我們最終得到的分類器
max = 0
best_model_name = ""
best_model_metric = "f1-score"
## choose max f1-score model from result_set
for k, v in result_set.items():
if v['weighted avg'][best_model_metric] > max:
max = v['weighted avg'][best_model_metric]
best_model_name = k
print(f"best model: {best_model_name}")
pprint(result_set[best_model_name])
best model: clf_svm {'Soft_Job': {'f1-score': 0.8743718592964824, 'precision': 0.8542635658914729, 'recall': 0.8954496208017335, 'support': 3692.0}, 'Tech_Job': {'f1-score': 0.8735662843424913, 'precision': 0.8897038848139093, 'recall': 0.8580036678019387, 'support': 3817.0}, 'accuracy': 0.9149513882806342, 'macro avg': {'f1-score': 0.9140927505638188, 'precision': 0.9145695951394212, 'recall': 0.9141500726597706, 'support': 11417.0}, 'part_time': {'f1-score': 0.9943401080524826, 'precision': 0.9997413347128815, 'recall': 0.9889969293756398, 'support': 3908.0}, 'weighted avg': {'f1-score': 0.9151672553321366, 'precision': 0.9159087281828808, 'recall': 0.9149513882806342, 'support': 11417.0}}
def plot_coef(logistic_reg_model, feature_names, top_n=10):
# 選出某個類別的前10大影響力字詞
log_odds = logistic_reg_model.coef_.T
coef_df = pd.DataFrame(
log_odds,
columns=logistic_reg_model.classes_, index=feature_names
)
for label in coef_df.columns:
select_words = (
coef_df[[label]]
.sort_values(by=label, ascending=False)
.iloc[np.r_[0:top_n, -top_n:0]]
)
word = select_words.index
count = select_words[label]
category_colors = np.where(
select_words[label] >= 0, "darkseagreen", "rosybrown"
) # 設定顏色
fig, ax = plt.subplots(figsize=(8, top_n*0.8)) # 設定畫布
plt.rcParams["axes.unicode_minus"] = False
ax.barh(word, count, color=category_colors)
ax.invert_yaxis()
ax.set_title(
"Coeff increase/decrease odds ratio of 「" + label + "」 label the most",
loc="left",
size=16,
)
ax.set_ylabel("word", size=14)
ax.set_xlabel("odds ratio", size=14)
課程:社群媒體分析
授課教授:黃三益老師
組別:Group_2
組員:M124020028,何允中、M134020016,王予芙、M134020034,黃沛萱、M134020037,陳宥齊、B104020032,翁武麟、M124111057,張伶宣
資料來源:ptt
版別:打工、軟體工作、科技工作
資料筆數:16310
分析動機:我們想探討不同類型工作的討論重點是否有明顯的差異,特別是軟體工作及科技工作是否存在一定程度的相似性。
分析目標:將三個版別的文章合起來,訓練模型能預測潛在主題
步驟:
遇到的困難和解決方式:
import time
from functools import reduce
from collections import Counter
from pprint import pprint
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import jieba
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from gensim.corpora import Dictionary
from gensim.models import LdaModel, CoherenceModel
from gensim.models.ldamulticore import LdaMulticore
from gensim.matutils import corpus2csc, corpus2dense, Sparse2Corpus
import pyLDAvis
import pyLDAvis.gensim_models
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
parttime = pd.read_csv("raw_data/ptt_parttime.csv") # 匯資料
softjob = pd.read_csv("raw_data/ptt_softjob.csv")
techjob = pd.read_csv("raw_data/ptt_techjob.csv")
data = pd.concat([parttime,softjob,techjob])
data.head(3)
system_id | artUrl | artTitle | artDate | artPoster | artCatagory | artContent | artComment | e_ip | insertedDate | dataSource | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | https://www.ptt.cc/bbs/part-time/M.1704039923.... | [個人]桃園搬家助手1/1兩位 | 2024-01-01 00:25:21 | snk236 | part_time | 本人同意並願意遵守現行法律、本站使用者條款、本站各級規定、本板所有規範,\n本人願意為本文內... | [] | 114.136.39.28 | 2024-01-01 02:09:00 | ptt |
1 | 2 | https://www.ptt.cc/bbs/part-time/M.1704068173.... | [台北/市調]維他命座談會車馬費1500元 | 2024-01-01 08:16:11 | Portmento | part_time | 本人同意並願意遵守現行法律、本站使用者條款、本站各級規定、本板所有規範,\n本人願意為本文內... | [{"cmtStatus": "噓", "cmtPoster": "GUANGLEI", "... | 125.229.192.37 | 2024-01-02 02:11:21 | ptt |
2 | 3 | https://www.ptt.cc/bbs/part-time/M.1704077270.... | [多區/個人]PPT製作 | 2024-01-01 10:47:48 | bonzi42 | part_time | 本人同意並願意遵守現行法律、本站使用者條款、本站各級規定、本板所有規範,\n本人願意為本文內... | [] | 118.165.136.221 | 2024-01-02 02:11:21 | ptt |
# 移除警告標語
warns = ["本人同意並願意遵守現行法律、本站使用者條款、本站各級規定、本板所有規範",
"本人願意為本文內容負責,並保証本文內容皆詳盡屬實,若違反相關規範,願受處分",
"誤刪者應至本板使用規範第37條或z-53-3複製範本",
"提醒:上方二行文字不得刪除或變更",
"違者文章逕行刪除",
"本行提醒得Ctrl+Y刪除之",
"誤刪者應至本板使用規範第37條或z-53-3複製範本",
"為必填項目,缺項應保留空項目名稱,灰色文字得刪除之",
"各項均不得為「面議」",
"本文僅授權發表於PTT實業坊",
"未經同意不得轉載至其它網站",
"本人保留一切訴訟權",
"否則得視情況提出告訴",
"承攬制等不適用排班、休息之工作者僅填第一項「交件期」,其餘項留空白",
"一次性工作且未滿四小時者,得將全部資訊填於第一項,其餘項留空白",
"不定期工作,第一項「工作期」應填「長期」及可開始工作日",
"第二項「排班方式」應填每週或每月何日出勤(休息),或現場排班等,一次性工作留空",
"第三、四項「工作時間」「休息時間」得合併至第三項,第四項留空白",
"第四項「休息時間」、第五項「休息計薪供餐」依實際情形填寫之(第五項擇一)",
"任一項僅寫「面議」或同義文字者,一律水桶一年並退文"]
for warn in warns:
data["artContent"] = data["artContent"].str.replace(warn, "")
data.head(10)
system_id | artUrl | artTitle | artDate | artPoster | artCatagory | artContent | artComment | e_ip | insertedDate | dataSource | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | https://www.ptt.cc/bbs/part-time/M.1704039923.... | [個人]桃園搬家助手1/1兩位 | 2024-01-01 00:25:21 | snk236 | part_time | ,\n。\n\n,\n。★,。\n\n,,。\n\n\n工作日期:113.1.1\n每日工作... | [] | 114.136.39.28 | 2024-01-01 02:09:00 | ptt |
1 | 2 | https://www.ptt.cc/bbs/part-time/M.1704068173.... | [台北/市調]維他命座談會車馬費1500元 | 2024-01-01 08:16:11 | Portmento | part_time | ,\n。\n★\n#14961 [台北市] 維他命座談會\n車馬費1500元\nhttps:... | [{"cmtStatus": "噓", "cmtPoster": "GUANGLEI", "... | 125.229.192.37 | 2024-01-02 02:11:21 | ptt |
2 | 3 | https://www.ptt.cc/bbs/part-time/M.1704077270.... | [多區/個人]PPT製作 | 2024-01-01 10:47:48 | bonzi42 | part_time | ,\n。\n,。。\n★\n,,。\n★\n。\n。\n。\n。\n。\n。\n。\n工作或... | [] | 118.165.136.221 | 2024-01-02 02:11:21 | ptt |
3 | 4 | https://www.ptt.cc/bbs/part-time/M.1704078649.... | [多區/個人]網頁Logo設計 | 2024-01-01 11:10:47 | ymd124783 | part_time | ,\n。\n,。。\n★\n,,。\n★\n。\n。\n。\n。\n。\n。\n。\n工作或... | [] | 111.241.75.141 | 2024-01-02 02:11:21 | ptt |
4 | 5 | https://www.ptt.cc/bbs/part-time/M.1704084465.... | [個人]西語電視台徵求攝影師跟拍(學生可) | 2024-01-01 12:47:43 | addisababa | part_time | ,\n。\n工作或交件期:台灣大選(1/13)前後與當日,約三日(實際時間可議)\n預定排班... | [] | 116.98.255.131 | 2024-01-02 02:11:21 | ptt |
5 | 6 | https://www.ptt.cc/bbs/part-time/M.1704085782.... | [台北/個人] | 2024-01-01 13:09:40 | richman888 | part_time | ,\n。\n,。。\n★\n,,。\n★\n。\n。\n。\n。\n。\n。\n。\n工作或... | [] | 223.138.206.210 | 2024-01-02 02:11:21 | ptt |
6 | 7 | https://www.ptt.cc/bbs/part-time/M.1704097191.... | [個人/桃園]幫忙撕除壁貼(僅一面牆) | 2024-01-01 16:19:49 | Madoona | part_time | ,\n。\n,。。\n★\n,,。\n★\n。\n。\n。\n。\n。\n。\n。\n工作或... | [] | 114.140.80.105 | 2024-01-02 02:11:21 | ptt |
7 | 8 | https://www.ptt.cc/bbs/part-time/M.1704101881.... | [台北/一般]北車蛋糕店徵寒假與長期工讀生 | 2024-01-01 17:37:59 | jj5481 | part_time | ,\n。\n工作或交件期:寒假期間\n預定排班方式:輪班制\n每日工作時間:10:00-22... | [] | 118.168.160.165 | 2024-01-02 02:11:23 | ptt |
8 | 9 | https://www.ptt.cc/bbs/part-time/M.1704115238.... | [高雄/個人]高樹1/3-1/4安裝工人2名2500 | 2024-01-01 21:20:36 | cocawowa | part_time | ,\n。\n,。。\n★\n,,。\n★\n。\n。\n。\n。\n。\n。\n。\n工作或... | [{"cmtStatus": "推", "cmtPoster": "a00000763", ... | 36.239.1.221 | 2024-01-02 02:11:23 | ptt |
9 | 10 | https://www.ptt.cc/bbs/part-time/M.1704116155.... | [台北/一般]中山區魔術酒吧PT | 2024-01-01 21:35:53 | seoegg2 | part_time | ,\n。\n,,。\n★\n工作或交件期:長期-周一至周六\n預定排班方式:月排班\n每日工... | [] | NaN | 2024-01-02 02:11:23 | ptt |
# 移除空文章
print(f"Art content na number : {data['artContent'].isna().sum()}")
data.dropna(subset="artContent",inplace=True)
data.reset_index(inplace=True,drop=True)
# 移除網址
data["artContent"] = data["artContent"].str.replace("(http|https)://.*", "", regex=True)
data["artTitle"] = data["artTitle"].str.replace("(http|https)://.*", "", regex=True)
# 只保留中文字(去除非中文字,包括英數符號)
data["artContent"] = data["artContent"].str.replace("[^\u4e00-\u9fa5]", "", regex=True)
data["artTitle"] = data["artTitle"].str.replace("[^\u4e00-\u9fa5]", "", regex=True)
# 日期轉換與欄位整理
data['artDate'] = pd.to_datetime(data['artDate'])
data['content'] = data['artContent']
# 只保留需要的欄位
data = data.loc[:, ["content", "artUrl", "artCatagory", 'artDate']]
data.head()
Art content na number : 22
content | artUrl | artCatagory | artDate | |
---|---|---|---|---|
0 | 工作日期每日工作時間每日休息時間無休息有無計薪供餐無平常日工資一次應為時薪或日薪不得為月薪計... | https://www.ptt.cc/bbs/part-time/M.1704039923.... | part_time | 2024-01-01 00:25:21 |
1 | 台北市維他命座談會車馬費元民國年次女性目前主要使用的維他命產品為善存不包含銀寶善存克補專科大... | https://www.ptt.cc/bbs/part-time/M.1704068173.... | part_time | 2024-01-01 08:16:11 |
2 | 工作或交件期年早上前將內容檔案傳給你下午前收件預定排班方式每日工作時間每日休息時間工作滿小時... | https://www.ptt.cc/bbs/part-time/M.1704077270.... | part_time | 2024-01-01 10:47:48 |
3 | 工作或交件期晚上以前預定排班方式自行安排每日工作時間自行安排每日休息時間無休息計薪供餐皆無以... | https://www.ptt.cc/bbs/part-time/M.1704078649.... | part_time | 2024-01-01 11:10:47 |
4 | 工作或交件期台灣大選前後與當日約三日實際時間可議預定排班方式無每日工作時間八小時以內視記者採... | https://www.ptt.cc/bbs/part-time/M.1704084465.... | part_time | 2024-01-01 12:47:43 |
len(data)
16310
# 設定繁體中文詞庫
jieba.set_dictionary("./dict/dict.txt.big")
# 新增stopwords
# jieba.analyse.set_stop_words('./dict/stop_words.txt') #jieba.analyse.extract_tags才會作用
with open("./dict/stop_words.txt", encoding="utf-8") as f:
stopWords = [line.strip() for line in f.readlines()]
# 設定斷詞 function
def getToken(row):
seg_list = jieba.cut(row, cut_all=False)
seg_list = [
w for w in seg_list if w not in stopWords and len(w) > 1
] # 篩選掉停用字與字元數小於等於1的詞彙
return seg_list
data["words"] = data["content"].apply(getToken)
data.head()
Building prefix dict from /Users/wengwulin/Desktop/社群媒體分析/讀書會專案/第二次讀書會專案/dict/dict.txt.big ... 2025-04-19 13:27:58,807 : DEBUG : Building prefix dict from /Users/wengwulin/Desktop/社群媒體分析/讀書會專案/第二次讀書會專案/dict/dict.txt.big ... Loading model from cache /var/folders/tz/hplj27qd26n9qxr1cd83m32c0000gn/T/jieba.ude891b01a373b1226a34d91c1ca0b7f6.cache 2025-04-19 13:27:58,809 : DEBUG : Loading model from cache /var/folders/tz/hplj27qd26n9qxr1cd83m32c0000gn/T/jieba.ude891b01a373b1226a34d91c1ca0b7f6.cache Loading model cost 0.590 seconds. 2025-04-19 13:27:59,398 : DEBUG : Loading model cost 0.590 seconds. Prefix dict has been built successfully. 2025-04-19 13:27:59,399 : DEBUG : Prefix dict has been built successfully.
content | artUrl | artCatagory | artDate | words | |
---|---|---|---|---|---|
0 | 工作日期每日工作時間每日休息時間無休息有無計薪供餐無平常日工資一次應為時薪或日薪不得為月薪計... | https://www.ptt.cc/bbs/part-time/M.1704039923.... | part_time | 2024-01-01 00:25:21 | [工作, 日期, 每日, 工作, 時間, 每日, 休息時間, 休息, 計薪, 供餐, 平常,... |
1 | 台北市維他命座談會車馬費元民國年次女性目前主要使用的維他命產品為善存不包含銀寶善存克補專科大... | https://www.ptt.cc/bbs/part-time/M.1704068173.... | part_time | 2024-01-01 08:16:11 | [台北市, 維他命, 座談會, 車馬費, 民國, 年次, 女性, 目前, 主要, 使用, 維... |
2 | 工作或交件期年早上前將內容檔案傳給你下午前收件預定排班方式每日工作時間每日休息時間工作滿小時... | https://www.ptt.cc/bbs/part-time/M.1704077270.... | part_time | 2024-01-01 10:47:48 | [工作, 交件, 期年, 早上, 前將, 內容, 檔案, 傳給, 下午, 收件, 預定, 排... |
3 | 工作或交件期晚上以前預定排班方式自行安排每日工作時間自行安排每日休息時間無休息計薪供餐皆無以... | https://www.ptt.cc/bbs/part-time/M.1704078649.... | part_time | 2024-01-01 11:10:47 | [工作, 交件, 晚上, 以前, 預定, 排班, 方式, 自行安排, 每日, 工作, 時間,... |
4 | 工作或交件期台灣大選前後與當日約三日實際時間可議預定排班方式無每日工作時間八小時以內視記者採... | https://www.ptt.cc/bbs/part-time/M.1704084465.... | part_time | 2024-01-01 12:47:43 | [工作, 交件, 期台灣, 大選, 當日, 三日, 實際, 時間, 可議, 預定, 排班, ... |
data['artCatagory'].unique()
array(['part_time', 'Soft_Job', 'Tech_Job'], dtype=object)
## 定義主題
part_time = data.loc[data['artCatagory'] == 'part_time',:]['words'].explode().value_counts().head(100)
part_time.index
Index(['工作', '方式', '推定', '內容', '聯絡', '砍除', '空白', '情形', '單位', '應徵', '資訊', '第一項', '國定假日', '工資', '文字', '未註明', '聯絡人', '發薪日', '小時', '依法', '連結', '水桶', '表示', '承攬', '分類', '時間', '簡介', '一律', '刪除', '特殊', '物品', '標題', '內信', '形式', '電子郵件', '回覆', '中文', '行動', '市內電話', '地點', '現領', '每日', '規定', '報酬', '條件', '以上', '休息', '同義', '第二項', '日時', '工時', '應徵者', '延長', '是否', '勞健', '注意', '法定', '此行', '基本工資', '徵才', '通知', '項目', '縣市', '相同', '全名', '變更', '標籤', '開頭', '代取', '表單', '一年', '地址', '帳號', '網站', '自然人', '僅有', '人力資源', '第二', '一次性', '三項', '職缺', '諸如', '七日', '星期六', '載明', '第五項', '退文', '再有', '公司', '人數', '保勞退', '排班', '學校', '電話', '面試', '計薪', '提供', '需求', '亦可', '供餐'], dtype='object', name='words')
## 定義主題
soft_job = data.loc[data['artCatagory'] == 'Soft_Job',:]['words'].explode().value_counts().head(100)
soft_job.index
Index(['公司', '工作', '面試', '問題', '時間', '開發', '工程師', '經驗', '目前', '比較', '技術', '軟體', '覺得', '團隊', '程式', '知道', '需要', '相關', '產品', '一些', '使用', '薪資', '能力', '系統', '職缺', '方式', '內容', '真的', '應該', '資料', '台灣', '專案', '現在', '主管', '一下', '學習', '主要', '是否', '直接', '東西', '資訊', '年薪', '員工', '希望', '機會', '分享', '履歷', '小時', '提供', '最後', '之後', '熟悉', '設計', '工具', '前端', '語言', '薪水', '以上', '需求', '已經', '建議', '介紹', '討論', '環境', '服務', '看到', '流程', '感覺', '功能', '經歷', '未來', '興趣', '英文', '了解', '要求', '部分', '處理', '一定', '面試官', '課程', '進行', '測試', '科技', '加班費', '工時', '網站', '畢業', '重要', '準備', '最近', '小弟', '來說', '選擇', '簡單', '一點', '前輩', '管理', '新創', '領域', '架構'], dtype='object', name='words')
## 定義主題
tech_job = data.loc[data['artCatagory'] == 'Tech_Job',:]['words'].explode().value_counts().head(100)
tech_job.index
Index(['公司', '台灣', '美國', '工作', '晶片', '表示', '員工', '半導體', '科技', '台積電', '中國', '技術', '產業', '報導', '台積', '工程師', '英特爾', '市場', '目前', '全球', '發展', '未來', '問題', '企業', '投資', '積電', '指出', '時間', '現在', '相關', '今年', '產品', '製程', '需要', '主管', '提供', '合作', '輝達', '客戶', '先進', '億美元', '製造', '已經', '日本', '記者', '認為', '生產', '設計', '進行', '領域', '需求', '知道', '人才', '薪資', '面試', '去年', '研發', '模型', '研究', '資料', '真的', '使用', '應該', '影響', '包括', '成長', '能力', '蘋果', '機會', '比較', '主要', '系統', '設備', '持續', '中心', '人工智慧', '開發', '億元', '重要', '希望', '政府', '仁勳', '三星', '預計', '超過', '計畫', '年薪', '透過', '宣布', '台北', '直接', '執行長', '國家', '董事長', '代工', '應用', '覺得', '大學', '內容', '業務'], dtype='object', name='words')
custom_topic_word = {
"打工": [
"工作", "方式", "推定", "內容", "聯絡", "砍除", "空白", "情形", "單位", "應徵", "資訊", "第一項",
"國定假日", "工資", "文字", "未註明", "聯絡人", "發薪日", "小時", "依法", "連結", "水桶", "表示",
"承攬", "分類", "時間", "簡介", "一律", "刪除", "特殊", "物品", "標題", "內信", "形式",
"電子郵件", "回覆", "中文", "行動", "市內電話", "地點", "現領", "每日", "規定", "報酬", "條件",
"以上", "休息", "同義", "第二項", "日時", "工時", "應徵者", "延長", "是否", "勞健", "注意",
"法定", "此行", "基本工資", "徵才", "通知", "項目", "縣市", "相同", "全名", "變更", "標籤",
"開頭", "代取", "表單", "一年", "地址", "帳號", "網站", "自然人", "僅有", "人力資源", "第二",
"一次性", "三項", "職缺", "諸如", "七日", "星期六", "載明", "第五項", "退文", "再有", "公司",
"人數", "保勞退", "排班", "學校", "電話", "面試", "計薪", "提供", "需求", "亦可", "供餐"
],
"軟體工作": [
"問題", "開發", "工程師", "經驗", "目前", "比較", "技術", "軟體", "覺得", "團隊", "程式", "知道",
"需要", "相關", "產品", "一些", "使用", "薪資", "能力", "系統", "方式", "真的", "應該", "資料",
"台灣", "專案", "現在", "主管", "一下", "學習", "主要", "直接", "東西", "年薪", "員工", "希望",
"機會", "分享", "履歷", "最後", "之後", "熟悉", "設計", "工具", "前端", "語言", "薪水", "已經",
"建議", "介紹", "討論", "環境", "流程", "感覺", "功能", "經歷", "未來", "興趣", "英文", "了解",
"要求", "部分", "處理", "一定", "面試官", "課程", "進行", "測試", "科技", "加班費", "畢業",
"重要", "準備", "最近", "選擇", "簡單", "一點", "前輩", "管理", "新創", "領域", "架構"
],
"科技業": [
"晶片", "台積電", "中國", "英特爾", "市場", "全球", "發展", "企業", "投資", "積電", "指出", "今年",
"製程", "合作", "輝達", "客戶", "先進", "億美元", "製造", "日本", "記者", "認為", "生產", "人工智慧",
"去年", "研發", "模型", "研究", "影響", "包括", "成長", "蘋果", "設備", "持續", "中心", "億元", "政府",
"仁勳", "三星", "預計", "超過", "計畫", "透過", "宣布", "台北", "執行長", "國家", "董事長", "代工",
"應用", "大學"
],
}
vocabularies = np.unique(reduce(lambda x, y: x + y, custom_topic_word.values()))
vocabularies
array(['一下', '一些', '一定', '一年', '一律', '一次性', '一點', '七日', '三星', '三項', '中國', '中心', '中文', '主管', '主要', '之後', '了解', '亦可', '人力資源', '人工智慧', '人數', '仁勳', '今年', '介紹', '代取', '代工', '以上', '企業', '休息', '使用', '供餐', '依法', '保勞退', '僅有', '億元', '億美元', '先進', '內信', '內容', '全名', '全球', '公司', '再有', '分享', '分類', '刪除', '前端', '前輩', '功能', '加班費', '勞健', '包括', '去年', '台北', '台灣', '台積電', '合作', '同義', '員工', '問題', '單位', '回覆', '國定假日', '國家', '團隊', '地址', '地點', '執行長', '基本工資', '報酬', '大學', '學校', '學習', '客戶', '宣布', '專案', '小時', '履歷', '工作', '工具', '工時', '工程師', '工資', '已經', '市內電話', '市場', '希望', '帳號', '年薪', '延長', '建議', '形式', '影響', '徵才', '情形', '感覺', '應徵', '應徵者', '應用', '應該', '成長', '承攬', '技術', '投資', '持續', '指出', '排班', '推定', '提供', '政府', '文字', '新創', '方式', '日時', '日本', '星期六', '是否', '時間', '晶片', '最後', '最近', '未來', '未註明', '東西', '架構', '條件', '標籤', '標題', '模型', '機會', '此行', '每日', '比較', '水桶', '法定', '注意', '流程', '測試', '準備', '熟悉', '物品', '特殊', '現在', '現領', '環境', '生產', '產品', '畢業', '發展', '發薪日', '目前', '直接', '相同', '相關', '真的', '知道', '砍除', '研發', '研究', '科技', '程式', '積電', '空白', '第一項', '第二', '第二項', '第五項', '管理', '簡介', '簡單', '系統', '經歷', '經驗', '網站', '縣市', '聯絡', '聯絡人', '職缺', '能力', '自然人', '興趣', '英文', '英特爾', '董事長', '薪水', '薪資', '蘋果', '處理', '行動', '表單', '表示', '製程', '製造', '要求', '規定', '覺得', '計畫', '計薪', '討論', '記者', '設備', '設計', '認為', '語言', '課程', '諸如', '變更', '資料', '資訊', '超過', '軟體', '載明', '輝達', '退文', '透過', '通知', '連結', '進行', '選擇', '部分', '重要', '開發', '開頭', '電子郵件', '電話', '需求', '需要', '面試', '面試官', '項目', '預計', '領域'], dtype='<U4')
data_corpus = data['words'].map(" ".join)
vectorizer = CountVectorizer(vocabulary=vocabularies)
data_matrix = vectorizer.fit_transform(data_corpus)
data_matrix = data_matrix.toarray()
feature_names = vectorizer.get_feature_names_out()
DTM_df = pd.DataFrame(columns = feature_names, data = data_matrix)
DTM_df
一下 | 一些 | 一定 | 一年 | 一律 | 一次性 | 一點 | 七日 | 三星 | 三項 | ... | 開頭 | 電子郵件 | 電話 | 需求 | 需要 | 面試 | 面試官 | 項目 | 預計 | 領域 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 2 | 3 | 2 | 0 | 2 | 0 | 2 | ... | 2 | 3 | 0 | 1 | 1 | 1 | 0 | 2 | 0 | 0 |
3 | 1 | 1 | 0 | 2 | 3 | 2 | 0 | 2 | 0 | 2 | ... | 2 | 3 | 0 | 1 | 2 | 1 | 0 | 2 | 0 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
16305 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
16306 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
16307 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
16308 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
16309 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 4 | 0 |
16310 rows × 232 columns
df_count = pd.DataFrame({})
# k 是主題名稱,v 是這個主題下的詞彙list
for k, v in custom_topic_word.items():
idx = np.isin(
feature_names,
v
)
df_count[f'topic_{k}'] = data_matrix[:, idx].sum(axis=1)
df_count
topic_打工 | topic_軟體工作 | topic_科技業 | |
---|---|---|---|
0 | 51 | 1 | 0 |
1 | 25 | 16 | 3 |
2 | 280 | 13 | 0 |
3 | 266 | 18 | 0 |
4 | 69 | 12 | 5 |
... | ... | ... | ... |
16305 | 5 | 12 | 15 |
16306 | 2 | 4 | 2 |
16307 | 1 | 1 | 11 |
16308 | 2 | 19 | 23 |
16309 | 3 | 12 | 15 |
16310 rows × 3 columns
也就是 每篇文章屬於各個主題的相對權重(theta 向量)
thetas = df_count.div(
df_count.sum(axis=1),
axis=0
)
thetas.head()
topic_打工 | topic_軟體工作 | topic_科技業 | |
---|---|---|---|
0 | 0.980769 | 0.019231 | 0.000000 |
1 | 0.568182 | 0.363636 | 0.068182 |
2 | 0.955631 | 0.044369 | 0.000000 |
3 | 0.936620 | 0.063380 | 0.000000 |
4 | 0.802326 | 0.139535 | 0.058140 |
doc['words']
轉換成list¶docs = data['words'].to_list()
docs[0]
['工作', '日期', '每日', '工作', '時間', '每日', '休息時間', '休息', '計薪', '供餐', '平常', '工資', '一次', '應為', '時薪', '日薪', '月薪', '計件', '制者', '註明', '常人', '平均', '每件', '需工', '國定假日', '工資', '無應', '約定', '之倍', '以上', '補休', '留空', '視為', '依法', '規定', '延長', '工時', '工資', '小時', '至少', '上開', '約定', '之倍', '小時', '留空', '視為', '依法', '規定', '勞健', '保勞退', '依法', '規定', '承攬', '制之', '工作', '應填', '依法', '規定', '工資', '發放', '工作', '現領', '工作', '內容', '工作', '地點', '填寫', '完整', '地址', '戶政', '最小', '單位', '相對', '位置', '外包', '承攬', '制得', '留空', '工作', '地點', '私人', '住宅', '得僅', '縣市', '鄉鎮', '市區', '勞務', '內容', '確實', '填寫', '含糊', '工作', '地點', '桃園', '新生路', '永安', '路口', '電梯', '大樓', '勞務', '內容', '兩位', '貨車', '標準', '雙人', '床上', '墊下', '電梯', '大樓', '六樓', '主臥室', '主臥', '標準', '雙人', '床上', '墊下', '墊移', '隔壁', '原有', '兩張', '標準', '雙人', '電梯', '搬到', '樓門口', '聯絡人', '李先生', '聯絡', '方式', '回覆', '應徵者', '僅回', '錄取']
dictionary = Dictionary(docs)
dictionary.filter_extremes(no_below=5, no_above=0.99)
print(dictionary)
2025-04-19 16:02:48,857 : INFO : adding document #0 to Dictionary<0 unique tokens: []> 2025-04-19 16:02:49,742 : INFO : adding document #10000 to Dictionary<58700 unique tokens: ['一次', '上開', '主臥', '主臥室', '之倍']...> 2025-04-19 16:02:50,107 : INFO : built Dictionary<101854 unique tokens: ['一次', '上開', '主臥', '主臥室', '之倍']...> from 16310 documents (total 3597554 corpus positions) 2025-04-19 16:02:50,107 : INFO : Dictionary lifecycle event {'msg': "built Dictionary<101854 unique tokens: ['一次', '上開', '主臥', '主臥室', '之倍']...> from 16310 documents (total 3597554 corpus positions)", 'datetime': '2025-04-19T16:02:50.107747', 'gensim': '4.3.3', 'python': '3.11.2 (main, Apr 21 2023, 22:51:21) [Clang 14.0.3 (clang-1403.0.22.14.1)]', 'platform': 'macOS-15.3.2-arm64-arm-64bit', 'event': 'created'} 2025-04-19 16:02:50,143 : INFO : discarding 78593 tokens: [('主臥', 2), ('主臥室', 3), ('墊移', 1), ('新生路', 4), ('存克補', 1), ('為善存', 1), ('銀寶善', 1), ('具短', 1), ('每頁', 1), ('進投', 1)]... 2025-04-19 16:02:50,144 : INFO : keeping 23261 tokens which were in no less than 5 and no more than 16146 (=99.0%) documents 2025-04-19 16:02:50,168 : INFO : resulting dictionary: Dictionary<23261 unique tokens: ['一次', '上開', '之倍', '以上', '休息']...>
Dictionary<23261 unique tokens: ['一次', '上開', '之倍', '以上', '休息']...>
for idx, (k, v) in enumerate(dictionary.token2id.items()):
print(f"{k}: {v}")
if idx > 10:
break
一次: 0 上開: 1 之倍: 2 以上: 3 休息: 4 休息時間: 5 位置: 6 住宅: 7 供餐: 8 依法: 9 保勞退: 10 僅回: 11
pprint(" ".join(data['words'].iloc[600]))
('台北市 優格 口味 測試 調查 車馬費 元歲 女性 一週 至少 次優 原味 風味 優格 舉辦 時間 四五 調查 時間 小時 配合 填寫 一週 優格 實用 ' '紀錄 照片 簡單 文字說明 舉辦 地點 台北市 南京東路 光復 北路 交叉口 台北 巨蛋 參與 報酬 車馬費 報名 活動 網頁 市調 活動 資訊網 ' '單位名稱 永光 資訊 多媒體 有限公司 單位地址 台北市 山區 久康 街號 活動 地點 自行 前往 負責人 思允 台北市 優格 口味 測試 調查 車馬費 ' '聯絡人 姓氏 陳小姐 電話 聯絡 是否 回信 報名者 報名 自行 留意 網頁 公告 通知 方式 電話 通知 注意事項 公開 招募 列出 必要條件 詳細 ' '參加者 條件 配額 將以 報名 資料 進行 篩選 初步 符合 進行 電話 過濾 訪談 分鐘 合格者 邀請 參加 承上 報名 資料 符合條件 配額 額滿將 ' '另行通知 避免 受訪者 臨時 缺席 影響 活動 行將 邀請 超額 人數 實際 到場 人數 多於 人數 入場 將會 提供 部分 車馬費 報名 資料 電話 ' '過濾 訪談 內容 僅供 篩選 使用 正式 訪問 內容 用於 研究 分析 所有 活動 保證 錄取 能夠 接受者 報名 簡介 常見問題 聯絡 參加 心得 ' '知名 部落 介紹')
# 建立 Bag-of-words 作為文章的特徵表示
# 用 gensim ldamodel input 需要將文章轉換成 bag of words
corpus = [dictionary.doc2bow(doc) for doc in docs]
ldamodel = LdaModel(
corpus=corpus,
id2word=dictionary, # 字典
num_topics=10, # 生成幾個主題數
random_state=2024, # 亂數
)
2025-04-19 00:05:49,364 : INFO : using symmetric alpha at 0.1 2025-04-19 00:05:49,365 : INFO : using symmetric eta at 0.1 2025-04-19 00:05:49,370 : INFO : using serial LDA version on this node 2025-04-19 00:05:49,392 : INFO : running online (single-pass) LDA training, 10 topics, 1 passes over the supplied corpus of 16310 documents, updating model once every 2000 documents, evaluating perplexity every 16310 documents, iterating 50x with a convergence threshold of 0.001000 2025-04-19 00:05:49,392 : WARNING : too few updates, training might not converge; consider increasing the number of passes or iterations to improve accuracy 2025-04-19 00:05:49,393 : INFO : PROGRESS: pass 0, at document #2000/16310 2025-04-19 00:05:50,062 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:05:50,070 : INFO : topic #1 (0.100): 0.030*"工作" + 0.015*"方式" + 0.012*"推定" + 0.012*"砍除" + 0.012*"內容" + 0.011*"文字" + 0.011*"國定假日" + 0.011*"應徵" + 0.010*"資訊" + 0.010*"聯絡" 2025-04-19 00:05:50,070 : INFO : topic #4 (0.100): 0.025*"工作" + 0.014*"推定" + 0.013*"砍除" + 0.012*"方式" + 0.012*"第一項" + 0.012*"內容" + 0.011*"空白" + 0.011*"資訊" + 0.011*"單位" + 0.011*"聯絡" 2025-04-19 00:05:50,071 : INFO : topic #8 (0.100): 0.034*"工作" + 0.016*"方式" + 0.013*"聯絡" + 0.011*"推定" + 0.010*"情形" + 0.009*"內容" + 0.009*"空白" + 0.009*"聯絡人" + 0.009*"小時" + 0.009*"單位" 2025-04-19 00:05:50,071 : INFO : topic #0 (0.100): 0.037*"工作" + 0.018*"推定" + 0.014*"內容" + 0.011*"砍除" + 0.011*"第一項" + 0.010*"方式" + 0.010*"空白" + 0.010*"水桶" + 0.010*"未註明" + 0.010*"單位" 2025-04-19 00:05:50,072 : INFO : topic #6 (0.100): 0.032*"工作" + 0.015*"聯絡" + 0.015*"內容" + 0.015*"方式" + 0.011*"應徵" + 0.011*"國定假日" + 0.010*"情形" + 0.010*"砍除" + 0.009*"推定" + 0.009*"工資" 2025-04-19 00:05:50,072 : INFO : topic diff=9.386298, rho=1.000000 2025-04-19 00:05:50,074 : INFO : PROGRESS: pass 0, at document #4000/16310 2025-04-19 00:05:50,709 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:05:50,713 : INFO : topic #6 (0.100): 0.033*"工作" + 0.016*"方式" + 0.014*"內容" + 0.014*"聯絡" + 0.011*"應徵" + 0.010*"工資" + 0.010*"國定假日" + 0.009*"聯絡人" + 0.008*"情形" + 0.008*"資訊" 2025-04-19 00:05:50,714 : INFO : topic #0 (0.100): 0.039*"工作" + 0.018*"推定" + 0.014*"內容" + 0.012*"第一項" + 0.012*"砍除" + 0.012*"空白" + 0.011*"水桶" + 0.011*"方式" + 0.010*"應徵" + 0.010*"承攬" 2025-04-19 00:05:50,715 : INFO : topic #8 (0.100): 0.031*"工作" + 0.018*"方式" + 0.011*"聯絡" + 0.011*"時間" + 0.011*"小時" + 0.010*"內容" + 0.010*"推定" + 0.010*"報名" + 0.009*"電話" + 0.008*"單位" 2025-04-19 00:05:50,716 : INFO : topic #1 (0.100): 0.028*"工作" + 0.015*"方式" + 0.012*"內容" + 0.012*"文字" + 0.011*"砍除" + 0.010*"推定" + 0.010*"應徵" + 0.010*"聯絡" + 0.009*"資訊" + 0.009*"國定假日" 2025-04-19 00:05:50,716 : INFO : topic #7 (0.100): 0.037*"工作" + 0.013*"情形" + 0.012*"單位" + 0.012*"推定" + 0.012*"空白" + 0.012*"資訊" + 0.012*"方式" + 0.012*"砍除" + 0.011*"第一項" + 0.010*"應徵" 2025-04-19 00:05:50,717 : INFO : topic diff=0.658289, rho=0.707107 2025-04-19 00:05:50,718 : INFO : PROGRESS: pass 0, at document #6000/16310 2025-04-19 00:05:51,299 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:05:51,303 : INFO : topic #3 (0.100): 0.025*"工作" + 0.014*"推定" + 0.014*"國定假日" + 0.014*"方式" + 0.013*"聯絡" + 0.013*"砍除" + 0.012*"空白" + 0.011*"內容" + 0.011*"情形" + 0.011*"資訊" 2025-04-19 00:05:51,304 : INFO : topic #8 (0.100): 0.024*"工作" + 0.015*"方式" + 0.013*"報名" + 0.012*"時間" + 0.011*"電話" + 0.011*"活動" + 0.010*"聯絡" + 0.010*"內容" + 0.010*"小時" + 0.009*"台北市" 2025-04-19 00:05:51,304 : INFO : topic #2 (0.100): 0.028*"工作" + 0.012*"第一項" + 0.012*"方式" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"推定" + 0.011*"砍除" + 0.010*"資訊" + 0.010*"依法" + 0.009*"應徵" 2025-04-19 00:05:51,305 : INFO : topic #5 (0.100): 0.035*"工作" + 0.011*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"推定" + 0.011*"空白" + 0.011*"文字" + 0.010*"應徵" + 0.009*"國定假日" + 0.009*"內容" 2025-04-19 00:05:51,305 : INFO : topic #7 (0.100): 0.038*"工作" + 0.014*"情形" + 0.012*"空白" + 0.012*"單位" + 0.012*"推定" + 0.012*"資訊" + 0.012*"方式" + 0.012*"砍除" + 0.011*"第一項" + 0.010*"應徵" 2025-04-19 00:05:51,306 : INFO : topic diff=0.525696, rho=0.577350 2025-04-19 00:05:51,307 : INFO : PROGRESS: pass 0, at document #8000/16310 2025-04-19 00:05:51,686 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:05:51,690 : INFO : topic #2 (0.100): 0.028*"工作" + 0.012*"第一項" + 0.012*"方式" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"推定" + 0.011*"砍除" + 0.010*"資訊" + 0.010*"依法" + 0.009*"應徵" 2025-04-19 00:05:51,691 : INFO : topic #3 (0.100): 0.025*"工作" + 0.014*"推定" + 0.014*"國定假日" + 0.014*"方式" + 0.013*"聯絡" + 0.013*"砍除" + 0.012*"空白" + 0.011*"內容" + 0.011*"情形" + 0.011*"資訊" 2025-04-19 00:05:51,692 : INFO : topic #7 (0.100): 0.038*"工作" + 0.014*"情形" + 0.012*"空白" + 0.012*"單位" + 0.012*"推定" + 0.012*"資訊" + 0.012*"方式" + 0.012*"砍除" + 0.011*"第一項" + 0.010*"應徵" 2025-04-19 00:05:51,692 : INFO : topic #6 (0.100): 0.031*"工作" + 0.015*"方式" + 0.012*"內容" + 0.011*"聯絡" + 0.008*"時間" + 0.008*"工資" + 0.008*"應徵" + 0.007*"依法" + 0.007*"國定假日" + 0.007*"每日" 2025-04-19 00:05:51,693 : INFO : topic #9 (0.100): 0.025*"工作" + 0.018*"公司" + 0.011*"面試" + 0.010*"工程師" + 0.009*"問題" + 0.009*"時間" + 0.008*"經驗" + 0.007*"團隊" + 0.006*"方式" + 0.006*"技術" 2025-04-19 00:05:51,693 : INFO : topic diff=0.645233, rho=0.500000 2025-04-19 00:05:51,694 : INFO : PROGRESS: pass 0, at document #10000/16310 2025-04-19 00:05:51,977 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:05:51,981 : INFO : topic #6 (0.100): 0.029*"工作" + 0.015*"業界" + 0.013*"方式" + 0.011*"內容" + 0.010*"聯絡" + 0.008*"數學" + 0.008*"時間" + 0.008*"機制" + 0.007*"工資" + 0.007*"應徵" 2025-04-19 00:05:51,981 : INFO : topic #3 (0.100): 0.025*"工作" + 0.014*"推定" + 0.014*"國定假日" + 0.014*"方式" + 0.013*"聯絡" + 0.013*"砍除" + 0.012*"空白" + 0.011*"內容" + 0.011*"情形" + 0.011*"資訊" 2025-04-19 00:05:51,982 : INFO : topic #1 (0.100): 0.020*"工作" + 0.020*"公司" + 0.010*"資深" + 0.009*"開發" + 0.009*"內容" + 0.009*"方式" + 0.009*"資訊" + 0.007*"文字" + 0.006*"聯絡" + 0.006*"分類" 2025-04-19 00:05:51,983 : INFO : topic #2 (0.100): 0.028*"工作" + 0.012*"第一項" + 0.012*"方式" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"推定" + 0.011*"砍除" + 0.010*"資訊" + 0.009*"依法" + 0.009*"應徵" 2025-04-19 00:05:51,984 : INFO : topic #7 (0.100): 0.038*"工作" + 0.014*"情形" + 0.012*"空白" + 0.012*"單位" + 0.012*"推定" + 0.012*"資訊" + 0.012*"方式" + 0.012*"砍除" + 0.011*"第一項" + 0.010*"應徵" 2025-04-19 00:05:51,984 : INFO : topic diff=0.528236, rho=0.447214 2025-04-19 00:05:51,985 : INFO : PROGRESS: pass 0, at document #12000/16310 2025-04-19 00:05:52,347 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:05:52,351 : INFO : topic #9 (0.100): 0.018*"公司" + 0.017*"工作" + 0.009*"面試" + 0.008*"問題" + 0.007*"工程師" + 0.006*"時間" + 0.006*"開發" + 0.006*"技術" + 0.006*"經驗" + 0.005*"比較" 2025-04-19 00:05:52,352 : INFO : topic #3 (0.100): 0.025*"工作" + 0.014*"推定" + 0.014*"國定假日" + 0.014*"方式" + 0.013*"聯絡" + 0.013*"砍除" + 0.012*"空白" + 0.011*"內容" + 0.011*"情形" + 0.011*"資訊" 2025-04-19 00:05:52,353 : INFO : topic #4 (0.100): 0.025*"製程" + 0.023*"工作" + 0.023*"研發" + 0.012*"砍除" + 0.012*"表示" + 0.011*"推定" + 0.011*"第一項" + 0.011*"空白" + 0.011*"方式" + 0.011*"資工" 2025-04-19 00:05:52,354 : INFO : topic #0 (0.100): 0.040*"工作" + 0.018*"推定" + 0.014*"內容" + 0.012*"第一項" + 0.012*"砍除" + 0.012*"空白" + 0.011*"水桶" + 0.010*"方式" + 0.010*"應徵" + 0.010*"承攬" 2025-04-19 00:05:52,354 : INFO : topic #6 (0.100): 0.025*"工作" + 0.024*"業界" + 0.022*"晶片" + 0.011*"方式" + 0.009*"內容" + 0.009*"機制" + 0.008*"數學" + 0.008*"聯絡" + 0.008*"中國" + 0.007*"時間" 2025-04-19 00:05:52,355 : INFO : topic diff=0.464044, rho=0.408248 2025-04-19 00:05:52,355 : INFO : PROGRESS: pass 0, at document #14000/16310 2025-04-19 00:05:52,671 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:05:52,683 : INFO : topic #1 (0.100): 0.019*"公司" + 0.018*"工作" + 0.011*"資深" + 0.009*"資安" + 0.008*"缺點" + 0.008*"開發" + 0.008*"內容" + 0.008*"資訊" + 0.008*"方式" + 0.007*"厲害" 2025-04-19 00:05:52,684 : INFO : topic #7 (0.100): 0.037*"工作" + 0.013*"情形" + 0.012*"單位" + 0.012*"空白" + 0.012*"推定" + 0.012*"資訊" + 0.012*"方式" + 0.011*"砍除" + 0.011*"第一項" + 0.010*"應徵" 2025-04-19 00:05:52,684 : INFO : topic #2 (0.100): 0.025*"工作" + 0.011*"第一項" + 0.011*"方式" + 0.010*"內容" + 0.010*"聯絡" + 0.010*"推定" + 0.010*"砍除" + 0.009*"依法" + 0.009*"資訊" + 0.008*"應徵" 2025-04-19 00:05:52,685 : INFO : topic #5 (0.100): 0.033*"工作" + 0.011*"方式" + 0.010*"情形" + 0.010*"第一項" + 0.010*"推定" + 0.010*"空白" + 0.010*"文字" + 0.010*"應徵" + 0.009*"英國" + 0.009*"國定假日" 2025-04-19 00:05:52,686 : INFO : topic #8 (0.100): 0.011*"半導體" + 0.011*"工作" + 0.008*"公司" + 0.007*"時間" + 0.006*"進行" + 0.006*"目前" + 0.006*"研究" + 0.005*"方式" + 0.005*"資料" + 0.005*"使用" 2025-04-19 00:05:52,687 : INFO : topic diff=0.410106, rho=0.377964 2025-04-19 00:05:52,688 : INFO : PROGRESS: pass 0, at document #16000/16310 2025-04-19 00:05:52,928 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:05:52,933 : INFO : topic #7 (0.100): 0.037*"工作" + 0.013*"情形" + 0.012*"單位" + 0.012*"空白" + 0.012*"資訊" + 0.012*"推定" + 0.011*"方式" + 0.011*"砍除" + 0.011*"第一項" + 0.010*"應徵" 2025-04-19 00:05:52,933 : INFO : topic #1 (0.100): 0.019*"公司" + 0.017*"工作" + 0.011*"資深" + 0.011*"承諾" + 0.009*"缺點" + 0.009*"資安" + 0.009*"厲害" + 0.008*"通勤" + 0.008*"內容" + 0.008*"資訊" 2025-04-19 00:05:52,937 : INFO : topic #3 (0.100): 0.024*"工作" + 0.014*"推定" + 0.014*"國定假日" + 0.013*"方式" + 0.013*"聯絡" + 0.013*"砍除" + 0.012*"空白" + 0.011*"內容" + 0.010*"情形" + 0.010*"資訊" 2025-04-19 00:05:52,938 : INFO : topic #9 (0.100): 0.016*"公司" + 0.011*"工作" + 0.008*"台灣" + 0.007*"美國" + 0.006*"技術" + 0.006*"工程師" + 0.005*"員工" + 0.005*"問題" + 0.005*"科技" + 0.005*"面試" 2025-04-19 00:05:52,939 : INFO : topic #2 (0.100): 0.024*"工作" + 0.014*"尾牙" + 0.010*"第一項" + 0.010*"方式" + 0.009*"內容" + 0.009*"工資" + 0.009*"聯絡" + 0.009*"依法" + 0.009*"推定" + 0.009*"砍除" 2025-04-19 00:05:52,939 : INFO : topic diff=0.362326, rho=0.353553 2025-04-19 00:05:53,024 : INFO : -10.214 per-word bound, 1187.4 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:05:53,024 : INFO : PROGRESS: pass 0, at document #16310/16310 2025-04-19 00:05:53,063 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:05:53,068 : INFO : topic #1 (0.100): 0.019*"公司" + 0.017*"承諾" + 0.015*"工作" + 0.011*"資深" + 0.010*"通勤" + 0.010*"厲害" + 0.009*"通用" + 0.009*"缺點" + 0.008*"責任" + 0.008*"資安" 2025-04-19 00:05:53,068 : INFO : topic #8 (0.100): 0.015*"半導體" + 0.008*"工作" + 0.007*"公司" + 0.006*"時間" + 0.006*"進行" + 0.006*"研究" + 0.006*"蘋果" + 0.006*"模型" + 0.005*"影響" + 0.005*"目前" 2025-04-19 00:05:53,069 : INFO : topic #7 (0.100): 0.036*"工作" + 0.013*"情形" + 0.013*"單位" + 0.012*"空白" + 0.011*"資訊" + 0.011*"推定" + 0.011*"方式" + 0.011*"砍除" + 0.011*"第一項" + 0.010*"應徵" 2025-04-19 00:05:53,069 : INFO : topic #4 (0.100): 0.114*"研發" + 0.114*"製程" + 0.032*"職場" + 0.017*"半導體" + 0.016*"表示" + 0.016*"工作" + 0.013*"資工" + 0.010*"半導體業" + 0.010*"聯電" + 0.010*"光電" 2025-04-19 00:05:53,070 : INFO : topic #2 (0.100): 0.022*"工作" + 0.013*"尾牙" + 0.009*"第一項" + 0.009*"方式" + 0.009*"內容" + 0.009*"工資" + 0.008*"聯絡" + 0.008*"依法" + 0.008*"推定" + 0.008*"砍除" 2025-04-19 00:05:53,070 : INFO : topic diff=0.344905, rho=0.333333 2025-04-19 00:05:53,071 : INFO : LdaModel lifecycle event {'msg': 'trained LdaModel<num_terms=23261, num_topics=10, decay=0.5, chunksize=2000> in 3.68s', 'datetime': '2025-04-19T00:05:53.071124', 'gensim': '4.3.3', 'python': '3.11.2 (main, Apr 21 2023, 22:51:21) [Clang 14.0.3 (clang-1403.0.22.14.1)]', 'platform': 'macOS-15.3.2-arm64-arm-64bit', 'event': 'created'}
t0 = time.time()
topic_num_list = np.arange(2, 10)
result = {"topic_num":[], "perplexity":[], "pmi":[]}
model_set = dict()
for topic_num in topic_num_list:
# perplexity
model = LdaModel(
corpus = corpus,
num_topics = topic_num ,
id2word=dictionary,
random_state = 1500,
passes=5 # 訓練次數
)
loss = model.log_perplexity(corpus)
pmi = CoherenceModel(model=model, texts=docs, coherence='c_npmi').get_coherence()
perplexity = np.exp(-1. * loss)
# model_set[f'k_{topic_num}'] = model
result['topic_num'].append(topic_num)
result['perplexity'].append(perplexity)
result['pmi'].append(pmi)
print(f"花費時間: {time.time() - t0} sec")
2025-04-19 00:07:46,705 : INFO : using symmetric alpha at 0.5 2025-04-19 00:07:46,706 : INFO : using symmetric eta at 0.5 2025-04-19 00:07:46,709 : INFO : using serial LDA version on this node 2025-04-19 00:07:46,713 : INFO : running online (multi-pass) LDA training, 2 topics, 5 passes over the supplied corpus of 16310 documents, updating model once every 2000 documents, evaluating perplexity every 16310 documents, iterating 50x with a convergence threshold of 0.001000 2025-04-19 00:07:46,714 : INFO : PROGRESS: pass 0, at document #2000/16310 2025-04-19 00:07:47,417 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:47,419 : INFO : topic #0 (0.500): 0.031*"工作" + 0.013*"應徵" + 0.013*"方式" + 0.012*"推定" + 0.011*"空白" + 0.011*"內容" + 0.010*"砍除" + 0.010*"單位" + 0.009*"工資" + 0.009*"聯絡" 2025-04-19 00:07:47,419 : INFO : topic #1 (0.500): 0.032*"工作" + 0.014*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"情形" + 0.010*"砍除" + 0.010*"空白" + 0.010*"單位" + 0.010*"第一項" 2025-04-19 00:07:47,420 : INFO : topic diff=4.666146, rho=1.000000 2025-04-19 00:07:47,421 : INFO : PROGRESS: pass 0, at document #4000/16310 2025-04-19 00:07:48,069 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:48,071 : INFO : topic #0 (0.500): 0.029*"工作" + 0.014*"方式" + 0.010*"內容" + 0.010*"應徵" + 0.010*"推定" + 0.009*"聯絡" + 0.009*"工資" + 0.009*"單位" + 0.008*"時間" + 0.008*"地點" 2025-04-19 00:07:48,072 : INFO : topic #1 (0.500): 0.033*"工作" + 0.013*"方式" + 0.013*"推定" + 0.011*"砍除" + 0.011*"情形" + 0.011*"空白" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"第一項" + 0.010*"單位" 2025-04-19 00:07:48,072 : INFO : topic diff=0.508650, rho=0.707107 2025-04-19 00:07:48,073 : INFO : PROGRESS: pass 0, at document #6000/16310 2025-04-19 00:07:48,546 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:48,548 : INFO : topic #0 (0.500): 0.024*"工作" + 0.013*"方式" + 0.009*"時間" + 0.009*"內容" + 0.008*"聯絡" + 0.007*"報名" + 0.007*"電話" + 0.007*"小時" + 0.007*"地點" + 0.007*"推定" 2025-04-19 00:07:48,548 : INFO : topic #1 (0.500): 0.033*"工作" + 0.013*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.011*"空白" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"單位" 2025-04-19 00:07:48,548 : INFO : topic diff=0.722963, rho=0.577350 2025-04-19 00:07:48,549 : INFO : PROGRESS: pass 0, at document #8000/16310 2025-04-19 00:07:48,692 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:48,693 : INFO : topic #0 (0.500): 0.019*"工作" + 0.013*"公司" + 0.008*"時間" + 0.007*"面試" + 0.007*"方式" + 0.005*"內容" + 0.005*"經驗" + 0.005*"問題" + 0.005*"工程師" + 0.004*"開發" 2025-04-19 00:07:48,693 : INFO : topic #1 (0.500): 0.033*"工作" + 0.013*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.011*"空白" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"單位" 2025-04-19 00:07:48,694 : INFO : topic diff=0.908972, rho=0.500000 2025-04-19 00:07:48,694 : INFO : PROGRESS: pass 0, at document #10000/16310 2025-04-19 00:07:48,828 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:48,830 : INFO : topic #0 (0.500): 0.017*"工作" + 0.014*"公司" + 0.008*"面試" + 0.007*"時間" + 0.006*"問題" + 0.005*"經驗" + 0.005*"開發" + 0.005*"工程師" + 0.005*"方式" + 0.004*"內容" 2025-04-19 00:07:48,830 : INFO : topic #1 (0.500): 0.033*"工作" + 0.013*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.011*"空白" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"單位" 2025-04-19 00:07:48,831 : INFO : topic diff=0.496164, rho=0.447214 2025-04-19 00:07:48,831 : INFO : PROGRESS: pass 0, at document #12000/16310 2025-04-19 00:07:48,976 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:48,978 : INFO : topic #0 (0.500): 0.013*"工作" + 0.013*"公司" + 0.006*"面試" + 0.005*"時間" + 0.005*"問題" + 0.004*"工程師" + 0.004*"開發" + 0.004*"經驗" + 0.004*"目前" + 0.004*"技術" 2025-04-19 00:07:48,978 : INFO : topic #1 (0.500): 0.033*"工作" + 0.013*"推定" + 0.013*"方式" + 0.011*"砍除" + 0.011*"空白" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"第一項" + 0.011*"內容" + 0.010*"單位" 2025-04-19 00:07:48,979 : INFO : topic diff=0.566727, rho=0.408248 2025-04-19 00:07:48,979 : INFO : PROGRESS: pass 0, at document #14000/16310 2025-04-19 00:07:49,122 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:49,123 : INFO : topic #0 (0.500): 0.012*"公司" + 0.010*"工作" + 0.005*"台灣" + 0.004*"面試" + 0.004*"時間" + 0.004*"問題" + 0.004*"工程師" + 0.004*"技術" + 0.003*"目前" + 0.003*"員工" 2025-04-19 00:07:49,124 : INFO : topic #1 (0.500): 0.032*"工作" + 0.013*"推定" + 0.013*"方式" + 0.011*"砍除" + 0.011*"空白" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"第一項" + 0.010*"內容" + 0.010*"單位" 2025-04-19 00:07:49,124 : INFO : topic diff=0.438542, rho=0.377964 2025-04-19 00:07:49,125 : INFO : PROGRESS: pass 0, at document #16000/16310 2025-04-19 00:07:49,268 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:49,269 : INFO : topic #0 (0.500): 0.011*"公司" + 0.008*"工作" + 0.006*"台灣" + 0.004*"美國" + 0.004*"技術" + 0.004*"晶片" + 0.004*"問題" + 0.004*"員工" + 0.004*"時間" + 0.004*"工程師" 2025-04-19 00:07:49,270 : INFO : topic #1 (0.500): 0.032*"工作" + 0.013*"推定" + 0.012*"方式" + 0.011*"砍除" + 0.011*"空白" + 0.011*"情形" + 0.010*"聯絡" + 0.010*"第一項" + 0.010*"內容" + 0.010*"單位" 2025-04-19 00:07:49,270 : INFO : topic diff=0.326256, rho=0.353553 2025-04-19 00:07:49,329 : INFO : -8.501 per-word bound, 362.2 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:07:49,329 : INFO : PROGRESS: pass 0, at document #16310/16310 2025-04-19 00:07:49,354 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:07:49,356 : INFO : topic #0 (0.500): 0.011*"公司" + 0.007*"工作" + 0.006*"美國" + 0.006*"台灣" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.003*"台積電" 2025-04-19 00:07:49,356 : INFO : topic #1 (0.500): 0.031*"工作" + 0.012*"推定" + 0.012*"方式" + 0.011*"砍除" + 0.011*"情形" + 0.011*"空白" + 0.010*"聯絡" + 0.010*"單位" + 0.010*"內容" + 0.010*"第一項" 2025-04-19 00:07:49,356 : INFO : topic diff=0.359043, rho=0.333333 2025-04-19 00:07:49,356 : INFO : PROGRESS: pass 1, at document #2000/16310 2025-04-19 00:07:49,642 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:49,644 : INFO : topic #0 (0.500): 0.011*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.006*"美國" + 0.004*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"時間" + 0.004*"科技" + 0.003*"表示" 2025-04-19 00:07:49,644 : INFO : topic #1 (0.500): 0.033*"工作" + 0.014*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"砍除" + 0.011*"空白" + 0.010*"情形" + 0.010*"單位" + 0.010*"應徵" 2025-04-19 00:07:49,645 : INFO : topic diff=0.916464, rho=0.313805 2025-04-19 00:07:49,645 : INFO : PROGRESS: pass 1, at document #4000/16310 2025-04-19 00:07:49,936 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:49,938 : INFO : topic #0 (0.500): 0.010*"公司" + 0.007*"工作" + 0.005*"台灣" + 0.005*"美國" + 0.004*"時間" + 0.004*"技術" + 0.004*"晶片" + 0.003*"員工" + 0.003*"科技" + 0.003*"資料" 2025-04-19 00:07:49,938 : INFO : topic #1 (0.500): 0.033*"工作" + 0.014*"方式" + 0.013*"推定" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"砍除" + 0.011*"空白" + 0.010*"情形" + 0.010*"單位" + 0.010*"應徵" 2025-04-19 00:07:49,938 : INFO : topic diff=0.329926, rho=0.313805 2025-04-19 00:07:49,939 : INFO : PROGRESS: pass 1, at document #6000/16310 2025-04-19 00:07:50,184 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:50,186 : INFO : topic #0 (0.500): 0.010*"公司" + 0.007*"工作" + 0.005*"報名" + 0.005*"時間" + 0.005*"活動" + 0.005*"台灣" + 0.004*"資料" + 0.004*"美國" + 0.003*"技術" + 0.003*"使用" 2025-04-19 00:07:50,187 : INFO : topic #1 (0.500): 0.034*"工作" + 0.014*"方式" + 0.013*"推定" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"砍除" + 0.011*"空白" + 0.010*"情形" + 0.010*"單位" + 0.010*"應徵" 2025-04-19 00:07:50,187 : INFO : topic diff=0.267877, rho=0.313805 2025-04-19 00:07:50,187 : INFO : PROGRESS: pass 1, at document #8000/16310 2025-04-19 00:07:50,411 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:50,413 : INFO : topic #0 (0.500): 0.013*"公司" + 0.011*"工作" + 0.006*"面試" + 0.006*"時間" + 0.005*"問題" + 0.005*"工程師" + 0.004*"經驗" + 0.004*"技術" + 0.004*"開發" + 0.004*"目前" 2025-04-19 00:07:50,414 : INFO : topic #1 (0.500): 0.034*"工作" + 0.014*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"砍除" + 0.011*"空白" + 0.010*"情形" + 0.010*"單位" + 0.010*"應徵" 2025-04-19 00:07:50,414 : INFO : topic diff=0.389331, rho=0.313805 2025-04-19 00:07:50,414 : INFO : PROGRESS: pass 1, at document #10000/16310 2025-04-19 00:07:50,619 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:50,621 : INFO : topic #0 (0.500): 0.014*"公司" + 0.012*"工作" + 0.007*"面試" + 0.006*"問題" + 0.006*"時間" + 0.005*"工程師" + 0.005*"經驗" + 0.005*"開發" + 0.004*"目前" + 0.004*"技術" 2025-04-19 00:07:50,622 : INFO : topic #1 (0.500): 0.034*"工作" + 0.014*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"砍除" + 0.010*"空白" + 0.010*"單位" + 0.010*"情形" + 0.010*"應徵" 2025-04-19 00:07:50,622 : INFO : topic diff=0.292958, rho=0.313805 2025-04-19 00:07:50,622 : INFO : PROGRESS: pass 1, at document #12000/16310 2025-04-19 00:07:50,785 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:50,786 : INFO : topic #0 (0.500): 0.013*"公司" + 0.010*"工作" + 0.006*"面試" + 0.005*"問題" + 0.005*"時間" + 0.004*"工程師" + 0.004*"開發" + 0.004*"台灣" + 0.004*"技術" + 0.004*"目前" 2025-04-19 00:07:50,787 : INFO : topic #1 (0.500): 0.034*"工作" + 0.014*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"砍除" + 0.010*"空白" + 0.010*"單位" + 0.010*"情形" + 0.010*"應徵" 2025-04-19 00:07:50,787 : INFO : topic diff=0.313159, rho=0.313805 2025-04-19 00:07:50,787 : INFO : PROGRESS: pass 1, at document #14000/16310 2025-04-19 00:07:50,944 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:50,946 : INFO : topic #0 (0.500): 0.012*"公司" + 0.009*"工作" + 0.005*"台灣" + 0.004*"面試" + 0.004*"問題" + 0.004*"時間" + 0.004*"工程師" + 0.004*"技術" + 0.003*"目前" + 0.003*"員工" 2025-04-19 00:07:50,946 : INFO : topic #1 (0.500): 0.034*"工作" + 0.014*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"砍除" + 0.010*"單位" + 0.010*"空白" + 0.010*"情形" + 0.010*"應徵" 2025-04-19 00:07:50,947 : INFO : topic diff=0.308267, rho=0.313805 2025-04-19 00:07:50,947 : INFO : PROGRESS: pass 1, at document #16000/16310 2025-04-19 00:07:51,096 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:51,098 : INFO : topic #0 (0.500): 0.011*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.005*"美國" + 0.004*"技術" + 0.004*"晶片" + 0.004*"問題" + 0.004*"員工" + 0.004*"工程師" + 0.004*"面試" 2025-04-19 00:07:51,098 : INFO : topic #1 (0.500): 0.034*"工作" + 0.014*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"單位" + 0.010*"砍除" + 0.010*"情形" + 0.010*"空白" + 0.010*"應徵" 2025-04-19 00:07:51,098 : INFO : topic diff=0.264518, rho=0.313805 2025-04-19 00:07:51,158 : INFO : -8.453 per-word bound, 350.3 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:07:51,158 : INFO : PROGRESS: pass 1, at document #16310/16310 2025-04-19 00:07:51,183 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:07:51,185 : INFO : topic #0 (0.500): 0.012*"公司" + 0.007*"工作" + 0.006*"美國" + 0.006*"台灣" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.003*"問題" 2025-04-19 00:07:51,185 : INFO : topic #1 (0.500): 0.034*"工作" + 0.014*"方式" + 0.012*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"單位" + 0.010*"砍除" + 0.010*"應徵" + 0.010*"情形" + 0.010*"空白" 2025-04-19 00:07:51,185 : INFO : topic diff=0.323771, rho=0.313805 2025-04-19 00:07:51,186 : INFO : PROGRESS: pass 2, at document #2000/16310 2025-04-19 00:07:51,442 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:51,444 : INFO : topic #0 (0.500): 0.011*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.006*"美國" + 0.004*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.003*"問題" + 0.003*"時間" 2025-04-19 00:07:51,445 : INFO : topic #1 (0.500): 0.033*"工作" + 0.014*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"砍除" + 0.010*"空白" + 0.010*"單位" + 0.010*"情形" + 0.010*"應徵" 2025-04-19 00:07:51,445 : INFO : topic diff=0.673476, rho=0.299409 2025-04-19 00:07:51,445 : INFO : PROGRESS: pass 2, at document #4000/16310 2025-04-19 00:07:51,719 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:51,721 : INFO : topic #0 (0.500): 0.010*"公司" + 0.006*"工作" + 0.006*"台灣" + 0.005*"美國" + 0.004*"技術" + 0.004*"時間" + 0.004*"晶片" + 0.003*"員工" + 0.003*"科技" + 0.003*"資料" 2025-04-19 00:07:51,721 : INFO : topic #1 (0.500): 0.033*"工作" + 0.014*"方式" + 0.013*"推定" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"砍除" + 0.011*"空白" + 0.010*"單位" + 0.010*"情形" + 0.010*"應徵" 2025-04-19 00:07:51,722 : INFO : topic diff=0.302914, rho=0.299409 2025-04-19 00:07:51,722 : INFO : PROGRESS: pass 2, at document #6000/16310 2025-04-19 00:07:51,954 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:51,955 : INFO : topic #0 (0.500): 0.011*"公司" + 0.007*"工作" + 0.005*"報名" + 0.005*"台灣" + 0.005*"活動" + 0.005*"時間" + 0.004*"資料" + 0.004*"美國" + 0.004*"技術" + 0.003*"問題" 2025-04-19 00:07:51,956 : INFO : topic #1 (0.500): 0.034*"工作" + 0.014*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"砍除" + 0.011*"空白" + 0.010*"單位" + 0.010*"情形" + 0.010*"應徵" 2025-04-19 00:07:51,956 : INFO : topic diff=0.247939, rho=0.299409 2025-04-19 00:07:51,956 : INFO : PROGRESS: pass 2, at document #8000/16310 2025-04-19 00:07:52,154 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:52,155 : INFO : topic #0 (0.500): 0.014*"公司" + 0.010*"工作" + 0.006*"面試" + 0.005*"時間" + 0.005*"問題" + 0.005*"工程師" + 0.004*"經驗" + 0.004*"技術" + 0.004*"開發" + 0.004*"目前" 2025-04-19 00:07:52,156 : INFO : topic #1 (0.500): 0.034*"工作" + 0.014*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"砍除" + 0.010*"空白" + 0.010*"單位" + 0.010*"應徵" + 0.010*"情形" 2025-04-19 00:07:52,156 : INFO : topic diff=0.365239, rho=0.299409 2025-04-19 00:07:52,156 : INFO : PROGRESS: pass 2, at document #10000/16310 2025-04-19 00:07:52,328 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:52,329 : INFO : topic #0 (0.500): 0.014*"公司" + 0.011*"工作" + 0.007*"面試" + 0.006*"問題" + 0.006*"時間" + 0.005*"工程師" + 0.005*"經驗" + 0.005*"開發" + 0.004*"目前" + 0.004*"技術" 2025-04-19 00:07:52,330 : INFO : topic #1 (0.500): 0.034*"工作" + 0.015*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"砍除" + 0.010*"空白" + 0.010*"單位" + 0.010*"應徵" + 0.010*"情形" 2025-04-19 00:07:52,330 : INFO : topic diff=0.278015, rho=0.299409 2025-04-19 00:07:52,331 : INFO : PROGRESS: pass 2, at document #12000/16310 2025-04-19 00:07:52,486 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:52,488 : INFO : topic #0 (0.500): 0.013*"公司" + 0.010*"工作" + 0.006*"面試" + 0.005*"問題" + 0.005*"時間" + 0.004*"工程師" + 0.004*"開發" + 0.004*"台灣" + 0.004*"技術" + 0.004*"目前" 2025-04-19 00:07:52,488 : INFO : topic #1 (0.500): 0.034*"工作" + 0.015*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"砍除" + 0.010*"單位" + 0.010*"應徵" + 0.010*"空白" + 0.010*"情形" 2025-04-19 00:07:52,488 : INFO : topic diff=0.293189, rho=0.299409 2025-04-19 00:07:52,489 : INFO : PROGRESS: pass 2, at document #14000/16310 2025-04-19 00:07:52,641 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:52,642 : INFO : topic #0 (0.500): 0.012*"公司" + 0.008*"工作" + 0.005*"台灣" + 0.004*"面試" + 0.004*"問題" + 0.004*"工程師" + 0.004*"技術" + 0.004*"時間" + 0.003*"目前" + 0.003*"員工" 2025-04-19 00:07:52,643 : INFO : topic #1 (0.500): 0.034*"工作" + 0.015*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"單位" + 0.010*"砍除" + 0.010*"應徵" + 0.010*"空白" + 0.010*"情形" 2025-04-19 00:07:52,643 : INFO : topic diff=0.291490, rho=0.299409 2025-04-19 00:07:52,643 : INFO : PROGRESS: pass 2, at document #16000/16310 2025-04-19 00:07:52,790 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:52,792 : INFO : topic #0 (0.500): 0.011*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.004*"美國" + 0.004*"技術" + 0.004*"問題" + 0.004*"晶片" + 0.004*"工程師" + 0.004*"員工" + 0.004*"面試" 2025-04-19 00:07:52,792 : INFO : topic #1 (0.500): 0.034*"工作" + 0.014*"方式" + 0.012*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"單位" + 0.010*"應徵" + 0.010*"情形" + 0.010*"砍除" + 0.010*"空白" 2025-04-19 00:07:52,792 : INFO : topic diff=0.250620, rho=0.299409 2025-04-19 00:07:52,851 : INFO : -8.449 per-word bound, 349.4 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:07:52,851 : INFO : PROGRESS: pass 2, at document #16310/16310 2025-04-19 00:07:52,876 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:07:52,877 : INFO : topic #0 (0.500): 0.012*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.006*"美國" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.003*"問題" + 0.003*"表示" 2025-04-19 00:07:52,878 : INFO : topic #1 (0.500): 0.034*"工作" + 0.014*"方式" + 0.012*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"單位" + 0.010*"應徵" + 0.010*"情形" + 0.010*"砍除" + 0.010*"空白" 2025-04-19 00:07:52,878 : INFO : topic diff=0.306627, rho=0.299409 2025-04-19 00:07:52,878 : INFO : PROGRESS: pass 3, at document #2000/16310 2025-04-19 00:07:53,126 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:53,127 : INFO : topic #0 (0.500): 0.011*"公司" + 0.006*"工作" + 0.006*"台灣" + 0.006*"美國" + 0.004*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.003*"問題" + 0.003*"時間" 2025-04-19 00:07:53,128 : INFO : topic #1 (0.500): 0.033*"工作" + 0.014*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"砍除" + 0.010*"空白" + 0.010*"單位" + 0.010*"情形" + 0.010*"應徵" 2025-04-19 00:07:53,128 : INFO : topic diff=0.623463, rho=0.286829 2025-04-19 00:07:53,128 : INFO : PROGRESS: pass 3, at document #4000/16310 2025-04-19 00:07:53,374 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:53,376 : INFO : topic #0 (0.500): 0.011*"公司" + 0.006*"工作" + 0.006*"台灣" + 0.005*"美國" + 0.004*"技術" + 0.004*"時間" + 0.004*"晶片" + 0.003*"員工" + 0.003*"問題" + 0.003*"科技" 2025-04-19 00:07:53,377 : INFO : topic #1 (0.500): 0.033*"工作" + 0.014*"方式" + 0.013*"推定" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"砍除" + 0.011*"空白" + 0.010*"單位" + 0.010*"情形" + 0.010*"應徵" 2025-04-19 00:07:53,377 : INFO : topic diff=0.292407, rho=0.286829 2025-04-19 00:07:53,377 : INFO : PROGRESS: pass 3, at document #6000/16310 2025-04-19 00:07:53,604 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:53,605 : INFO : topic #0 (0.500): 0.011*"公司" + 0.007*"工作" + 0.005*"報名" + 0.005*"台灣" + 0.004*"時間" + 0.004*"活動" + 0.004*"美國" + 0.004*"資料" + 0.004*"技術" + 0.003*"問題" 2025-04-19 00:07:53,606 : INFO : topic #1 (0.500): 0.034*"工作" + 0.014*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"砍除" + 0.010*"空白" + 0.010*"單位" + 0.010*"情形" + 0.010*"應徵" 2025-04-19 00:07:53,606 : INFO : topic diff=0.236177, rho=0.286829 2025-04-19 00:07:53,606 : INFO : PROGRESS: pass 3, at document #8000/16310 2025-04-19 00:07:53,797 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:53,798 : INFO : topic #0 (0.500): 0.013*"公司" + 0.010*"工作" + 0.006*"面試" + 0.005*"時間" + 0.005*"問題" + 0.005*"工程師" + 0.004*"經驗" + 0.004*"技術" + 0.004*"台灣" + 0.004*"目前" 2025-04-19 00:07:53,799 : INFO : topic #1 (0.500): 0.034*"工作" + 0.015*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"砍除" + 0.010*"空白" + 0.010*"應徵" + 0.010*"單位" + 0.010*"情形" 2025-04-19 00:07:53,799 : INFO : topic diff=0.345315, rho=0.286829 2025-04-19 00:07:53,799 : INFO : PROGRESS: pass 3, at document #10000/16310 2025-04-19 00:07:53,968 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:53,969 : INFO : topic #0 (0.500): 0.014*"公司" + 0.011*"工作" + 0.007*"面試" + 0.006*"問題" + 0.005*"時間" + 0.005*"工程師" + 0.005*"經驗" + 0.005*"開發" + 0.004*"目前" + 0.004*"技術" 2025-04-19 00:07:53,970 : INFO : topic #1 (0.500): 0.034*"工作" + 0.015*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"砍除" + 0.010*"應徵" + 0.010*"空白" + 0.010*"單位" + 0.010*"情形" 2025-04-19 00:07:53,970 : INFO : topic diff=0.264775, rho=0.286829 2025-04-19 00:07:53,971 : INFO : PROGRESS: pass 3, at document #12000/16310 2025-04-19 00:07:54,125 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:54,126 : INFO : topic #0 (0.500): 0.013*"公司" + 0.010*"工作" + 0.006*"面試" + 0.005*"問題" + 0.005*"時間" + 0.004*"工程師" + 0.004*"開發" + 0.004*"台灣" + 0.004*"技術" + 0.004*"目前" 2025-04-19 00:07:54,127 : INFO : topic #1 (0.500): 0.034*"工作" + 0.015*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"砍除" + 0.010*"單位" + 0.010*"應徵" + 0.010*"空白" + 0.010*"情形" 2025-04-19 00:07:54,127 : INFO : topic diff=0.277286, rho=0.286829 2025-04-19 00:07:54,127 : INFO : PROGRESS: pass 3, at document #14000/16310 2025-04-19 00:07:54,307 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:54,308 : INFO : topic #0 (0.500): 0.012*"公司" + 0.008*"工作" + 0.005*"台灣" + 0.005*"面試" + 0.004*"問題" + 0.004*"工程師" + 0.004*"技術" + 0.004*"時間" + 0.003*"目前" + 0.003*"開發" 2025-04-19 00:07:54,309 : INFO : topic #1 (0.500): 0.034*"工作" + 0.015*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"單位" + 0.010*"砍除" + 0.010*"應徵" + 0.010*"空白" + 0.010*"情形" 2025-04-19 00:07:54,309 : INFO : topic diff=0.277794, rho=0.286829 2025-04-19 00:07:54,309 : INFO : PROGRESS: pass 3, at document #16000/16310 2025-04-19 00:07:54,457 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:54,458 : INFO : topic #0 (0.500): 0.011*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.004*"美國" + 0.004*"技術" + 0.004*"問題" + 0.004*"晶片" + 0.004*"工程師" + 0.004*"面試" + 0.004*"員工" 2025-04-19 00:07:54,459 : INFO : topic #1 (0.500): 0.034*"工作" + 0.015*"方式" + 0.012*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"單位" + 0.010*"應徵" + 0.010*"情形" + 0.010*"砍除" + 0.010*"空白" 2025-04-19 00:07:54,459 : INFO : topic diff=0.239211, rho=0.286829 2025-04-19 00:07:54,518 : INFO : -8.448 per-word bound, 349.2 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:07:54,518 : INFO : PROGRESS: pass 3, at document #16310/16310 2025-04-19 00:07:54,543 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:07:54,544 : INFO : topic #0 (0.500): 0.012*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.006*"美國" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"問題" + 0.003*"表示" 2025-04-19 00:07:54,544 : INFO : topic #1 (0.500): 0.034*"工作" + 0.014*"方式" + 0.012*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"單位" + 0.010*"應徵" + 0.010*"情形" + 0.010*"砍除" + 0.010*"空白" 2025-04-19 00:07:54,545 : INFO : topic diff=0.292528, rho=0.286829 2025-04-19 00:07:54,545 : INFO : PROGRESS: pass 4, at document #2000/16310 2025-04-19 00:07:54,792 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:54,793 : INFO : topic #0 (0.500): 0.011*"公司" + 0.006*"工作" + 0.006*"台灣" + 0.006*"美國" + 0.004*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.003*"問題" + 0.003*"時間" 2025-04-19 00:07:54,793 : INFO : topic #1 (0.500): 0.033*"工作" + 0.014*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"砍除" + 0.010*"空白" + 0.010*"單位" + 0.010*"情形" + 0.010*"應徵" 2025-04-19 00:07:54,794 : INFO : topic diff=0.586245, rho=0.275711 2025-04-19 00:07:54,794 : INFO : PROGRESS: pass 4, at document #4000/16310 2025-04-19 00:07:55,037 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:55,039 : INFO : topic #0 (0.500): 0.011*"公司" + 0.006*"工作" + 0.006*"台灣" + 0.005*"美國" + 0.004*"技術" + 0.004*"時間" + 0.004*"晶片" + 0.003*"員工" + 0.003*"問題" + 0.003*"科技" 2025-04-19 00:07:55,039 : INFO : topic #1 (0.500): 0.034*"工作" + 0.014*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"砍除" + 0.011*"空白" + 0.010*"單位" + 0.010*"情形" + 0.010*"應徵" 2025-04-19 00:07:55,040 : INFO : topic diff=0.284091, rho=0.275711 2025-04-19 00:07:55,040 : INFO : PROGRESS: pass 4, at document #6000/16310 2025-04-19 00:07:55,264 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:55,265 : INFO : topic #0 (0.500): 0.011*"公司" + 0.007*"工作" + 0.005*"台灣" + 0.005*"報名" + 0.004*"時間" + 0.004*"活動" + 0.004*"美國" + 0.004*"資料" + 0.004*"技術" + 0.003*"問題" 2025-04-19 00:07:55,266 : INFO : topic #1 (0.500): 0.034*"工作" + 0.014*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"砍除" + 0.010*"空白" + 0.010*"單位" + 0.010*"情形" + 0.010*"應徵" 2025-04-19 00:07:55,266 : INFO : topic diff=0.226828, rho=0.275711 2025-04-19 00:07:55,267 : INFO : PROGRESS: pass 4, at document #8000/16310 2025-04-19 00:07:55,455 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:55,457 : INFO : topic #0 (0.500): 0.013*"公司" + 0.010*"工作" + 0.006*"面試" + 0.005*"時間" + 0.005*"問題" + 0.005*"工程師" + 0.004*"技術" + 0.004*"經驗" + 0.004*"台灣" + 0.004*"目前" 2025-04-19 00:07:55,458 : INFO : topic #1 (0.500): 0.034*"工作" + 0.015*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"砍除" + 0.010*"空白" + 0.010*"應徵" + 0.010*"單位" + 0.010*"情形" 2025-04-19 00:07:55,458 : INFO : topic diff=0.328348, rho=0.275711 2025-04-19 00:07:55,458 : INFO : PROGRESS: pass 4, at document #10000/16310 2025-04-19 00:07:55,625 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:55,627 : INFO : topic #0 (0.500): 0.014*"公司" + 0.011*"工作" + 0.007*"面試" + 0.006*"問題" + 0.005*"時間" + 0.005*"工程師" + 0.005*"經驗" + 0.005*"開發" + 0.004*"目前" + 0.004*"技術" 2025-04-19 00:07:55,627 : INFO : topic #1 (0.500): 0.034*"工作" + 0.015*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"砍除" + 0.010*"應徵" + 0.010*"空白" + 0.010*"單位" + 0.010*"情形" 2025-04-19 00:07:55,628 : INFO : topic diff=0.253407, rho=0.275711 2025-04-19 00:07:55,628 : INFO : PROGRESS: pass 4, at document #12000/16310 2025-04-19 00:07:55,781 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:55,782 : INFO : topic #0 (0.500): 0.013*"公司" + 0.010*"工作" + 0.006*"面試" + 0.005*"問題" + 0.005*"時間" + 0.004*"工程師" + 0.004*"開發" + 0.004*"台灣" + 0.004*"技術" + 0.004*"目前" 2025-04-19 00:07:55,783 : INFO : topic #1 (0.500): 0.034*"工作" + 0.015*"方式" + 0.013*"推定" + 0.012*"聯絡" + 0.011*"內容" + 0.010*"砍除" + 0.010*"單位" + 0.010*"應徵" + 0.010*"空白" + 0.010*"情形" 2025-04-19 00:07:55,783 : INFO : topic diff=0.263821, rho=0.275711 2025-04-19 00:07:55,784 : INFO : PROGRESS: pass 4, at document #14000/16310 2025-04-19 00:07:55,935 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:55,936 : INFO : topic #0 (0.500): 0.012*"公司" + 0.008*"工作" + 0.005*"台灣" + 0.005*"面試" + 0.004*"問題" + 0.004*"工程師" + 0.004*"時間" + 0.004*"技術" + 0.003*"目前" + 0.003*"開發" 2025-04-19 00:07:55,937 : INFO : topic #1 (0.500): 0.034*"工作" + 0.015*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"單位" + 0.010*"應徵" + 0.010*"砍除" + 0.010*"空白" + 0.010*"情形" 2025-04-19 00:07:55,937 : INFO : topic diff=0.266055, rho=0.275711 2025-04-19 00:07:55,937 : INFO : PROGRESS: pass 4, at document #16000/16310 2025-04-19 00:07:56,083 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:07:56,085 : INFO : topic #0 (0.500): 0.012*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.004*"美國" + 0.004*"技術" + 0.004*"問題" + 0.004*"面試" + 0.004*"工程師" + 0.004*"晶片" + 0.004*"員工" 2025-04-19 00:07:56,085 : INFO : topic #1 (0.500): 0.034*"工作" + 0.015*"方式" + 0.012*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"單位" + 0.010*"應徵" + 0.010*"砍除" + 0.010*"情形" + 0.010*"空白" 2025-04-19 00:07:56,085 : INFO : topic diff=0.229457, rho=0.275711 2025-04-19 00:07:56,143 : INFO : -8.447 per-word bound, 349.0 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:07:56,144 : INFO : PROGRESS: pass 4, at document #16310/16310 2025-04-19 00:07:56,169 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:07:56,170 : INFO : topic #0 (0.500): 0.012*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.006*"美國" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"問題" + 0.004*"科技" + 0.003*"表示" 2025-04-19 00:07:56,170 : INFO : topic #1 (0.500): 0.034*"工作" + 0.014*"方式" + 0.012*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"單位" + 0.010*"應徵" + 0.010*"情形" + 0.010*"砍除" + 0.010*"空白" 2025-04-19 00:07:56,171 : INFO : topic diff=0.280494, rho=0.275711 2025-04-19 00:07:56,171 : INFO : LdaModel lifecycle event {'msg': 'trained LdaModel<num_terms=23261, num_topics=2, decay=0.5, chunksize=2000> in 9.46s', 'datetime': '2025-04-19T00:07:56.171364', 'gensim': '4.3.3', 'python': '3.11.2 (main, Apr 21 2023, 22:51:21) [Clang 14.0.3 (clang-1403.0.22.14.1)]', 'platform': 'macOS-15.3.2-arm64-arm-64bit', 'event': 'created'} 2025-04-19 00:08:00,142 : INFO : -7.158 per-word bound, 142.9 perplexity estimate based on a held-out corpus of 16310 documents with 3460358 words 2025-04-19 00:08:00,144 : INFO : using ParallelWordOccurrenceAccumulator<processes=7, batch_size=64> to estimate probabilities from sliding windows 2025-04-19 00:08:03,941 : INFO : 1 batches submitted to accumulate stats from 64 documents (22660 virtual) 2025-04-19 00:08:03,944 : INFO : 2 batches submitted to accumulate stats from 128 documents (45646 virtual) 2025-04-19 00:08:03,947 : INFO : 3 batches submitted to accumulate stats from 192 documents (67171 virtual) 2025-04-19 00:08:03,950 : INFO : 4 batches submitted to accumulate stats from 256 documents (88330 virtual) 2025-04-19 00:08:03,954 : INFO : 5 batches submitted to accumulate stats from 320 documents (109687 virtual) 2025-04-19 00:08:03,958 : INFO : 6 batches submitted to accumulate stats from 384 documents (131042 virtual) 2025-04-19 00:08:03,962 : INFO : 7 batches submitted to accumulate stats from 448 documents (153774 virtual) 2025-04-19 00:08:03,968 : INFO : 8 batches submitted to accumulate stats from 512 documents (176164 virtual) 2025-04-19 00:08:03,974 : INFO : 9 batches submitted to accumulate stats from 576 documents (197020 virtual) 2025-04-19 00:08:03,985 : INFO : 10 batches submitted to accumulate stats from 640 documents (218505 virtual) 2025-04-19 00:08:03,990 : INFO : 11 batches submitted to accumulate stats from 704 documents (240803 virtual) 2025-04-19 00:08:03,997 : INFO : 12 batches submitted to accumulate stats from 768 documents (265360 virtual) 2025-04-19 00:08:04,003 : INFO : 13 batches submitted to accumulate stats from 832 documents (286615 virtual) 2025-04-19 00:08:04,009 : INFO : 14 batches submitted to accumulate stats from 896 documents (310833 virtual) 2025-04-19 00:08:04,076 : INFO : 15 batches submitted to accumulate stats from 960 documents (331313 virtual) 2025-04-19 00:08:04,080 : INFO : 16 batches submitted to accumulate stats from 1024 documents (350940 virtual) 2025-04-19 00:08:04,084 : INFO : 17 batches submitted to accumulate stats from 1088 documents (368371 virtual) 2025-04-19 00:08:04,088 : INFO : 18 batches submitted to accumulate stats from 1152 documents (390334 virtual) 2025-04-19 00:08:04,092 : INFO : 19 batches submitted to accumulate stats from 1216 documents (414153 virtual) 2025-04-19 00:08:04,100 : INFO : 20 batches submitted to accumulate stats from 1280 documents (435684 virtual) 2025-04-19 00:08:04,160 : INFO : 21 batches submitted to accumulate stats from 1344 documents (459433 virtual) 2025-04-19 00:08:04,203 : INFO : 22 batches submitted to accumulate stats from 1408 documents (483210 virtual) 2025-04-19 00:08:04,208 : INFO : 23 batches submitted to accumulate stats from 1472 documents (507391 virtual) 2025-04-19 00:08:04,218 : INFO : 24 batches submitted to accumulate stats from 1536 documents (527404 virtual) 2025-04-19 00:08:04,228 : INFO : 25 batches submitted to accumulate stats from 1600 documents (550178 virtual) 2025-04-19 00:08:04,243 : INFO : 26 batches submitted to accumulate stats from 1664 documents (575041 virtual) 2025-04-19 00:08:04,249 : INFO : 27 batches submitted to accumulate stats from 1728 documents (598912 virtual) 2025-04-19 00:08:04,305 : INFO : 28 batches submitted to accumulate stats from 1792 documents (622487 virtual) 2025-04-19 00:08:04,313 : INFO : 29 batches submitted to accumulate stats from 1856 documents (648902 virtual) 2025-04-19 00:08:04,318 : INFO : 30 batches submitted to accumulate stats from 1920 documents (671126 virtual) 2025-04-19 00:08:04,335 : INFO : 31 batches submitted to accumulate stats from 1984 documents (693717 virtual) 2025-04-19 00:08:04,360 : INFO : 32 batches submitted to accumulate stats from 2048 documents (714139 virtual) 2025-04-19 00:08:04,364 : INFO : 33 batches submitted to accumulate stats from 2112 documents (736202 virtual) 2025-04-19 00:08:04,401 : INFO : 34 batches submitted to accumulate stats from 2176 documents (758687 virtual) 2025-04-19 00:08:04,442 : INFO : 35 batches submitted to accumulate stats from 2240 documents (779677 virtual) 2025-04-19 00:08:04,460 : INFO : 36 batches submitted to accumulate stats from 2304 documents (800483 virtual) 2025-04-19 00:08:04,464 : INFO : 37 batches submitted to accumulate stats from 2368 documents (821258 virtual) 2025-04-19 00:08:04,470 : INFO : 38 batches submitted to accumulate stats from 2432 documents (844326 virtual) 2025-04-19 00:08:04,500 : INFO : 39 batches submitted to accumulate stats from 2496 documents (868823 virtual) 2025-04-19 00:08:04,518 : INFO : 40 batches submitted to accumulate stats from 2560 documents (888215 virtual) 2025-04-19 00:08:04,530 : INFO : 41 batches submitted to accumulate stats from 2624 documents (910499 virtual) 2025-04-19 00:08:04,577 : INFO : 42 batches submitted to accumulate stats from 2688 documents (931945 virtual) 2025-04-19 00:08:04,588 : INFO : 43 batches submitted to accumulate stats from 2752 documents (954111 virtual) 2025-04-19 00:08:04,604 : INFO : 44 batches submitted to accumulate stats from 2816 documents (975617 virtual) 2025-04-19 00:08:04,615 : INFO : 45 batches submitted to accumulate stats from 2880 documents (995125 virtual) 2025-04-19 00:08:04,634 : INFO : 46 batches submitted to accumulate stats from 2944 documents (1016531 virtual) 2025-04-19 00:08:04,658 : INFO : 47 batches submitted to accumulate stats from 3008 documents (1038247 virtual) 2025-04-19 00:08:04,665 : INFO : 48 batches submitted to accumulate stats from 3072 documents (1063862 virtual) 2025-04-19 00:08:04,703 : INFO : 49 batches submitted to accumulate stats from 3136 documents (1087898 virtual) 2025-04-19 00:08:04,714 : INFO : 50 batches submitted to accumulate stats from 3200 documents (1110531 virtual) 2025-04-19 00:08:04,720 : INFO : 51 batches submitted to accumulate stats from 3264 documents (1133127 virtual) 2025-04-19 00:08:04,734 : INFO : 52 batches submitted to accumulate stats from 3328 documents (1153766 virtual) 2025-04-19 00:08:04,783 : INFO : 53 batches submitted to accumulate stats from 3392 documents (1177684 virtual) 2025-04-19 00:08:04,794 : INFO : 54 batches submitted to accumulate stats from 3456 documents (1200190 virtual) 2025-04-19 00:08:04,816 : INFO : 55 batches submitted to accumulate stats from 3520 documents (1225029 virtual) 2025-04-19 00:08:04,828 : INFO : 56 batches submitted to accumulate stats from 3584 documents (1249662 virtual) 2025-04-19 00:08:04,831 : INFO : 57 batches submitted to accumulate stats from 3648 documents (1274547 virtual) 2025-04-19 00:08:04,847 : INFO : 58 batches submitted to accumulate stats from 3712 documents (1297434 virtual) 2025-04-19 00:08:04,852 : INFO : 59 batches submitted to accumulate stats from 3776 documents (1319261 virtual) 2025-04-19 00:08:04,902 : INFO : 60 batches submitted to accumulate stats from 3840 documents (1341972 virtual) 2025-04-19 00:08:04,975 : INFO : 61 batches submitted to accumulate stats from 3904 documents (1364269 virtual) 2025-04-19 00:08:04,980 : INFO : 62 batches submitted to accumulate stats from 3968 documents (1386796 virtual) 2025-04-19 00:08:04,985 : INFO : 63 batches submitted to accumulate stats from 4032 documents (1410249 virtual) 2025-04-19 00:08:05,005 : INFO : 64 batches submitted to accumulate stats from 4096 documents (1433115 virtual) 2025-04-19 00:08:05,013 : INFO : 65 batches submitted to accumulate stats from 4160 documents (1453873 virtual) 2025-04-19 00:08:05,016 : INFO : 66 batches submitted to accumulate stats from 4224 documents (1475474 virtual) 2025-04-19 00:08:05,056 : INFO : 67 batches submitted to accumulate stats from 4288 documents (1497524 virtual) 2025-04-19 00:08:05,127 : INFO : 68 batches submitted to accumulate stats from 4352 documents (1516835 virtual) 2025-04-19 00:08:05,131 : INFO : 69 batches submitted to accumulate stats from 4416 documents (1536986 virtual) 2025-04-19 00:08:05,136 : INFO : 70 batches submitted to accumulate stats from 4480 documents (1558454 virtual) 2025-04-19 00:08:05,140 : INFO : 71 batches submitted to accumulate stats from 4544 documents (1580610 virtual) 2025-04-19 00:08:05,162 : INFO : 72 batches submitted to accumulate stats from 4608 documents (1603508 virtual) 2025-04-19 00:08:05,168 : INFO : 73 batches submitted to accumulate stats from 4672 documents (1624378 virtual) 2025-04-19 00:08:05,173 : INFO : 74 batches submitted to accumulate stats from 4736 documents (1646402 virtual) 2025-04-19 00:08:05,252 : INFO : 75 batches submitted to accumulate stats from 4800 documents (1668704 virtual) 2025-04-19 00:08:05,257 : INFO : 76 batches submitted to accumulate stats from 4864 documents (1690394 virtual) 2025-04-19 00:08:05,267 : INFO : 77 batches submitted to accumulate stats from 4928 documents (1713028 virtual) 2025-04-19 00:08:05,270 : INFO : 78 batches submitted to accumulate stats from 4992 documents (1735434 virtual) 2025-04-19 00:08:05,289 : INFO : 79 batches submitted to accumulate stats from 5056 documents (1755430 virtual) 2025-04-19 00:08:05,297 : INFO : 80 batches submitted to accumulate stats from 5120 documents (1779164 virtual) 2025-04-19 00:08:05,329 : INFO : 81 batches submitted to accumulate stats from 5184 documents (1799023 virtual) 2025-04-19 00:08:05,365 : INFO : 82 batches submitted to accumulate stats from 5248 documents (1821516 virtual) 2025-04-19 00:08:05,392 : INFO : 83 batches submitted to accumulate stats from 5312 documents (1844224 virtual) 2025-04-19 00:08:05,396 : INFO : 84 batches submitted to accumulate stats from 5376 documents (1864739 virtual) 2025-04-19 00:08:05,400 : INFO : 85 batches submitted to accumulate stats from 5440 documents (1885053 virtual) 2025-04-19 00:08:05,404 : INFO : 86 batches submitted to accumulate stats from 5504 documents (1902170 virtual) 2025-04-19 00:08:05,410 : INFO : 87 batches submitted to accumulate stats from 5568 documents (1924910 virtual) 2025-04-19 00:08:05,471 : INFO : 88 batches submitted to accumulate stats from 5632 documents (1931530 virtual) 2025-04-19 00:08:05,503 : INFO : 89 batches submitted to accumulate stats from 5696 documents (1941414 virtual) 2025-04-19 00:08:05,524 : INFO : 90 batches submitted to accumulate stats from 5760 documents (1950642 virtual) 2025-04-19 00:08:05,528 : INFO : 91 batches submitted to accumulate stats from 5824 documents (1957200 virtual) 2025-04-19 00:08:05,530 : INFO : 92 batches submitted to accumulate stats from 5888 documents (1964937 virtual) 2025-04-19 00:08:05,533 : INFO : 93 batches submitted to accumulate stats from 5952 documents (1974259 virtual) 2025-04-19 00:08:05,535 : INFO : 94 batches submitted to accumulate stats from 6016 documents (1988296 virtual) 2025-04-19 00:08:05,576 : INFO : 95 batches submitted to accumulate stats from 6080 documents (1997659 virtual) 2025-04-19 00:08:05,599 : INFO : 96 batches submitted to accumulate stats from 6144 documents (2009678 virtual) 2025-04-19 00:08:05,636 : INFO : 97 batches submitted to accumulate stats from 6208 documents (2019297 virtual) 2025-04-19 00:08:05,639 : INFO : 98 batches submitted to accumulate stats from 6272 documents (2031857 virtual) 2025-04-19 00:08:05,644 : INFO : 99 batches submitted to accumulate stats from 6336 documents (2044117 virtual) 2025-04-19 00:08:05,647 : INFO : 100 batches submitted to accumulate stats from 6400 documents (2053380 virtual) 2025-04-19 00:08:05,650 : INFO : 101 batches submitted to accumulate stats from 6464 documents (2066889 virtual) 2025-04-19 00:08:05,654 : INFO : 102 batches submitted to accumulate stats from 6528 documents (2075479 virtual) 2025-04-19 00:08:05,667 : INFO : 103 batches submitted to accumulate stats from 6592 documents (2085095 virtual) 2025-04-19 00:08:05,676 : INFO : 104 batches submitted to accumulate stats from 6656 documents (2093845 virtual) 2025-04-19 00:08:05,678 : INFO : 105 batches submitted to accumulate stats from 6720 documents (2102407 virtual) 2025-04-19 00:08:05,682 : INFO : 106 batches submitted to accumulate stats from 6784 documents (2111466 virtual) 2025-04-19 00:08:05,684 : INFO : 107 batches submitted to accumulate stats from 6848 documents (2121845 virtual) 2025-04-19 00:08:05,701 : INFO : 108 batches submitted to accumulate stats from 6912 documents (2129219 virtual) 2025-04-19 00:08:05,710 : INFO : 109 batches submitted to accumulate stats from 6976 documents (2137886 virtual) 2025-04-19 00:08:05,716 : INFO : 110 batches submitted to accumulate stats from 7040 documents (2145150 virtual) 2025-04-19 00:08:05,718 : INFO : 111 batches submitted to accumulate stats from 7104 documents (2155495 virtual) 2025-04-19 00:08:05,729 : INFO : 112 batches submitted to accumulate stats from 7168 documents (2164720 virtual) 2025-04-19 00:08:05,731 : INFO : 113 batches submitted to accumulate stats from 7232 documents (2172193 virtual) 2025-04-19 00:08:05,735 : INFO : 114 batches submitted to accumulate stats from 7296 documents (2183458 virtual) 2025-04-19 00:08:05,752 : INFO : 115 batches submitted to accumulate stats from 7360 documents (2191706 virtual) 2025-04-19 00:08:05,754 : INFO : 116 batches submitted to accumulate stats from 7424 documents (2202020 virtual) 2025-04-19 00:08:05,759 : INFO : 117 batches submitted to accumulate stats from 7488 documents (2211055 virtual) 2025-04-19 00:08:05,766 : INFO : 118 batches submitted to accumulate stats from 7552 documents (2223321 virtual) 2025-04-19 00:08:05,769 : INFO : 119 batches submitted to accumulate stats from 7616 documents (2230121 virtual) 2025-04-19 00:08:05,772 : INFO : 120 batches submitted to accumulate stats from 7680 documents (2243511 virtual) 2025-04-19 00:08:05,778 : INFO : 121 batches submitted to accumulate stats from 7744 documents (2258370 virtual) 2025-04-19 00:08:05,793 : INFO : 122 batches submitted to accumulate stats from 7808 documents (2269267 virtual) 2025-04-19 00:08:05,796 : INFO : 123 batches submitted to accumulate stats from 7872 documents (2280490 virtual) 2025-04-19 00:08:05,800 : INFO : 124 batches submitted to accumulate stats from 7936 documents (2289945 virtual) 2025-04-19 00:08:05,810 : INFO : 125 batches submitted to accumulate stats from 8000 documents (2298931 virtual) 2025-04-19 00:08:05,813 : INFO : 126 batches submitted to accumulate stats from 8064 documents (2309719 virtual) 2025-04-19 00:08:05,814 : INFO : 127 batches submitted to accumulate stats from 8128 documents (2320328 virtual) 2025-04-19 00:08:05,852 : INFO : 128 batches submitted to accumulate stats from 8192 documents (2331614 virtual) 2025-04-19 00:08:05,866 : INFO : 129 batches submitted to accumulate stats from 8256 documents (2342568 virtual) 2025-04-19 00:08:05,868 : INFO : 130 batches submitted to accumulate stats from 8320 documents (2351306 virtual) 2025-04-19 00:08:05,870 : INFO : 131 batches submitted to accumulate stats from 8384 documents (2359488 virtual) 2025-04-19 00:08:05,872 : INFO : 132 batches submitted to accumulate stats from 8448 documents (2368497 virtual) 2025-04-19 00:08:05,892 : INFO : 133 batches submitted to accumulate stats from 8512 documents (2378449 virtual) 2025-04-19 00:08:05,901 : INFO : 134 batches submitted to accumulate stats from 8576 documents (2388057 virtual) 2025-04-19 00:08:05,903 : INFO : 135 batches submitted to accumulate stats from 8640 documents (2395926 virtual) 2025-04-19 00:08:05,905 : INFO : 136 batches submitted to accumulate stats from 8704 documents (2403405 virtual) 2025-04-19 00:08:05,919 : INFO : 137 batches submitted to accumulate stats from 8768 documents (2411628 virtual) 2025-04-19 00:08:05,926 : INFO : 138 batches submitted to accumulate stats from 8832 documents (2419219 virtual) 2025-04-19 00:08:05,928 : INFO : 139 batches submitted to accumulate stats from 8896 documents (2428220 virtual) 2025-04-19 00:08:05,929 : INFO : 140 batches submitted to accumulate stats from 8960 documents (2436470 virtual) 2025-04-19 00:08:05,948 : INFO : 141 batches submitted to accumulate stats from 9024 documents (2446006 virtual) 2025-04-19 00:08:05,950 : INFO : 142 batches submitted to accumulate stats from 9088 documents (2453039 virtual) 2025-04-19 00:08:05,961 : INFO : 143 batches submitted to accumulate stats from 9152 documents (2460905 virtual) 2025-04-19 00:08:05,962 : INFO : 144 batches submitted to accumulate stats from 9216 documents (2468645 virtual) 2025-04-19 00:08:05,966 : INFO : 145 batches submitted to accumulate stats from 9280 documents (2476321 virtual) 2025-04-19 00:08:05,967 : INFO : 146 batches submitted to accumulate stats from 9344 documents (2481981 virtual) 2025-04-19 00:08:05,976 : INFO : 147 batches submitted to accumulate stats from 9408 documents (2489833 virtual) 2025-04-19 00:08:05,982 : INFO : 148 batches submitted to accumulate stats from 9472 documents (2496627 virtual) 2025-04-19 00:08:05,987 : INFO : 149 batches submitted to accumulate stats from 9536 documents (2502106 virtual) 2025-04-19 00:08:05,989 : INFO : 150 batches submitted to accumulate stats from 9600 documents (2508434 virtual) 2025-04-19 00:08:06,002 : INFO : 151 batches submitted to accumulate stats from 9664 documents (2517654 virtual) 2025-04-19 00:08:06,004 : INFO : 152 batches submitted to accumulate stats from 9728 documents (2525651 virtual) 2025-04-19 00:08:06,006 : INFO : 153 batches submitted to accumulate stats from 9792 documents (2534661 virtual) 2025-04-19 00:08:06,016 : INFO : 154 batches submitted to accumulate stats from 9856 documents (2542846 virtual) 2025-04-19 00:08:06,020 : INFO : 155 batches submitted to accumulate stats from 9920 documents (2549206 virtual) 2025-04-19 00:08:06,023 : INFO : 156 batches submitted to accumulate stats from 9984 documents (2556742 virtual) 2025-04-19 00:08:06,028 : INFO : 157 batches submitted to accumulate stats from 10048 documents (2565026 virtual) 2025-04-19 00:08:06,032 : INFO : 158 batches submitted to accumulate stats from 10112 documents (2571434 virtual) 2025-04-19 00:08:06,037 : INFO : 159 batches submitted to accumulate stats from 10176 documents (2581280 virtual) 2025-04-19 00:08:06,047 : INFO : 160 batches submitted to accumulate stats from 10240 documents (2589671 virtual) 2025-04-19 00:08:06,049 : INFO : 161 batches submitted to accumulate stats from 10304 documents (2596979 virtual) 2025-04-19 00:08:06,051 : INFO : 162 batches submitted to accumulate stats from 10368 documents (2604556 virtual) 2025-04-19 00:08:06,053 : INFO : 163 batches submitted to accumulate stats from 10432 documents (2613656 virtual) 2025-04-19 00:08:06,060 : INFO : 164 batches submitted to accumulate stats from 10496 documents (2623890 virtual) 2025-04-19 00:08:06,073 : INFO : 165 batches submitted to accumulate stats from 10560 documents (2629308 virtual) 2025-04-19 00:08:06,075 : INFO : 166 batches submitted to accumulate stats from 10624 documents (2636085 virtual) 2025-04-19 00:08:06,079 : INFO : 167 batches submitted to accumulate stats from 10688 documents (2642039 virtual) 2025-04-19 00:08:06,085 : INFO : 168 batches submitted to accumulate stats from 10752 documents (2648389 virtual) 2025-04-19 00:08:06,087 : INFO : 169 batches submitted to accumulate stats from 10816 documents (2661959 virtual) 2025-04-19 00:08:06,090 : INFO : 170 batches submitted to accumulate stats from 10880 documents (2672949 virtual) 2025-04-19 00:08:06,096 : INFO : 171 batches submitted to accumulate stats from 10944 documents (2683365 virtual) 2025-04-19 00:08:06,103 : INFO : 172 batches submitted to accumulate stats from 11008 documents (2690484 virtual) 2025-04-19 00:08:06,105 : INFO : 173 batches submitted to accumulate stats from 11072 documents (2700627 virtual) 2025-04-19 00:08:06,118 : INFO : 174 batches submitted to accumulate stats from 11136 documents (2708742 virtual) 2025-04-19 00:08:06,120 : INFO : 175 batches submitted to accumulate stats from 11200 documents (2718156 virtual) 2025-04-19 00:08:06,128 : INFO : 176 batches submitted to accumulate stats from 11264 documents (2727801 virtual) 2025-04-19 00:08:06,130 : INFO : 177 batches submitted to accumulate stats from 11328 documents (2736288 virtual) 2025-04-19 00:08:06,135 : INFO : 178 batches submitted to accumulate stats from 11392 documents (2743845 virtual) 2025-04-19 00:08:06,137 : INFO : 179 batches submitted to accumulate stats from 11456 documents (2750885 virtual) 2025-04-19 00:08:06,143 : INFO : 180 batches submitted to accumulate stats from 11520 documents (2759213 virtual) 2025-04-19 00:08:06,146 : INFO : 181 batches submitted to accumulate stats from 11584 documents (2770309 virtual) 2025-04-19 00:08:06,149 : INFO : 182 batches submitted to accumulate stats from 11648 documents (2781566 virtual) 2025-04-19 00:08:06,163 : INFO : 183 batches submitted to accumulate stats from 11712 documents (2793513 virtual) 2025-04-19 00:08:06,176 : INFO : 184 batches submitted to accumulate stats from 11776 documents (2805133 virtual) 2025-04-19 00:08:06,179 : INFO : 185 batches submitted to accumulate stats from 11840 documents (2814621 virtual) 2025-04-19 00:08:06,187 : INFO : 186 batches submitted to accumulate stats from 11904 documents (2825917 virtual) 2025-04-19 00:08:06,191 : INFO : 187 batches submitted to accumulate stats from 11968 documents (2834764 virtual) 2025-04-19 00:08:06,196 : INFO : 188 batches submitted to accumulate stats from 12032 documents (2844523 virtual) 2025-04-19 00:08:06,198 : INFO : 189 batches submitted to accumulate stats from 12096 documents (2854512 virtual) 2025-04-19 00:08:06,200 : INFO : 190 batches submitted to accumulate stats from 12160 documents (2863511 virtual) 2025-04-19 00:08:06,211 : INFO : 191 batches submitted to accumulate stats from 12224 documents (2872492 virtual) 2025-04-19 00:08:06,213 : INFO : 192 batches submitted to accumulate stats from 12288 documents (2881543 virtual) 2025-04-19 00:08:06,250 : INFO : 193 batches submitted to accumulate stats from 12352 documents (2891233 virtual) 2025-04-19 00:08:06,258 : INFO : 194 batches submitted to accumulate stats from 12416 documents (2899835 virtual) 2025-04-19 00:08:06,262 : INFO : 195 batches submitted to accumulate stats from 12480 documents (2908542 virtual) 2025-04-19 00:08:06,266 : INFO : 196 batches submitted to accumulate stats from 12544 documents (2920162 virtual) 2025-04-19 00:08:06,282 : INFO : 197 batches submitted to accumulate stats from 12608 documents (2931072 virtual) 2025-04-19 00:08:06,287 : INFO : 198 batches submitted to accumulate stats from 12672 documents (2942168 virtual) 2025-04-19 00:08:06,291 : INFO : 199 batches submitted to accumulate stats from 12736 documents (2951378 virtual) 2025-04-19 00:08:06,294 : INFO : 200 batches submitted to accumulate stats from 12800 documents (2964980 virtual) 2025-04-19 00:08:06,302 : INFO : 201 batches submitted to accumulate stats from 12864 documents (2974742 virtual) 2025-04-19 00:08:06,304 : INFO : 202 batches submitted to accumulate stats from 12928 documents (2984778 virtual) 2025-04-19 00:08:06,312 : INFO : 203 batches submitted to accumulate stats from 12992 documents (2994073 virtual) 2025-04-19 00:08:06,319 : INFO : 204 batches submitted to accumulate stats from 13056 documents (3002522 virtual) 2025-04-19 00:08:06,329 : INFO : 205 batches submitted to accumulate stats from 13120 documents (3012040 virtual) 2025-04-19 00:08:06,335 : INFO : 206 batches submitted to accumulate stats from 13184 documents (3019919 virtual) 2025-04-19 00:08:06,340 : INFO : 207 batches submitted to accumulate stats from 13248 documents (3029004 virtual) 2025-04-19 00:08:06,342 : INFO : 208 batches submitted to accumulate stats from 13312 documents (3037489 virtual) 2025-04-19 00:08:06,345 : INFO : 209 batches submitted to accumulate stats from 13376 documents (3044929 virtual) 2025-04-19 00:08:06,363 : INFO : 210 batches submitted to accumulate stats from 13440 documents (3054034 virtual) 2025-04-19 00:08:06,368 : INFO : 211 batches submitted to accumulate stats from 13504 documents (3064099 virtual) 2025-04-19 00:08:06,369 : INFO : 212 batches submitted to accumulate stats from 13568 documents (3074522 virtual) 2025-04-19 00:08:06,379 : INFO : 213 batches submitted to accumulate stats from 13632 documents (3083808 virtual) 2025-04-19 00:08:06,391 : INFO : 214 batches submitted to accumulate stats from 13696 documents (3093078 virtual) 2025-04-19 00:08:06,397 : INFO : 215 batches submitted to accumulate stats from 13760 documents (3102171 virtual) 2025-04-19 00:08:06,399 : INFO : 216 batches submitted to accumulate stats from 13824 documents (3111128 virtual) 2025-04-19 00:08:06,401 : INFO : 217 batches submitted to accumulate stats from 13888 documents (3120517 virtual) 2025-04-19 00:08:06,409 : INFO : 218 batches submitted to accumulate stats from 13952 documents (3130614 virtual) 2025-04-19 00:08:06,412 : INFO : 219 batches submitted to accumulate stats from 14016 documents (3139268 virtual) 2025-04-19 00:08:06,415 : INFO : 220 batches submitted to accumulate stats from 14080 documents (3148635 virtual) 2025-04-19 00:08:06,428 : INFO : 221 batches submitted to accumulate stats from 14144 documents (3157335 virtual) 2025-04-19 00:08:06,433 : INFO : 222 batches submitted to accumulate stats from 14208 documents (3165838 virtual) 2025-04-19 00:08:06,437 : INFO : 223 batches submitted to accumulate stats from 14272 documents (3175765 virtual) 2025-04-19 00:08:06,443 : INFO : 224 batches submitted to accumulate stats from 14336 documents (3183123 virtual) 2025-04-19 00:08:06,445 : INFO : 225 batches submitted to accumulate stats from 14400 documents (3189537 virtual) 2025-04-19 00:08:06,453 : INFO : 226 batches submitted to accumulate stats from 14464 documents (3197239 virtual) 2025-04-19 00:08:06,462 : INFO : 227 batches submitted to accumulate stats from 14528 documents (3205518 virtual) 2025-04-19 00:08:06,465 : INFO : 228 batches submitted to accumulate stats from 14592 documents (3215608 virtual) 2025-04-19 00:08:06,468 : INFO : 229 batches submitted to accumulate stats from 14656 documents (3223376 virtual) 2025-04-19 00:08:06,479 : INFO : 230 batches submitted to accumulate stats from 14720 documents (3232304 virtual) 2025-04-19 00:08:06,485 : INFO : 231 batches submitted to accumulate stats from 14784 documents (3240270 virtual) 2025-04-19 00:08:06,494 : INFO : 232 batches submitted to accumulate stats from 14848 documents (3249755 virtual) 2025-04-19 00:08:06,496 : INFO : 233 batches submitted to accumulate stats from 14912 documents (3259377 virtual) 2025-04-19 00:08:06,500 : INFO : 234 batches submitted to accumulate stats from 14976 documents (3269637 virtual) 2025-04-19 00:08:06,502 : INFO : 235 batches submitted to accumulate stats from 15040 documents (3278311 virtual) 2025-04-19 00:08:06,509 : INFO : 236 batches submitted to accumulate stats from 15104 documents (3286321 virtual) 2025-04-19 00:08:06,512 : INFO : 237 batches submitted to accumulate stats from 15168 documents (3293385 virtual) 2025-04-19 00:08:06,528 : INFO : 238 batches submitted to accumulate stats from 15232 documents (3300334 virtual) 2025-04-19 00:08:06,530 : INFO : 239 batches submitted to accumulate stats from 15296 documents (3308226 virtual) 2025-04-19 00:08:06,533 : INFO : 240 batches submitted to accumulate stats from 15360 documents (3317325 virtual) 2025-04-19 00:08:06,542 : INFO : 241 batches submitted to accumulate stats from 15424 documents (3325778 virtual) 2025-04-19 00:08:06,548 : INFO : 242 batches submitted to accumulate stats from 15488 documents (3335373 virtual) 2025-04-19 00:08:06,550 : INFO : 243 batches submitted to accumulate stats from 15552 documents (3342716 virtual) 2025-04-19 00:08:06,551 : INFO : 244 batches submitted to accumulate stats from 15616 documents (3350508 virtual) 2025-04-19 00:08:06,570 : INFO : 245 batches submitted to accumulate stats from 15680 documents (3360131 virtual) 2025-04-19 00:08:06,572 : INFO : 246 batches submitted to accumulate stats from 15744 documents (3370635 virtual) 2025-04-19 00:08:06,579 : INFO : 247 batches submitted to accumulate stats from 15808 documents (3380994 virtual) 2025-04-19 00:08:06,583 : INFO : 248 batches submitted to accumulate stats from 15872 documents (3389920 virtual) 2025-04-19 00:08:06,585 : INFO : 249 batches submitted to accumulate stats from 15936 documents (3397487 virtual) 2025-04-19 00:08:06,589 : INFO : 250 batches submitted to accumulate stats from 16000 documents (3406129 virtual) 2025-04-19 00:08:06,595 : INFO : 251 batches submitted to accumulate stats from 16064 documents (3416805 virtual) 2025-04-19 00:08:06,609 : INFO : 252 batches submitted to accumulate stats from 16128 documents (3426189 virtual) 2025-04-19 00:08:06,613 : INFO : 253 batches submitted to accumulate stats from 16192 documents (3433824 virtual) 2025-04-19 00:08:06,616 : INFO : 254 batches submitted to accumulate stats from 16256 documents (3443379 virtual) 2025-04-19 00:08:06,620 : INFO : 255 batches submitted to accumulate stats from 16320 documents (3450914 virtual) 2025-04-19 00:08:06,758 : INFO : 7 accumulators retrieved from output queue 2025-04-19 00:08:06,766 : INFO : accumulated word occurrence stats for 3451622 virtual documents 2025-04-19 00:08:06,798 : INFO : using symmetric alpha at 0.3333333333333333 2025-04-19 00:08:06,799 : INFO : using symmetric eta at 0.3333333333333333 2025-04-19 00:08:06,800 : INFO : using serial LDA version on this node 2025-04-19 00:08:06,803 : INFO : running online (multi-pass) LDA training, 3 topics, 5 passes over the supplied corpus of 16310 documents, updating model once every 2000 documents, evaluating perplexity every 16310 documents, iterating 50x with a convergence threshold of 0.001000 2025-04-19 00:08:06,804 : INFO : PROGRESS: pass 0, at document #2000/16310 2025-04-19 00:08:07,575 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:07,576 : INFO : topic #0 (0.333): 0.027*"工作" + 0.014*"方式" + 0.012*"推定" + 0.012*"應徵" + 0.012*"空白" + 0.011*"單位" + 0.010*"砍除" + 0.010*"內容" + 0.009*"聯絡" + 0.009*"資訊" 2025-04-19 00:08:07,577 : INFO : topic #1 (0.333): 0.029*"工作" + 0.015*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.010*"單位" + 0.010*"國定假日" + 0.010*"第一項" + 0.010*"砍除" + 0.010*"空白" + 0.010*"情形" 2025-04-19 00:08:07,577 : INFO : topic #2 (0.333): 0.038*"工作" + 0.012*"推定" + 0.012*"內容" + 0.011*"工資" + 0.011*"應徵" + 0.011*"方式" + 0.010*"情形" + 0.010*"聯絡" + 0.010*"砍除" + 0.009*"小時" 2025-04-19 00:08:07,578 : INFO : topic diff=5.168176, rho=1.000000 2025-04-19 00:08:07,579 : INFO : PROGRESS: pass 0, at document #4000/16310 2025-04-19 00:08:08,311 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:08,312 : INFO : topic #0 (0.333): 0.022*"工作" + 0.013*"方式" + 0.010*"內容" + 0.009*"應徵" + 0.009*"聯絡" + 0.008*"推定" + 0.008*"空白" + 0.008*"報名" + 0.008*"地點" + 0.008*"資訊" 2025-04-19 00:08:08,313 : INFO : topic #1 (0.333): 0.029*"工作" + 0.014*"方式" + 0.013*"推定" + 0.012*"空白" + 0.012*"砍除" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"國定假日" + 0.010*"單位" 2025-04-19 00:08:08,313 : INFO : topic #2 (0.333): 0.041*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"工資" + 0.012*"內容" + 0.011*"應徵" + 0.011*"小時" + 0.010*"單位" + 0.010*"情形" + 0.010*"聯絡" 2025-04-19 00:08:08,314 : INFO : topic diff=0.541377, rho=0.707107 2025-04-19 00:08:08,315 : INFO : PROGRESS: pass 0, at document #6000/16310 2025-04-19 00:08:08,936 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:08,938 : INFO : topic #0 (0.333): 0.015*"工作" + 0.011*"報名" + 0.010*"方式" + 0.009*"活動" + 0.009*"電話" + 0.008*"時間" + 0.008*"內容" + 0.008*"台北市" + 0.008*"聯絡" + 0.007*"公司" 2025-04-19 00:08:08,938 : INFO : topic #1 (0.333): 0.030*"工作" + 0.013*"方式" + 0.013*"推定" + 0.012*"空白" + 0.012*"砍除" + 0.012*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"國定假日" + 0.011*"資訊" 2025-04-19 00:08:08,939 : INFO : topic #2 (0.333): 0.042*"工作" + 0.014*"方式" + 0.014*"推定" + 0.012*"工資" + 0.012*"內容" + 0.012*"小時" + 0.011*"應徵" + 0.010*"單位" + 0.010*"依法" + 0.009*"聯絡" 2025-04-19 00:08:08,939 : INFO : topic diff=0.780898, rho=0.577350 2025-04-19 00:08:08,940 : INFO : PROGRESS: pass 0, at document #8000/16310 2025-04-19 00:08:09,204 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:09,206 : INFO : topic #0 (0.333): 0.016*"工作" + 0.014*"公司" + 0.008*"面試" + 0.007*"時間" + 0.006*"問題" + 0.005*"工程師" + 0.005*"方式" + 0.005*"經驗" + 0.005*"開發" + 0.005*"內容" 2025-04-19 00:08:09,206 : INFO : topic #1 (0.333): 0.030*"工作" + 0.013*"方式" + 0.013*"推定" + 0.012*"空白" + 0.012*"砍除" + 0.012*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"國定假日" + 0.011*"資訊" 2025-04-19 00:08:09,207 : INFO : topic #2 (0.333): 0.042*"工作" + 0.014*"方式" + 0.013*"推定" + 0.012*"工資" + 0.012*"內容" + 0.011*"小時" + 0.010*"應徵" + 0.010*"單位" + 0.010*"依法" + 0.009*"聯絡" 2025-04-19 00:08:09,207 : INFO : topic diff=0.832769, rho=0.500000 2025-04-19 00:08:09,208 : INFO : PROGRESS: pass 0, at document #10000/16310 2025-04-19 00:08:09,381 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:09,383 : INFO : topic #0 (0.333): 0.015*"工作" + 0.015*"公司" + 0.008*"面試" + 0.007*"時間" + 0.006*"問題" + 0.005*"開發" + 0.005*"工程師" + 0.005*"經驗" + 0.005*"目前" + 0.004*"比較" 2025-04-19 00:08:09,384 : INFO : topic #1 (0.333): 0.030*"工作" + 0.013*"方式" + 0.013*"推定" + 0.012*"空白" + 0.012*"砍除" + 0.012*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"國定假日" + 0.011*"資訊" 2025-04-19 00:08:09,384 : INFO : topic #2 (0.333): 0.041*"工作" + 0.014*"方式" + 0.013*"推定" + 0.012*"工資" + 0.012*"內容" + 0.011*"小時" + 0.010*"應徵" + 0.010*"單位" + 0.010*"依法" + 0.009*"聯絡" 2025-04-19 00:08:09,385 : INFO : topic diff=0.475472, rho=0.447214 2025-04-19 00:08:09,385 : INFO : PROGRESS: pass 0, at document #12000/16310 2025-04-19 00:08:09,570 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:09,572 : INFO : topic #0 (0.333): 0.013*"公司" + 0.012*"工作" + 0.006*"面試" + 0.005*"問題" + 0.005*"時間" + 0.005*"工程師" + 0.004*"開發" + 0.004*"經驗" + 0.004*"目前" + 0.004*"技術" 2025-04-19 00:08:09,573 : INFO : topic #1 (0.333): 0.030*"工作" + 0.013*"方式" + 0.013*"推定" + 0.012*"空白" + 0.012*"砍除" + 0.012*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"國定假日" + 0.010*"資訊" 2025-04-19 00:08:09,573 : INFO : topic #2 (0.333): 0.041*"工作" + 0.014*"方式" + 0.013*"推定" + 0.012*"工資" + 0.012*"內容" + 0.011*"小時" + 0.010*"應徵" + 0.010*"單位" + 0.010*"依法" + 0.009*"聯絡" 2025-04-19 00:08:09,574 : INFO : topic diff=0.520220, rho=0.408248 2025-04-19 00:08:09,575 : INFO : PROGRESS: pass 0, at document #14000/16310 2025-04-19 00:08:09,749 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:09,751 : INFO : topic #0 (0.333): 0.012*"公司" + 0.009*"工作" + 0.005*"台灣" + 0.004*"面試" + 0.004*"問題" + 0.004*"時間" + 0.004*"工程師" + 0.004*"技術" + 0.003*"目前" + 0.003*"員工" 2025-04-19 00:08:09,752 : INFO : topic #1 (0.333): 0.030*"工作" + 0.013*"方式" + 0.013*"推定" + 0.012*"空白" + 0.012*"砍除" + 0.011*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.010*"國定假日" + 0.010*"資訊" 2025-04-19 00:08:09,753 : INFO : topic #2 (0.333): 0.040*"工作" + 0.014*"方式" + 0.013*"推定" + 0.011*"工資" + 0.011*"內容" + 0.011*"小時" + 0.010*"單位" + 0.010*"應徵" + 0.009*"依法" + 0.009*"聯絡" 2025-04-19 00:08:09,753 : INFO : topic diff=0.420072, rho=0.377964 2025-04-19 00:08:09,754 : INFO : PROGRESS: pass 0, at document #16000/16310 2025-04-19 00:08:09,930 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:09,932 : INFO : topic #0 (0.333): 0.011*"公司" + 0.008*"工作" + 0.006*"台灣" + 0.005*"美國" + 0.004*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"問題" + 0.004*"工程師" + 0.004*"表示" 2025-04-19 00:08:09,932 : INFO : topic #1 (0.333): 0.029*"工作" + 0.013*"方式" + 0.012*"推定" + 0.012*"空白" + 0.012*"砍除" + 0.011*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.010*"國定假日" + 0.010*"資訊" 2025-04-19 00:08:09,933 : INFO : topic #2 (0.333): 0.039*"工作" + 0.013*"方式" + 0.012*"推定" + 0.011*"工資" + 0.011*"小時" + 0.011*"內容" + 0.010*"應徵" + 0.010*"單位" + 0.009*"依法" + 0.009*"聯絡" 2025-04-19 00:08:09,933 : INFO : topic diff=0.319432, rho=0.353553 2025-04-19 00:08:09,998 : INFO : -8.517 per-word bound, 366.3 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:08:09,998 : INFO : PROGRESS: pass 0, at document #16310/16310 2025-04-19 00:08:10,054 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:08:10,056 : INFO : topic #0 (0.333): 0.012*"公司" + 0.007*"工作" + 0.006*"美國" + 0.006*"台灣" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.004*"台積電" 2025-04-19 00:08:10,057 : INFO : topic #1 (0.333): 0.029*"工作" + 0.012*"方式" + 0.012*"推定" + 0.011*"空白" + 0.011*"砍除" + 0.011*"第一項" + 0.011*"情形" + 0.010*"聯絡" + 0.010*"資訊" + 0.010*"國定假日" 2025-04-19 00:08:10,057 : INFO : topic #2 (0.333): 0.039*"工作" + 0.013*"小時" + 0.013*"方式" + 0.011*"推定" + 0.011*"工資" + 0.011*"內容" + 0.010*"單位" + 0.009*"應徵" + 0.008*"依法" + 0.008*"聯絡" 2025-04-19 00:08:10,058 : INFO : topic diff=0.324231, rho=0.333333 2025-04-19 00:08:10,058 : INFO : PROGRESS: pass 1, at document #2000/16310 2025-04-19 00:08:10,711 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:10,713 : INFO : topic #0 (0.333): 0.011*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.006*"美國" + 0.004*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.003*"台積電" 2025-04-19 00:08:10,713 : INFO : topic #1 (0.333): 0.031*"工作" + 0.013*"推定" + 0.012*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.011*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:10,714 : INFO : topic #2 (0.333): 0.039*"工作" + 0.020*"方式" + 0.013*"小時" + 0.012*"時間" + 0.012*"工資" + 0.012*"內容" + 0.012*"推定" + 0.010*"依法" + 0.010*"單位" + 0.010*"聯絡" 2025-04-19 00:08:10,714 : INFO : topic diff=1.147336, rho=0.313805 2025-04-19 00:08:10,714 : INFO : PROGRESS: pass 1, at document #4000/16310 2025-04-19 00:08:11,297 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:11,299 : INFO : topic #0 (0.333): 0.011*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.006*"美國" + 0.004*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.003*"科技" + 0.003*"表示" + 0.003*"問題" 2025-04-19 00:08:11,299 : INFO : topic #1 (0.333): 0.032*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.012*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:11,299 : INFO : topic #2 (0.333): 0.037*"工作" + 0.020*"方式" + 0.013*"小時" + 0.013*"時間" + 0.012*"內容" + 0.012*"推定" + 0.012*"工資" + 0.010*"聯絡" + 0.010*"單位" + 0.010*"依法" 2025-04-19 00:08:11,300 : INFO : topic diff=0.404685, rho=0.313805 2025-04-19 00:08:11,300 : INFO : PROGRESS: pass 1, at document #6000/16310 2025-04-19 00:08:11,758 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:11,760 : INFO : topic #0 (0.333): 0.012*"公司" + 0.007*"工作" + 0.005*"台灣" + 0.005*"美國" + 0.004*"技術" + 0.004*"問題" + 0.003*"時間" + 0.003*"目前" + 0.003*"員工" + 0.003*"工程師" 2025-04-19 00:08:11,760 : INFO : topic #1 (0.333): 0.032*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.012*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:11,761 : INFO : topic #2 (0.333): 0.034*"工作" + 0.020*"方式" + 0.014*"時間" + 0.013*"小時" + 0.012*"內容" + 0.010*"聯絡" + 0.010*"推定" + 0.010*"工資" + 0.010*"電話" + 0.010*"依法" 2025-04-19 00:08:11,761 : INFO : topic diff=0.287964, rho=0.313805 2025-04-19 00:08:11,762 : INFO : PROGRESS: pass 1, at document #8000/16310 2025-04-19 00:08:12,023 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:12,025 : INFO : topic #0 (0.333): 0.015*"公司" + 0.010*"工作" + 0.006*"面試" + 0.005*"問題" + 0.005*"工程師" + 0.004*"時間" + 0.004*"技術" + 0.004*"開發" + 0.004*"經驗" + 0.004*"台灣" 2025-04-19 00:08:12,025 : INFO : topic #1 (0.333): 0.032*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.012*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:12,026 : INFO : topic #2 (0.333): 0.037*"工作" + 0.021*"方式" + 0.015*"時間" + 0.014*"小時" + 0.012*"內容" + 0.011*"聯絡" + 0.010*"每日" + 0.010*"工資" + 0.010*"推定" + 0.010*"電話" 2025-04-19 00:08:12,026 : INFO : topic diff=0.356930, rho=0.313805 2025-04-19 00:08:12,027 : INFO : PROGRESS: pass 1, at document #10000/16310 2025-04-19 00:08:12,248 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:12,250 : INFO : topic #0 (0.333): 0.015*"公司" + 0.011*"工作" + 0.007*"面試" + 0.006*"問題" + 0.005*"工程師" + 0.005*"開發" + 0.005*"時間" + 0.005*"經驗" + 0.005*"目前" + 0.004*"技術" 2025-04-19 00:08:12,251 : INFO : topic #1 (0.333): 0.032*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.011*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:12,251 : INFO : topic #2 (0.333): 0.037*"工作" + 0.021*"方式" + 0.015*"時間" + 0.015*"小時" + 0.012*"內容" + 0.011*"聯絡" + 0.010*"每日" + 0.009*"電話" + 0.009*"工資" + 0.009*"推定" 2025-04-19 00:08:12,252 : INFO : topic diff=0.283800, rho=0.313805 2025-04-19 00:08:12,252 : INFO : PROGRESS: pass 1, at document #12000/16310 2025-04-19 00:08:12,456 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:12,458 : INFO : topic #0 (0.333): 0.014*"公司" + 0.010*"工作" + 0.006*"面試" + 0.005*"問題" + 0.005*"工程師" + 0.004*"開發" + 0.004*"時間" + 0.004*"台灣" + 0.004*"技術" + 0.004*"目前" 2025-04-19 00:08:12,459 : INFO : topic #1 (0.333): 0.032*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.011*"第一項" + 0.011*"情形" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:12,459 : INFO : topic #2 (0.333): 0.037*"工作" + 0.021*"方式" + 0.015*"時間" + 0.015*"小時" + 0.012*"內容" + 0.011*"聯絡" + 0.010*"每日" + 0.009*"地點" + 0.009*"電話" + 0.009*"工資" 2025-04-19 00:08:12,460 : INFO : topic diff=0.299853, rho=0.313805 2025-04-19 00:08:12,460 : INFO : PROGRESS: pass 1, at document #14000/16310 2025-04-19 00:08:12,664 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:12,666 : INFO : topic #0 (0.333): 0.012*"公司" + 0.008*"工作" + 0.005*"台灣" + 0.004*"面試" + 0.004*"問題" + 0.004*"工程師" + 0.004*"技術" + 0.004*"目前" + 0.003*"時間" + 0.003*"員工" 2025-04-19 00:08:12,666 : INFO : topic #1 (0.333): 0.032*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:12,667 : INFO : topic #2 (0.333): 0.037*"工作" + 0.021*"方式" + 0.015*"時間" + 0.014*"小時" + 0.011*"內容" + 0.011*"聯絡" + 0.010*"每日" + 0.009*"地點" + 0.009*"電話" + 0.009*"通知" 2025-04-19 00:08:12,667 : INFO : topic diff=0.297309, rho=0.313805 2025-04-19 00:08:12,668 : INFO : PROGRESS: pass 1, at document #16000/16310 2025-04-19 00:08:12,862 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:12,863 : INFO : topic #0 (0.333): 0.012*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.005*"美國" + 0.004*"技術" + 0.004*"晶片" + 0.004*"問題" + 0.004*"員工" + 0.004*"工程師" + 0.004*"面試" 2025-04-19 00:08:12,864 : INFO : topic #1 (0.333): 0.032*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.010*"內容" 2025-04-19 00:08:12,864 : INFO : topic #2 (0.333): 0.036*"工作" + 0.020*"方式" + 0.015*"時間" + 0.014*"小時" + 0.011*"內容" + 0.011*"聯絡" + 0.010*"地點" + 0.009*"每日" + 0.009*"報名" + 0.008*"工資" 2025-04-19 00:08:12,865 : INFO : topic diff=0.262534, rho=0.313805 2025-04-19 00:08:12,932 : INFO : -8.456 per-word bound, 351.3 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:08:12,932 : INFO : PROGRESS: pass 1, at document #16310/16310 2025-04-19 00:08:12,965 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:08:12,966 : INFO : topic #0 (0.333): 0.012*"公司" + 0.006*"美國" + 0.006*"工作" + 0.006*"台灣" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.004*"問題" 2025-04-19 00:08:12,967 : INFO : topic #1 (0.333): 0.031*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.010*"內容" 2025-04-19 00:08:12,967 : INFO : topic #2 (0.333): 0.038*"工作" + 0.018*"方式" + 0.016*"小時" + 0.015*"時間" + 0.011*"內容" + 0.011*"聯絡" + 0.010*"地點" + 0.009*"報名" + 0.009*"每日" + 0.009*"工時" 2025-04-19 00:08:12,968 : INFO : topic diff=0.298291, rho=0.313805 2025-04-19 00:08:12,968 : INFO : PROGRESS: pass 2, at document #2000/16310 2025-04-19 00:08:13,472 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:13,474 : INFO : topic #0 (0.333): 0.012*"公司" + 0.006*"台灣" + 0.006*"工作" + 0.006*"美國" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.004*"問題" 2025-04-19 00:08:13,474 : INFO : topic #1 (0.333): 0.032*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"方式" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:13,474 : INFO : topic #2 (0.333): 0.033*"工作" + 0.019*"方式" + 0.015*"時間" + 0.013*"小時" + 0.011*"內容" + 0.010*"聯絡" + 0.010*"電話" + 0.009*"通知" + 0.009*"地點" + 0.009*"報名" 2025-04-19 00:08:13,475 : INFO : topic diff=0.838892, rho=0.299409 2025-04-19 00:08:13,475 : INFO : PROGRESS: pass 2, at document #4000/16310 2025-04-19 00:08:13,962 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:13,965 : INFO : topic #0 (0.333): 0.012*"公司" + 0.006*"台灣" + 0.006*"工作" + 0.006*"美國" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"問題" + 0.004*"表示" 2025-04-19 00:08:13,965 : INFO : topic #1 (0.333): 0.032*"工作" + 0.014*"推定" + 0.012*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:13,966 : INFO : topic #2 (0.333): 0.031*"工作" + 0.019*"方式" + 0.015*"時間" + 0.013*"小時" + 0.011*"內容" + 0.011*"電話" + 0.010*"聯絡" + 0.010*"通知" + 0.010*"報名" + 0.009*"地點" 2025-04-19 00:08:13,966 : INFO : topic diff=0.350293, rho=0.299409 2025-04-19 00:08:13,966 : INFO : PROGRESS: pass 2, at document #6000/16310 2025-04-19 00:08:14,402 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:14,404 : INFO : topic #0 (0.333): 0.013*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.005*"美國" + 0.004*"技術" + 0.004*"問題" + 0.004*"工程師" + 0.004*"員工" + 0.003*"晶片" + 0.003*"面試" 2025-04-19 00:08:14,404 : INFO : topic #1 (0.333): 0.033*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"方式" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"資訊" 2025-04-19 00:08:14,405 : INFO : topic #2 (0.333): 0.030*"工作" + 0.019*"方式" + 0.014*"時間" + 0.012*"小時" + 0.011*"電話" + 0.011*"內容" + 0.011*"報名" + 0.010*"聯絡" + 0.010*"活動" + 0.010*"通知" 2025-04-19 00:08:14,405 : INFO : topic diff=0.259849, rho=0.299409 2025-04-19 00:08:14,406 : INFO : PROGRESS: pass 2, at document #8000/16310 2025-04-19 00:08:14,681 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:14,683 : INFO : topic #0 (0.333): 0.015*"公司" + 0.010*"工作" + 0.006*"面試" + 0.006*"問題" + 0.005*"工程師" + 0.005*"技術" + 0.004*"開發" + 0.004*"台灣" + 0.004*"經驗" + 0.004*"目前" 2025-04-19 00:08:14,683 : INFO : topic #1 (0.333): 0.033*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"方式" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:14,684 : INFO : topic #2 (0.333): 0.033*"工作" + 0.020*"方式" + 0.016*"時間" + 0.013*"小時" + 0.011*"內容" + 0.011*"電話" + 0.010*"聯絡" + 0.010*"報名" + 0.009*"活動" + 0.009*"台北市" 2025-04-19 00:08:14,684 : INFO : topic diff=0.333700, rho=0.299409 2025-04-19 00:08:14,685 : INFO : PROGRESS: pass 2, at document #10000/16310 2025-04-19 00:08:14,919 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:14,921 : INFO : topic #0 (0.333): 0.015*"公司" + 0.011*"工作" + 0.007*"面試" + 0.006*"問題" + 0.005*"工程師" + 0.005*"開發" + 0.005*"技術" + 0.005*"經驗" + 0.004*"目前" + 0.004*"時間" 2025-04-19 00:08:14,922 : INFO : topic #1 (0.333): 0.033*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"方式" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:14,922 : INFO : topic #2 (0.333): 0.033*"工作" + 0.020*"方式" + 0.016*"時間" + 0.014*"小時" + 0.011*"內容" + 0.011*"聯絡" + 0.010*"電話" + 0.010*"報名" + 0.009*"活動" + 0.009*"台北市" 2025-04-19 00:08:14,922 : INFO : topic diff=0.267383, rho=0.299409 2025-04-19 00:08:14,923 : INFO : PROGRESS: pass 2, at document #12000/16310 2025-04-19 00:08:15,146 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:15,148 : INFO : topic #0 (0.333): 0.014*"公司" + 0.009*"工作" + 0.006*"面試" + 0.006*"問題" + 0.005*"工程師" + 0.004*"開發" + 0.004*"技術" + 0.004*"台灣" + 0.004*"目前" + 0.004*"時間" 2025-04-19 00:08:15,148 : INFO : topic #1 (0.333): 0.032*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"方式" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:15,149 : INFO : topic #2 (0.333): 0.033*"工作" + 0.020*"方式" + 0.017*"時間" + 0.014*"小時" + 0.011*"內容" + 0.011*"聯絡" + 0.010*"報名" + 0.010*"活動" + 0.010*"電話" + 0.009*"地點" 2025-04-19 00:08:15,149 : INFO : topic diff=0.280246, rho=0.299409 2025-04-19 00:08:15,149 : INFO : PROGRESS: pass 2, at document #14000/16310 2025-04-19 00:08:15,363 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:15,365 : INFO : topic #0 (0.333): 0.012*"公司" + 0.008*"工作" + 0.006*"台灣" + 0.005*"面試" + 0.004*"問題" + 0.004*"工程師" + 0.004*"技術" + 0.004*"目前" + 0.003*"員工" + 0.003*"開發" 2025-04-19 00:08:15,366 : INFO : topic #1 (0.333): 0.032*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"方式" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:15,366 : INFO : topic #2 (0.333): 0.032*"工作" + 0.019*"方式" + 0.016*"時間" + 0.013*"小時" + 0.011*"內容" + 0.011*"報名" + 0.010*"聯絡" + 0.010*"活動" + 0.009*"電話" + 0.009*"地點" 2025-04-19 00:08:15,367 : INFO : topic diff=0.280783, rho=0.299409 2025-04-19 00:08:15,367 : INFO : PROGRESS: pass 2, at document #16000/16310 2025-04-19 00:08:15,571 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:15,573 : INFO : topic #0 (0.333): 0.012*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.005*"美國" + 0.004*"技術" + 0.004*"晶片" + 0.004*"問題" + 0.004*"工程師" + 0.004*"員工" + 0.004*"面試" 2025-04-19 00:08:15,574 : INFO : topic #1 (0.333): 0.032*"工作" + 0.013*"推定" + 0.012*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:08:15,574 : INFO : topic #2 (0.333): 0.032*"工作" + 0.018*"方式" + 0.016*"時間" + 0.013*"小時" + 0.011*"報名" + 0.010*"內容" + 0.010*"活動" + 0.010*"地點" + 0.010*"聯絡" + 0.009*"電話" 2025-04-19 00:08:15,574 : INFO : topic diff=0.247817, rho=0.299409 2025-04-19 00:08:15,643 : INFO : -8.445 per-word bound, 348.4 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:08:15,644 : INFO : PROGRESS: pass 2, at document #16310/16310 2025-04-19 00:08:15,678 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:08:15,680 : INFO : topic #0 (0.333): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.006*"工作" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.004*"問題" 2025-04-19 00:08:15,681 : INFO : topic #1 (0.333): 0.032*"工作" + 0.013*"推定" + 0.012*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"單位" + 0.011*"內容" 2025-04-19 00:08:15,682 : INFO : topic #2 (0.333): 0.033*"工作" + 0.017*"方式" + 0.016*"時間" + 0.015*"小時" + 0.011*"報名" + 0.010*"內容" + 0.010*"活動" + 0.010*"地點" + 0.010*"聯絡" + 0.008*"台北市" 2025-04-19 00:08:15,682 : INFO : topic diff=0.281434, rho=0.299409 2025-04-19 00:08:15,682 : INFO : PROGRESS: pass 3, at document #2000/16310 2025-04-19 00:08:16,139 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:16,141 : INFO : topic #0 (0.333): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.006*"工作" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.004*"問題" 2025-04-19 00:08:16,141 : INFO : topic #1 (0.333): 0.033*"工作" + 0.014*"推定" + 0.012*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:16,142 : INFO : topic #2 (0.333): 0.030*"工作" + 0.019*"方式" + 0.016*"時間" + 0.012*"小時" + 0.011*"電話" + 0.011*"內容" + 0.011*"報名" + 0.010*"活動" + 0.010*"聯絡" + 0.010*"地點" 2025-04-19 00:08:16,142 : INFO : topic diff=0.733749, rho=0.286829 2025-04-19 00:08:16,143 : INFO : PROGRESS: pass 3, at document #4000/16310 2025-04-19 00:08:16,589 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:16,591 : INFO : topic #0 (0.333): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.006*"工作" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"問題" + 0.004*"表示" 2025-04-19 00:08:16,591 : INFO : topic #1 (0.333): 0.033*"工作" + 0.014*"推定" + 0.012*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:16,592 : INFO : topic #2 (0.333): 0.029*"工作" + 0.018*"方式" + 0.015*"時間" + 0.012*"小時" + 0.012*"電話" + 0.011*"內容" + 0.011*"報名" + 0.010*"活動" + 0.010*"通知" + 0.010*"聯絡" 2025-04-19 00:08:16,592 : INFO : topic diff=0.327646, rho=0.286829 2025-04-19 00:08:16,593 : INFO : PROGRESS: pass 3, at document #6000/16310 2025-04-19 00:08:16,984 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:16,986 : INFO : topic #0 (0.333): 0.013*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.005*"美國" + 0.004*"技術" + 0.004*"問題" + 0.004*"工程師" + 0.004*"員工" + 0.004*"晶片" + 0.003*"面試" 2025-04-19 00:08:16,986 : INFO : topic #1 (0.333): 0.033*"工作" + 0.014*"推定" + 0.012*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:08:16,987 : INFO : topic #2 (0.333): 0.028*"工作" + 0.018*"方式" + 0.015*"時間" + 0.012*"電話" + 0.012*"報名" + 0.012*"小時" + 0.011*"活動" + 0.011*"內容" + 0.010*"通知" + 0.010*"台北市" 2025-04-19 00:08:16,987 : INFO : topic diff=0.246719, rho=0.286829 2025-04-19 00:08:16,987 : INFO : PROGRESS: pass 3, at document #8000/16310 2025-04-19 00:08:17,259 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:17,261 : INFO : topic #0 (0.333): 0.015*"公司" + 0.009*"工作" + 0.006*"面試" + 0.006*"問題" + 0.005*"工程師" + 0.005*"技術" + 0.004*"開發" + 0.004*"台灣" + 0.004*"經驗" + 0.004*"目前" 2025-04-19 00:08:17,261 : INFO : topic #1 (0.333): 0.033*"工作" + 0.014*"推定" + 0.012*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:08:17,262 : INFO : topic #2 (0.333): 0.030*"工作" + 0.019*"方式" + 0.016*"時間" + 0.013*"小時" + 0.011*"電話" + 0.011*"內容" + 0.011*"報名" + 0.010*"活動" + 0.010*"聯絡" + 0.010*"台北市" 2025-04-19 00:08:17,262 : INFO : topic diff=0.313431, rho=0.286829 2025-04-19 00:08:17,263 : INFO : PROGRESS: pass 3, at document #10000/16310 2025-04-19 00:08:17,502 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:17,504 : INFO : topic #0 (0.333): 0.016*"公司" + 0.010*"工作" + 0.007*"面試" + 0.006*"問題" + 0.005*"工程師" + 0.005*"開發" + 0.005*"技術" + 0.004*"經驗" + 0.004*"目前" + 0.004*"比較" 2025-04-19 00:08:17,504 : INFO : topic #1 (0.333): 0.033*"工作" + 0.014*"推定" + 0.012*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:08:17,505 : INFO : topic #2 (0.333): 0.031*"工作" + 0.019*"方式" + 0.017*"時間" + 0.013*"小時" + 0.011*"內容" + 0.011*"報名" + 0.010*"電話" + 0.010*"聯絡" + 0.010*"活動" + 0.009*"台北市" 2025-04-19 00:08:17,505 : INFO : topic diff=0.252243, rho=0.286829 2025-04-19 00:08:17,506 : INFO : PROGRESS: pass 3, at document #12000/16310 2025-04-19 00:08:17,752 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:17,754 : INFO : topic #0 (0.333): 0.014*"公司" + 0.009*"工作" + 0.006*"面試" + 0.006*"問題" + 0.005*"工程師" + 0.004*"開發" + 0.004*"技術" + 0.004*"台灣" + 0.004*"目前" + 0.004*"比較" 2025-04-19 00:08:17,754 : INFO : topic #1 (0.333): 0.033*"工作" + 0.014*"推定" + 0.012*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:08:17,755 : INFO : topic #2 (0.333): 0.031*"工作" + 0.019*"方式" + 0.017*"時間" + 0.013*"小時" + 0.011*"報名" + 0.011*"內容" + 0.011*"活動" + 0.010*"聯絡" + 0.010*"電話" + 0.009*"地點" 2025-04-19 00:08:17,755 : INFO : topic diff=0.264195, rho=0.286829 2025-04-19 00:08:17,756 : INFO : PROGRESS: pass 3, at document #14000/16310 2025-04-19 00:08:17,978 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:17,980 : INFO : topic #0 (0.333): 0.013*"公司" + 0.008*"工作" + 0.006*"台灣" + 0.005*"面試" + 0.005*"問題" + 0.004*"工程師" + 0.004*"技術" + 0.004*"目前" + 0.003*"開發" + 0.003*"員工" 2025-04-19 00:08:17,980 : INFO : topic #1 (0.333): 0.033*"工作" + 0.014*"推定" + 0.012*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:08:17,981 : INFO : topic #2 (0.333): 0.030*"工作" + 0.018*"方式" + 0.016*"時間" + 0.013*"小時" + 0.011*"報名" + 0.011*"活動" + 0.010*"內容" + 0.010*"聯絡" + 0.010*"電話" + 0.010*"地點" 2025-04-19 00:08:17,981 : INFO : topic diff=0.266639, rho=0.286829 2025-04-19 00:08:17,982 : INFO : PROGRESS: pass 3, at document #16000/16310 2025-04-19 00:08:18,187 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:18,189 : INFO : topic #0 (0.333): 0.012*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.005*"美國" + 0.004*"技術" + 0.004*"問題" + 0.004*"晶片" + 0.004*"工程師" + 0.004*"員工" + 0.004*"面試" 2025-04-19 00:08:18,189 : INFO : topic #1 (0.333): 0.033*"工作" + 0.014*"推定" + 0.012*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:08:18,190 : INFO : topic #2 (0.333): 0.029*"工作" + 0.017*"方式" + 0.016*"時間" + 0.013*"小時" + 0.011*"報名" + 0.011*"活動" + 0.010*"內容" + 0.010*"地點" + 0.009*"聯絡" + 0.009*"電話" 2025-04-19 00:08:18,190 : INFO : topic diff=0.235313, rho=0.286829 2025-04-19 00:08:18,261 : INFO : -8.439 per-word bound, 347.0 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:08:18,261 : INFO : PROGRESS: pass 3, at document #16310/16310 2025-04-19 00:08:18,295 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:08:18,298 : INFO : topic #0 (0.333): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.006*"工作" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"問題" + 0.004*"表示" 2025-04-19 00:08:18,298 : INFO : topic #1 (0.333): 0.032*"工作" + 0.013*"推定" + 0.012*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"單位" + 0.011*"第一項" + 0.011*"內容" + 0.010*"聯絡" 2025-04-19 00:08:18,299 : INFO : topic #2 (0.333): 0.031*"工作" + 0.016*"方式" + 0.016*"時間" + 0.014*"小時" + 0.011*"活動" + 0.011*"報名" + 0.010*"地點" + 0.010*"內容" + 0.009*"聯絡" + 0.009*"電話" 2025-04-19 00:08:18,299 : INFO : topic diff=0.267190, rho=0.286829 2025-04-19 00:08:18,299 : INFO : PROGRESS: pass 4, at document #2000/16310 2025-04-19 00:08:18,722 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:18,724 : INFO : topic #0 (0.333): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.006*"工作" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"問題" + 0.004*"表示" 2025-04-19 00:08:18,725 : INFO : topic #1 (0.333): 0.033*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"單位" + 0.011*"第一項" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:08:18,726 : INFO : topic #2 (0.333): 0.028*"工作" + 0.018*"方式" + 0.016*"時間" + 0.012*"小時" + 0.011*"電話" + 0.011*"報名" + 0.011*"活動" + 0.010*"內容" + 0.010*"台北市" + 0.010*"地點" 2025-04-19 00:08:18,726 : INFO : topic diff=0.669148, rho=0.275711 2025-04-19 00:08:18,726 : INFO : PROGRESS: pass 4, at document #4000/16310 2025-04-19 00:08:19,155 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:19,157 : INFO : topic #0 (0.333): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.006*"工作" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"問題" + 0.004*"科技" + 0.004*"表示" 2025-04-19 00:08:19,158 : INFO : topic #1 (0.333): 0.033*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:08:19,158 : INFO : topic #2 (0.333): 0.027*"工作" + 0.018*"方式" + 0.016*"時間" + 0.012*"電話" + 0.012*"小時" + 0.011*"報名" + 0.011*"活動" + 0.010*"內容" + 0.010*"通知" + 0.010*"台北市" 2025-04-19 00:08:19,159 : INFO : topic diff=0.312450, rho=0.275711 2025-04-19 00:08:19,159 : INFO : PROGRESS: pass 4, at document #6000/16310 2025-04-19 00:08:19,536 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:19,538 : INFO : topic #0 (0.333): 0.013*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.005*"美國" + 0.005*"技術" + 0.004*"問題" + 0.004*"工程師" + 0.004*"員工" + 0.004*"晶片" + 0.003*"面試" 2025-04-19 00:08:19,538 : INFO : topic #1 (0.333): 0.033*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:08:19,539 : INFO : topic #2 (0.333): 0.026*"工作" + 0.018*"方式" + 0.015*"時間" + 0.013*"電話" + 0.012*"報名" + 0.011*"活動" + 0.011*"小時" + 0.011*"內容" + 0.010*"台北市" + 0.010*"通知" 2025-04-19 00:08:19,539 : INFO : topic diff=0.237084, rho=0.275711 2025-04-19 00:08:19,540 : INFO : PROGRESS: pass 4, at document #8000/16310 2025-04-19 00:08:19,810 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:19,812 : INFO : topic #0 (0.333): 0.015*"公司" + 0.009*"工作" + 0.006*"面試" + 0.006*"問題" + 0.005*"工程師" + 0.005*"技術" + 0.004*"台灣" + 0.004*"開發" + 0.004*"目前" + 0.004*"經驗" 2025-04-19 00:08:19,813 : INFO : topic #1 (0.333): 0.033*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:08:19,813 : INFO : topic #2 (0.333): 0.028*"工作" + 0.019*"方式" + 0.017*"時間" + 0.012*"小時" + 0.012*"電話" + 0.011*"報名" + 0.011*"內容" + 0.011*"活動" + 0.010*"台北市" + 0.010*"聯絡" 2025-04-19 00:08:19,814 : INFO : topic diff=0.296448, rho=0.275711 2025-04-19 00:08:19,814 : INFO : PROGRESS: pass 4, at document #10000/16310 2025-04-19 00:08:20,050 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:20,052 : INFO : topic #0 (0.333): 0.016*"公司" + 0.010*"工作" + 0.007*"面試" + 0.006*"問題" + 0.005*"工程師" + 0.005*"開發" + 0.005*"技術" + 0.004*"目前" + 0.004*"經驗" + 0.004*"比較" 2025-04-19 00:08:20,053 : INFO : topic #1 (0.333): 0.033*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:08:20,054 : INFO : topic #2 (0.333): 0.029*"工作" + 0.019*"方式" + 0.017*"時間" + 0.013*"小時" + 0.011*"內容" + 0.011*"報名" + 0.011*"電話" + 0.010*"活動" + 0.010*"聯絡" + 0.010*"台北市" 2025-04-19 00:08:20,054 : INFO : topic diff=0.239550, rho=0.275711 2025-04-19 00:08:20,054 : INFO : PROGRESS: pass 4, at document #12000/16310 2025-04-19 00:08:20,280 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:20,282 : INFO : topic #0 (0.333): 0.014*"公司" + 0.009*"工作" + 0.006*"面試" + 0.006*"問題" + 0.005*"工程師" + 0.005*"開發" + 0.004*"技術" + 0.004*"台灣" + 0.004*"目前" + 0.004*"比較" 2025-04-19 00:08:20,283 : INFO : topic #1 (0.333): 0.033*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"單位" + 0.011*"第一項" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:08:20,283 : INFO : topic #2 (0.333): 0.029*"工作" + 0.018*"方式" + 0.017*"時間" + 0.013*"小時" + 0.011*"活動" + 0.011*"報名" + 0.010*"內容" + 0.010*"電話" + 0.010*"聯絡" + 0.009*"地點" 2025-04-19 00:08:20,283 : INFO : topic diff=0.250540, rho=0.275711 2025-04-19 00:08:20,284 : INFO : PROGRESS: pass 4, at document #14000/16310 2025-04-19 00:08:20,535 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:20,538 : INFO : topic #0 (0.333): 0.013*"公司" + 0.008*"工作" + 0.006*"台灣" + 0.005*"面試" + 0.005*"問題" + 0.004*"工程師" + 0.004*"技術" + 0.004*"目前" + 0.003*"開發" + 0.003*"員工" 2025-04-19 00:08:20,539 : INFO : topic #1 (0.333): 0.033*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"單位" + 0.011*"第一項" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:08:20,541 : INFO : topic #2 (0.333): 0.029*"工作" + 0.018*"方式" + 0.017*"時間" + 0.012*"小時" + 0.011*"活動" + 0.011*"報名" + 0.010*"內容" + 0.010*"電話" + 0.010*"地點" + 0.010*"聯絡" 2025-04-19 00:08:20,541 : INFO : topic diff=0.254796, rho=0.275711 2025-04-19 00:08:20,542 : INFO : PROGRESS: pass 4, at document #16000/16310 2025-04-19 00:08:20,827 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:20,829 : INFO : topic #0 (0.333): 0.012*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.005*"美國" + 0.004*"技術" + 0.004*"問題" + 0.004*"工程師" + 0.004*"晶片" + 0.004*"員工" + 0.004*"面試" 2025-04-19 00:08:20,829 : INFO : topic #1 (0.333): 0.033*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"情形" + 0.012*"空白" + 0.011*"單位" + 0.011*"第一項" + 0.011*"內容" + 0.011*"應徵" 2025-04-19 00:08:20,830 : INFO : topic #2 (0.333): 0.028*"工作" + 0.017*"方式" + 0.016*"時間" + 0.012*"小時" + 0.011*"活動" + 0.011*"報名" + 0.010*"內容" + 0.010*"地點" + 0.009*"電話" + 0.009*"聯絡" 2025-04-19 00:08:20,830 : INFO : topic diff=0.225021, rho=0.275711 2025-04-19 00:08:20,897 : INFO : -8.435 per-word bound, 346.1 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:08:20,898 : INFO : PROGRESS: pass 4, at document #16310/16310 2025-04-19 00:08:20,931 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:08:20,933 : INFO : topic #0 (0.333): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.006*"工作" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"問題" + 0.004*"表示" 2025-04-19 00:08:20,933 : INFO : topic #1 (0.333): 0.033*"工作" + 0.014*"推定" + 0.012*"方式" + 0.012*"砍除" + 0.012*"情形" + 0.012*"空白" + 0.011*"單位" + 0.011*"第一項" + 0.011*"內容" + 0.010*"應徵" 2025-04-19 00:08:20,934 : INFO : topic #2 (0.333): 0.029*"工作" + 0.016*"時間" + 0.016*"方式" + 0.013*"小時" + 0.011*"活動" + 0.010*"報名" + 0.010*"地點" + 0.010*"內容" + 0.009*"聯絡" + 0.009*"電話" 2025-04-19 00:08:20,934 : INFO : topic diff=0.255352, rho=0.275711 2025-04-19 00:08:20,934 : INFO : LdaModel lifecycle event {'msg': 'trained LdaModel<num_terms=23261, num_topics=3, decay=0.5, chunksize=2000> in 14.13s', 'datetime': '2025-04-19T00:08:20.934816', 'gensim': '4.3.3', 'python': '3.11.2 (main, Apr 21 2023, 22:51:21) [Clang 14.0.3 (clang-1403.0.22.14.1)]', 'platform': 'macOS-15.3.2-arm64-arm-64bit', 'event': 'created'} 2025-04-19 00:08:25,754 : INFO : -7.070 per-word bound, 134.4 perplexity estimate based on a held-out corpus of 16310 documents with 3460358 words 2025-04-19 00:08:25,756 : INFO : using ParallelWordOccurrenceAccumulator<processes=7, batch_size=64> to estimate probabilities from sliding windows 2025-04-19 00:08:29,705 : INFO : 1 batches submitted to accumulate stats from 64 documents (22660 virtual) 2025-04-19 00:08:29,707 : INFO : 2 batches submitted to accumulate stats from 128 documents (45646 virtual) 2025-04-19 00:08:29,710 : INFO : 3 batches submitted to accumulate stats from 192 documents (67171 virtual) 2025-04-19 00:08:29,713 : INFO : 4 batches submitted to accumulate stats from 256 documents (88330 virtual) 2025-04-19 00:08:29,715 : INFO : 5 batches submitted to accumulate stats from 320 documents (109687 virtual) 2025-04-19 00:08:29,719 : INFO : 6 batches submitted to accumulate stats from 384 documents (131042 virtual) 2025-04-19 00:08:29,722 : INFO : 7 batches submitted to accumulate stats from 448 documents (153774 virtual) 2025-04-19 00:08:29,727 : INFO : 8 batches submitted to accumulate stats from 512 documents (176164 virtual) 2025-04-19 00:08:29,732 : INFO : 9 batches submitted to accumulate stats from 576 documents (197020 virtual) 2025-04-19 00:08:29,735 : INFO : 10 batches submitted to accumulate stats from 640 documents (218505 virtual) 2025-04-19 00:08:29,737 : INFO : 11 batches submitted to accumulate stats from 704 documents (240803 virtual) 2025-04-19 00:08:29,739 : INFO : 12 batches submitted to accumulate stats from 768 documents (265360 virtual) 2025-04-19 00:08:29,745 : INFO : 13 batches submitted to accumulate stats from 832 documents (286615 virtual) 2025-04-19 00:08:29,751 : INFO : 14 batches submitted to accumulate stats from 896 documents (310833 virtual) 2025-04-19 00:08:29,841 : INFO : 15 batches submitted to accumulate stats from 960 documents (331313 virtual) 2025-04-19 00:08:29,846 : INFO : 16 batches submitted to accumulate stats from 1024 documents (350940 virtual) 2025-04-19 00:08:29,850 : INFO : 17 batches submitted to accumulate stats from 1088 documents (368371 virtual) 2025-04-19 00:08:29,857 : INFO : 18 batches submitted to accumulate stats from 1152 documents (390334 virtual) 2025-04-19 00:08:29,861 : INFO : 19 batches submitted to accumulate stats from 1216 documents (414153 virtual) 2025-04-19 00:08:29,869 : INFO : 20 batches submitted to accumulate stats from 1280 documents (435684 virtual) 2025-04-19 00:08:29,877 : INFO : 21 batches submitted to accumulate stats from 1344 documents (459433 virtual) 2025-04-19 00:08:29,974 : INFO : 22 batches submitted to accumulate stats from 1408 documents (483210 virtual) 2025-04-19 00:08:29,998 : INFO : 23 batches submitted to accumulate stats from 1472 documents (507391 virtual) 2025-04-19 00:08:30,011 : INFO : 24 batches submitted to accumulate stats from 1536 documents (527404 virtual) 2025-04-19 00:08:30,030 : INFO : 25 batches submitted to accumulate stats from 1600 documents (550178 virtual) 2025-04-19 00:08:30,036 : INFO : 26 batches submitted to accumulate stats from 1664 documents (575041 virtual) 2025-04-19 00:08:30,045 : INFO : 27 batches submitted to accumulate stats from 1728 documents (598912 virtual) 2025-04-19 00:08:30,050 : INFO : 28 batches submitted to accumulate stats from 1792 documents (622487 virtual) 2025-04-19 00:08:30,109 : INFO : 29 batches submitted to accumulate stats from 1856 documents (648902 virtual) 2025-04-19 00:08:30,124 : INFO : 30 batches submitted to accumulate stats from 1920 documents (671126 virtual) 2025-04-19 00:08:30,129 : INFO : 31 batches submitted to accumulate stats from 1984 documents (693717 virtual) 2025-04-19 00:08:30,150 : INFO : 32 batches submitted to accumulate stats from 2048 documents (714139 virtual) 2025-04-19 00:08:30,154 : INFO : 33 batches submitted to accumulate stats from 2112 documents (736202 virtual) 2025-04-19 00:08:30,243 : INFO : 34 batches submitted to accumulate stats from 2176 documents (758687 virtual) 2025-04-19 00:08:30,261 : INFO : 35 batches submitted to accumulate stats from 2240 documents (779677 virtual) 2025-04-19 00:08:30,265 : INFO : 36 batches submitted to accumulate stats from 2304 documents (800483 virtual) 2025-04-19 00:08:30,338 : INFO : 37 batches submitted to accumulate stats from 2368 documents (821258 virtual) 2025-04-19 00:08:30,353 : INFO : 38 batches submitted to accumulate stats from 2432 documents (844326 virtual) 2025-04-19 00:08:30,377 : INFO : 39 batches submitted to accumulate stats from 2496 documents (868823 virtual) 2025-04-19 00:08:30,400 : INFO : 40 batches submitted to accumulate stats from 2560 documents (888215 virtual) 2025-04-19 00:08:30,463 : INFO : 41 batches submitted to accumulate stats from 2624 documents (910499 virtual) 2025-04-19 00:08:30,468 : INFO : 42 batches submitted to accumulate stats from 2688 documents (931945 virtual) 2025-04-19 00:08:30,475 : INFO : 43 batches submitted to accumulate stats from 2752 documents (954111 virtual) 2025-04-19 00:08:30,499 : INFO : 44 batches submitted to accumulate stats from 2816 documents (975617 virtual) 2025-04-19 00:08:30,506 : INFO : 45 batches submitted to accumulate stats from 2880 documents (995125 virtual) 2025-04-19 00:08:30,513 : INFO : 46 batches submitted to accumulate stats from 2944 documents (1016531 virtual) 2025-04-19 00:08:30,530 : INFO : 47 batches submitted to accumulate stats from 3008 documents (1038247 virtual) 2025-04-19 00:08:30,602 : INFO : 48 batches submitted to accumulate stats from 3072 documents (1063862 virtual) 2025-04-19 00:08:30,610 : INFO : 49 batches submitted to accumulate stats from 3136 documents (1087898 virtual) 2025-04-19 00:08:30,631 : INFO : 50 batches submitted to accumulate stats from 3200 documents (1110531 virtual) 2025-04-19 00:08:30,636 : INFO : 51 batches submitted to accumulate stats from 3264 documents (1133127 virtual) 2025-04-19 00:08:30,641 : INFO : 52 batches submitted to accumulate stats from 3328 documents (1153766 virtual) 2025-04-19 00:08:30,648 : INFO : 53 batches submitted to accumulate stats from 3392 documents (1177684 virtual) 2025-04-19 00:08:30,686 : INFO : 54 batches submitted to accumulate stats from 3456 documents (1200190 virtual) 2025-04-19 00:08:30,791 : INFO : 55 batches submitted to accumulate stats from 3520 documents (1225029 virtual) 2025-04-19 00:08:30,796 : INFO : 56 batches submitted to accumulate stats from 3584 documents (1249662 virtual) 2025-04-19 00:08:30,802 : INFO : 57 batches submitted to accumulate stats from 3648 documents (1274547 virtual) 2025-04-19 00:08:30,806 : INFO : 58 batches submitted to accumulate stats from 3712 documents (1297434 virtual) 2025-04-19 00:08:30,812 : INFO : 59 batches submitted to accumulate stats from 3776 documents (1319261 virtual) 2025-04-19 00:08:30,836 : INFO : 60 batches submitted to accumulate stats from 3840 documents (1341972 virtual) 2025-04-19 00:08:30,845 : INFO : 61 batches submitted to accumulate stats from 3904 documents (1364269 virtual) 2025-04-19 00:08:30,937 : INFO : 62 batches submitted to accumulate stats from 3968 documents (1386796 virtual) 2025-04-19 00:08:30,944 : INFO : 63 batches submitted to accumulate stats from 4032 documents (1410249 virtual) 2025-04-19 00:08:30,947 : INFO : 64 batches submitted to accumulate stats from 4096 documents (1433115 virtual) 2025-04-19 00:08:30,950 : INFO : 65 batches submitted to accumulate stats from 4160 documents (1453873 virtual) 2025-04-19 00:08:30,969 : INFO : 66 batches submitted to accumulate stats from 4224 documents (1475474 virtual) 2025-04-19 00:08:30,993 : INFO : 67 batches submitted to accumulate stats from 4288 documents (1497524 virtual) 2025-04-19 00:08:30,998 : INFO : 68 batches submitted to accumulate stats from 4352 documents (1516835 virtual) 2025-04-19 00:08:31,073 : INFO : 69 batches submitted to accumulate stats from 4416 documents (1536986 virtual) 2025-04-19 00:08:31,092 : INFO : 70 batches submitted to accumulate stats from 4480 documents (1558454 virtual) 2025-04-19 00:08:31,097 : INFO : 71 batches submitted to accumulate stats from 4544 documents (1580610 virtual) 2025-04-19 00:08:31,112 : INFO : 72 batches submitted to accumulate stats from 4608 documents (1603508 virtual) 2025-04-19 00:08:31,117 : INFO : 73 batches submitted to accumulate stats from 4672 documents (1624378 virtual) 2025-04-19 00:08:31,135 : INFO : 74 batches submitted to accumulate stats from 4736 documents (1646402 virtual) 2025-04-19 00:08:31,174 : INFO : 75 batches submitted to accumulate stats from 4800 documents (1668704 virtual) 2025-04-19 00:08:31,222 : INFO : 76 batches submitted to accumulate stats from 4864 documents (1690394 virtual) 2025-04-19 00:08:31,273 : INFO : 77 batches submitted to accumulate stats from 4928 documents (1713028 virtual) 2025-04-19 00:08:31,278 : INFO : 78 batches submitted to accumulate stats from 4992 documents (1735434 virtual) 2025-04-19 00:08:31,283 : INFO : 79 batches submitted to accumulate stats from 5056 documents (1755430 virtual) 2025-04-19 00:08:31,287 : INFO : 80 batches submitted to accumulate stats from 5120 documents (1779164 virtual) 2025-04-19 00:08:31,293 : INFO : 81 batches submitted to accumulate stats from 5184 documents (1799023 virtual) 2025-04-19 00:08:31,309 : INFO : 82 batches submitted to accumulate stats from 5248 documents (1821516 virtual) 2025-04-19 00:08:31,351 : INFO : 83 batches submitted to accumulate stats from 5312 documents (1844224 virtual) 2025-04-19 00:08:31,385 : INFO : 84 batches submitted to accumulate stats from 5376 documents (1864739 virtual) 2025-04-19 00:08:31,417 : INFO : 85 batches submitted to accumulate stats from 5440 documents (1885053 virtual) 2025-04-19 00:08:31,426 : INFO : 86 batches submitted to accumulate stats from 5504 documents (1902170 virtual) 2025-04-19 00:08:31,431 : INFO : 87 batches submitted to accumulate stats from 5568 documents (1924910 virtual) 2025-04-19 00:08:31,435 : INFO : 88 batches submitted to accumulate stats from 5632 documents (1931530 virtual) 2025-04-19 00:08:31,449 : INFO : 89 batches submitted to accumulate stats from 5696 documents (1941414 virtual) 2025-04-19 00:08:31,508 : INFO : 90 batches submitted to accumulate stats from 5760 documents (1950642 virtual) 2025-04-19 00:08:31,558 : INFO : 91 batches submitted to accumulate stats from 5824 documents (1957200 virtual) 2025-04-19 00:08:31,580 : INFO : 92 batches submitted to accumulate stats from 5888 documents (1964937 virtual) 2025-04-19 00:08:31,620 : INFO : 93 batches submitted to accumulate stats from 5952 documents (1974259 virtual) 2025-04-19 00:08:31,628 : INFO : 94 batches submitted to accumulate stats from 6016 documents (1988296 virtual) 2025-04-19 00:08:31,637 : INFO : 95 batches submitted to accumulate stats from 6080 documents (1997659 virtual) 2025-04-19 00:08:31,646 : INFO : 96 batches submitted to accumulate stats from 6144 documents (2009678 virtual) 2025-04-19 00:08:31,668 : INFO : 97 batches submitted to accumulate stats from 6208 documents (2019297 virtual) 2025-04-19 00:08:31,687 : INFO : 98 batches submitted to accumulate stats from 6272 documents (2031857 virtual) 2025-04-19 00:08:31,696 : INFO : 99 batches submitted to accumulate stats from 6336 documents (2044117 virtual) 2025-04-19 00:08:31,704 : INFO : 100 batches submitted to accumulate stats from 6400 documents (2053380 virtual) 2025-04-19 00:08:31,711 : INFO : 101 batches submitted to accumulate stats from 6464 documents (2066889 virtual) 2025-04-19 00:08:31,724 : INFO : 102 batches submitted to accumulate stats from 6528 documents (2075479 virtual) 2025-04-19 00:08:31,727 : INFO : 103 batches submitted to accumulate stats from 6592 documents (2085095 virtual) 2025-04-19 00:08:31,731 : INFO : 104 batches submitted to accumulate stats from 6656 documents (2093845 virtual) 2025-04-19 00:08:31,743 : INFO : 105 batches submitted to accumulate stats from 6720 documents (2102407 virtual) 2025-04-19 00:08:31,748 : INFO : 106 batches submitted to accumulate stats from 6784 documents (2111466 virtual) 2025-04-19 00:08:31,766 : INFO : 107 batches submitted to accumulate stats from 6848 documents (2121845 virtual) 2025-04-19 00:08:31,773 : INFO : 108 batches submitted to accumulate stats from 6912 documents (2129219 virtual) 2025-04-19 00:08:31,778 : INFO : 109 batches submitted to accumulate stats from 6976 documents (2137886 virtual) 2025-04-19 00:08:31,791 : INFO : 110 batches submitted to accumulate stats from 7040 documents (2145150 virtual) 2025-04-19 00:08:31,793 : INFO : 111 batches submitted to accumulate stats from 7104 documents (2155495 virtual) 2025-04-19 00:08:31,803 : INFO : 112 batches submitted to accumulate stats from 7168 documents (2164720 virtual) 2025-04-19 00:08:31,808 : INFO : 113 batches submitted to accumulate stats from 7232 documents (2172193 virtual) 2025-04-19 00:08:31,810 : INFO : 114 batches submitted to accumulate stats from 7296 documents (2183458 virtual) 2025-04-19 00:08:31,814 : INFO : 115 batches submitted to accumulate stats from 7360 documents (2191706 virtual) 2025-04-19 00:08:31,852 : INFO : 116 batches submitted to accumulate stats from 7424 documents (2202020 virtual) 2025-04-19 00:08:31,858 : INFO : 117 batches submitted to accumulate stats from 7488 documents (2211055 virtual) 2025-04-19 00:08:31,867 : INFO : 118 batches submitted to accumulate stats from 7552 documents (2223321 virtual) 2025-04-19 00:08:31,869 : INFO : 119 batches submitted to accumulate stats from 7616 documents (2230121 virtual) 2025-04-19 00:08:31,872 : INFO : 120 batches submitted to accumulate stats from 7680 documents (2243511 virtual) 2025-04-19 00:08:31,878 : INFO : 121 batches submitted to accumulate stats from 7744 documents (2258370 virtual) 2025-04-19 00:08:31,892 : INFO : 122 batches submitted to accumulate stats from 7808 documents (2269267 virtual) 2025-04-19 00:08:31,894 : INFO : 123 batches submitted to accumulate stats from 7872 documents (2280490 virtual) 2025-04-19 00:08:31,898 : INFO : 124 batches submitted to accumulate stats from 7936 documents (2289945 virtual) 2025-04-19 00:08:31,908 : INFO : 125 batches submitted to accumulate stats from 8000 documents (2298931 virtual) 2025-04-19 00:08:31,917 : INFO : 126 batches submitted to accumulate stats from 8064 documents (2309719 virtual) 2025-04-19 00:08:31,919 : INFO : 127 batches submitted to accumulate stats from 8128 documents (2320328 virtual) 2025-04-19 00:08:31,929 : INFO : 128 batches submitted to accumulate stats from 8192 documents (2331614 virtual) 2025-04-19 00:08:31,938 : INFO : 129 batches submitted to accumulate stats from 8256 documents (2342568 virtual) 2025-04-19 00:08:31,940 : INFO : 130 batches submitted to accumulate stats from 8320 documents (2351306 virtual) 2025-04-19 00:08:31,946 : INFO : 131 batches submitted to accumulate stats from 8384 documents (2359488 virtual) 2025-04-19 00:08:31,948 : INFO : 132 batches submitted to accumulate stats from 8448 documents (2368497 virtual) 2025-04-19 00:08:31,969 : INFO : 133 batches submitted to accumulate stats from 8512 documents (2378449 virtual) 2025-04-19 00:08:31,980 : INFO : 134 batches submitted to accumulate stats from 8576 documents (2388057 virtual) 2025-04-19 00:08:31,983 : INFO : 135 batches submitted to accumulate stats from 8640 documents (2395926 virtual) 2025-04-19 00:08:31,986 : INFO : 136 batches submitted to accumulate stats from 8704 documents (2403405 virtual) 2025-04-19 00:08:31,991 : INFO : 137 batches submitted to accumulate stats from 8768 documents (2411628 virtual) 2025-04-19 00:08:31,999 : INFO : 138 batches submitted to accumulate stats from 8832 documents (2419219 virtual) 2025-04-19 00:08:32,009 : INFO : 139 batches submitted to accumulate stats from 8896 documents (2428220 virtual) 2025-04-19 00:08:32,020 : INFO : 140 batches submitted to accumulate stats from 8960 documents (2436470 virtual) 2025-04-19 00:08:32,025 : INFO : 141 batches submitted to accumulate stats from 9024 documents (2446006 virtual) 2025-04-19 00:08:32,030 : INFO : 142 batches submitted to accumulate stats from 9088 documents (2453039 virtual) 2025-04-19 00:08:32,037 : INFO : 143 batches submitted to accumulate stats from 9152 documents (2460905 virtual) 2025-04-19 00:08:32,040 : INFO : 144 batches submitted to accumulate stats from 9216 documents (2468645 virtual) 2025-04-19 00:08:32,048 : INFO : 145 batches submitted to accumulate stats from 9280 documents (2476321 virtual) 2025-04-19 00:08:32,059 : INFO : 146 batches submitted to accumulate stats from 9344 documents (2481981 virtual) 2025-04-19 00:08:32,062 : INFO : 147 batches submitted to accumulate stats from 9408 documents (2489833 virtual) 2025-04-19 00:08:32,067 : INFO : 148 batches submitted to accumulate stats from 9472 documents (2496627 virtual) 2025-04-19 00:08:32,072 : INFO : 149 batches submitted to accumulate stats from 9536 documents (2502106 virtual) 2025-04-19 00:08:32,078 : INFO : 150 batches submitted to accumulate stats from 9600 documents (2508434 virtual) 2025-04-19 00:08:32,087 : INFO : 151 batches submitted to accumulate stats from 9664 documents (2517654 virtual) 2025-04-19 00:08:32,090 : INFO : 152 batches submitted to accumulate stats from 9728 documents (2525651 virtual) 2025-04-19 00:08:32,104 : INFO : 153 batches submitted to accumulate stats from 9792 documents (2534661 virtual) 2025-04-19 00:08:32,109 : INFO : 154 batches submitted to accumulate stats from 9856 documents (2542846 virtual) 2025-04-19 00:08:32,111 : INFO : 155 batches submitted to accumulate stats from 9920 documents (2549206 virtual) 2025-04-19 00:08:32,114 : INFO : 156 batches submitted to accumulate stats from 9984 documents (2556742 virtual) 2025-04-19 00:08:32,117 : INFO : 157 batches submitted to accumulate stats from 10048 documents (2565026 virtual) 2025-04-19 00:08:32,158 : INFO : 158 batches submitted to accumulate stats from 10112 documents (2571434 virtual) 2025-04-19 00:08:32,161 : INFO : 159 batches submitted to accumulate stats from 10176 documents (2581280 virtual) 2025-04-19 00:08:32,163 : INFO : 160 batches submitted to accumulate stats from 10240 documents (2589671 virtual) 2025-04-19 00:08:32,170 : INFO : 161 batches submitted to accumulate stats from 10304 documents (2596979 virtual) 2025-04-19 00:08:32,174 : INFO : 162 batches submitted to accumulate stats from 10368 documents (2604556 virtual) 2025-04-19 00:08:32,178 : INFO : 163 batches submitted to accumulate stats from 10432 documents (2613656 virtual) 2025-04-19 00:08:32,186 : INFO : 164 batches submitted to accumulate stats from 10496 documents (2623890 virtual) 2025-04-19 00:08:32,191 : INFO : 165 batches submitted to accumulate stats from 10560 documents (2629308 virtual) 2025-04-19 00:08:32,197 : INFO : 166 batches submitted to accumulate stats from 10624 documents (2636085 virtual) 2025-04-19 00:08:32,201 : INFO : 167 batches submitted to accumulate stats from 10688 documents (2642039 virtual) 2025-04-19 00:08:32,205 : INFO : 168 batches submitted to accumulate stats from 10752 documents (2648389 virtual) 2025-04-19 00:08:32,210 : INFO : 169 batches submitted to accumulate stats from 10816 documents (2661959 virtual) 2025-04-19 00:08:32,219 : INFO : 170 batches submitted to accumulate stats from 10880 documents (2672949 virtual) 2025-04-19 00:08:32,221 : INFO : 171 batches submitted to accumulate stats from 10944 documents (2683365 virtual) 2025-04-19 00:08:32,234 : INFO : 172 batches submitted to accumulate stats from 11008 documents (2690484 virtual) 2025-04-19 00:08:32,237 : INFO : 173 batches submitted to accumulate stats from 11072 documents (2700627 virtual) 2025-04-19 00:08:32,241 : INFO : 174 batches submitted to accumulate stats from 11136 documents (2708742 virtual) 2025-04-19 00:08:32,243 : INFO : 175 batches submitted to accumulate stats from 11200 documents (2718156 virtual) 2025-04-19 00:08:32,245 : INFO : 176 batches submitted to accumulate stats from 11264 documents (2727801 virtual) 2025-04-19 00:08:32,267 : INFO : 177 batches submitted to accumulate stats from 11328 documents (2736288 virtual) 2025-04-19 00:08:32,269 : INFO : 178 batches submitted to accumulate stats from 11392 documents (2743845 virtual) 2025-04-19 00:08:32,276 : INFO : 179 batches submitted to accumulate stats from 11456 documents (2750885 virtual) 2025-04-19 00:08:32,288 : INFO : 180 batches submitted to accumulate stats from 11520 documents (2759213 virtual) 2025-04-19 00:08:32,291 : INFO : 181 batches submitted to accumulate stats from 11584 documents (2770309 virtual) 2025-04-19 00:08:32,294 : INFO : 182 batches submitted to accumulate stats from 11648 documents (2781566 virtual) 2025-04-19 00:08:32,319 : INFO : 183 batches submitted to accumulate stats from 11712 documents (2793513 virtual) 2025-04-19 00:08:32,347 : INFO : 184 batches submitted to accumulate stats from 11776 documents (2805133 virtual) 2025-04-19 00:08:32,356 : INFO : 185 batches submitted to accumulate stats from 11840 documents (2814621 virtual) 2025-04-19 00:08:32,363 : INFO : 186 batches submitted to accumulate stats from 11904 documents (2825917 virtual) 2025-04-19 00:08:32,369 : INFO : 187 batches submitted to accumulate stats from 11968 documents (2834764 virtual) 2025-04-19 00:08:32,377 : INFO : 188 batches submitted to accumulate stats from 12032 documents (2844523 virtual) 2025-04-19 00:08:32,387 : INFO : 189 batches submitted to accumulate stats from 12096 documents (2854512 virtual) 2025-04-19 00:08:32,389 : INFO : 190 batches submitted to accumulate stats from 12160 documents (2863511 virtual) 2025-04-19 00:08:32,394 : INFO : 191 batches submitted to accumulate stats from 12224 documents (2872492 virtual) 2025-04-19 00:08:32,396 : INFO : 192 batches submitted to accumulate stats from 12288 documents (2881543 virtual) 2025-04-19 00:08:32,400 : INFO : 193 batches submitted to accumulate stats from 12352 documents (2891233 virtual) 2025-04-19 00:08:32,404 : INFO : 194 batches submitted to accumulate stats from 12416 documents (2899835 virtual) 2025-04-19 00:08:32,417 : INFO : 195 batches submitted to accumulate stats from 12480 documents (2908542 virtual) 2025-04-19 00:08:32,419 : INFO : 196 batches submitted to accumulate stats from 12544 documents (2920162 virtual) 2025-04-19 00:08:32,433 : INFO : 197 batches submitted to accumulate stats from 12608 documents (2931072 virtual) 2025-04-19 00:08:32,439 : INFO : 198 batches submitted to accumulate stats from 12672 documents (2942168 virtual) 2025-04-19 00:08:32,443 : INFO : 199 batches submitted to accumulate stats from 12736 documents (2951378 virtual) 2025-04-19 00:08:32,446 : INFO : 200 batches submitted to accumulate stats from 12800 documents (2964980 virtual) 2025-04-19 00:08:32,457 : INFO : 201 batches submitted to accumulate stats from 12864 documents (2974742 virtual) 2025-04-19 00:08:32,463 : INFO : 202 batches submitted to accumulate stats from 12928 documents (2984778 virtual) 2025-04-19 00:08:32,465 : INFO : 203 batches submitted to accumulate stats from 12992 documents (2994073 virtual) 2025-04-19 00:08:32,476 : INFO : 204 batches submitted to accumulate stats from 13056 documents (3002522 virtual) 2025-04-19 00:08:32,483 : INFO : 205 batches submitted to accumulate stats from 13120 documents (3012040 virtual) 2025-04-19 00:08:32,492 : INFO : 206 batches submitted to accumulate stats from 13184 documents (3019919 virtual) 2025-04-19 00:08:32,494 : INFO : 207 batches submitted to accumulate stats from 13248 documents (3029004 virtual) 2025-04-19 00:08:32,530 : INFO : 208 batches submitted to accumulate stats from 13312 documents (3037489 virtual) 2025-04-19 00:08:32,532 : INFO : 209 batches submitted to accumulate stats from 13376 documents (3044929 virtual) 2025-04-19 00:08:32,542 : INFO : 210 batches submitted to accumulate stats from 13440 documents (3054034 virtual) 2025-04-19 00:08:32,553 : INFO : 211 batches submitted to accumulate stats from 13504 documents (3064099 virtual) 2025-04-19 00:08:32,563 : INFO : 212 batches submitted to accumulate stats from 13568 documents (3074522 virtual) 2025-04-19 00:08:32,569 : INFO : 213 batches submitted to accumulate stats from 13632 documents (3083808 virtual) 2025-04-19 00:08:32,570 : INFO : 214 batches submitted to accumulate stats from 13696 documents (3093078 virtual) 2025-04-19 00:08:32,573 : INFO : 215 batches submitted to accumulate stats from 13760 documents (3102171 virtual) 2025-04-19 00:08:32,575 : INFO : 216 batches submitted to accumulate stats from 13824 documents (3111128 virtual) 2025-04-19 00:08:32,586 : INFO : 217 batches submitted to accumulate stats from 13888 documents (3120517 virtual) 2025-04-19 00:08:32,597 : INFO : 218 batches submitted to accumulate stats from 13952 documents (3130614 virtual) 2025-04-19 00:08:32,604 : INFO : 219 batches submitted to accumulate stats from 14016 documents (3139268 virtual) 2025-04-19 00:08:32,608 : INFO : 220 batches submitted to accumulate stats from 14080 documents (3148635 virtual) 2025-04-19 00:08:32,611 : INFO : 221 batches submitted to accumulate stats from 14144 documents (3157335 virtual) 2025-04-19 00:08:32,616 : INFO : 222 batches submitted to accumulate stats from 14208 documents (3165838 virtual) 2025-04-19 00:08:32,618 : INFO : 223 batches submitted to accumulate stats from 14272 documents (3175765 virtual) 2025-04-19 00:08:32,633 : INFO : 224 batches submitted to accumulate stats from 14336 documents (3183123 virtual) 2025-04-19 00:08:32,643 : INFO : 225 batches submitted to accumulate stats from 14400 documents (3189537 virtual) 2025-04-19 00:08:32,645 : INFO : 226 batches submitted to accumulate stats from 14464 documents (3197239 virtual) 2025-04-19 00:08:32,648 : INFO : 227 batches submitted to accumulate stats from 14528 documents (3205518 virtual) 2025-04-19 00:08:32,653 : INFO : 228 batches submitted to accumulate stats from 14592 documents (3215608 virtual) 2025-04-19 00:08:32,657 : INFO : 229 batches submitted to accumulate stats from 14656 documents (3223376 virtual) 2025-04-19 00:08:32,660 : INFO : 230 batches submitted to accumulate stats from 14720 documents (3232304 virtual) 2025-04-19 00:08:32,685 : INFO : 231 batches submitted to accumulate stats from 14784 documents (3240270 virtual) 2025-04-19 00:08:32,687 : INFO : 232 batches submitted to accumulate stats from 14848 documents (3249755 virtual) 2025-04-19 00:08:32,694 : INFO : 233 batches submitted to accumulate stats from 14912 documents (3259377 virtual) 2025-04-19 00:08:32,697 : INFO : 234 batches submitted to accumulate stats from 14976 documents (3269637 virtual) 2025-04-19 00:08:32,699 : INFO : 235 batches submitted to accumulate stats from 15040 documents (3278311 virtual) 2025-04-19 00:08:32,701 : INFO : 236 batches submitted to accumulate stats from 15104 documents (3286321 virtual) 2025-04-19 00:08:32,702 : INFO : 237 batches submitted to accumulate stats from 15168 documents (3293385 virtual) 2025-04-19 00:08:32,715 : INFO : 238 batches submitted to accumulate stats from 15232 documents (3300334 virtual) 2025-04-19 00:08:32,727 : INFO : 239 batches submitted to accumulate stats from 15296 documents (3308226 virtual) 2025-04-19 00:08:32,732 : INFO : 240 batches submitted to accumulate stats from 15360 documents (3317325 virtual) 2025-04-19 00:08:32,741 : INFO : 241 batches submitted to accumulate stats from 15424 documents (3325778 virtual) 2025-04-19 00:08:32,744 : INFO : 242 batches submitted to accumulate stats from 15488 documents (3335373 virtual) 2025-04-19 00:08:32,748 : INFO : 243 batches submitted to accumulate stats from 15552 documents (3342716 virtual) 2025-04-19 00:08:32,750 : INFO : 244 batches submitted to accumulate stats from 15616 documents (3350508 virtual) 2025-04-19 00:08:32,752 : INFO : 245 batches submitted to accumulate stats from 15680 documents (3360131 virtual) 2025-04-19 00:08:32,763 : INFO : 246 batches submitted to accumulate stats from 15744 documents (3370635 virtual) 2025-04-19 00:08:32,778 : INFO : 247 batches submitted to accumulate stats from 15808 documents (3380994 virtual) 2025-04-19 00:08:32,781 : INFO : 248 batches submitted to accumulate stats from 15872 documents (3389920 virtual) 2025-04-19 00:08:32,784 : INFO : 249 batches submitted to accumulate stats from 15936 documents (3397487 virtual) 2025-04-19 00:08:32,791 : INFO : 250 batches submitted to accumulate stats from 16000 documents (3406129 virtual) 2025-04-19 00:08:32,793 : INFO : 251 batches submitted to accumulate stats from 16064 documents (3416805 virtual) 2025-04-19 00:08:32,798 : INFO : 252 batches submitted to accumulate stats from 16128 documents (3426189 virtual) 2025-04-19 00:08:32,811 : INFO : 253 batches submitted to accumulate stats from 16192 documents (3433824 virtual) 2025-04-19 00:08:32,828 : INFO : 254 batches submitted to accumulate stats from 16256 documents (3443379 virtual) 2025-04-19 00:08:32,829 : INFO : 255 batches submitted to accumulate stats from 16320 documents (3450914 virtual) 2025-04-19 00:08:33,033 : INFO : 7 accumulators retrieved from output queue 2025-04-19 00:08:33,042 : INFO : accumulated word occurrence stats for 3451622 virtual documents 2025-04-19 00:08:33,091 : INFO : using symmetric alpha at 0.25 2025-04-19 00:08:33,091 : INFO : using symmetric eta at 0.25 2025-04-19 00:08:33,093 : INFO : using serial LDA version on this node 2025-04-19 00:08:33,097 : INFO : running online (multi-pass) LDA training, 4 topics, 5 passes over the supplied corpus of 16310 documents, updating model once every 2000 documents, evaluating perplexity every 16310 documents, iterating 50x with a convergence threshold of 0.001000 2025-04-19 00:08:33,098 : INFO : PROGRESS: pass 0, at document #2000/16310 2025-04-19 00:08:33,719 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:33,722 : INFO : topic #0 (0.250): 0.029*"工作" + 0.014*"方式" + 0.013*"推定" + 0.012*"應徵" + 0.012*"空白" + 0.011*"單位" + 0.011*"砍除" + 0.010*"內容" + 0.010*"聯絡" + 0.010*"資訊" 2025-04-19 00:08:33,722 : INFO : topic #1 (0.250): 0.030*"工作" + 0.015*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"國定假日" + 0.010*"第一項" + 0.010*"空白" + 0.010*"情形" + 0.010*"內容" 2025-04-19 00:08:33,723 : INFO : topic #2 (0.250): 0.040*"工作" + 0.013*"內容" + 0.013*"推定" + 0.012*"工資" + 0.011*"應徵" + 0.011*"方式" + 0.010*"聯絡" + 0.010*"情形" + 0.010*"砍除" + 0.010*"小時" 2025-04-19 00:08:33,723 : INFO : topic #3 (0.250): 0.020*"工作" + 0.012*"方式" + 0.011*"砍除" + 0.010*"聯絡人" + 0.010*"推定" + 0.010*"應徵" + 0.010*"空白" + 0.009*"文字" + 0.008*"資訊" + 0.008*"情形" 2025-04-19 00:08:33,723 : INFO : topic diff=5.686695, rho=1.000000 2025-04-19 00:08:33,724 : INFO : PROGRESS: pass 0, at document #4000/16310 2025-04-19 00:08:34,301 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:34,304 : INFO : topic #0 (0.250): 0.029*"工作" + 0.013*"方式" + 0.013*"應徵" + 0.013*"空白" + 0.012*"推定" + 0.011*"砍除" + 0.011*"單位" + 0.010*"內容" + 0.010*"資訊" + 0.010*"第一項" 2025-04-19 00:08:34,304 : INFO : topic #1 (0.250): 0.030*"工作" + 0.014*"方式" + 0.013*"推定" + 0.012*"第一項" + 0.012*"空白" + 0.012*"砍除" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"國定假日" + 0.011*"單位" 2025-04-19 00:08:34,305 : INFO : topic #2 (0.250): 0.042*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"內容" + 0.012*"工資" + 0.011*"應徵" + 0.011*"小時" + 0.010*"單位" + 0.010*"聯絡" + 0.010*"情形" 2025-04-19 00:08:34,305 : INFO : topic #3 (0.250): 0.014*"報名" + 0.012*"工作" + 0.012*"活動" + 0.011*"電話" + 0.011*"方式" + 0.009*"時間" + 0.009*"聯絡" + 0.009*"台北市" + 0.009*"內容" + 0.008*"聯絡人" 2025-04-19 00:08:34,306 : INFO : topic diff=0.556125, rho=0.707107 2025-04-19 00:08:34,306 : INFO : PROGRESS: pass 0, at document #6000/16310 2025-04-19 00:08:34,811 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:34,814 : INFO : topic #0 (0.250): 0.029*"工作" + 0.012*"應徵" + 0.012*"方式" + 0.012*"空白" + 0.011*"推定" + 0.011*"砍除" + 0.010*"內容" + 0.010*"單位" + 0.009*"第一項" + 0.009*"資訊" 2025-04-19 00:08:34,814 : INFO : topic #1 (0.250): 0.031*"工作" + 0.013*"推定" + 0.013*"方式" + 0.012*"空白" + 0.012*"砍除" + 0.012*"第一項" + 0.012*"情形" + 0.011*"聯絡" + 0.011*"國定假日" + 0.011*"資訊" 2025-04-19 00:08:34,815 : INFO : topic #2 (0.250): 0.042*"工作" + 0.015*"方式" + 0.013*"推定" + 0.012*"內容" + 0.012*"工資" + 0.011*"小時" + 0.010*"依法" + 0.010*"單位" + 0.010*"應徵" + 0.009*"聯絡" 2025-04-19 00:08:34,815 : INFO : topic #3 (0.250): 0.016*"報名" + 0.014*"活動" + 0.012*"電話" + 0.010*"台北市" + 0.010*"時間" + 0.009*"方式" + 0.008*"聯絡" + 0.008*"內容" + 0.008*"資料" + 0.008*"人數" 2025-04-19 00:08:34,816 : INFO : topic diff=0.777768, rho=0.577350 2025-04-19 00:08:34,816 : INFO : PROGRESS: pass 0, at document #8000/16310 2025-04-19 00:08:35,178 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:35,180 : INFO : topic #0 (0.250): 0.029*"工作" + 0.012*"應徵" + 0.012*"方式" + 0.011*"空白" + 0.011*"推定" + 0.010*"砍除" + 0.010*"內容" + 0.009*"單位" + 0.009*"資訊" + 0.009*"第一項" 2025-04-19 00:08:35,181 : INFO : topic #1 (0.250): 0.031*"工作" + 0.013*"推定" + 0.013*"方式" + 0.012*"空白" + 0.012*"砍除" + 0.012*"第一項" + 0.012*"情形" + 0.011*"聯絡" + 0.011*"國定假日" + 0.011*"資訊" 2025-04-19 00:08:35,182 : INFO : topic #2 (0.250): 0.045*"工作" + 0.013*"方式" + 0.010*"小時" + 0.010*"內容" + 0.010*"推定" + 0.009*"工資" + 0.009*"面試" + 0.008*"應徵" + 0.008*"時間" + 0.008*"單位" 2025-04-19 00:08:35,185 : INFO : topic #3 (0.250): 0.016*"公司" + 0.009*"工作" + 0.007*"時間" + 0.007*"問題" + 0.006*"工程師" + 0.006*"面試" + 0.005*"開發" + 0.005*"經驗" + 0.005*"目前" + 0.005*"團隊" 2025-04-19 00:08:35,185 : INFO : topic diff=0.948413, rho=0.500000 2025-04-19 00:08:35,187 : INFO : PROGRESS: pass 0, at document #10000/16310 2025-04-19 00:08:35,449 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:35,454 : INFO : topic #0 (0.250): 0.029*"工作" + 0.012*"應徵" + 0.011*"方式" + 0.011*"空白" + 0.010*"推定" + 0.010*"砍除" + 0.009*"內容" + 0.009*"單位" + 0.009*"資訊" + 0.009*"第一項" 2025-04-19 00:08:35,457 : INFO : topic #1 (0.250): 0.031*"工作" + 0.013*"推定" + 0.013*"方式" + 0.012*"空白" + 0.012*"砍除" + 0.012*"第一項" + 0.012*"情形" + 0.011*"聯絡" + 0.011*"國定假日" + 0.011*"資訊" 2025-04-19 00:08:35,462 : INFO : topic #2 (0.250): 0.047*"工作" + 0.013*"方式" + 0.011*"小時" + 0.010*"內容" + 0.010*"推定" + 0.009*"工資" + 0.008*"面試" + 0.008*"時間" + 0.008*"應徵" + 0.008*"單位" 2025-04-19 00:08:35,470 : INFO : topic #3 (0.250): 0.016*"公司" + 0.011*"工作" + 0.008*"面試" + 0.007*"問題" + 0.007*"時間" + 0.006*"工程師" + 0.006*"開發" + 0.005*"經驗" + 0.005*"目前" + 0.004*"技術" 2025-04-19 00:08:35,473 : INFO : topic diff=0.493554, rho=0.447214 2025-04-19 00:08:35,474 : INFO : PROGRESS: pass 0, at document #12000/16310 2025-04-19 00:08:35,687 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:35,689 : INFO : topic #0 (0.250): 0.028*"工作" + 0.011*"應徵" + 0.011*"方式" + 0.011*"空白" + 0.010*"推定" + 0.010*"砍除" + 0.009*"內容" + 0.009*"單位" + 0.009*"資訊" + 0.009*"第一項" 2025-04-19 00:08:35,689 : INFO : topic #1 (0.250): 0.031*"工作" + 0.013*"推定" + 0.013*"方式" + 0.012*"空白" + 0.012*"砍除" + 0.012*"第一項" + 0.012*"情形" + 0.011*"聯絡" + 0.011*"國定假日" + 0.010*"資訊" 2025-04-19 00:08:35,690 : INFO : topic #2 (0.250): 0.047*"工作" + 0.013*"方式" + 0.011*"小時" + 0.010*"內容" + 0.009*"推定" + 0.008*"時間" + 0.008*"工資" + 0.008*"面試" + 0.008*"應徵" + 0.008*"單位" 2025-04-19 00:08:35,690 : INFO : topic #3 (0.250): 0.014*"公司" + 0.009*"工作" + 0.006*"面試" + 0.006*"問題" + 0.005*"時間" + 0.005*"工程師" + 0.005*"開發" + 0.004*"目前" + 0.004*"技術" + 0.004*"台灣" 2025-04-19 00:08:35,691 : INFO : topic diff=0.504608, rho=0.408248 2025-04-19 00:08:35,691 : INFO : PROGRESS: pass 0, at document #14000/16310 2025-04-19 00:08:35,877 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:35,879 : INFO : topic #0 (0.250): 0.027*"工作" + 0.011*"方式" + 0.011*"應徵" + 0.010*"空白" + 0.010*"推定" + 0.009*"砍除" + 0.009*"單位" + 0.009*"內容" + 0.009*"資訊" + 0.008*"第一項" 2025-04-19 00:08:35,880 : INFO : topic #1 (0.250): 0.031*"工作" + 0.013*"推定" + 0.013*"方式" + 0.012*"空白" + 0.012*"砍除" + 0.012*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"國定假日" + 0.010*"資訊" 2025-04-19 00:08:35,880 : INFO : topic #2 (0.250): 0.046*"工作" + 0.012*"方式" + 0.011*"小時" + 0.010*"內容" + 0.009*"推定" + 0.008*"時間" + 0.008*"工資" + 0.008*"單位" + 0.008*"面試" + 0.008*"應徵" 2025-04-19 00:08:35,881 : INFO : topic #3 (0.250): 0.012*"公司" + 0.008*"工作" + 0.006*"台灣" + 0.004*"問題" + 0.004*"面試" + 0.004*"工程師" + 0.004*"技術" + 0.004*"時間" + 0.004*"目前" + 0.003*"員工" 2025-04-19 00:08:35,881 : INFO : topic diff=0.415438, rho=0.377964 2025-04-19 00:08:35,882 : INFO : PROGRESS: pass 0, at document #16000/16310 2025-04-19 00:08:36,067 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:36,069 : INFO : topic #0 (0.250): 0.025*"工作" + 0.010*"方式" + 0.010*"應徵" + 0.010*"空白" + 0.009*"推定" + 0.009*"砍除" + 0.009*"單位" + 0.008*"資訊" + 0.008*"內容" + 0.008*"第一項" 2025-04-19 00:08:36,070 : INFO : topic #1 (0.250): 0.030*"工作" + 0.013*"推定" + 0.013*"方式" + 0.012*"空白" + 0.012*"砍除" + 0.012*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"國定假日" + 0.010*"資訊" 2025-04-19 00:08:36,070 : INFO : topic #2 (0.250): 0.046*"工作" + 0.012*"方式" + 0.011*"小時" + 0.010*"內容" + 0.008*"時間" + 0.008*"工資" + 0.008*"推定" + 0.008*"面試" + 0.008*"單位" + 0.008*"應徵" 2025-04-19 00:08:36,071 : INFO : topic #3 (0.250): 0.012*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.005*"美國" + 0.004*"晶片" + 0.004*"技術" + 0.004*"問題" + 0.004*"員工" + 0.004*"工程師" + 0.004*"表示" 2025-04-19 00:08:36,071 : INFO : topic diff=0.317846, rho=0.353553 2025-04-19 00:08:36,139 : INFO : -8.535 per-word bound, 371.0 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:08:36,140 : INFO : PROGRESS: pass 0, at document #16310/16310 2025-04-19 00:08:36,172 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:08:36,175 : INFO : topic #0 (0.250): 0.024*"關稅" + 0.022*"工作" + 0.009*"方式" + 0.009*"應徵" + 0.009*"空白" + 0.008*"單位" + 0.008*"推定" + 0.008*"砍除" + 0.008*"分類" + 0.007*"資訊" 2025-04-19 00:08:36,175 : INFO : topic #1 (0.250): 0.030*"工作" + 0.013*"推定" + 0.013*"方式" + 0.012*"空白" + 0.012*"砍除" + 0.011*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.010*"國定假日" + 0.010*"資訊" 2025-04-19 00:08:36,176 : INFO : topic #2 (0.250): 0.047*"工作" + 0.013*"小時" + 0.011*"方式" + 0.009*"內容" + 0.008*"工時" + 0.008*"時間" + 0.008*"單位" + 0.007*"面試" + 0.007*"工資" + 0.007*"應徵" 2025-04-19 00:08:36,176 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"美國" + 0.006*"台灣" + 0.006*"工作" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.004*"台積電" 2025-04-19 00:08:36,176 : INFO : topic diff=0.304742, rho=0.333333 2025-04-19 00:08:36,177 : INFO : PROGRESS: pass 1, at document #2000/16310 2025-04-19 00:08:36,669 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:36,671 : INFO : topic #0 (0.250): 0.019*"工作" + 0.011*"方式" + 0.010*"台北市" + 0.009*"關稅" + 0.009*"內容" + 0.009*"聯絡" + 0.008*"應徵" + 0.008*"分類" + 0.008*"資訊" + 0.007*"電話" 2025-04-19 00:08:36,671 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"資訊" 2025-04-19 00:08:36,672 : INFO : topic #2 (0.250): 0.045*"工作" + 0.019*"方式" + 0.014*"小時" + 0.013*"時間" + 0.011*"內容" + 0.011*"工資" + 0.010*"依法" + 0.010*"每日" + 0.010*"推定" + 0.009*"單位" 2025-04-19 00:08:36,672 : INFO : topic #3 (0.250): 0.011*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.006*"工作" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.003*"台積電" 2025-04-19 00:08:36,673 : INFO : topic diff=0.992621, rho=0.313805 2025-04-19 00:08:36,673 : INFO : PROGRESS: pass 1, at document #4000/16310 2025-04-19 00:08:37,135 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:37,138 : INFO : topic #0 (0.250): 0.016*"電話" + 0.014*"工作" + 0.014*"台北市" + 0.014*"報名" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"方式" + 0.010*"通知" + 0.010*"地點" + 0.008*"人數" 2025-04-19 00:08:37,138 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"資訊" 2025-04-19 00:08:37,139 : INFO : topic #2 (0.250): 0.045*"工作" + 0.021*"方式" + 0.015*"小時" + 0.014*"時間" + 0.012*"工資" + 0.012*"推定" + 0.011*"內容" + 0.011*"依法" + 0.011*"每日" + 0.010*"單位" 2025-04-19 00:08:37,141 : INFO : topic #3 (0.250): 0.011*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.005*"工作" + 0.004*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"時間" + 0.003*"科技" + 0.003*"表示" 2025-04-19 00:08:37,145 : INFO : topic diff=0.498947, rho=0.313805 2025-04-19 00:08:37,152 : INFO : PROGRESS: pass 1, at document #6000/16310 2025-04-19 00:08:37,536 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:37,538 : INFO : topic #0 (0.250): 0.023*"報名" + 0.020*"電話" + 0.017*"台北市" + 0.016*"活動" + 0.012*"聯絡" + 0.012*"內容" + 0.012*"通知" + 0.011*"人數" + 0.011*"地點" + 0.010*"車馬費" 2025-04-19 00:08:37,539 : INFO : topic #1 (0.250): 0.032*"工作" + 0.013*"推定" + 0.013*"砍除" + 0.012*"方式" + 0.012*"空白" + 0.012*"情形" + 0.012*"第一項" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:37,539 : INFO : topic #2 (0.250): 0.046*"工作" + 0.023*"方式" + 0.015*"小時" + 0.015*"時間" + 0.012*"工資" + 0.012*"依法" + 0.012*"每日" + 0.012*"內容" + 0.012*"推定" + 0.010*"單位" 2025-04-19 00:08:37,540 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"工作" + 0.005*"台灣" + 0.005*"美國" + 0.004*"技術" + 0.004*"資料" + 0.004*"時間" + 0.004*"問題" + 0.004*"目前" + 0.003*"產品" 2025-04-19 00:08:37,540 : INFO : topic diff=0.338915, rho=0.313805 2025-04-19 00:08:37,541 : INFO : PROGRESS: pass 1, at document #8000/16310 2025-04-19 00:08:37,777 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:37,780 : INFO : topic #0 (0.250): 0.023*"報名" + 0.020*"電話" + 0.017*"台北市" + 0.015*"活動" + 0.012*"聯絡" + 0.012*"內容" + 0.011*"通知" + 0.011*"人數" + 0.011*"地點" + 0.010*"方式" 2025-04-19 00:08:37,780 : INFO : topic #1 (0.250): 0.032*"工作" + 0.013*"推定" + 0.013*"砍除" + 0.012*"方式" + 0.012*"空白" + 0.012*"情形" + 0.012*"第一項" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:37,781 : INFO : topic #2 (0.250): 0.049*"工作" + 0.023*"方式" + 0.017*"小時" + 0.016*"時間" + 0.012*"每日" + 0.012*"內容" + 0.011*"工資" + 0.010*"依法" + 0.010*"推定" + 0.010*"休息" 2025-04-19 00:08:37,781 : INFO : topic #3 (0.250): 0.015*"公司" + 0.009*"工作" + 0.006*"面試" + 0.005*"問題" + 0.005*"工程師" + 0.005*"技術" + 0.004*"時間" + 0.004*"開發" + 0.004*"目前" + 0.004*"台灣" 2025-04-19 00:08:37,782 : INFO : topic diff=0.340004, rho=0.313805 2025-04-19 00:08:37,782 : INFO : PROGRESS: pass 1, at document #10000/16310 2025-04-19 00:08:37,986 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:37,988 : INFO : topic #0 (0.250): 0.024*"報名" + 0.019*"電話" + 0.016*"台北市" + 0.016*"活動" + 0.012*"聯絡" + 0.011*"內容" + 0.011*"通知" + 0.011*"地點" + 0.010*"人數" + 0.010*"方式" 2025-04-19 00:08:37,989 : INFO : topic #1 (0.250): 0.032*"工作" + 0.013*"推定" + 0.013*"砍除" + 0.012*"方式" + 0.012*"空白" + 0.012*"情形" + 0.012*"第一項" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:37,989 : INFO : topic #2 (0.250): 0.051*"工作" + 0.023*"方式" + 0.017*"小時" + 0.017*"時間" + 0.012*"每日" + 0.012*"內容" + 0.010*"聯絡" + 0.010*"工資" + 0.010*"休息" + 0.009*"工時" 2025-04-19 00:08:37,990 : INFO : topic #3 (0.250): 0.015*"公司" + 0.010*"工作" + 0.007*"面試" + 0.006*"問題" + 0.005*"工程師" + 0.005*"開發" + 0.005*"時間" + 0.005*"目前" + 0.004*"技術" + 0.004*"經驗" 2025-04-19 00:08:37,990 : INFO : topic diff=0.280427, rho=0.313805 2025-04-19 00:08:37,990 : INFO : PROGRESS: pass 1, at document #12000/16310 2025-04-19 00:08:38,177 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:38,179 : INFO : topic #0 (0.250): 0.025*"報名" + 0.018*"活動" + 0.018*"電話" + 0.015*"台北市" + 0.012*"聯絡" + 0.011*"通知" + 0.011*"內容" + 0.011*"地點" + 0.010*"人數" + 0.010*"方式" 2025-04-19 00:08:38,179 : INFO : topic #1 (0.250): 0.032*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"方式" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:38,180 : INFO : topic #2 (0.250): 0.051*"工作" + 0.023*"方式" + 0.017*"小時" + 0.017*"時間" + 0.012*"每日" + 0.011*"內容" + 0.010*"工時" + 0.010*"聯絡" + 0.009*"休息" + 0.009*"工資" 2025-04-19 00:08:38,180 : INFO : topic #3 (0.250): 0.014*"公司" + 0.009*"工作" + 0.006*"面試" + 0.005*"問題" + 0.005*"工程師" + 0.004*"開發" + 0.004*"技術" + 0.004*"台灣" + 0.004*"目前" + 0.004*"時間" 2025-04-19 00:08:38,181 : INFO : topic diff=0.293272, rho=0.313805 2025-04-19 00:08:38,181 : INFO : PROGRESS: pass 1, at document #14000/16310 2025-04-19 00:08:38,359 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:38,361 : INFO : topic #0 (0.250): 0.025*"報名" + 0.017*"活動" + 0.017*"電話" + 0.014*"台北市" + 0.012*"聯絡" + 0.011*"通知" + 0.010*"地點" + 0.010*"人數" + 0.010*"內容" + 0.010*"方式" 2025-04-19 00:08:38,362 : INFO : topic #1 (0.250): 0.032*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"方式" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:38,362 : INFO : topic #2 (0.250): 0.050*"工作" + 0.022*"方式" + 0.017*"小時" + 0.017*"時間" + 0.011*"每日" + 0.011*"內容" + 0.010*"工時" + 0.009*"聯絡" + 0.009*"工資" + 0.009*"休息" 2025-04-19 00:08:38,363 : INFO : topic #3 (0.250): 0.012*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.004*"問題" + 0.004*"面試" + 0.004*"工程師" + 0.004*"技術" + 0.004*"目前" + 0.003*"員工" + 0.003*"美國" 2025-04-19 00:08:38,363 : INFO : topic diff=0.288521, rho=0.313805 2025-04-19 00:08:38,364 : INFO : PROGRESS: pass 1, at document #16000/16310 2025-04-19 00:08:38,539 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:38,541 : INFO : topic #0 (0.250): 0.026*"報名" + 0.017*"活動" + 0.015*"電話" + 0.013*"台北市" + 0.011*"聯絡" + 0.011*"問卷" + 0.010*"通知" + 0.010*"地點" + 0.010*"人數" + 0.009*"方式" 2025-04-19 00:08:38,541 : INFO : topic #1 (0.250): 0.032*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"方式" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:38,542 : INFO : topic #2 (0.250): 0.050*"工作" + 0.021*"方式" + 0.017*"小時" + 0.016*"時間" + 0.011*"內容" + 0.011*"工時" + 0.010*"每日" + 0.009*"工資" + 0.009*"聯絡" + 0.009*"單位" 2025-04-19 00:08:38,542 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"工作" + 0.006*"台灣" + 0.005*"美國" + 0.004*"技術" + 0.004*"晶片" + 0.004*"問題" + 0.004*"工程師" + 0.004*"員工" + 0.003*"表示" 2025-04-19 00:08:38,543 : INFO : topic diff=0.255317, rho=0.313805 2025-04-19 00:08:38,606 : INFO : -8.459 per-word bound, 351.9 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:08:38,606 : INFO : PROGRESS: pass 1, at document #16310/16310 2025-04-19 00:08:38,658 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:08:38,660 : INFO : topic #0 (0.250): 0.025*"報名" + 0.021*"問卷" + 0.017*"活動" + 0.013*"電話" + 0.012*"研究" + 0.012*"台北市" + 0.010*"聯絡" + 0.010*"地點" + 0.010*"填寫" + 0.009*"工作" 2025-04-19 00:08:38,661 : INFO : topic #1 (0.250): 0.032*"工作" + 0.013*"推定" + 0.012*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:38,661 : INFO : topic #2 (0.250): 0.051*"工作" + 0.020*"小時" + 0.019*"方式" + 0.016*"時間" + 0.014*"工時" + 0.011*"內容" + 0.010*"每日" + 0.009*"聯絡" + 0.009*"單位" + 0.008*"地點" 2025-04-19 00:08:38,662 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"美國" + 0.006*"台灣" + 0.006*"工作" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.004*"台積電" 2025-04-19 00:08:38,662 : INFO : topic diff=0.275234, rho=0.313805 2025-04-19 00:08:38,663 : INFO : PROGRESS: pass 2, at document #2000/16310 2025-04-19 00:08:39,062 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:39,064 : INFO : topic #0 (0.250): 0.029*"報名" + 0.023*"活動" + 0.020*"電話" + 0.016*"台北市" + 0.013*"舉辦" + 0.012*"車馬費" + 0.012*"人數" + 0.012*"通知" + 0.011*"聯絡" + 0.011*"地點" 2025-04-19 00:08:39,064 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.012*"砍除" + 0.012*"方式" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:39,065 : INFO : topic #2 (0.250): 0.047*"工作" + 0.023*"方式" + 0.016*"小時" + 0.016*"時間" + 0.012*"工資" + 0.011*"每日" + 0.011*"內容" + 0.011*"依法" + 0.010*"休息" + 0.010*"單位" 2025-04-19 00:08:39,065 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.006*"工作" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.004*"問題" 2025-04-19 00:08:39,066 : INFO : topic diff=0.906902, rho=0.299409 2025-04-19 00:08:39,066 : INFO : PROGRESS: pass 2, at document #4000/16310 2025-04-19 00:08:39,457 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:39,460 : INFO : topic #0 (0.250): 0.028*"報名" + 0.023*"活動" + 0.021*"電話" + 0.016*"台北市" + 0.012*"車馬費" + 0.012*"人數" + 0.012*"舉辦" + 0.012*"通知" + 0.011*"聯絡" + 0.011*"地點" 2025-04-19 00:08:39,460 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:39,461 : INFO : topic #2 (0.250): 0.047*"工作" + 0.023*"方式" + 0.016*"小時" + 0.016*"時間" + 0.013*"工資" + 0.012*"每日" + 0.012*"依法" + 0.011*"推定" + 0.011*"內容" + 0.011*"單位" 2025-04-19 00:08:39,461 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.006*"工作" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"問題" + 0.004*"表示" 2025-04-19 00:08:39,461 : INFO : topic diff=0.390145, rho=0.299409 2025-04-19 00:08:39,462 : INFO : PROGRESS: pass 2, at document #6000/16310 2025-04-19 00:08:39,807 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:39,809 : INFO : topic #0 (0.250): 0.030*"報名" + 0.025*"活動" + 0.022*"電話" + 0.017*"台北市" + 0.013*"車馬費" + 0.013*"人數" + 0.012*"舉辦" + 0.012*"訪問" + 0.012*"通知" + 0.011*"聯絡" 2025-04-19 00:08:39,810 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:39,810 : INFO : topic #2 (0.250): 0.047*"工作" + 0.024*"方式" + 0.016*"小時" + 0.016*"時間" + 0.012*"工資" + 0.012*"每日" + 0.012*"依法" + 0.011*"內容" + 0.011*"推定" + 0.011*"休息" 2025-04-19 00:08:39,811 : INFO : topic #3 (0.250): 0.013*"公司" + 0.006*"工作" + 0.006*"台灣" + 0.005*"美國" + 0.004*"技術" + 0.004*"問題" + 0.004*"工程師" + 0.004*"員工" + 0.004*"晶片" + 0.003*"科技" 2025-04-19 00:08:39,811 : INFO : topic diff=0.278405, rho=0.299409 2025-04-19 00:08:39,811 : INFO : PROGRESS: pass 2, at document #8000/16310 2025-04-19 00:08:40,056 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:40,058 : INFO : topic #0 (0.250): 0.030*"報名" + 0.025*"活動" + 0.021*"電話" + 0.016*"台北市" + 0.013*"車馬費" + 0.012*"舉辦" + 0.012*"人數" + 0.011*"通知" + 0.011*"訪問" + 0.011*"資料" 2025-04-19 00:08:40,059 : INFO : topic #1 (0.250): 0.032*"工作" + 0.013*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:40,059 : INFO : topic #2 (0.250): 0.050*"工作" + 0.024*"方式" + 0.017*"時間" + 0.017*"小時" + 0.012*"每日" + 0.012*"內容" + 0.011*"工資" + 0.010*"依法" + 0.010*"休息" + 0.010*"推定" 2025-04-19 00:08:40,060 : INFO : topic #3 (0.250): 0.015*"公司" + 0.009*"工作" + 0.006*"面試" + 0.006*"問題" + 0.005*"工程師" + 0.005*"技術" + 0.004*"開發" + 0.004*"台灣" + 0.004*"目前" + 0.004*"經驗" 2025-04-19 00:08:40,060 : INFO : topic diff=0.317855, rho=0.299409 2025-04-19 00:08:40,060 : INFO : PROGRESS: pass 2, at document #10000/16310 2025-04-19 00:08:40,267 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:40,269 : INFO : topic #0 (0.250): 0.031*"報名" + 0.025*"活動" + 0.020*"電話" + 0.015*"台北市" + 0.012*"車馬費" + 0.012*"舉辦" + 0.012*"人數" + 0.011*"通知" + 0.011*"資料" + 0.011*"聯絡" 2025-04-19 00:08:40,270 : INFO : topic #1 (0.250): 0.032*"工作" + 0.013*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:40,270 : INFO : topic #2 (0.250): 0.051*"工作" + 0.024*"方式" + 0.018*"時間" + 0.018*"小時" + 0.012*"每日" + 0.012*"內容" + 0.010*"工時" + 0.010*"聯絡" + 0.009*"休息" + 0.009*"工資" 2025-04-19 00:08:40,271 : INFO : topic #3 (0.250): 0.016*"公司" + 0.010*"工作" + 0.007*"面試" + 0.006*"問題" + 0.005*"工程師" + 0.005*"開發" + 0.005*"技術" + 0.005*"目前" + 0.004*"經驗" + 0.004*"比較" 2025-04-19 00:08:40,271 : INFO : topic diff=0.265873, rho=0.299409 2025-04-19 00:08:40,271 : INFO : PROGRESS: pass 2, at document #12000/16310 2025-04-19 00:08:40,490 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:40,492 : INFO : topic #0 (0.250): 0.031*"報名" + 0.027*"活動" + 0.019*"電話" + 0.014*"台北市" + 0.012*"研究" + 0.012*"舉辦" + 0.012*"人數" + 0.011*"問卷" + 0.011*"車馬費" + 0.011*"參加" 2025-04-19 00:08:40,493 : INFO : topic #1 (0.250): 0.032*"工作" + 0.013*"推定" + 0.013*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:40,494 : INFO : topic #2 (0.250): 0.051*"工作" + 0.023*"方式" + 0.018*"時間" + 0.018*"小時" + 0.011*"每日" + 0.011*"內容" + 0.011*"工時" + 0.010*"聯絡" + 0.009*"休息" + 0.009*"工資" 2025-04-19 00:08:40,494 : INFO : topic #3 (0.250): 0.014*"公司" + 0.009*"工作" + 0.006*"面試" + 0.006*"問題" + 0.005*"工程師" + 0.005*"開發" + 0.004*"技術" + 0.004*"台灣" + 0.004*"目前" + 0.004*"比較" 2025-04-19 00:08:40,494 : INFO : topic diff=0.274682, rho=0.299409 2025-04-19 00:08:40,495 : INFO : PROGRESS: pass 2, at document #14000/16310 2025-04-19 00:08:40,682 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:40,684 : INFO : topic #0 (0.250): 0.030*"報名" + 0.026*"活動" + 0.018*"電話" + 0.014*"研究" + 0.014*"問卷" + 0.013*"台北市" + 0.012*"舉辦" + 0.011*"人數" + 0.011*"參加" + 0.010*"通知" 2025-04-19 00:08:40,684 : INFO : topic #1 (0.250): 0.032*"工作" + 0.013*"推定" + 0.013*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.012*"情形" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:40,685 : INFO : topic #2 (0.250): 0.050*"工作" + 0.023*"方式" + 0.018*"時間" + 0.017*"小時" + 0.011*"內容" + 0.011*"每日" + 0.011*"工時" + 0.009*"聯絡" + 0.009*"休息" + 0.009*"單位" 2025-04-19 00:08:40,685 : INFO : topic #3 (0.250): 0.013*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.005*"問題" + 0.005*"面試" + 0.004*"工程師" + 0.004*"技術" + 0.004*"目前" + 0.003*"開發" + 0.003*"員工" 2025-04-19 00:08:40,686 : INFO : topic diff=0.273865, rho=0.299409 2025-04-19 00:08:40,686 : INFO : PROGRESS: pass 2, at document #16000/16310 2025-04-19 00:08:40,865 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:40,867 : INFO : topic #0 (0.250): 0.030*"報名" + 0.027*"活動" + 0.016*"研究" + 0.016*"電話" + 0.014*"問卷" + 0.012*"台北市" + 0.011*"舉辦" + 0.011*"人數" + 0.011*"參加" + 0.010*"通知" 2025-04-19 00:08:40,867 : INFO : topic #1 (0.250): 0.032*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.012*"情形" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:40,868 : INFO : topic #2 (0.250): 0.050*"工作" + 0.021*"方式" + 0.017*"小時" + 0.017*"時間" + 0.012*"工時" + 0.011*"內容" + 0.010*"每日" + 0.009*"地點" + 0.009*"單位" + 0.008*"聯絡" 2025-04-19 00:08:40,868 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"工作" + 0.006*"台灣" + 0.005*"美國" + 0.004*"技術" + 0.004*"晶片" + 0.004*"問題" + 0.004*"工程師" + 0.004*"員工" + 0.004*"面試" 2025-04-19 00:08:40,869 : INFO : topic diff=0.244578, rho=0.299409 2025-04-19 00:08:40,933 : INFO : -8.443 per-word bound, 348.1 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:08:40,934 : INFO : PROGRESS: pass 2, at document #16310/16310 2025-04-19 00:08:40,964 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:08:40,966 : INFO : topic #0 (0.250): 0.028*"報名" + 0.025*"活動" + 0.020*"研究" + 0.020*"問卷" + 0.013*"電話" + 0.013*"台北市" + 0.011*"時間" + 0.011*"舉辦" + 0.010*"人數" + 0.010*"參與" 2025-04-19 00:08:40,966 : INFO : topic #1 (0.250): 0.032*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.012*"情形" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:40,967 : INFO : topic #2 (0.250): 0.051*"工作" + 0.019*"小時" + 0.019*"方式" + 0.017*"時間" + 0.014*"工時" + 0.011*"內容" + 0.010*"每日" + 0.009*"聯絡" + 0.009*"地點" + 0.008*"單位" 2025-04-19 00:08:40,967 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.006*"工作" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.004*"問題" 2025-04-19 00:08:40,967 : INFO : topic diff=0.262645, rho=0.299409 2025-04-19 00:08:40,968 : INFO : PROGRESS: pass 3, at document #2000/16310 2025-04-19 00:08:41,339 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:41,341 : INFO : topic #0 (0.250): 0.029*"報名" + 0.025*"活動" + 0.019*"電話" + 0.016*"台北市" + 0.013*"舉辦" + 0.012*"參與" + 0.012*"車馬費" + 0.012*"人數" + 0.012*"研究" + 0.011*"時間" 2025-04-19 00:08:41,342 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:41,342 : INFO : topic #2 (0.250): 0.048*"工作" + 0.023*"方式" + 0.017*"小時" + 0.016*"時間" + 0.011*"每日" + 0.011*"工資" + 0.011*"內容" + 0.010*"依法" + 0.010*"工時" + 0.010*"休息" 2025-04-19 00:08:41,343 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.006*"工作" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.004*"問題" 2025-04-19 00:08:41,343 : INFO : topic diff=0.779662, rho=0.286829 2025-04-19 00:08:41,343 : INFO : PROGRESS: pass 3, at document #4000/16310 2025-04-19 00:08:41,716 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:41,718 : INFO : topic #0 (0.250): 0.029*"報名" + 0.025*"活動" + 0.021*"電話" + 0.016*"台北市" + 0.013*"舉辦" + 0.013*"車馬費" + 0.012*"人數" + 0.011*"通知" + 0.011*"資料" + 0.011*"時間" 2025-04-19 00:08:41,718 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:41,721 : INFO : topic #2 (0.250): 0.047*"工作" + 0.023*"方式" + 0.016*"小時" + 0.016*"時間" + 0.012*"工資" + 0.012*"每日" + 0.011*"依法" + 0.011*"內容" + 0.011*"推定" + 0.010*"單位" 2025-04-19 00:08:41,726 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.006*"工作" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"問題" + 0.004*"表示" 2025-04-19 00:08:41,730 : INFO : topic diff=0.353576, rho=0.286829 2025-04-19 00:08:41,736 : INFO : PROGRESS: pass 3, at document #6000/16310 2025-04-19 00:08:42,074 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:42,076 : INFO : topic #0 (0.250): 0.031*"報名" + 0.026*"活動" + 0.021*"電話" + 0.016*"台北市" + 0.013*"車馬費" + 0.013*"舉辦" + 0.013*"人數" + 0.012*"訪問" + 0.011*"資料" + 0.011*"通知" 2025-04-19 00:08:42,077 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:42,077 : INFO : topic #2 (0.250): 0.047*"工作" + 0.024*"方式" + 0.016*"小時" + 0.016*"時間" + 0.012*"每日" + 0.012*"工資" + 0.012*"依法" + 0.011*"內容" + 0.011*"推定" + 0.010*"休息" 2025-04-19 00:08:42,078 : INFO : topic #3 (0.250): 0.013*"公司" + 0.006*"工作" + 0.006*"台灣" + 0.005*"美國" + 0.005*"技術" + 0.004*"問題" + 0.004*"工程師" + 0.004*"員工" + 0.004*"晶片" + 0.003*"科技" 2025-04-19 00:08:42,078 : INFO : topic diff=0.258180, rho=0.286829 2025-04-19 00:08:42,078 : INFO : PROGRESS: pass 3, at document #8000/16310 2025-04-19 00:08:42,317 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:42,319 : INFO : topic #0 (0.250): 0.030*"報名" + 0.026*"活動" + 0.021*"電話" + 0.016*"台北市" + 0.013*"舉辦" + 0.013*"車馬費" + 0.012*"人數" + 0.012*"資料" + 0.011*"訪問" + 0.011*"通知" 2025-04-19 00:08:42,319 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:42,320 : INFO : topic #2 (0.250): 0.049*"工作" + 0.024*"方式" + 0.017*"時間" + 0.017*"小時" + 0.012*"每日" + 0.011*"內容" + 0.010*"工資" + 0.010*"依法" + 0.010*"休息" + 0.009*"工時" 2025-04-19 00:08:42,321 : INFO : topic #3 (0.250): 0.015*"公司" + 0.008*"工作" + 0.006*"面試" + 0.006*"問題" + 0.005*"工程師" + 0.005*"技術" + 0.005*"開發" + 0.004*"台灣" + 0.004*"目前" + 0.004*"經驗" 2025-04-19 00:08:42,321 : INFO : topic diff=0.298122, rho=0.286829 2025-04-19 00:08:42,321 : INFO : PROGRESS: pass 3, at document #10000/16310 2025-04-19 00:08:42,526 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:42,528 : INFO : topic #0 (0.250): 0.031*"報名" + 0.026*"活動" + 0.020*"電話" + 0.015*"台北市" + 0.012*"舉辦" + 0.012*"車馬費" + 0.012*"人數" + 0.012*"研究" + 0.012*"資料" + 0.011*"參加" 2025-04-19 00:08:42,529 : INFO : topic #1 (0.250): 0.032*"工作" + 0.013*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:42,529 : INFO : topic #2 (0.250): 0.050*"工作" + 0.023*"方式" + 0.018*"時間" + 0.017*"小時" + 0.011*"內容" + 0.011*"每日" + 0.010*"工時" + 0.009*"聯絡" + 0.009*"休息" + 0.009*"工資" 2025-04-19 00:08:42,529 : INFO : topic #3 (0.250): 0.016*"公司" + 0.009*"工作" + 0.007*"面試" + 0.006*"問題" + 0.006*"工程師" + 0.005*"開發" + 0.005*"技術" + 0.005*"目前" + 0.004*"比較" + 0.004*"經驗" 2025-04-19 00:08:42,530 : INFO : topic diff=0.250906, rho=0.286829 2025-04-19 00:08:42,530 : INFO : PROGRESS: pass 3, at document #12000/16310 2025-04-19 00:08:42,725 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:42,727 : INFO : topic #0 (0.250): 0.031*"報名" + 0.028*"活動" + 0.018*"電話" + 0.014*"台北市" + 0.013*"研究" + 0.013*"舉辦" + 0.011*"問卷" + 0.011*"人數" + 0.011*"參加" + 0.011*"資料" 2025-04-19 00:08:42,728 : INFO : topic #1 (0.250): 0.032*"工作" + 0.013*"推定" + 0.013*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:42,728 : INFO : topic #2 (0.250): 0.050*"工作" + 0.023*"方式" + 0.018*"時間" + 0.017*"小時" + 0.011*"內容" + 0.011*"每日" + 0.011*"工時" + 0.009*"聯絡" + 0.009*"休息" + 0.008*"工資" 2025-04-19 00:08:42,729 : INFO : topic #3 (0.250): 0.014*"公司" + 0.008*"工作" + 0.006*"面試" + 0.006*"問題" + 0.005*"工程師" + 0.005*"開發" + 0.004*"技術" + 0.004*"台灣" + 0.004*"目前" + 0.004*"比較" 2025-04-19 00:08:42,729 : INFO : topic diff=0.259103, rho=0.286829 2025-04-19 00:08:42,729 : INFO : PROGRESS: pass 3, at document #14000/16310 2025-04-19 00:08:42,918 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:42,920 : INFO : topic #0 (0.250): 0.030*"報名" + 0.028*"活動" + 0.017*"電話" + 0.015*"研究" + 0.013*"問卷" + 0.013*"台北市" + 0.012*"舉辦" + 0.011*"人數" + 0.011*"參加" + 0.011*"資料" 2025-04-19 00:08:42,920 : INFO : topic #1 (0.250): 0.032*"工作" + 0.013*"推定" + 0.013*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:42,921 : INFO : topic #2 (0.250): 0.050*"工作" + 0.022*"方式" + 0.018*"時間" + 0.017*"小時" + 0.011*"內容" + 0.011*"工時" + 0.010*"每日" + 0.009*"聯絡" + 0.008*"地點" + 0.008*"休息" 2025-04-19 00:08:42,921 : INFO : topic #3 (0.250): 0.013*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.005*"問題" + 0.005*"面試" + 0.004*"工程師" + 0.004*"技術" + 0.004*"目前" + 0.003*"開發" + 0.003*"美國" 2025-04-19 00:08:42,922 : INFO : topic diff=0.260128, rho=0.286829 2025-04-19 00:08:42,922 : INFO : PROGRESS: pass 3, at document #16000/16310 2025-04-19 00:08:43,102 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:43,104 : INFO : topic #0 (0.250): 0.029*"報名" + 0.028*"活動" + 0.017*"研究" + 0.015*"電話" + 0.014*"問卷" + 0.012*"舉辦" + 0.012*"台北市" + 0.011*"參加" + 0.011*"人數" + 0.010*"參與" 2025-04-19 00:08:43,105 : INFO : topic #1 (0.250): 0.032*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.012*"情形" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:43,105 : INFO : topic #2 (0.250): 0.049*"工作" + 0.021*"方式" + 0.017*"時間" + 0.017*"小時" + 0.012*"工時" + 0.011*"內容" + 0.010*"每日" + 0.009*"地點" + 0.008*"單位" + 0.008*"聯絡" 2025-04-19 00:08:43,106 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"工作" + 0.006*"台灣" + 0.005*"美國" + 0.004*"技術" + 0.004*"問題" + 0.004*"晶片" + 0.004*"工程師" + 0.004*"員工" + 0.004*"面試" 2025-04-19 00:08:43,106 : INFO : topic diff=0.233232, rho=0.286829 2025-04-19 00:08:43,193 : INFO : -8.438 per-word bound, 346.7 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:08:43,193 : INFO : PROGRESS: pass 3, at document #16310/16310 2025-04-19 00:08:43,223 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:08:43,225 : INFO : topic #0 (0.250): 0.027*"報名" + 0.027*"活動" + 0.020*"研究" + 0.018*"問卷" + 0.014*"電話" + 0.012*"台北市" + 0.011*"舉辦" + 0.011*"時間" + 0.010*"參與" + 0.010*"人數" 2025-04-19 00:08:43,225 : INFO : topic #1 (0.250): 0.032*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.012*"情形" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:43,226 : INFO : topic #2 (0.250): 0.050*"工作" + 0.019*"方式" + 0.019*"小時" + 0.017*"時間" + 0.014*"工時" + 0.011*"內容" + 0.009*"每日" + 0.009*"地點" + 0.009*"聯絡" + 0.008*"單位" 2025-04-19 00:08:43,226 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.006*"工作" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"問題" + 0.004*"表示" 2025-04-19 00:08:43,227 : INFO : topic diff=0.249669, rho=0.286829 2025-04-19 00:08:43,227 : INFO : PROGRESS: pass 4, at document #2000/16310 2025-04-19 00:08:43,587 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:43,589 : INFO : topic #0 (0.250): 0.029*"報名" + 0.026*"活動" + 0.019*"電話" + 0.015*"台北市" + 0.014*"舉辦" + 0.012*"研究" + 0.012*"參與" + 0.012*"人數" + 0.012*"車馬費" + 0.011*"時間" 2025-04-19 00:08:43,590 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:43,590 : INFO : topic #2 (0.250): 0.048*"工作" + 0.022*"方式" + 0.017*"時間" + 0.017*"小時" + 0.011*"內容" + 0.011*"每日" + 0.011*"工資" + 0.010*"工時" + 0.010*"依法" + 0.010*"休息" 2025-04-19 00:08:43,591 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.006*"工作" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"問題" + 0.004*"表示" 2025-04-19 00:08:43,591 : INFO : topic diff=0.711553, rho=0.275711 2025-04-19 00:08:43,591 : INFO : PROGRESS: pass 4, at document #4000/16310 2025-04-19 00:08:43,952 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:43,954 : INFO : topic #0 (0.250): 0.029*"報名" + 0.026*"活動" + 0.020*"電話" + 0.015*"台北市" + 0.013*"舉辦" + 0.013*"車馬費" + 0.012*"人數" + 0.011*"資料" + 0.011*"參與" + 0.011*"通知" 2025-04-19 00:08:43,955 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:43,955 : INFO : topic #2 (0.250): 0.047*"工作" + 0.023*"方式" + 0.016*"小時" + 0.016*"時間" + 0.012*"工資" + 0.012*"每日" + 0.011*"內容" + 0.011*"依法" + 0.010*"休息" + 0.010*"推定" 2025-04-19 00:08:43,956 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.006*"工作" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"問題" + 0.004*"表示" 2025-04-19 00:08:43,956 : INFO : topic diff=0.335299, rho=0.275711 2025-04-19 00:08:43,956 : INFO : PROGRESS: pass 4, at document #6000/16310 2025-04-19 00:08:44,279 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:44,281 : INFO : topic #0 (0.250): 0.031*"報名" + 0.027*"活動" + 0.021*"電話" + 0.016*"台北市" + 0.013*"車馬費" + 0.013*"舉辦" + 0.013*"人數" + 0.012*"訪問" + 0.012*"資料" + 0.011*"通知" 2025-04-19 00:08:44,281 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:08:44,282 : INFO : topic #2 (0.250): 0.047*"工作" + 0.023*"方式" + 0.016*"小時" + 0.016*"時間" + 0.012*"每日" + 0.012*"工資" + 0.011*"依法" + 0.011*"內容" + 0.010*"推定" + 0.010*"休息" 2025-04-19 00:08:44,282 : INFO : topic #3 (0.250): 0.013*"公司" + 0.006*"工作" + 0.006*"台灣" + 0.005*"美國" + 0.005*"技術" + 0.004*"問題" + 0.004*"工程師" + 0.004*"員工" + 0.004*"晶片" + 0.003*"科技" 2025-04-19 00:08:44,282 : INFO : topic diff=0.246474, rho=0.275711 2025-04-19 00:08:44,283 : INFO : PROGRESS: pass 4, at document #8000/16310 2025-04-19 00:08:44,520 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:44,522 : INFO : topic #0 (0.250): 0.030*"報名" + 0.027*"活動" + 0.021*"電話" + 0.016*"台北市" + 0.013*"舉辦" + 0.013*"車馬費" + 0.012*"人數" + 0.012*"資料" + 0.011*"訪問" + 0.011*"參加" 2025-04-19 00:08:44,522 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:44,523 : INFO : topic #2 (0.250): 0.049*"工作" + 0.023*"方式" + 0.017*"時間" + 0.017*"小時" + 0.011*"內容" + 0.011*"每日" + 0.010*"工資" + 0.010*"依法" + 0.010*"休息" + 0.009*"工時" 2025-04-19 00:08:44,529 : INFO : topic #3 (0.250): 0.015*"公司" + 0.008*"工作" + 0.006*"面試" + 0.006*"問題" + 0.005*"工程師" + 0.005*"技術" + 0.005*"台灣" + 0.004*"開發" + 0.004*"目前" + 0.004*"比較" 2025-04-19 00:08:44,532 : INFO : topic diff=0.281795, rho=0.275711 2025-04-19 00:08:44,536 : INFO : PROGRESS: pass 4, at document #10000/16310 2025-04-19 00:08:44,753 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:44,755 : INFO : topic #0 (0.250): 0.031*"報名" + 0.027*"活動" + 0.020*"電話" + 0.015*"台北市" + 0.012*"舉辦" + 0.012*"研究" + 0.012*"車馬費" + 0.012*"資料" + 0.012*"人數" + 0.011*"參加" 2025-04-19 00:08:44,755 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:44,756 : INFO : topic #2 (0.250): 0.050*"工作" + 0.023*"方式" + 0.018*"時間" + 0.017*"小時" + 0.011*"內容" + 0.011*"每日" + 0.010*"工時" + 0.009*"聯絡" + 0.009*"休息" + 0.009*"工資" 2025-04-19 00:08:44,756 : INFO : topic #3 (0.250): 0.016*"公司" + 0.009*"工作" + 0.007*"面試" + 0.006*"問題" + 0.006*"工程師" + 0.005*"開發" + 0.005*"技術" + 0.005*"目前" + 0.004*"比較" + 0.004*"台灣" 2025-04-19 00:08:44,757 : INFO : topic diff=0.238386, rho=0.275711 2025-04-19 00:08:44,757 : INFO : PROGRESS: pass 4, at document #12000/16310 2025-04-19 00:08:44,951 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:44,953 : INFO : topic #0 (0.250): 0.031*"報名" + 0.029*"活動" + 0.018*"電話" + 0.014*"研究" + 0.013*"台北市" + 0.013*"舉辦" + 0.012*"參加" + 0.011*"人數" + 0.011*"資料" + 0.011*"問卷" 2025-04-19 00:08:44,954 : INFO : topic #1 (0.250): 0.032*"工作" + 0.013*"推定" + 0.013*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:44,954 : INFO : topic #2 (0.250): 0.049*"工作" + 0.022*"方式" + 0.018*"時間" + 0.017*"小時" + 0.011*"內容" + 0.011*"每日" + 0.011*"工時" + 0.009*"聯絡" + 0.008*"休息" + 0.008*"工資" 2025-04-19 00:08:44,955 : INFO : topic #3 (0.250): 0.014*"公司" + 0.008*"工作" + 0.006*"面試" + 0.006*"問題" + 0.005*"工程師" + 0.005*"開發" + 0.004*"技術" + 0.004*"台灣" + 0.004*"目前" + 0.004*"比較" 2025-04-19 00:08:44,955 : INFO : topic diff=0.246169, rho=0.275711 2025-04-19 00:08:44,955 : INFO : PROGRESS: pass 4, at document #14000/16310 2025-04-19 00:08:45,148 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:45,149 : INFO : topic #0 (0.250): 0.029*"報名" + 0.028*"活動" + 0.017*"電話" + 0.015*"研究" + 0.013*"問卷" + 0.012*"台北市" + 0.012*"舉辦" + 0.011*"參加" + 0.011*"人數" + 0.011*"參與" 2025-04-19 00:08:45,150 : INFO : topic #1 (0.250): 0.032*"工作" + 0.013*"推定" + 0.013*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.012*"情形" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:45,151 : INFO : topic #2 (0.250): 0.049*"工作" + 0.021*"方式" + 0.018*"時間" + 0.017*"小時" + 0.011*"內容" + 0.011*"工時" + 0.010*"每日" + 0.009*"聯絡" + 0.008*"地點" + 0.008*"休息" 2025-04-19 00:08:45,151 : INFO : topic #3 (0.250): 0.013*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.005*"問題" + 0.005*"面試" + 0.004*"工程師" + 0.004*"技術" + 0.004*"目前" + 0.004*"開發" + 0.003*"美國" 2025-04-19 00:08:45,151 : INFO : topic diff=0.248548, rho=0.275711 2025-04-19 00:08:45,152 : INFO : PROGRESS: pass 4, at document #16000/16310 2025-04-19 00:08:45,331 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:45,333 : INFO : topic #0 (0.250): 0.029*"報名" + 0.029*"活動" + 0.017*"研究" + 0.016*"電話" + 0.013*"問卷" + 0.012*"舉辦" + 0.012*"台北市" + 0.011*"參加" + 0.011*"人數" + 0.010*"參與" 2025-04-19 00:08:45,333 : INFO : topic #1 (0.250): 0.032*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.012*"情形" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:45,334 : INFO : topic #2 (0.250): 0.049*"工作" + 0.020*"方式" + 0.017*"時間" + 0.017*"小時" + 0.012*"工時" + 0.011*"內容" + 0.009*"每日" + 0.009*"地點" + 0.008*"單位" + 0.008*"聯絡" 2025-04-19 00:08:45,334 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"工作" + 0.006*"台灣" + 0.005*"美國" + 0.004*"技術" + 0.004*"問題" + 0.004*"工程師" + 0.004*"晶片" + 0.004*"員工" + 0.004*"面試" 2025-04-19 00:08:45,335 : INFO : topic diff=0.223364, rho=0.275711 2025-04-19 00:08:45,398 : INFO : -8.434 per-word bound, 345.8 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:08:45,398 : INFO : PROGRESS: pass 4, at document #16310/16310 2025-04-19 00:08:45,427 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:08:45,429 : INFO : topic #0 (0.250): 0.027*"活動" + 0.026*"報名" + 0.019*"研究" + 0.017*"問卷" + 0.014*"電話" + 0.012*"台北市" + 0.011*"舉辦" + 0.011*"時間" + 0.011*"參與" + 0.010*"人數" 2025-04-19 00:08:45,429 : INFO : topic #1 (0.250): 0.032*"工作" + 0.013*"推定" + 0.012*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.012*"情形" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:08:45,430 : INFO : topic #2 (0.250): 0.050*"工作" + 0.019*"方式" + 0.018*"小時" + 0.017*"時間" + 0.013*"工時" + 0.011*"內容" + 0.009*"每日" + 0.009*"地點" + 0.008*"聯絡" + 0.008*"單位" 2025-04-19 00:08:45,430 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.006*"工作" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"問題" + 0.004*"表示" 2025-04-19 00:08:45,430 : INFO : topic diff=0.238909, rho=0.275711 2025-04-19 00:08:45,431 : INFO : LdaModel lifecycle event {'msg': 'trained LdaModel<num_terms=23261, num_topics=4, decay=0.5, chunksize=2000> in 12.33s', 'datetime': '2025-04-19T00:08:45.431050', 'gensim': '4.3.3', 'python': '3.11.2 (main, Apr 21 2023, 22:51:21) [Clang 14.0.3 (clang-1403.0.22.14.1)]', 'platform': 'macOS-15.3.2-arm64-arm-64bit', 'event': 'created'} 2025-04-19 00:08:49,916 : INFO : -7.041 per-word bound, 131.7 perplexity estimate based on a held-out corpus of 16310 documents with 3460358 words 2025-04-19 00:08:49,918 : INFO : using ParallelWordOccurrenceAccumulator<processes=7, batch_size=64> to estimate probabilities from sliding windows 2025-04-19 00:08:53,359 : INFO : 1 batches submitted to accumulate stats from 64 documents (22660 virtual) 2025-04-19 00:08:53,362 : INFO : 2 batches submitted to accumulate stats from 128 documents (45646 virtual) 2025-04-19 00:08:53,365 : INFO : 3 batches submitted to accumulate stats from 192 documents (67171 virtual) 2025-04-19 00:08:53,367 : INFO : 4 batches submitted to accumulate stats from 256 documents (88330 virtual) 2025-04-19 00:08:53,371 : INFO : 5 batches submitted to accumulate stats from 320 documents (109687 virtual) 2025-04-19 00:08:53,375 : INFO : 6 batches submitted to accumulate stats from 384 documents (131042 virtual) 2025-04-19 00:08:53,378 : INFO : 7 batches submitted to accumulate stats from 448 documents (153774 virtual) 2025-04-19 00:08:53,381 : INFO : 8 batches submitted to accumulate stats from 512 documents (176164 virtual) 2025-04-19 00:08:53,384 : INFO : 9 batches submitted to accumulate stats from 576 documents (197020 virtual) 2025-04-19 00:08:53,388 : INFO : 10 batches submitted to accumulate stats from 640 documents (218505 virtual) 2025-04-19 00:08:53,394 : INFO : 11 batches submitted to accumulate stats from 704 documents (240803 virtual) 2025-04-19 00:08:53,399 : INFO : 12 batches submitted to accumulate stats from 768 documents (265360 virtual) 2025-04-19 00:08:53,404 : INFO : 13 batches submitted to accumulate stats from 832 documents (286615 virtual) 2025-04-19 00:08:53,411 : INFO : 14 batches submitted to accumulate stats from 896 documents (310833 virtual) 2025-04-19 00:08:53,488 : INFO : 15 batches submitted to accumulate stats from 960 documents (331313 virtual) 2025-04-19 00:08:53,498 : INFO : 16 batches submitted to accumulate stats from 1024 documents (350940 virtual) 2025-04-19 00:08:53,505 : INFO : 17 batches submitted to accumulate stats from 1088 documents (368371 virtual) 2025-04-19 00:08:53,510 : INFO : 18 batches submitted to accumulate stats from 1152 documents (390334 virtual) 2025-04-19 00:08:53,514 : INFO : 19 batches submitted to accumulate stats from 1216 documents (414153 virtual) 2025-04-19 00:08:53,520 : INFO : 20 batches submitted to accumulate stats from 1280 documents (435684 virtual) 2025-04-19 00:08:53,563 : INFO : 21 batches submitted to accumulate stats from 1344 documents (459433 virtual) 2025-04-19 00:08:53,620 : INFO : 22 batches submitted to accumulate stats from 1408 documents (483210 virtual) 2025-04-19 00:08:53,624 : INFO : 23 batches submitted to accumulate stats from 1472 documents (507391 virtual) 2025-04-19 00:08:53,637 : INFO : 24 batches submitted to accumulate stats from 1536 documents (527404 virtual) 2025-04-19 00:08:53,645 : INFO : 25 batches submitted to accumulate stats from 1600 documents (550178 virtual) 2025-04-19 00:08:53,662 : INFO : 26 batches submitted to accumulate stats from 1664 documents (575041 virtual) 2025-04-19 00:08:53,667 : INFO : 27 batches submitted to accumulate stats from 1728 documents (598912 virtual) 2025-04-19 00:08:53,700 : INFO : 28 batches submitted to accumulate stats from 1792 documents (622487 virtual) 2025-04-19 00:08:53,741 : INFO : 29 batches submitted to accumulate stats from 1856 documents (648902 virtual) 2025-04-19 00:08:53,745 : INFO : 30 batches submitted to accumulate stats from 1920 documents (671126 virtual) 2025-04-19 00:08:53,756 : INFO : 31 batches submitted to accumulate stats from 1984 documents (693717 virtual) 2025-04-19 00:08:53,802 : INFO : 32 batches submitted to accumulate stats from 2048 documents (714139 virtual) 2025-04-19 00:08:53,808 : INFO : 33 batches submitted to accumulate stats from 2112 documents (736202 virtual) 2025-04-19 00:08:53,846 : INFO : 34 batches submitted to accumulate stats from 2176 documents (758687 virtual) 2025-04-19 00:08:53,894 : INFO : 35 batches submitted to accumulate stats from 2240 documents (779677 virtual) 2025-04-19 00:08:53,898 : INFO : 36 batches submitted to accumulate stats from 2304 documents (800483 virtual) 2025-04-19 00:08:53,908 : INFO : 37 batches submitted to accumulate stats from 2368 documents (821258 virtual) 2025-04-19 00:08:53,927 : INFO : 38 batches submitted to accumulate stats from 2432 documents (844326 virtual) 2025-04-19 00:08:53,967 : INFO : 39 batches submitted to accumulate stats from 2496 documents (868823 virtual) 2025-04-19 00:08:53,979 : INFO : 40 batches submitted to accumulate stats from 2560 documents (888215 virtual) 2025-04-19 00:08:54,011 : INFO : 41 batches submitted to accumulate stats from 2624 documents (910499 virtual) 2025-04-19 00:08:54,039 : INFO : 42 batches submitted to accumulate stats from 2688 documents (931945 virtual) 2025-04-19 00:08:54,066 : INFO : 43 batches submitted to accumulate stats from 2752 documents (954111 virtual) 2025-04-19 00:08:54,070 : INFO : 44 batches submitted to accumulate stats from 2816 documents (975617 virtual) 2025-04-19 00:08:54,076 : INFO : 45 batches submitted to accumulate stats from 2880 documents (995125 virtual) 2025-04-19 00:08:54,101 : INFO : 46 batches submitted to accumulate stats from 2944 documents (1016531 virtual) 2025-04-19 00:08:54,121 : INFO : 47 batches submitted to accumulate stats from 3008 documents (1038247 virtual) 2025-04-19 00:08:54,152 : INFO : 48 batches submitted to accumulate stats from 3072 documents (1063862 virtual) 2025-04-19 00:08:54,159 : INFO : 49 batches submitted to accumulate stats from 3136 documents (1087898 virtual) 2025-04-19 00:08:54,186 : INFO : 50 batches submitted to accumulate stats from 3200 documents (1110531 virtual) 2025-04-19 00:08:54,206 : INFO : 51 batches submitted to accumulate stats from 3264 documents (1133127 virtual) 2025-04-19 00:08:54,226 : INFO : 52 batches submitted to accumulate stats from 3328 documents (1153766 virtual) 2025-04-19 00:08:54,248 : INFO : 53 batches submitted to accumulate stats from 3392 documents (1177684 virtual) 2025-04-19 00:08:54,253 : INFO : 54 batches submitted to accumulate stats from 3456 documents (1200190 virtual) 2025-04-19 00:08:54,284 : INFO : 55 batches submitted to accumulate stats from 3520 documents (1225029 virtual) 2025-04-19 00:08:54,308 : INFO : 56 batches submitted to accumulate stats from 3584 documents (1249662 virtual) 2025-04-19 00:08:54,329 : INFO : 57 batches submitted to accumulate stats from 3648 documents (1274547 virtual) 2025-04-19 00:08:54,347 : INFO : 58 batches submitted to accumulate stats from 3712 documents (1297434 virtual) 2025-04-19 00:08:54,353 : INFO : 59 batches submitted to accumulate stats from 3776 documents (1319261 virtual) 2025-04-19 00:08:54,382 : INFO : 60 batches submitted to accumulate stats from 3840 documents (1341972 virtual) 2025-04-19 00:08:54,398 : INFO : 61 batches submitted to accumulate stats from 3904 documents (1364269 virtual) 2025-04-19 00:08:54,434 : INFO : 62 batches submitted to accumulate stats from 3968 documents (1386796 virtual) 2025-04-19 00:08:54,484 : INFO : 63 batches submitted to accumulate stats from 4032 documents (1410249 virtual) 2025-04-19 00:08:54,493 : INFO : 64 batches submitted to accumulate stats from 4096 documents (1433115 virtual) 2025-04-19 00:08:54,514 : INFO : 65 batches submitted to accumulate stats from 4160 documents (1453873 virtual) 2025-04-19 00:08:54,527 : INFO : 66 batches submitted to accumulate stats from 4224 documents (1475474 virtual) 2025-04-19 00:08:54,547 : INFO : 67 batches submitted to accumulate stats from 4288 documents (1497524 virtual) 2025-04-19 00:08:54,565 : INFO : 68 batches submitted to accumulate stats from 4352 documents (1516835 virtual) 2025-04-19 00:08:54,613 : INFO : 69 batches submitted to accumulate stats from 4416 documents (1536986 virtual) 2025-04-19 00:08:54,645 : INFO : 70 batches submitted to accumulate stats from 4480 documents (1558454 virtual) 2025-04-19 00:08:54,654 : INFO : 71 batches submitted to accumulate stats from 4544 documents (1580610 virtual) 2025-04-19 00:08:54,659 : INFO : 72 batches submitted to accumulate stats from 4608 documents (1603508 virtual) 2025-04-19 00:08:54,667 : INFO : 73 batches submitted to accumulate stats from 4672 documents (1624378 virtual) 2025-04-19 00:08:54,691 : INFO : 74 batches submitted to accumulate stats from 4736 documents (1646402 virtual) 2025-04-19 00:08:54,697 : INFO : 75 batches submitted to accumulate stats from 4800 documents (1668704 virtual) 2025-04-19 00:08:54,741 : INFO : 76 batches submitted to accumulate stats from 4864 documents (1690394 virtual) 2025-04-19 00:08:54,780 : INFO : 77 batches submitted to accumulate stats from 4928 documents (1713028 virtual) 2025-04-19 00:08:54,795 : INFO : 78 batches submitted to accumulate stats from 4992 documents (1735434 virtual) 2025-04-19 00:08:54,802 : INFO : 79 batches submitted to accumulate stats from 5056 documents (1755430 virtual) 2025-04-19 00:08:54,825 : INFO : 80 batches submitted to accumulate stats from 5120 documents (1779164 virtual) 2025-04-19 00:08:54,829 : INFO : 81 batches submitted to accumulate stats from 5184 documents (1799023 virtual) 2025-04-19 00:08:54,835 : INFO : 82 batches submitted to accumulate stats from 5248 documents (1821516 virtual) 2025-04-19 00:08:54,864 : INFO : 83 batches submitted to accumulate stats from 5312 documents (1844224 virtual) 2025-04-19 00:08:54,907 : INFO : 84 batches submitted to accumulate stats from 5376 documents (1864739 virtual) 2025-04-19 00:08:54,937 : INFO : 85 batches submitted to accumulate stats from 5440 documents (1885053 virtual) 2025-04-19 00:08:54,944 : INFO : 86 batches submitted to accumulate stats from 5504 documents (1902170 virtual) 2025-04-19 00:08:54,948 : INFO : 87 batches submitted to accumulate stats from 5568 documents (1924910 virtual) 2025-04-19 00:08:54,969 : INFO : 88 batches submitted to accumulate stats from 5632 documents (1931530 virtual) 2025-04-19 00:08:54,984 : INFO : 89 batches submitted to accumulate stats from 5696 documents (1941414 virtual) 2025-04-19 00:08:55,030 : INFO : 90 batches submitted to accumulate stats from 5760 documents (1950642 virtual) 2025-04-19 00:08:55,069 : INFO : 91 batches submitted to accumulate stats from 5824 documents (1957200 virtual) 2025-04-19 00:08:55,079 : INFO : 92 batches submitted to accumulate stats from 5888 documents (1964937 virtual) 2025-04-19 00:08:55,115 : INFO : 93 batches submitted to accumulate stats from 5952 documents (1974259 virtual) 2025-04-19 00:08:55,121 : INFO : 94 batches submitted to accumulate stats from 6016 documents (1988296 virtual) 2025-04-19 00:08:55,146 : INFO : 95 batches submitted to accumulate stats from 6080 documents (1997659 virtual) 2025-04-19 00:08:55,168 : INFO : 96 batches submitted to accumulate stats from 6144 documents (2009678 virtual) 2025-04-19 00:08:55,177 : INFO : 97 batches submitted to accumulate stats from 6208 documents (2019297 virtual) 2025-04-19 00:08:55,188 : INFO : 98 batches submitted to accumulate stats from 6272 documents (2031857 virtual) 2025-04-19 00:08:55,194 : INFO : 99 batches submitted to accumulate stats from 6336 documents (2044117 virtual) 2025-04-19 00:08:55,196 : INFO : 100 batches submitted to accumulate stats from 6400 documents (2053380 virtual) 2025-04-19 00:08:55,210 : INFO : 101 batches submitted to accumulate stats from 6464 documents (2066889 virtual) 2025-04-19 00:08:55,213 : INFO : 102 batches submitted to accumulate stats from 6528 documents (2075479 virtual) 2025-04-19 00:08:55,215 : INFO : 103 batches submitted to accumulate stats from 6592 documents (2085095 virtual) 2025-04-19 00:08:55,217 : INFO : 104 batches submitted to accumulate stats from 6656 documents (2093845 virtual) 2025-04-19 00:08:55,235 : INFO : 105 batches submitted to accumulate stats from 6720 documents (2102407 virtual) 2025-04-19 00:08:55,253 : INFO : 106 batches submitted to accumulate stats from 6784 documents (2111466 virtual) 2025-04-19 00:08:55,257 : INFO : 107 batches submitted to accumulate stats from 6848 documents (2121845 virtual) 2025-04-19 00:08:55,260 : INFO : 108 batches submitted to accumulate stats from 6912 documents (2129219 virtual) 2025-04-19 00:08:55,263 : INFO : 109 batches submitted to accumulate stats from 6976 documents (2137886 virtual) 2025-04-19 00:08:55,265 : INFO : 110 batches submitted to accumulate stats from 7040 documents (2145150 virtual) 2025-04-19 00:08:55,275 : INFO : 111 batches submitted to accumulate stats from 7104 documents (2155495 virtual) 2025-04-19 00:08:55,298 : INFO : 112 batches submitted to accumulate stats from 7168 documents (2164720 virtual) 2025-04-19 00:08:55,300 : INFO : 113 batches submitted to accumulate stats from 7232 documents (2172193 virtual) 2025-04-19 00:08:55,304 : INFO : 114 batches submitted to accumulate stats from 7296 documents (2183458 virtual) 2025-04-19 00:08:55,308 : INFO : 115 batches submitted to accumulate stats from 7360 documents (2191706 virtual) 2025-04-19 00:08:55,313 : INFO : 116 batches submitted to accumulate stats from 7424 documents (2202020 virtual) 2025-04-19 00:08:55,319 : INFO : 117 batches submitted to accumulate stats from 7488 documents (2211055 virtual) 2025-04-19 00:08:55,321 : INFO : 118 batches submitted to accumulate stats from 7552 documents (2223321 virtual) 2025-04-19 00:08:55,344 : INFO : 119 batches submitted to accumulate stats from 7616 documents (2230121 virtual) 2025-04-19 00:08:55,349 : INFO : 120 batches submitted to accumulate stats from 7680 documents (2243511 virtual) 2025-04-19 00:08:55,351 : INFO : 121 batches submitted to accumulate stats from 7744 documents (2258370 virtual) 2025-04-19 00:08:55,352 : INFO : 122 batches submitted to accumulate stats from 7808 documents (2269267 virtual) 2025-04-19 00:08:55,359 : INFO : 123 batches submitted to accumulate stats from 7872 documents (2280490 virtual) 2025-04-19 00:08:55,385 : INFO : 124 batches submitted to accumulate stats from 7936 documents (2289945 virtual) 2025-04-19 00:08:55,399 : INFO : 125 batches submitted to accumulate stats from 8000 documents (2298931 virtual) 2025-04-19 00:08:55,406 : INFO : 126 batches submitted to accumulate stats from 8064 documents (2309719 virtual) 2025-04-19 00:08:55,415 : INFO : 127 batches submitted to accumulate stats from 8128 documents (2320328 virtual) 2025-04-19 00:08:55,424 : INFO : 128 batches submitted to accumulate stats from 8192 documents (2331614 virtual) 2025-04-19 00:08:55,429 : INFO : 129 batches submitted to accumulate stats from 8256 documents (2342568 virtual) 2025-04-19 00:08:55,435 : INFO : 130 batches submitted to accumulate stats from 8320 documents (2351306 virtual) 2025-04-19 00:08:55,436 : INFO : 131 batches submitted to accumulate stats from 8384 documents (2359488 virtual) 2025-04-19 00:08:55,444 : INFO : 132 batches submitted to accumulate stats from 8448 documents (2368497 virtual) 2025-04-19 00:08:55,453 : INFO : 133 batches submitted to accumulate stats from 8512 documents (2378449 virtual) 2025-04-19 00:08:55,471 : INFO : 134 batches submitted to accumulate stats from 8576 documents (2388057 virtual) 2025-04-19 00:08:55,475 : INFO : 135 batches submitted to accumulate stats from 8640 documents (2395926 virtual) 2025-04-19 00:08:55,478 : INFO : 136 batches submitted to accumulate stats from 8704 documents (2403405 virtual) 2025-04-19 00:08:55,487 : INFO : 137 batches submitted to accumulate stats from 8768 documents (2411628 virtual) 2025-04-19 00:08:55,499 : INFO : 138 batches submitted to accumulate stats from 8832 documents (2419219 virtual) 2025-04-19 00:08:55,511 : INFO : 139 batches submitted to accumulate stats from 8896 documents (2428220 virtual) 2025-04-19 00:08:55,514 : INFO : 140 batches submitted to accumulate stats from 8960 documents (2436470 virtual) 2025-04-19 00:08:55,523 : INFO : 141 batches submitted to accumulate stats from 9024 documents (2446006 virtual) 2025-04-19 00:08:55,525 : INFO : 142 batches submitted to accumulate stats from 9088 documents (2453039 virtual) 2025-04-19 00:08:55,526 : INFO : 143 batches submitted to accumulate stats from 9152 documents (2460905 virtual) 2025-04-19 00:08:55,529 : INFO : 144 batches submitted to accumulate stats from 9216 documents (2468645 virtual) 2025-04-19 00:08:55,538 : INFO : 145 batches submitted to accumulate stats from 9280 documents (2476321 virtual) 2025-04-19 00:08:55,563 : INFO : 146 batches submitted to accumulate stats from 9344 documents (2481981 virtual) 2025-04-19 00:08:55,565 : INFO : 147 batches submitted to accumulate stats from 9408 documents (2489833 virtual) 2025-04-19 00:08:55,567 : INFO : 148 batches submitted to accumulate stats from 9472 documents (2496627 virtual) 2025-04-19 00:08:55,569 : INFO : 149 batches submitted to accumulate stats from 9536 documents (2502106 virtual) 2025-04-19 00:08:55,570 : INFO : 150 batches submitted to accumulate stats from 9600 documents (2508434 virtual) 2025-04-19 00:08:55,572 : INFO : 151 batches submitted to accumulate stats from 9664 documents (2517654 virtual) 2025-04-19 00:08:55,578 : INFO : 152 batches submitted to accumulate stats from 9728 documents (2525651 virtual) 2025-04-19 00:08:55,597 : INFO : 153 batches submitted to accumulate stats from 9792 documents (2534661 virtual) 2025-04-19 00:08:55,603 : INFO : 154 batches submitted to accumulate stats from 9856 documents (2542846 virtual) 2025-04-19 00:08:55,605 : INFO : 155 batches submitted to accumulate stats from 9920 documents (2549206 virtual) 2025-04-19 00:08:55,609 : INFO : 156 batches submitted to accumulate stats from 9984 documents (2556742 virtual) 2025-04-19 00:08:55,610 : INFO : 157 batches submitted to accumulate stats from 10048 documents (2565026 virtual) 2025-04-19 00:08:55,613 : INFO : 158 batches submitted to accumulate stats from 10112 documents (2571434 virtual) 2025-04-19 00:08:55,622 : INFO : 159 batches submitted to accumulate stats from 10176 documents (2581280 virtual) 2025-04-19 00:08:55,650 : INFO : 160 batches submitted to accumulate stats from 10240 documents (2589671 virtual) 2025-04-19 00:08:55,659 : INFO : 161 batches submitted to accumulate stats from 10304 documents (2596979 virtual) 2025-04-19 00:08:55,662 : INFO : 162 batches submitted to accumulate stats from 10368 documents (2604556 virtual) 2025-04-19 00:08:55,665 : INFO : 163 batches submitted to accumulate stats from 10432 documents (2613656 virtual) 2025-04-19 00:08:55,675 : INFO : 164 batches submitted to accumulate stats from 10496 documents (2623890 virtual) 2025-04-19 00:08:55,677 : INFO : 165 batches submitted to accumulate stats from 10560 documents (2629308 virtual) 2025-04-19 00:08:55,686 : INFO : 166 batches submitted to accumulate stats from 10624 documents (2636085 virtual) 2025-04-19 00:08:55,687 : INFO : 167 batches submitted to accumulate stats from 10688 documents (2642039 virtual) 2025-04-19 00:08:55,699 : INFO : 168 batches submitted to accumulate stats from 10752 documents (2648389 virtual) 2025-04-19 00:08:55,703 : INFO : 169 batches submitted to accumulate stats from 10816 documents (2661959 virtual) 2025-04-19 00:08:55,709 : INFO : 170 batches submitted to accumulate stats from 10880 documents (2672949 virtual) 2025-04-19 00:08:55,713 : INFO : 171 batches submitted to accumulate stats from 10944 documents (2683365 virtual) 2025-04-19 00:08:55,716 : INFO : 172 batches submitted to accumulate stats from 11008 documents (2690484 virtual) 2025-04-19 00:08:55,727 : INFO : 173 batches submitted to accumulate stats from 11072 documents (2700627 virtual) 2025-04-19 00:08:55,730 : INFO : 174 batches submitted to accumulate stats from 11136 documents (2708742 virtual) 2025-04-19 00:08:55,733 : INFO : 175 batches submitted to accumulate stats from 11200 documents (2718156 virtual) 2025-04-19 00:08:55,743 : INFO : 176 batches submitted to accumulate stats from 11264 documents (2727801 virtual) 2025-04-19 00:08:55,760 : INFO : 177 batches submitted to accumulate stats from 11328 documents (2736288 virtual) 2025-04-19 00:08:55,762 : INFO : 178 batches submitted to accumulate stats from 11392 documents (2743845 virtual) 2025-04-19 00:08:55,763 : INFO : 179 batches submitted to accumulate stats from 11456 documents (2750885 virtual) 2025-04-19 00:08:55,766 : INFO : 180 batches submitted to accumulate stats from 11520 documents (2759213 virtual) 2025-04-19 00:08:55,768 : INFO : 181 batches submitted to accumulate stats from 11584 documents (2770309 virtual) 2025-04-19 00:08:55,770 : INFO : 182 batches submitted to accumulate stats from 11648 documents (2781566 virtual) 2025-04-19 00:08:55,792 : INFO : 183 batches submitted to accumulate stats from 11712 documents (2793513 virtual) 2025-04-19 00:08:55,797 : INFO : 184 batches submitted to accumulate stats from 11776 documents (2805133 virtual) 2025-04-19 00:08:55,801 : INFO : 185 batches submitted to accumulate stats from 11840 documents (2814621 virtual) 2025-04-19 00:08:55,803 : INFO : 186 batches submitted to accumulate stats from 11904 documents (2825917 virtual) 2025-04-19 00:08:55,806 : INFO : 187 batches submitted to accumulate stats from 11968 documents (2834764 virtual) 2025-04-19 00:08:55,812 : INFO : 188 batches submitted to accumulate stats from 12032 documents (2844523 virtual) 2025-04-19 00:08:55,819 : INFO : 189 batches submitted to accumulate stats from 12096 documents (2854512 virtual) 2025-04-19 00:08:55,829 : INFO : 190 batches submitted to accumulate stats from 12160 documents (2863511 virtual) 2025-04-19 00:08:55,838 : INFO : 191 batches submitted to accumulate stats from 12224 documents (2872492 virtual) 2025-04-19 00:08:55,841 : INFO : 192 batches submitted to accumulate stats from 12288 documents (2881543 virtual) 2025-04-19 00:08:55,849 : INFO : 193 batches submitted to accumulate stats from 12352 documents (2891233 virtual) 2025-04-19 00:08:55,851 : INFO : 194 batches submitted to accumulate stats from 12416 documents (2899835 virtual) 2025-04-19 00:08:55,853 : INFO : 195 batches submitted to accumulate stats from 12480 documents (2908542 virtual) 2025-04-19 00:08:55,866 : INFO : 196 batches submitted to accumulate stats from 12544 documents (2920162 virtual) 2025-04-19 00:08:55,903 : INFO : 197 batches submitted to accumulate stats from 12608 documents (2931072 virtual) 2025-04-19 00:08:55,905 : INFO : 198 batches submitted to accumulate stats from 12672 documents (2942168 virtual) 2025-04-19 00:08:55,910 : INFO : 199 batches submitted to accumulate stats from 12736 documents (2951378 virtual) 2025-04-19 00:08:55,921 : INFO : 200 batches submitted to accumulate stats from 12800 documents (2964980 virtual) 2025-04-19 00:08:55,934 : INFO : 201 batches submitted to accumulate stats from 12864 documents (2974742 virtual) 2025-04-19 00:08:55,936 : INFO : 202 batches submitted to accumulate stats from 12928 documents (2984778 virtual) 2025-04-19 00:08:55,942 : INFO : 203 batches submitted to accumulate stats from 12992 documents (2994073 virtual) 2025-04-19 00:08:55,944 : INFO : 204 batches submitted to accumulate stats from 13056 documents (3002522 virtual) 2025-04-19 00:08:55,946 : INFO : 205 batches submitted to accumulate stats from 13120 documents (3012040 virtual) 2025-04-19 00:08:55,948 : INFO : 206 batches submitted to accumulate stats from 13184 documents (3019919 virtual) 2025-04-19 00:08:55,967 : INFO : 207 batches submitted to accumulate stats from 13248 documents (3029004 virtual) 2025-04-19 00:08:55,971 : INFO : 208 batches submitted to accumulate stats from 13312 documents (3037489 virtual) 2025-04-19 00:08:55,986 : INFO : 209 batches submitted to accumulate stats from 13376 documents (3044929 virtual) 2025-04-19 00:08:55,989 : INFO : 210 batches submitted to accumulate stats from 13440 documents (3054034 virtual) 2025-04-19 00:08:55,991 : INFO : 211 batches submitted to accumulate stats from 13504 documents (3064099 virtual) 2025-04-19 00:08:55,999 : INFO : 212 batches submitted to accumulate stats from 13568 documents (3074522 virtual) 2025-04-19 00:08:56,012 : INFO : 213 batches submitted to accumulate stats from 13632 documents (3083808 virtual) 2025-04-19 00:08:56,016 : INFO : 214 batches submitted to accumulate stats from 13696 documents (3093078 virtual) 2025-04-19 00:08:56,021 : INFO : 215 batches submitted to accumulate stats from 13760 documents (3102171 virtual) 2025-04-19 00:08:56,029 : INFO : 216 batches submitted to accumulate stats from 13824 documents (3111128 virtual) 2025-04-19 00:08:56,031 : INFO : 217 batches submitted to accumulate stats from 13888 documents (3120517 virtual) 2025-04-19 00:08:56,032 : INFO : 218 batches submitted to accumulate stats from 13952 documents (3130614 virtual) 2025-04-19 00:08:56,048 : INFO : 219 batches submitted to accumulate stats from 14016 documents (3139268 virtual) 2025-04-19 00:08:56,050 : INFO : 220 batches submitted to accumulate stats from 14080 documents (3148635 virtual) 2025-04-19 00:08:56,062 : INFO : 221 batches submitted to accumulate stats from 14144 documents (3157335 virtual) 2025-04-19 00:08:56,068 : INFO : 222 batches submitted to accumulate stats from 14208 documents (3165838 virtual) 2025-04-19 00:08:56,070 : INFO : 223 batches submitted to accumulate stats from 14272 documents (3175765 virtual) 2025-04-19 00:08:56,073 : INFO : 224 batches submitted to accumulate stats from 14336 documents (3183123 virtual) 2025-04-19 00:08:56,075 : INFO : 225 batches submitted to accumulate stats from 14400 documents (3189537 virtual) 2025-04-19 00:08:56,090 : INFO : 226 batches submitted to accumulate stats from 14464 documents (3197239 virtual) 2025-04-19 00:08:56,096 : INFO : 227 batches submitted to accumulate stats from 14528 documents (3205518 virtual) 2025-04-19 00:08:56,098 : INFO : 228 batches submitted to accumulate stats from 14592 documents (3215608 virtual) 2025-04-19 00:08:56,104 : INFO : 229 batches submitted to accumulate stats from 14656 documents (3223376 virtual) 2025-04-19 00:08:56,111 : INFO : 230 batches submitted to accumulate stats from 14720 documents (3232304 virtual) 2025-04-19 00:08:56,118 : INFO : 231 batches submitted to accumulate stats from 14784 documents (3240270 virtual) 2025-04-19 00:08:56,155 : INFO : 232 batches submitted to accumulate stats from 14848 documents (3249755 virtual) 2025-04-19 00:08:56,158 : INFO : 233 batches submitted to accumulate stats from 14912 documents (3259377 virtual) 2025-04-19 00:08:56,166 : INFO : 234 batches submitted to accumulate stats from 14976 documents (3269637 virtual) 2025-04-19 00:08:56,170 : INFO : 235 batches submitted to accumulate stats from 15040 documents (3278311 virtual) 2025-04-19 00:08:56,177 : INFO : 236 batches submitted to accumulate stats from 15104 documents (3286321 virtual) 2025-04-19 00:08:56,179 : INFO : 237 batches submitted to accumulate stats from 15168 documents (3293385 virtual) 2025-04-19 00:08:56,185 : INFO : 238 batches submitted to accumulate stats from 15232 documents (3300334 virtual) 2025-04-19 00:08:56,191 : INFO : 239 batches submitted to accumulate stats from 15296 documents (3308226 virtual) 2025-04-19 00:08:56,196 : INFO : 240 batches submitted to accumulate stats from 15360 documents (3317325 virtual) 2025-04-19 00:08:56,209 : INFO : 241 batches submitted to accumulate stats from 15424 documents (3325778 virtual) 2025-04-19 00:08:56,214 : INFO : 242 batches submitted to accumulate stats from 15488 documents (3335373 virtual) 2025-04-19 00:08:56,217 : INFO : 243 batches submitted to accumulate stats from 15552 documents (3342716 virtual) 2025-04-19 00:08:56,221 : INFO : 244 batches submitted to accumulate stats from 15616 documents (3350508 virtual) 2025-04-19 00:08:56,223 : INFO : 245 batches submitted to accumulate stats from 15680 documents (3360131 virtual) 2025-04-19 00:08:56,246 : INFO : 246 batches submitted to accumulate stats from 15744 documents (3370635 virtual) 2025-04-19 00:08:56,248 : INFO : 247 batches submitted to accumulate stats from 15808 documents (3380994 virtual) 2025-04-19 00:08:56,250 : INFO : 248 batches submitted to accumulate stats from 15872 documents (3389920 virtual) 2025-04-19 00:08:56,252 : INFO : 249 batches submitted to accumulate stats from 15936 documents (3397487 virtual) 2025-04-19 00:08:56,255 : INFO : 250 batches submitted to accumulate stats from 16000 documents (3406129 virtual) 2025-04-19 00:08:56,257 : INFO : 251 batches submitted to accumulate stats from 16064 documents (3416805 virtual) 2025-04-19 00:08:56,267 : INFO : 252 batches submitted to accumulate stats from 16128 documents (3426189 virtual) 2025-04-19 00:08:56,284 : INFO : 253 batches submitted to accumulate stats from 16192 documents (3433824 virtual) 2025-04-19 00:08:56,286 : INFO : 254 batches submitted to accumulate stats from 16256 documents (3443379 virtual) 2025-04-19 00:08:56,288 : INFO : 255 batches submitted to accumulate stats from 16320 documents (3450914 virtual) 2025-04-19 00:08:56,429 : INFO : 7 accumulators retrieved from output queue 2025-04-19 00:08:56,437 : INFO : accumulated word occurrence stats for 3451622 virtual documents 2025-04-19 00:08:56,498 : INFO : using symmetric alpha at 0.2 2025-04-19 00:08:56,498 : INFO : using symmetric eta at 0.2 2025-04-19 00:08:56,500 : INFO : using serial LDA version on this node 2025-04-19 00:08:56,504 : INFO : running online (multi-pass) LDA training, 5 topics, 5 passes over the supplied corpus of 16310 documents, updating model once every 2000 documents, evaluating perplexity every 16310 documents, iterating 50x with a convergence threshold of 0.001000 2025-04-19 00:08:56,505 : INFO : PROGRESS: pass 0, at document #2000/16310 2025-04-19 00:08:57,167 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:57,169 : INFO : topic #0 (0.200): 0.028*"工作" + 0.014*"方式" + 0.013*"應徵" + 0.012*"推定" + 0.012*"空白" + 0.011*"單位" + 0.011*"砍除" + 0.010*"內容" + 0.010*"資訊" + 0.009*"聯絡" 2025-04-19 00:08:57,170 : INFO : topic #1 (0.200): 0.029*"工作" + 0.016*"方式" + 0.012*"推定" + 0.011*"聯絡" + 0.011*"單位" + 0.010*"砍除" + 0.010*"國定假日" + 0.010*"內容" + 0.010*"第一項" + 0.010*"情形" 2025-04-19 00:08:57,170 : INFO : topic #2 (0.200): 0.039*"工作" + 0.013*"內容" + 0.012*"推定" + 0.012*"工資" + 0.011*"應徵" + 0.011*"方式" + 0.010*"情形" + 0.010*"聯絡" + 0.010*"砍除" + 0.010*"小時" 2025-04-19 00:08:57,171 : INFO : topic #3 (0.200): 0.019*"工作" + 0.013*"方式" + 0.011*"砍除" + 0.010*"應徵" + 0.010*"聯絡人" + 0.009*"推定" + 0.009*"文字" + 0.009*"空白" + 0.009*"資訊" + 0.008*"情形" 2025-04-19 00:08:57,172 : INFO : topic #4 (0.200): 0.037*"工作" + 0.017*"推定" + 0.014*"空白" + 0.012*"方式" + 0.011*"聯絡" + 0.010*"聯絡人" + 0.010*"第一項" + 0.010*"單位" + 0.009*"情形" + 0.009*"內容" 2025-04-19 00:08:57,172 : INFO : topic diff=6.258093, rho=1.000000 2025-04-19 00:08:57,173 : INFO : PROGRESS: pass 0, at document #4000/16310 2025-04-19 00:08:57,783 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:57,786 : INFO : topic #0 (0.200): 0.028*"工作" + 0.014*"應徵" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.011*"推定" + 0.011*"單位" + 0.010*"內容" + 0.010*"資訊" + 0.010*"第一項" 2025-04-19 00:08:57,786 : INFO : topic #1 (0.200): 0.029*"工作" + 0.015*"方式" + 0.012*"砍除" + 0.012*"推定" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"國定假日" + 0.011*"單位" + 0.011*"資訊" 2025-04-19 00:08:57,787 : INFO : topic #2 (0.200): 0.041*"工作" + 0.015*"方式" + 0.013*"推定" + 0.013*"內容" + 0.013*"工資" + 0.012*"小時" + 0.011*"應徵" + 0.011*"單位" + 0.010*"未註明" + 0.010*"聯絡" 2025-04-19 00:08:57,787 : INFO : topic #3 (0.200): 0.016*"報名" + 0.013*"活動" + 0.013*"電話" + 0.011*"方式" + 0.010*"工作" + 0.010*"時間" + 0.010*"台北市" + 0.009*"聯絡" + 0.009*"內容" + 0.008*"人數" 2025-04-19 00:08:57,788 : INFO : topic #4 (0.200): 0.037*"工作" + 0.016*"推定" + 0.015*"空白" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"方式" + 0.011*"情形" + 0.010*"砍除" + 0.010*"聯絡人" + 0.010*"單位" 2025-04-19 00:08:57,788 : INFO : topic diff=0.603408, rho=0.707107 2025-04-19 00:08:57,789 : INFO : PROGRESS: pass 0, at document #6000/16310 2025-04-19 00:08:58,289 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:58,291 : INFO : topic #0 (0.200): 0.028*"工作" + 0.013*"應徵" + 0.012*"方式" + 0.012*"砍除" + 0.011*"空白" + 0.010*"資訊" + 0.010*"內容" + 0.010*"第一項" + 0.010*"推定" + 0.010*"單位" 2025-04-19 00:08:58,292 : INFO : topic #1 (0.200): 0.029*"工作" + 0.014*"方式" + 0.013*"砍除" + 0.012*"情形" + 0.012*"推定" + 0.012*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"資訊" 2025-04-19 00:08:58,292 : INFO : topic #2 (0.200): 0.040*"工作" + 0.016*"方式" + 0.012*"推定" + 0.012*"小時" + 0.012*"內容" + 0.011*"工資" + 0.010*"單位" + 0.010*"依法" + 0.010*"時間" + 0.009*"應徵" 2025-04-19 00:08:58,293 : INFO : topic #3 (0.200): 0.018*"報名" + 0.015*"活動" + 0.013*"電話" + 0.011*"台北市" + 0.010*"時間" + 0.008*"聯絡" + 0.008*"資料" + 0.008*"內容" + 0.008*"方式" + 0.008*"人數" 2025-04-19 00:08:58,293 : INFO : topic #4 (0.200): 0.037*"工作" + 0.016*"推定" + 0.015*"空白" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"方式" + 0.011*"砍除" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 00:08:58,294 : INFO : topic diff=0.666836, rho=0.577350 2025-04-19 00:08:58,294 : INFO : PROGRESS: pass 0, at document #8000/16310 2025-04-19 00:08:58,571 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:58,574 : INFO : topic #0 (0.200): 0.028*"工作" + 0.012*"應徵" + 0.012*"方式" + 0.012*"砍除" + 0.011*"空白" + 0.010*"資訊" + 0.010*"內容" + 0.010*"第一項" + 0.010*"推定" + 0.010*"單位" 2025-04-19 00:08:58,574 : INFO : topic #1 (0.200): 0.029*"工作" + 0.014*"方式" + 0.013*"砍除" + 0.012*"情形" + 0.012*"推定" + 0.012*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"資訊" 2025-04-19 00:08:58,575 : INFO : topic #2 (0.200): 0.035*"工作" + 0.010*"面試" + 0.010*"方式" + 0.010*"公司" + 0.008*"時間" + 0.008*"小時" + 0.008*"內容" + 0.006*"推定" + 0.006*"經驗" + 0.006*"工資" 2025-04-19 00:08:58,575 : INFO : topic #3 (0.200): 0.016*"公司" + 0.008*"時間" + 0.007*"問題" + 0.007*"目前" + 0.007*"產品" + 0.006*"工程師" + 0.006*"資料" + 0.006*"使用" + 0.006*"報名" + 0.006*"活動" 2025-04-19 00:08:58,576 : INFO : topic #4 (0.200): 0.037*"工作" + 0.016*"推定" + 0.015*"空白" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"方式" + 0.011*"砍除" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 00:08:58,576 : INFO : topic diff=0.963371, rho=0.500000 2025-04-19 00:08:58,577 : INFO : PROGRESS: pass 0, at document #10000/16310 2025-04-19 00:08:58,830 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:58,833 : INFO : topic #0 (0.200): 0.028*"工作" + 0.012*"應徵" + 0.012*"方式" + 0.011*"砍除" + 0.011*"空白" + 0.010*"資訊" + 0.010*"內容" + 0.010*"第一項" + 0.010*"推定" + 0.009*"單位" 2025-04-19 00:08:58,833 : INFO : topic #1 (0.200): 0.029*"工作" + 0.014*"方式" + 0.013*"砍除" + 0.012*"情形" + 0.012*"推定" + 0.012*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"資訊" 2025-04-19 00:08:58,834 : INFO : topic #2 (0.200): 0.034*"工作" + 0.012*"面試" + 0.011*"公司" + 0.008*"方式" + 0.008*"時間" + 0.007*"內容" + 0.006*"小時" + 0.006*"經驗" + 0.006*"覺得" + 0.006*"比較" 2025-04-19 00:08:58,834 : INFO : topic #3 (0.200): 0.017*"公司" + 0.008*"問題" + 0.008*"目前" + 0.007*"時間" + 0.007*"工程師" + 0.006*"使用" + 0.006*"產品" + 0.006*"資料" + 0.006*"技術" + 0.005*"團隊" 2025-04-19 00:08:58,834 : INFO : topic #4 (0.200): 0.037*"工作" + 0.016*"推定" + 0.015*"空白" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"方式" + 0.011*"砍除" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 00:08:58,835 : INFO : topic diff=0.639901, rho=0.447214 2025-04-19 00:08:58,839 : INFO : PROGRESS: pass 0, at document #12000/16310 2025-04-19 00:08:59,123 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:59,126 : INFO : topic #0 (0.200): 0.027*"工作" + 0.012*"應徵" + 0.012*"方式" + 0.011*"砍除" + 0.011*"空白" + 0.010*"資訊" + 0.010*"內容" + 0.009*"第一項" + 0.009*"推定" + 0.009*"單位" 2025-04-19 00:08:59,127 : INFO : topic #1 (0.200): 0.029*"工作" + 0.014*"方式" + 0.013*"砍除" + 0.012*"情形" + 0.012*"推定" + 0.011*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"資訊" 2025-04-19 00:08:59,127 : INFO : topic #2 (0.200): 0.032*"工作" + 0.012*"面試" + 0.011*"公司" + 0.007*"時間" + 0.007*"方式" + 0.006*"覺得" + 0.006*"比較" + 0.006*"經驗" + 0.006*"內容" + 0.005*"小時" 2025-04-19 00:08:59,128 : INFO : topic #3 (0.200): 0.015*"公司" + 0.007*"問題" + 0.006*"目前" + 0.005*"工程師" + 0.005*"技術" + 0.005*"台灣" + 0.005*"時間" + 0.005*"使用" + 0.004*"產品" + 0.004*"資料" 2025-04-19 00:08:59,128 : INFO : topic #4 (0.200): 0.036*"工作" + 0.016*"推定" + 0.015*"空白" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"方式" + 0.011*"砍除" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 00:08:59,129 : INFO : topic diff=0.602195, rho=0.408248 2025-04-19 00:08:59,129 : INFO : PROGRESS: pass 0, at document #14000/16310 2025-04-19 00:08:59,387 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:59,390 : INFO : topic #0 (0.200): 0.026*"工作" + 0.012*"應徵" + 0.011*"方式" + 0.011*"砍除" + 0.010*"空白" + 0.009*"資訊" + 0.009*"內容" + 0.009*"單位" + 0.009*"第一項" + 0.009*"推定" 2025-04-19 00:08:59,390 : INFO : topic #1 (0.200): 0.028*"工作" + 0.014*"方式" + 0.013*"砍除" + 0.012*"情形" + 0.011*"推定" + 0.011*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"資訊" 2025-04-19 00:08:59,391 : INFO : topic #2 (0.200): 0.031*"工作" + 0.011*"面試" + 0.011*"公司" + 0.006*"時間" + 0.006*"覺得" + 0.006*"比較" + 0.006*"方式" + 0.005*"經驗" + 0.005*"真的" + 0.005*"內容" 2025-04-19 00:08:59,391 : INFO : topic #3 (0.200): 0.012*"公司" + 0.007*"台灣" + 0.005*"問題" + 0.005*"技術" + 0.005*"目前" + 0.004*"工程師" + 0.004*"美國" + 0.004*"晶片" + 0.004*"產業" + 0.004*"產品" 2025-04-19 00:08:59,392 : INFO : topic #4 (0.200): 0.036*"工作" + 0.016*"推定" + 0.015*"空白" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.010*"方式" + 0.010*"砍除" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 00:08:59,392 : INFO : topic diff=0.492179, rho=0.377964 2025-04-19 00:08:59,393 : INFO : PROGRESS: pass 0, at document #16000/16310 2025-04-19 00:08:59,641 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:08:59,643 : INFO : topic #0 (0.200): 0.025*"工作" + 0.011*"方式" + 0.011*"應徵" + 0.010*"砍除" + 0.010*"空白" + 0.009*"資訊" + 0.009*"單位" + 0.009*"內容" + 0.009*"第一項" + 0.008*"推定" 2025-04-19 00:08:59,644 : INFO : topic #1 (0.200): 0.028*"工作" + 0.014*"方式" + 0.012*"砍除" + 0.011*"情形" + 0.011*"推定" + 0.011*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.010*"文字" + 0.010*"資訊" 2025-04-19 00:08:59,644 : INFO : topic #2 (0.200): 0.030*"工作" + 0.011*"公司" + 0.010*"面試" + 0.006*"時間" + 0.006*"比較" + 0.006*"覺得" + 0.006*"真的" + 0.005*"應該" + 0.005*"方式" + 0.005*"內容" 2025-04-19 00:08:59,645 : INFO : topic #3 (0.200): 0.011*"公司" + 0.007*"台灣" + 0.006*"美國" + 0.005*"晶片" + 0.005*"技術" + 0.004*"表示" + 0.004*"目前" + 0.004*"中國" + 0.004*"半導體" + 0.004*"問題" 2025-04-19 00:08:59,645 : INFO : topic #4 (0.200): 0.035*"工作" + 0.015*"推定" + 0.014*"空白" + 0.011*"第一項" + 0.011*"聯絡" + 0.010*"情形" + 0.010*"方式" + 0.010*"砍除" + 0.010*"單位" + 0.010*"國定假日" 2025-04-19 00:08:59,646 : INFO : topic diff=0.381256, rho=0.353553 2025-04-19 00:08:59,720 : INFO : -8.527 per-word bound, 368.9 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:08:59,720 : INFO : PROGRESS: pass 0, at document #16310/16310 2025-04-19 00:08:59,760 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:08:59,763 : INFO : topic #0 (0.200): 0.023*"工作" + 0.010*"方式" + 0.010*"應徵" + 0.009*"單位" + 0.009*"砍除" + 0.009*"空白" + 0.008*"資訊" + 0.008*"內容" + 0.008*"第一項" + 0.008*"推定" 2025-04-19 00:08:59,763 : INFO : topic #1 (0.200): 0.027*"工作" + 0.014*"方式" + 0.012*"砍除" + 0.011*"情形" + 0.011*"推定" + 0.011*"第一項" + 0.011*"國定假日" + 0.010*"聯絡" + 0.010*"文字" + 0.010*"資訊" 2025-04-19 00:08:59,764 : INFO : topic #2 (0.200): 0.027*"工作" + 0.012*"公司" + 0.010*"面試" + 0.007*"真的" + 0.006*"覺得" + 0.006*"時間" + 0.006*"比較" + 0.006*"應該" + 0.005*"事情" + 0.005*"一下" 2025-04-19 00:08:59,764 : INFO : topic #3 (0.200): 0.011*"公司" + 0.008*"美國" + 0.008*"台灣" + 0.006*"技術" + 0.005*"晶片" + 0.005*"台積電" + 0.004*"表示" + 0.004*"中國" + 0.004*"台積" + 0.004*"科技" 2025-04-19 00:08:59,765 : INFO : topic #4 (0.200): 0.034*"工作" + 0.015*"推定" + 0.014*"空白" + 0.011*"第一項" + 0.010*"聯絡" + 0.010*"情形" + 0.010*"方式" + 0.010*"砍除" + 0.009*"單位" + 0.009*"國定假日" 2025-04-19 00:08:59,765 : INFO : topic diff=0.375175, rho=0.333333 2025-04-19 00:08:59,765 : INFO : PROGRESS: pass 1, at document #2000/16310 2025-04-19 00:09:00,390 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:00,393 : INFO : topic #0 (0.200): 0.028*"工作" + 0.016*"方式" + 0.010*"應徵" + 0.010*"內容" + 0.010*"工資" + 0.009*"聯絡" + 0.008*"通知" + 0.008*"單位" + 0.008*"推定" + 0.008*"依法" 2025-04-19 00:09:00,393 : INFO : topic #1 (0.200): 0.030*"工作" + 0.015*"方式" + 0.012*"推定" + 0.012*"砍除" + 0.012*"聯絡" + 0.011*"內容" + 0.011*"情形" + 0.011*"單位" + 0.010*"資訊" + 0.010*"第一項" 2025-04-19 00:09:00,394 : INFO : topic #2 (0.200): 0.029*"工作" + 0.011*"公司" + 0.011*"面試" + 0.008*"時間" + 0.006*"真的" + 0.005*"覺得" + 0.005*"比較" + 0.005*"方式" + 0.005*"小時" + 0.005*"應該" 2025-04-19 00:09:00,397 : INFO : topic #3 (0.200): 0.011*"公司" + 0.007*"美國" + 0.007*"台灣" + 0.005*"技術" + 0.005*"晶片" + 0.004*"台積電" + 0.004*"表示" + 0.004*"中國" + 0.004*"科技" + 0.004*"台積" 2025-04-19 00:09:00,405 : INFO : topic #4 (0.200): 0.038*"工作" + 0.017*"推定" + 0.015*"空白" + 0.011*"方式" + 0.011*"聯絡" + 0.011*"砍除" + 0.011*"情形" + 0.011*"第一項" + 0.010*"單位" + 0.010*"內容" 2025-04-19 00:09:00,410 : INFO : topic diff=1.157990, rho=0.313805 2025-04-19 00:09:00,414 : INFO : PROGRESS: pass 1, at document #4000/16310 2025-04-19 00:09:01,024 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:01,027 : INFO : topic #0 (0.200): 0.030*"工作" + 0.020*"方式" + 0.012*"內容" + 0.011*"通知" + 0.011*"聯絡" + 0.011*"電話" + 0.011*"工資" + 0.010*"時間" + 0.010*"台北市" + 0.010*"依法" 2025-04-19 00:09:01,027 : INFO : topic #1 (0.200): 0.030*"工作" + 0.015*"方式" + 0.012*"砍除" + 0.011*"推定" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"第一項" + 0.011*"單位" + 0.011*"文字" 2025-04-19 00:09:01,028 : INFO : topic #2 (0.200): 0.030*"工作" + 0.011*"面試" + 0.010*"公司" + 0.009*"時間" + 0.006*"小時" + 0.005*"方式" + 0.005*"經驗" + 0.005*"真的" + 0.005*"內容" + 0.005*"比較" 2025-04-19 00:09:01,028 : INFO : topic #3 (0.200): 0.010*"公司" + 0.007*"美國" + 0.007*"台灣" + 0.005*"技術" + 0.005*"晶片" + 0.004*"報名" + 0.004*"活動" + 0.004*"台積電" + 0.004*"資料" + 0.004*"表示" 2025-04-19 00:09:01,029 : INFO : topic #4 (0.200): 0.037*"工作" + 0.017*"推定" + 0.014*"空白" + 0.012*"方式" + 0.012*"砍除" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"內容" 2025-04-19 00:09:01,029 : INFO : topic diff=0.487175, rho=0.313805 2025-04-19 00:09:01,030 : INFO : PROGRESS: pass 1, at document #6000/16310 2025-04-19 00:09:01,520 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:01,522 : INFO : topic #0 (0.200): 0.032*"工作" + 0.023*"方式" + 0.013*"內容" + 0.013*"聯絡" + 0.013*"通知" + 0.012*"電話" + 0.012*"時間" + 0.011*"依法" + 0.011*"台北市" + 0.011*"工資" 2025-04-19 00:09:01,523 : INFO : topic #1 (0.200): 0.030*"工作" + 0.014*"方式" + 0.012*"砍除" + 0.012*"情形" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"推定" + 0.011*"第一項" + 0.011*"文字" + 0.011*"資訊" 2025-04-19 00:09:01,523 : INFO : topic #2 (0.200): 0.030*"工作" + 0.011*"公司" + 0.010*"面試" + 0.009*"時間" + 0.007*"經驗" + 0.006*"小時" + 0.005*"方式" + 0.005*"內容" + 0.005*"比較" + 0.004*"真的" 2025-04-19 00:09:01,524 : INFO : topic #3 (0.200): 0.010*"公司" + 0.007*"報名" + 0.006*"活動" + 0.006*"台灣" + 0.006*"美國" + 0.005*"資料" + 0.004*"技術" + 0.004*"進行" + 0.004*"使用" + 0.004*"產品" 2025-04-19 00:09:01,524 : INFO : topic #4 (0.200): 0.036*"工作" + 0.016*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:09:01,525 : INFO : topic diff=0.326379, rho=0.313805 2025-04-19 00:09:01,525 : INFO : PROGRESS: pass 1, at document #8000/16310 2025-04-19 00:09:01,798 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:01,801 : INFO : topic #0 (0.200): 0.032*"工作" + 0.023*"方式" + 0.013*"內容" + 0.013*"聯絡" + 0.012*"通知" + 0.012*"時間" + 0.012*"電話" + 0.011*"台北市" + 0.011*"依法" + 0.011*"小時" 2025-04-19 00:09:01,801 : INFO : topic #1 (0.200): 0.030*"工作" + 0.014*"方式" + 0.012*"砍除" + 0.012*"情形" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"推定" + 0.011*"第一項" + 0.011*"文字" + 0.011*"資訊" 2025-04-19 00:09:01,802 : INFO : topic #2 (0.200): 0.027*"工作" + 0.015*"公司" + 0.013*"面試" + 0.009*"時間" + 0.008*"經驗" + 0.006*"覺得" + 0.006*"比較" + 0.006*"問題" + 0.006*"工程師" + 0.006*"開發" 2025-04-19 00:09:01,802 : INFO : topic #3 (0.200): 0.012*"公司" + 0.006*"技術" + 0.006*"台灣" + 0.006*"產品" + 0.005*"資料" + 0.005*"目前" + 0.005*"報名" + 0.005*"使用" + 0.005*"活動" + 0.005*"美國" 2025-04-19 00:09:01,803 : INFO : topic #4 (0.200): 0.036*"工作" + 0.016*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:09:01,803 : INFO : topic diff=0.435202, rho=0.313805 2025-04-19 00:09:01,803 : INFO : PROGRESS: pass 1, at document #10000/16310 2025-04-19 00:09:02,057 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:02,059 : INFO : topic #0 (0.200): 0.032*"工作" + 0.024*"方式" + 0.013*"內容" + 0.013*"聯絡" + 0.012*"時間" + 0.012*"通知" + 0.012*"電話" + 0.011*"小時" + 0.011*"台北市" + 0.011*"依法" 2025-04-19 00:09:02,060 : INFO : topic #1 (0.200): 0.030*"工作" + 0.014*"方式" + 0.012*"砍除" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"推定" + 0.011*"第一項" + 0.011*"文字" + 0.011*"資訊" 2025-04-19 00:09:02,060 : INFO : topic #2 (0.200): 0.024*"工作" + 0.016*"公司" + 0.013*"面試" + 0.008*"時間" + 0.008*"經驗" + 0.007*"問題" + 0.007*"比較" + 0.006*"開發" + 0.006*"覺得" + 0.006*"工程師" 2025-04-19 00:09:02,061 : INFO : topic #3 (0.200): 0.012*"公司" + 0.006*"產品" + 0.006*"技術" + 0.006*"台灣" + 0.006*"資料" + 0.005*"使用" + 0.005*"目前" + 0.005*"問題" + 0.004*"美國" + 0.004*"報名" 2025-04-19 00:09:02,061 : INFO : topic #4 (0.200): 0.036*"工作" + 0.016*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:09:02,061 : INFO : topic diff=0.325595, rho=0.313805 2025-04-19 00:09:02,062 : INFO : PROGRESS: pass 1, at document #12000/16310 2025-04-19 00:09:02,310 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:02,312 : INFO : topic #0 (0.200): 0.032*"工作" + 0.024*"方式" + 0.014*"聯絡" + 0.013*"內容" + 0.013*"時間" + 0.012*"通知" + 0.012*"電話" + 0.011*"小時" + 0.011*"台北市" + 0.011*"地點" 2025-04-19 00:09:02,313 : INFO : topic #1 (0.200): 0.030*"工作" + 0.014*"方式" + 0.012*"砍除" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"推定" + 0.011*"文字" + 0.011*"第一項" + 0.011*"資訊" 2025-04-19 00:09:02,313 : INFO : topic #2 (0.200): 0.023*"工作" + 0.016*"公司" + 0.013*"面試" + 0.008*"時間" + 0.008*"經驗" + 0.007*"比較" + 0.007*"問題" + 0.006*"開發" + 0.006*"覺得" + 0.006*"工程師" 2025-04-19 00:09:02,314 : INFO : topic #3 (0.200): 0.010*"公司" + 0.006*"台灣" + 0.005*"技術" + 0.005*"產品" + 0.005*"使用" + 0.004*"目前" + 0.004*"資料" + 0.004*"美國" + 0.004*"問題" + 0.003*"產業" 2025-04-19 00:09:02,314 : INFO : topic #4 (0.200): 0.036*"工作" + 0.016*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:09:02,315 : INFO : topic diff=0.323943, rho=0.313805 2025-04-19 00:09:02,315 : INFO : PROGRESS: pass 1, at document #14000/16310 2025-04-19 00:09:02,568 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:02,570 : INFO : topic #0 (0.200): 0.032*"工作" + 0.023*"方式" + 0.013*"聯絡" + 0.013*"內容" + 0.012*"時間" + 0.012*"通知" + 0.011*"電話" + 0.011*"小時" + 0.011*"地點" + 0.011*"台北市" 2025-04-19 00:09:02,571 : INFO : topic #1 (0.200): 0.029*"工作" + 0.014*"方式" + 0.012*"砍除" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"推定" + 0.011*"文字" + 0.011*"第一項" + 0.010*"資訊" 2025-04-19 00:09:02,571 : INFO : topic #2 (0.200): 0.024*"工作" + 0.016*"公司" + 0.012*"面試" + 0.007*"時間" + 0.007*"經驗" + 0.007*"比較" + 0.007*"問題" + 0.006*"工程師" + 0.006*"覺得" + 0.005*"開發" 2025-04-19 00:09:02,572 : INFO : topic #3 (0.200): 0.009*"公司" + 0.008*"台灣" + 0.005*"技術" + 0.005*"美國" + 0.004*"晶片" + 0.004*"表示" + 0.004*"產業" + 0.004*"科技" + 0.004*"產品" + 0.004*"員工" 2025-04-19 00:09:02,573 : INFO : topic #4 (0.200): 0.036*"工作" + 0.016*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"單位" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:09:02,573 : INFO : topic diff=0.309660, rho=0.313805 2025-04-19 00:09:02,573 : INFO : PROGRESS: pass 1, at document #16000/16310 2025-04-19 00:09:02,785 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:02,787 : INFO : topic #0 (0.200): 0.030*"工作" + 0.023*"方式" + 0.013*"聯絡" + 0.012*"時間" + 0.012*"內容" + 0.012*"通知" + 0.012*"小時" + 0.011*"電話" + 0.011*"地點" + 0.011*"工資" 2025-04-19 00:09:02,788 : INFO : topic #1 (0.200): 0.029*"工作" + 0.014*"方式" + 0.012*"砍除" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"推定" + 0.011*"文字" + 0.011*"第一項" + 0.010*"資訊" 2025-04-19 00:09:02,788 : INFO : topic #2 (0.200): 0.023*"工作" + 0.017*"公司" + 0.011*"面試" + 0.007*"時間" + 0.007*"工程師" + 0.007*"比較" + 0.006*"經驗" + 0.006*"問題" + 0.006*"覺得" + 0.006*"主管" 2025-04-19 00:09:02,789 : INFO : topic #3 (0.200): 0.009*"公司" + 0.008*"台灣" + 0.006*"美國" + 0.005*"晶片" + 0.005*"技術" + 0.005*"表示" + 0.004*"中國" + 0.004*"半導體" + 0.004*"台積電" + 0.004*"員工" 2025-04-19 00:09:02,789 : INFO : topic #4 (0.200): 0.036*"工作" + 0.016*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"單位" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:09:02,790 : INFO : topic diff=0.285886, rho=0.313805 2025-04-19 00:09:02,859 : INFO : -8.354 per-word bound, 327.1 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:09:02,860 : INFO : PROGRESS: pass 1, at document #16310/16310 2025-04-19 00:09:02,895 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:09:02,897 : INFO : topic #0 (0.200): 0.033*"工作" + 0.021*"方式" + 0.014*"小時" + 0.013*"時間" + 0.012*"聯絡" + 0.011*"內容" + 0.011*"通知" + 0.011*"地點" + 0.010*"電話" + 0.010*"工資" 2025-04-19 00:09:02,898 : INFO : topic #1 (0.200): 0.029*"工作" + 0.014*"方式" + 0.011*"砍除" + 0.011*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"推定" + 0.011*"文字" + 0.010*"第一項" + 0.010*"資訊" 2025-04-19 00:09:02,898 : INFO : topic #2 (0.200): 0.021*"工作" + 0.017*"公司" + 0.010*"面試" + 0.007*"知道" + 0.007*"真的" + 0.006*"問題" + 0.006*"時間" + 0.006*"工程師" + 0.006*"比較" + 0.006*"覺得" 2025-04-19 00:09:02,899 : INFO : topic #3 (0.200): 0.009*"公司" + 0.008*"美國" + 0.008*"台灣" + 0.006*"晶片" + 0.006*"技術" + 0.005*"表示" + 0.005*"台積電" + 0.004*"中國" + 0.004*"科技" + 0.004*"台積" 2025-04-19 00:09:02,899 : INFO : topic #4 (0.200): 0.035*"工作" + 0.016*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"單位" + 0.011*"第一項" + 0.011*"聯絡" + 0.010*"內容" 2025-04-19 00:09:02,899 : INFO : topic diff=0.320931, rho=0.313805 2025-04-19 00:09:02,900 : INFO : PROGRESS: pass 2, at document #2000/16310 2025-04-19 00:09:03,477 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:03,479 : INFO : topic #0 (0.200): 0.035*"工作" + 0.024*"方式" + 0.016*"時間" + 0.014*"小時" + 0.013*"內容" + 0.012*"通知" + 0.012*"聯絡" + 0.012*"電話" + 0.012*"工資" + 0.012*"台北市" 2025-04-19 00:09:03,480 : INFO : topic #1 (0.200): 0.030*"工作" + 0.013*"方式" + 0.012*"情形" + 0.011*"砍除" + 0.011*"第一項" + 0.011*"文字" + 0.011*"聯絡" + 0.010*"推定" + 0.010*"內容" + 0.010*"資訊" 2025-04-19 00:09:03,480 : INFO : topic #2 (0.200): 0.021*"工作" + 0.016*"公司" + 0.011*"面試" + 0.006*"時間" + 0.006*"問題" + 0.006*"知道" + 0.006*"真的" + 0.006*"比較" + 0.006*"工程師" + 0.006*"經驗" 2025-04-19 00:09:03,481 : INFO : topic #3 (0.200): 0.009*"公司" + 0.008*"美國" + 0.008*"台灣" + 0.006*"晶片" + 0.005*"技術" + 0.005*"表示" + 0.005*"台積電" + 0.004*"中國" + 0.004*"科技" + 0.004*"台積" 2025-04-19 00:09:03,481 : INFO : topic #4 (0.200): 0.035*"工作" + 0.016*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"單位" + 0.011*"第一項" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:09:03,481 : INFO : topic diff=0.930425, rho=0.299409 2025-04-19 00:09:03,482 : INFO : PROGRESS: pass 2, at document #4000/16310 2025-04-19 00:09:04,071 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:04,073 : INFO : topic #0 (0.200): 0.034*"工作" + 0.023*"方式" + 0.016*"時間" + 0.014*"小時" + 0.013*"內容" + 0.012*"電話" + 0.012*"通知" + 0.012*"聯絡" + 0.011*"工資" + 0.011*"地點" 2025-04-19 00:09:04,074 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"方式" + 0.012*"情形" + 0.012*"砍除" + 0.011*"第一項" + 0.011*"文字" + 0.010*"聯絡" + 0.010*"資訊" + 0.010*"空白" + 0.010*"推定" 2025-04-19 00:09:04,074 : INFO : topic #2 (0.200): 0.021*"工作" + 0.016*"公司" + 0.011*"面試" + 0.007*"時間" + 0.006*"問題" + 0.006*"經驗" + 0.006*"知道" + 0.006*"比較" + 0.005*"真的" + 0.005*"工程師" 2025-04-19 00:09:04,075 : INFO : topic #3 (0.200): 0.009*"公司" + 0.008*"美國" + 0.008*"台灣" + 0.006*"晶片" + 0.005*"技術" + 0.005*"表示" + 0.004*"台積電" + 0.004*"中國" + 0.004*"科技" + 0.004*"台積" 2025-04-19 00:09:04,075 : INFO : topic #4 (0.200): 0.035*"工作" + 0.016*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"情形" + 0.011*"單位" + 0.011*"第一項" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:09:04,076 : INFO : topic diff=0.371374, rho=0.299409 2025-04-19 00:09:04,076 : INFO : PROGRESS: pass 2, at document #6000/16310 2025-04-19 00:09:04,544 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:04,546 : INFO : topic #0 (0.200): 0.032*"工作" + 0.023*"方式" + 0.016*"時間" + 0.013*"小時" + 0.013*"電話" + 0.013*"內容" + 0.012*"通知" + 0.012*"聯絡" + 0.012*"報名" + 0.011*"台北市" 2025-04-19 00:09:04,547 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.012*"砍除" + 0.011*"文字" + 0.010*"空白" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"內容" 2025-04-19 00:09:04,547 : INFO : topic #2 (0.200): 0.021*"工作" + 0.016*"公司" + 0.010*"面試" + 0.007*"經驗" + 0.007*"時間" + 0.006*"問題" + 0.006*"工程師" + 0.005*"比較" + 0.005*"知道" + 0.005*"需要" 2025-04-19 00:09:04,548 : INFO : topic #3 (0.200): 0.009*"公司" + 0.007*"台灣" + 0.007*"美國" + 0.005*"技術" + 0.005*"晶片" + 0.005*"產品" + 0.004*"表示" + 0.004*"科技" + 0.004*"台積電" + 0.004*"中國" 2025-04-19 00:09:04,548 : INFO : topic #4 (0.200): 0.034*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.012*"情形" + 0.011*"單位" + 0.011*"內容" + 0.011*"第一項" + 0.011*"聯絡" 2025-04-19 00:09:04,548 : INFO : topic diff=0.251031, rho=0.299409 2025-04-19 00:09:04,549 : INFO : PROGRESS: pass 2, at document #8000/16310 2025-04-19 00:09:04,800 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:04,802 : INFO : topic #0 (0.200): 0.033*"工作" + 0.023*"方式" + 0.016*"時間" + 0.014*"小時" + 0.013*"電話" + 0.013*"內容" + 0.012*"聯絡" + 0.012*"台北市" + 0.012*"通知" + 0.011*"報名" 2025-04-19 00:09:04,802 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.012*"砍除" + 0.011*"文字" + 0.010*"資訊" + 0.010*"空白" + 0.010*"聯絡" + 0.010*"內容" 2025-04-19 00:09:04,803 : INFO : topic #2 (0.200): 0.020*"工作" + 0.018*"公司" + 0.011*"面試" + 0.007*"經驗" + 0.007*"問題" + 0.007*"工程師" + 0.007*"時間" + 0.006*"開發" + 0.006*"比較" + 0.006*"覺得" 2025-04-19 00:09:04,804 : INFO : topic #3 (0.200): 0.009*"公司" + 0.008*"台灣" + 0.006*"產品" + 0.006*"美國" + 0.006*"技術" + 0.004*"使用" + 0.004*"科技" + 0.004*"產業" + 0.004*"資料" + 0.004*"進行" 2025-04-19 00:09:04,804 : INFO : topic #4 (0.200): 0.034*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"情形" + 0.011*"單位" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"第一項" 2025-04-19 00:09:04,804 : INFO : topic diff=0.374744, rho=0.299409 2025-04-19 00:09:04,804 : INFO : PROGRESS: pass 2, at document #10000/16310 2025-04-19 00:09:05,030 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:05,032 : INFO : topic #0 (0.200): 0.033*"工作" + 0.024*"方式" + 0.017*"時間" + 0.014*"小時" + 0.013*"聯絡" + 0.013*"內容" + 0.012*"電話" + 0.012*"台北市" + 0.012*"報名" + 0.011*"通知" 2025-04-19 00:09:05,033 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.011*"砍除" + 0.011*"文字" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"空白" + 0.010*"內容" 2025-04-19 00:09:05,033 : INFO : topic #2 (0.200): 0.019*"工作" + 0.018*"公司" + 0.012*"面試" + 0.008*"問題" + 0.007*"經驗" + 0.007*"時間" + 0.007*"工程師" + 0.006*"開發" + 0.006*"比較" + 0.006*"覺得" 2025-04-19 00:09:05,034 : INFO : topic #3 (0.200): 0.009*"公司" + 0.008*"台灣" + 0.007*"產品" + 0.006*"技術" + 0.006*"美國" + 0.004*"使用" + 0.004*"資料" + 0.004*"產業" + 0.004*"研究" + 0.004*"進行" 2025-04-19 00:09:05,034 : INFO : topic #4 (0.200): 0.034*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"情形" + 0.011*"單位" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"第一項" 2025-04-19 00:09:05,034 : INFO : topic diff=0.288282, rho=0.299409 2025-04-19 00:09:05,035 : INFO : PROGRESS: pass 2, at document #12000/16310 2025-04-19 00:09:05,255 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:05,258 : INFO : topic #0 (0.200): 0.033*"工作" + 0.024*"方式" + 0.017*"時間" + 0.015*"小時" + 0.013*"聯絡" + 0.012*"內容" + 0.012*"報名" + 0.012*"電話" + 0.012*"台北市" + 0.011*"通知" 2025-04-19 00:09:05,258 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"方式" + 0.012*"情形" + 0.011*"第一項" + 0.011*"砍除" + 0.011*"文字" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"空白" + 0.010*"內容" 2025-04-19 00:09:05,258 : INFO : topic #2 (0.200): 0.019*"工作" + 0.018*"公司" + 0.011*"面試" + 0.008*"問題" + 0.007*"經驗" + 0.007*"工程師" + 0.007*"時間" + 0.006*"開發" + 0.006*"比較" + 0.006*"覺得" 2025-04-19 00:09:05,259 : INFO : topic #3 (0.200): 0.008*"公司" + 0.008*"台灣" + 0.005*"美國" + 0.005*"技術" + 0.005*"產品" + 0.004*"晶片" + 0.004*"產業" + 0.004*"科技" + 0.004*"表示" + 0.004*"員工" 2025-04-19 00:09:05,259 : INFO : topic #4 (0.200): 0.034*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"情形" + 0.011*"單位" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"第一項" 2025-04-19 00:09:05,260 : INFO : topic diff=0.299128, rho=0.299409 2025-04-19 00:09:05,260 : INFO : PROGRESS: pass 2, at document #14000/16310 2025-04-19 00:09:05,478 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:05,481 : INFO : topic #0 (0.200): 0.032*"工作" + 0.023*"方式" + 0.017*"時間" + 0.014*"小時" + 0.013*"聯絡" + 0.012*"報名" + 0.012*"內容" + 0.012*"電話" + 0.011*"台北市" + 0.011*"通知" 2025-04-19 00:09:05,481 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"方式" + 0.012*"情形" + 0.011*"第一項" + 0.011*"砍除" + 0.011*"文字" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"空白" + 0.010*"內容" 2025-04-19 00:09:05,482 : INFO : topic #2 (0.200): 0.019*"工作" + 0.018*"公司" + 0.010*"面試" + 0.008*"問題" + 0.007*"工程師" + 0.006*"經驗" + 0.006*"時間" + 0.006*"比較" + 0.005*"開發" + 0.005*"覺得" 2025-04-19 00:09:05,482 : INFO : topic #3 (0.200): 0.008*"台灣" + 0.008*"公司" + 0.006*"美國" + 0.005*"晶片" + 0.005*"技術" + 0.005*"表示" + 0.004*"科技" + 0.004*"產業" + 0.004*"員工" + 0.004*"半導體" 2025-04-19 00:09:05,483 : INFO : topic #4 (0.200): 0.034*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"情形" + 0.011*"單位" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"第一項" 2025-04-19 00:09:05,483 : INFO : topic diff=0.293956, rho=0.299409 2025-04-19 00:09:05,485 : INFO : PROGRESS: pass 2, at document #16000/16310 2025-04-19 00:09:05,716 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:05,719 : INFO : topic #0 (0.200): 0.032*"工作" + 0.023*"方式" + 0.016*"時間" + 0.014*"小時" + 0.013*"報名" + 0.012*"聯絡" + 0.012*"內容" + 0.011*"台北市" + 0.011*"電話" + 0.011*"通知" 2025-04-19 00:09:05,719 : INFO : topic #1 (0.200): 0.029*"工作" + 0.012*"方式" + 0.012*"情形" + 0.011*"第一項" + 0.011*"砍除" + 0.011*"文字" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"空白" + 0.010*"內容" 2025-04-19 00:09:05,720 : INFO : topic #2 (0.200): 0.019*"工作" + 0.018*"公司" + 0.010*"面試" + 0.008*"工程師" + 0.007*"問題" + 0.006*"時間" + 0.006*"比較" + 0.006*"經驗" + 0.005*"知道" + 0.005*"覺得" 2025-04-19 00:09:05,720 : INFO : topic #3 (0.200): 0.008*"台灣" + 0.008*"公司" + 0.007*"美國" + 0.006*"晶片" + 0.005*"表示" + 0.005*"技術" + 0.005*"中國" + 0.005*"半導體" + 0.004*"台積電" + 0.004*"員工" 2025-04-19 00:09:05,721 : INFO : topic #4 (0.200): 0.034*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"情形" + 0.011*"單位" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"第一項" 2025-04-19 00:09:05,721 : INFO : topic diff=0.270375, rho=0.299409 2025-04-19 00:09:05,791 : INFO : -8.301 per-word bound, 315.5 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:09:05,791 : INFO : PROGRESS: pass 2, at document #16310/16310 2025-04-19 00:09:05,826 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:09:05,829 : INFO : topic #0 (0.200): 0.034*"工作" + 0.021*"方式" + 0.017*"時間" + 0.016*"小時" + 0.014*"報名" + 0.012*"聯絡" + 0.011*"內容" + 0.011*"地點" + 0.011*"台北市" + 0.011*"活動" 2025-04-19 00:09:05,829 : INFO : topic #1 (0.200): 0.029*"工作" + 0.012*"方式" + 0.012*"情形" + 0.011*"第一項" + 0.011*"砍除" + 0.011*"文字" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"內容" + 0.010*"空白" 2025-04-19 00:09:05,830 : INFO : topic #2 (0.200): 0.018*"公司" + 0.017*"工作" + 0.009*"面試" + 0.007*"問題" + 0.007*"工程師" + 0.007*"知道" + 0.006*"真的" + 0.006*"時間" + 0.006*"比較" + 0.006*"覺得" 2025-04-19 00:09:05,830 : INFO : topic #3 (0.200): 0.009*"美國" + 0.009*"公司" + 0.009*"台灣" + 0.006*"晶片" + 0.006*"技術" + 0.005*"表示" + 0.005*"台積電" + 0.005*"中國" + 0.005*"科技" + 0.005*"台積" 2025-04-19 00:09:05,830 : INFO : topic #4 (0.200): 0.034*"工作" + 0.016*"推定" + 0.013*"空白" + 0.012*"方式" + 0.012*"砍除" + 0.011*"單位" + 0.011*"情形" + 0.011*"內容" + 0.011*"國定假日" + 0.011*"聯絡" 2025-04-19 00:09:05,831 : INFO : topic diff=0.297525, rho=0.299409 2025-04-19 00:09:05,831 : INFO : PROGRESS: pass 3, at document #2000/16310 2025-04-19 00:09:06,394 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:06,397 : INFO : topic #0 (0.200): 0.032*"工作" + 0.022*"方式" + 0.017*"時間" + 0.014*"小時" + 0.012*"報名" + 0.012*"電話" + 0.012*"內容" + 0.011*"通知" + 0.011*"聯絡" + 0.011*"台北市" 2025-04-19 00:09:06,397 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"情形" + 0.011*"第一項" + 0.011*"方式" + 0.011*"砍除" + 0.011*"文字" + 0.010*"空白" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"推定" 2025-04-19 00:09:06,398 : INFO : topic #2 (0.200): 0.017*"公司" + 0.017*"工作" + 0.009*"面試" + 0.007*"問題" + 0.007*"工程師" + 0.006*"知道" + 0.006*"時間" + 0.006*"真的" + 0.005*"比較" + 0.005*"覺得" 2025-04-19 00:09:06,398 : INFO : topic #3 (0.200): 0.009*"美國" + 0.009*"台灣" + 0.009*"公司" + 0.006*"晶片" + 0.006*"技術" + 0.005*"表示" + 0.005*"台積電" + 0.005*"中國" + 0.005*"科技" + 0.005*"台積" 2025-04-19 00:09:06,399 : INFO : topic #4 (0.200): 0.034*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"方式" + 0.013*"砍除" + 0.012*"單位" + 0.012*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"國定假日" 2025-04-19 00:09:06,399 : INFO : topic diff=0.816685, rho=0.286829 2025-04-19 00:09:06,399 : INFO : PROGRESS: pass 3, at document #4000/16310 2025-04-19 00:09:06,966 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:06,968 : INFO : topic #0 (0.200): 0.030*"工作" + 0.021*"方式" + 0.016*"時間" + 0.013*"小時" + 0.012*"電話" + 0.012*"報名" + 0.012*"內容" + 0.011*"通知" + 0.011*"聯絡" + 0.011*"地點" 2025-04-19 00:09:06,969 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"第一項" + 0.012*"情形" + 0.011*"砍除" + 0.011*"方式" + 0.011*"文字" + 0.011*"空白" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"應徵" 2025-04-19 00:09:06,969 : INFO : topic #2 (0.200): 0.017*"工作" + 0.017*"公司" + 0.009*"面試" + 0.007*"問題" + 0.006*"工程師" + 0.006*"知道" + 0.006*"時間" + 0.005*"比較" + 0.005*"經驗" + 0.005*"真的" 2025-04-19 00:09:06,974 : INFO : topic #3 (0.200): 0.009*"美國" + 0.009*"台灣" + 0.009*"公司" + 0.006*"晶片" + 0.005*"技術" + 0.005*"表示" + 0.005*"台積電" + 0.005*"中國" + 0.005*"科技" + 0.005*"台積" 2025-04-19 00:09:06,978 : INFO : topic #4 (0.200): 0.034*"工作" + 0.016*"推定" + 0.013*"方式" + 0.013*"空白" + 0.013*"砍除" + 0.012*"單位" + 0.012*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"應徵" 2025-04-19 00:09:06,981 : INFO : topic diff=0.338138, rho=0.286829 2025-04-19 00:09:06,987 : INFO : PROGRESS: pass 3, at document #6000/16310 2025-04-19 00:09:07,466 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:07,469 : INFO : topic #0 (0.200): 0.029*"工作" + 0.021*"方式" + 0.016*"時間" + 0.013*"報名" + 0.013*"電話" + 0.013*"小時" + 0.012*"內容" + 0.012*"活動" + 0.011*"通知" + 0.011*"台北市" 2025-04-19 00:09:07,469 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"第一項" + 0.012*"情形" + 0.011*"砍除" + 0.011*"方式" + 0.011*"文字" + 0.011*"空白" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"內容" 2025-04-19 00:09:07,470 : INFO : topic #2 (0.200): 0.017*"公司" + 0.017*"工作" + 0.009*"面試" + 0.007*"問題" + 0.006*"工程師" + 0.006*"經驗" + 0.006*"時間" + 0.006*"知道" + 0.005*"比較" + 0.005*"需要" 2025-04-19 00:09:07,470 : INFO : topic #3 (0.200): 0.009*"公司" + 0.008*"台灣" + 0.008*"美國" + 0.006*"晶片" + 0.005*"技術" + 0.005*"表示" + 0.005*"科技" + 0.005*"台積電" + 0.004*"中國" + 0.004*"產品" 2025-04-19 00:09:07,470 : INFO : topic #4 (0.200): 0.034*"工作" + 0.016*"推定" + 0.013*"方式" + 0.013*"空白" + 0.013*"砍除" + 0.012*"單位" + 0.012*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"應徵" 2025-04-19 00:09:07,471 : INFO : topic diff=0.219076, rho=0.286829 2025-04-19 00:09:07,471 : INFO : PROGRESS: pass 3, at document #8000/16310 2025-04-19 00:09:07,708 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:07,710 : INFO : topic #0 (0.200): 0.029*"工作" + 0.022*"方式" + 0.016*"時間" + 0.013*"小時" + 0.013*"報名" + 0.012*"電話" + 0.012*"內容" + 0.011*"聯絡" + 0.011*"活動" + 0.011*"台北市" 2025-04-19 00:09:07,711 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"第一項" + 0.012*"情形" + 0.011*"砍除" + 0.011*"方式" + 0.011*"文字" + 0.011*"空白" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"應徵" 2025-04-19 00:09:07,711 : INFO : topic #2 (0.200): 0.018*"公司" + 0.018*"工作" + 0.010*"面試" + 0.008*"問題" + 0.007*"工程師" + 0.007*"經驗" + 0.006*"時間" + 0.006*"開發" + 0.005*"比較" + 0.005*"覺得" 2025-04-19 00:09:07,712 : INFO : topic #3 (0.200): 0.009*"公司" + 0.009*"台灣" + 0.008*"美國" + 0.006*"技術" + 0.006*"產品" + 0.005*"科技" + 0.004*"晶片" + 0.004*"產業" + 0.004*"市場" + 0.004*"員工" 2025-04-19 00:09:07,713 : INFO : topic #4 (0.200): 0.034*"工作" + 0.016*"推定" + 0.013*"方式" + 0.013*"空白" + 0.013*"砍除" + 0.012*"單位" + 0.012*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"應徵" 2025-04-19 00:09:07,713 : INFO : topic diff=0.334620, rho=0.286829 2025-04-19 00:09:07,713 : INFO : PROGRESS: pass 3, at document #10000/16310 2025-04-19 00:09:07,929 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:07,931 : INFO : topic #0 (0.200): 0.030*"工作" + 0.022*"方式" + 0.017*"時間" + 0.014*"小時" + 0.013*"報名" + 0.012*"聯絡" + 0.012*"電話" + 0.012*"內容" + 0.012*"活動" + 0.011*"台北市" 2025-04-19 00:09:07,932 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"第一項" + 0.012*"情形" + 0.011*"砍除" + 0.011*"方式" + 0.011*"文字" + 0.011*"空白" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"應徵" 2025-04-19 00:09:07,932 : INFO : topic #2 (0.200): 0.018*"公司" + 0.017*"工作" + 0.011*"面試" + 0.008*"問題" + 0.007*"工程師" + 0.007*"經驗" + 0.006*"時間" + 0.006*"開發" + 0.006*"比較" + 0.005*"覺得" 2025-04-19 00:09:07,933 : INFO : topic #3 (0.200): 0.009*"台灣" + 0.009*"公司" + 0.007*"美國" + 0.006*"產品" + 0.006*"技術" + 0.005*"產業" + 0.004*"科技" + 0.004*"員工" + 0.004*"市場" + 0.004*"表示" 2025-04-19 00:09:07,933 : INFO : topic #4 (0.200): 0.034*"工作" + 0.016*"推定" + 0.013*"方式" + 0.013*"空白" + 0.013*"砍除" + 0.012*"單位" + 0.012*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"應徵" 2025-04-19 00:09:07,933 : INFO : topic diff=0.266511, rho=0.286829 2025-04-19 00:09:07,934 : INFO : PROGRESS: pass 3, at document #12000/16310 2025-04-19 00:09:08,169 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:08,171 : INFO : topic #0 (0.200): 0.030*"工作" + 0.022*"方式" + 0.017*"時間" + 0.014*"小時" + 0.014*"報名" + 0.012*"活動" + 0.012*"聯絡" + 0.011*"內容" + 0.011*"電話" + 0.011*"台北市" 2025-04-19 00:09:08,172 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"第一項" + 0.012*"情形" + 0.011*"文字" + 0.011*"砍除" + 0.011*"方式" + 0.011*"空白" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"應徵" 2025-04-19 00:09:08,172 : INFO : topic #2 (0.200): 0.018*"公司" + 0.017*"工作" + 0.010*"面試" + 0.008*"問題" + 0.007*"工程師" + 0.006*"經驗" + 0.006*"時間" + 0.006*"開發" + 0.006*"比較" + 0.005*"覺得" 2025-04-19 00:09:08,173 : INFO : topic #3 (0.200): 0.008*"台灣" + 0.008*"公司" + 0.006*"美國" + 0.005*"技術" + 0.005*"晶片" + 0.005*"產品" + 0.005*"產業" + 0.004*"科技" + 0.004*"表示" + 0.004*"員工" 2025-04-19 00:09:08,173 : INFO : topic #4 (0.200): 0.034*"工作" + 0.016*"推定" + 0.013*"方式" + 0.013*"空白" + 0.013*"砍除" + 0.012*"單位" + 0.012*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"應徵" 2025-04-19 00:09:08,173 : INFO : topic diff=0.284731, rho=0.286829 2025-04-19 00:09:08,174 : INFO : PROGRESS: pass 3, at document #14000/16310 2025-04-19 00:09:08,389 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:08,391 : INFO : topic #0 (0.200): 0.029*"工作" + 0.021*"方式" + 0.017*"時間" + 0.014*"報名" + 0.014*"小時" + 0.012*"活動" + 0.012*"聯絡" + 0.011*"電話" + 0.011*"內容" + 0.011*"台北市" 2025-04-19 00:09:08,392 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"情形" + 0.012*"第一項" + 0.011*"文字" + 0.011*"砍除" + 0.011*"方式" + 0.010*"空白" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"內容" 2025-04-19 00:09:08,392 : INFO : topic #2 (0.200): 0.017*"工作" + 0.017*"公司" + 0.010*"面試" + 0.008*"問題" + 0.007*"工程師" + 0.006*"經驗" + 0.006*"時間" + 0.006*"比較" + 0.005*"開發" + 0.005*"覺得" 2025-04-19 00:09:08,393 : INFO : topic #3 (0.200): 0.009*"台灣" + 0.008*"公司" + 0.006*"美國" + 0.005*"晶片" + 0.005*"表示" + 0.005*"科技" + 0.005*"技術" + 0.005*"產業" + 0.005*"員工" + 0.004*"半導體" 2025-04-19 00:09:08,393 : INFO : topic #4 (0.200): 0.034*"工作" + 0.016*"推定" + 0.013*"方式" + 0.013*"空白" + 0.013*"砍除" + 0.012*"單位" + 0.012*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"應徵" 2025-04-19 00:09:08,394 : INFO : topic diff=0.281097, rho=0.286829 2025-04-19 00:09:08,394 : INFO : PROGRESS: pass 3, at document #16000/16310 2025-04-19 00:09:08,602 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:08,604 : INFO : topic #0 (0.200): 0.029*"工作" + 0.021*"方式" + 0.016*"時間" + 0.014*"報名" + 0.014*"小時" + 0.012*"活動" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"台北市" + 0.011*"電話" 2025-04-19 00:09:08,605 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"情形" + 0.011*"第一項" + 0.011*"文字" + 0.011*"方式" + 0.011*"砍除" + 0.010*"空白" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"內容" 2025-04-19 00:09:08,605 : INFO : topic #2 (0.200): 0.018*"公司" + 0.017*"工作" + 0.009*"面試" + 0.008*"工程師" + 0.007*"問題" + 0.006*"比較" + 0.006*"時間" + 0.006*"經驗" + 0.005*"知道" + 0.005*"覺得" 2025-04-19 00:09:08,606 : INFO : topic #3 (0.200): 0.009*"台灣" + 0.008*"公司" + 0.008*"美國" + 0.006*"晶片" + 0.006*"表示" + 0.005*"中國" + 0.005*"半導體" + 0.005*"技術" + 0.005*"台積電" + 0.005*"員工" 2025-04-19 00:09:08,606 : INFO : topic #4 (0.200): 0.034*"工作" + 0.016*"推定" + 0.013*"方式" + 0.013*"空白" + 0.013*"砍除" + 0.012*"單位" + 0.012*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"應徵" 2025-04-19 00:09:08,607 : INFO : topic diff=0.256670, rho=0.286829 2025-04-19 00:09:08,675 : INFO : -8.279 per-word bound, 310.7 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:09:08,675 : INFO : PROGRESS: pass 3, at document #16310/16310 2025-04-19 00:09:08,733 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:09:08,736 : INFO : topic #0 (0.200): 0.031*"工作" + 0.019*"方式" + 0.017*"時間" + 0.015*"小時" + 0.014*"報名" + 0.013*"活動" + 0.011*"台北市" + 0.011*"聯絡" + 0.010*"地點" + 0.010*"內容" 2025-04-19 00:09:08,736 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"情形" + 0.011*"第一項" + 0.011*"文字" + 0.011*"方式" + 0.011*"砍除" + 0.010*"空白" + 0.010*"資訊" + 0.010*"分類" + 0.010*"聯絡" 2025-04-19 00:09:08,737 : INFO : topic #2 (0.200): 0.018*"公司" + 0.016*"工作" + 0.009*"面試" + 0.007*"問題" + 0.007*"工程師" + 0.007*"知道" + 0.006*"真的" + 0.005*"時間" + 0.005*"比較" + 0.005*"覺得" 2025-04-19 00:09:08,737 : INFO : topic #3 (0.200): 0.010*"美國" + 0.009*"台灣" + 0.008*"公司" + 0.007*"晶片" + 0.006*"表示" + 0.006*"技術" + 0.005*"台積電" + 0.005*"中國" + 0.005*"科技" + 0.005*"台積" 2025-04-19 00:09:08,737 : INFO : topic #4 (0.200): 0.034*"工作" + 0.016*"推定" + 0.013*"方式" + 0.013*"空白" + 0.012*"砍除" + 0.012*"單位" + 0.011*"內容" + 0.011*"情形" + 0.011*"國定假日" + 0.011*"聯絡" 2025-04-19 00:09:08,738 : INFO : topic diff=0.278675, rho=0.286829 2025-04-19 00:09:08,738 : INFO : PROGRESS: pass 4, at document #2000/16310 2025-04-19 00:09:09,291 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:09,294 : INFO : topic #0 (0.200): 0.029*"工作" + 0.020*"方式" + 0.017*"時間" + 0.013*"小時" + 0.013*"報名" + 0.012*"電話" + 0.012*"活動" + 0.011*"台北市" + 0.011*"通知" + 0.011*"內容" 2025-04-19 00:09:09,294 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"第一項" + 0.011*"情形" + 0.011*"文字" + 0.011*"砍除" + 0.011*"方式" + 0.010*"空白" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"分類" 2025-04-19 00:09:09,295 : INFO : topic #2 (0.200): 0.017*"公司" + 0.016*"工作" + 0.009*"面試" + 0.007*"問題" + 0.007*"工程師" + 0.006*"知道" + 0.005*"時間" + 0.005*"真的" + 0.005*"比較" + 0.005*"覺得" 2025-04-19 00:09:09,295 : INFO : topic #3 (0.200): 0.009*"美國" + 0.009*"台灣" + 0.008*"公司" + 0.007*"晶片" + 0.006*"表示" + 0.005*"技術" + 0.005*"台積電" + 0.005*"中國" + 0.005*"科技" + 0.005*"台積" 2025-04-19 00:09:09,296 : INFO : topic #4 (0.200): 0.034*"工作" + 0.017*"推定" + 0.013*"方式" + 0.013*"空白" + 0.013*"砍除" + 0.012*"單位" + 0.012*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"國定假日" 2025-04-19 00:09:09,296 : INFO : topic diff=0.742210, rho=0.275711 2025-04-19 00:09:09,296 : INFO : PROGRESS: pass 4, at document #4000/16310 2025-04-19 00:09:09,864 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:09,867 : INFO : topic #0 (0.200): 0.028*"工作" + 0.020*"方式" + 0.016*"時間" + 0.013*"小時" + 0.013*"電話" + 0.012*"報名" + 0.011*"活動" + 0.011*"通知" + 0.011*"內容" + 0.011*"台北市" 2025-04-19 00:09:09,868 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"第一項" + 0.012*"情形" + 0.011*"砍除" + 0.011*"文字" + 0.011*"方式" + 0.011*"空白" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"應徵" 2025-04-19 00:09:09,868 : INFO : topic #2 (0.200): 0.017*"公司" + 0.016*"工作" + 0.009*"面試" + 0.007*"問題" + 0.007*"工程師" + 0.006*"知道" + 0.005*"時間" + 0.005*"比較" + 0.005*"真的" + 0.005*"經驗" 2025-04-19 00:09:09,868 : INFO : topic #3 (0.200): 0.009*"美國" + 0.009*"台灣" + 0.008*"公司" + 0.007*"晶片" + 0.006*"表示" + 0.005*"技術" + 0.005*"台積電" + 0.005*"中國" + 0.005*"科技" + 0.005*"台積" 2025-04-19 00:09:09,869 : INFO : topic #4 (0.200): 0.035*"工作" + 0.017*"推定" + 0.013*"方式" + 0.013*"空白" + 0.013*"砍除" + 0.012*"單位" + 0.012*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"國定假日" 2025-04-19 00:09:09,869 : INFO : topic diff=0.322819, rho=0.275711 2025-04-19 00:09:09,870 : INFO : PROGRESS: pass 4, at document #6000/16310 2025-04-19 00:09:10,337 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:10,348 : INFO : topic #0 (0.200): 0.027*"工作" + 0.020*"方式" + 0.016*"時間" + 0.014*"報名" + 0.013*"電話" + 0.012*"活動" + 0.012*"小時" + 0.011*"通知" + 0.011*"內容" + 0.011*"台北市" 2025-04-19 00:09:10,351 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"第一項" + 0.012*"情形" + 0.011*"文字" + 0.011*"砍除" + 0.011*"方式" + 0.011*"空白" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"分類" 2025-04-19 00:09:10,355 : INFO : topic #2 (0.200): 0.018*"公司" + 0.016*"工作" + 0.008*"面試" + 0.007*"問題" + 0.006*"工程師" + 0.006*"經驗" + 0.006*"知道" + 0.006*"時間" + 0.005*"比較" + 0.004*"需要" 2025-04-19 00:09:10,360 : INFO : topic #3 (0.200): 0.009*"美國" + 0.009*"台灣" + 0.009*"公司" + 0.006*"晶片" + 0.005*"表示" + 0.005*"技術" + 0.005*"台積電" + 0.005*"科技" + 0.005*"中國" + 0.005*"員工" 2025-04-19 00:09:10,361 : INFO : topic #4 (0.200): 0.035*"工作" + 0.017*"推定" + 0.014*"方式" + 0.013*"空白" + 0.013*"砍除" + 0.012*"單位" + 0.012*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"應徵" 2025-04-19 00:09:10,361 : INFO : topic diff=0.203179, rho=0.275711 2025-04-19 00:09:10,362 : INFO : PROGRESS: pass 4, at document #8000/16310 2025-04-19 00:09:10,597 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:10,599 : INFO : topic #0 (0.200): 0.027*"工作" + 0.021*"方式" + 0.017*"時間" + 0.013*"報名" + 0.013*"小時" + 0.013*"電話" + 0.012*"活動" + 0.012*"台北市" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:09:10,600 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"第一項" + 0.012*"情形" + 0.011*"文字" + 0.011*"砍除" + 0.011*"方式" + 0.011*"空白" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"應徵" 2025-04-19 00:09:10,600 : INFO : topic #2 (0.200): 0.018*"公司" + 0.017*"工作" + 0.010*"面試" + 0.008*"問題" + 0.007*"工程師" + 0.007*"經驗" + 0.006*"時間" + 0.006*"開發" + 0.005*"比較" + 0.005*"覺得" 2025-04-19 00:09:10,601 : INFO : topic #3 (0.200): 0.009*"台灣" + 0.009*"公司" + 0.009*"美國" + 0.006*"技術" + 0.005*"晶片" + 0.005*"科技" + 0.005*"產品" + 0.005*"表示" + 0.005*"員工" + 0.005*"產業" 2025-04-19 00:09:10,601 : INFO : topic #4 (0.200): 0.035*"工作" + 0.017*"推定" + 0.014*"方式" + 0.013*"空白" + 0.013*"砍除" + 0.012*"單位" + 0.012*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"應徵" 2025-04-19 00:09:10,601 : INFO : topic diff=0.307448, rho=0.275711 2025-04-19 00:09:10,602 : INFO : PROGRESS: pass 4, at document #10000/16310 2025-04-19 00:09:10,814 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:10,816 : INFO : topic #0 (0.200): 0.027*"工作" + 0.021*"方式" + 0.017*"時間" + 0.013*"報名" + 0.013*"小時" + 0.012*"活動" + 0.012*"電話" + 0.012*"聯絡" + 0.012*"台北市" + 0.011*"內容" 2025-04-19 00:09:10,817 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"第一項" + 0.012*"情形" + 0.011*"文字" + 0.011*"砍除" + 0.011*"方式" + 0.011*"空白" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"分類" 2025-04-19 00:09:10,817 : INFO : topic #2 (0.200): 0.018*"公司" + 0.017*"工作" + 0.010*"面試" + 0.008*"問題" + 0.007*"工程師" + 0.007*"經驗" + 0.006*"開發" + 0.006*"時間" + 0.006*"比較" + 0.005*"覺得" 2025-04-19 00:09:10,818 : INFO : topic #3 (0.200): 0.010*"台灣" + 0.009*"公司" + 0.008*"美國" + 0.006*"技術" + 0.005*"產品" + 0.005*"科技" + 0.005*"產業" + 0.004*"員工" + 0.004*"市場" + 0.004*"表示" 2025-04-19 00:09:10,818 : INFO : topic #4 (0.200): 0.035*"工作" + 0.017*"推定" + 0.014*"方式" + 0.013*"空白" + 0.013*"砍除" + 0.012*"單位" + 0.012*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"應徵" 2025-04-19 00:09:10,818 : INFO : topic diff=0.249498, rho=0.275711 2025-04-19 00:09:10,819 : INFO : PROGRESS: pass 4, at document #12000/16310 2025-04-19 00:09:11,029 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:11,032 : INFO : topic #0 (0.200): 0.027*"工作" + 0.021*"方式" + 0.017*"時間" + 0.014*"報名" + 0.013*"小時" + 0.013*"活動" + 0.012*"聯絡" + 0.011*"電話" + 0.011*"台北市" + 0.011*"內容" 2025-04-19 00:09:11,032 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"第一項" + 0.012*"情形" + 0.011*"文字" + 0.011*"砍除" + 0.011*"方式" + 0.011*"空白" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"分類" 2025-04-19 00:09:11,033 : INFO : topic #2 (0.200): 0.017*"公司" + 0.016*"工作" + 0.010*"面試" + 0.008*"問題" + 0.007*"工程師" + 0.006*"經驗" + 0.006*"開發" + 0.006*"時間" + 0.006*"比較" + 0.005*"覺得" 2025-04-19 00:09:11,033 : INFO : topic #3 (0.200): 0.009*"台灣" + 0.008*"公司" + 0.007*"美國" + 0.005*"晶片" + 0.005*"技術" + 0.005*"科技" + 0.005*"表示" + 0.005*"產業" + 0.004*"員工" + 0.004*"半導體" 2025-04-19 00:09:11,034 : INFO : topic #4 (0.200): 0.035*"工作" + 0.017*"推定" + 0.014*"方式" + 0.013*"空白" + 0.012*"砍除" + 0.012*"單位" + 0.012*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"應徵" 2025-04-19 00:09:11,034 : INFO : topic diff=0.269872, rho=0.275711 2025-04-19 00:09:11,034 : INFO : PROGRESS: pass 4, at document #14000/16310 2025-04-19 00:09:11,278 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:11,280 : INFO : topic #0 (0.200): 0.027*"工作" + 0.021*"方式" + 0.017*"時間" + 0.014*"報名" + 0.013*"活動" + 0.013*"小時" + 0.011*"聯絡" + 0.011*"電話" + 0.011*"台北市" + 0.011*"內容" 2025-04-19 00:09:11,281 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"第一項" + 0.012*"情形" + 0.011*"文字" + 0.011*"砍除" + 0.011*"方式" + 0.011*"空白" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"分類" 2025-04-19 00:09:11,281 : INFO : topic #2 (0.200): 0.017*"公司" + 0.017*"工作" + 0.009*"面試" + 0.008*"問題" + 0.007*"工程師" + 0.006*"經驗" + 0.006*"比較" + 0.006*"時間" + 0.005*"開發" + 0.005*"覺得" 2025-04-19 00:09:11,282 : INFO : topic #3 (0.200): 0.009*"台灣" + 0.008*"公司" + 0.006*"美國" + 0.006*"晶片" + 0.005*"表示" + 0.005*"科技" + 0.005*"員工" + 0.005*"技術" + 0.005*"產業" + 0.005*"半導體" 2025-04-19 00:09:11,282 : INFO : topic #4 (0.200): 0.035*"工作" + 0.017*"推定" + 0.013*"方式" + 0.013*"空白" + 0.012*"砍除" + 0.012*"單位" + 0.012*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"應徵" 2025-04-19 00:09:11,283 : INFO : topic diff=0.267742, rho=0.275711 2025-04-19 00:09:11,283 : INFO : PROGRESS: pass 4, at document #16000/16310 2025-04-19 00:09:11,491 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:11,493 : INFO : topic #0 (0.200): 0.026*"工作" + 0.020*"方式" + 0.017*"時間" + 0.015*"報名" + 0.013*"小時" + 0.013*"活動" + 0.011*"聯絡" + 0.011*"台北市" + 0.011*"電話" + 0.010*"內容" 2025-04-19 00:09:11,493 : INFO : topic #1 (0.200): 0.030*"工作" + 0.012*"情形" + 0.012*"第一項" + 0.011*"文字" + 0.011*"砍除" + 0.011*"方式" + 0.010*"空白" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"分類" 2025-04-19 00:09:11,494 : INFO : topic #2 (0.200): 0.017*"公司" + 0.017*"工作" + 0.009*"面試" + 0.008*"工程師" + 0.007*"問題" + 0.006*"比較" + 0.006*"時間" + 0.005*"經驗" + 0.005*"知道" + 0.005*"覺得" 2025-04-19 00:09:11,494 : INFO : topic #3 (0.200): 0.009*"台灣" + 0.008*"公司" + 0.008*"美國" + 0.007*"晶片" + 0.006*"表示" + 0.005*"中國" + 0.005*"半導體" + 0.005*"台積電" + 0.005*"員工" + 0.005*"技術" 2025-04-19 00:09:11,495 : INFO : topic #4 (0.200): 0.034*"工作" + 0.017*"推定" + 0.013*"方式" + 0.012*"空白" + 0.012*"砍除" + 0.012*"單位" + 0.012*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"應徵" 2025-04-19 00:09:11,495 : INFO : topic diff=0.243291, rho=0.275711 2025-04-19 00:09:11,564 : INFO : -8.268 per-word bound, 308.3 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:09:11,564 : INFO : PROGRESS: pass 4, at document #16310/16310 2025-04-19 00:09:11,598 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:09:11,600 : INFO : topic #0 (0.200): 0.028*"工作" + 0.019*"方式" + 0.017*"時間" + 0.014*"報名" + 0.014*"小時" + 0.013*"活動" + 0.011*"台北市" + 0.010*"聯絡" + 0.010*"地點" + 0.010*"電話" 2025-04-19 00:09:11,601 : INFO : topic #1 (0.200): 0.030*"工作" + 0.011*"情形" + 0.011*"第一項" + 0.011*"文字" + 0.011*"砍除" + 0.011*"方式" + 0.010*"空白" + 0.010*"分類" + 0.010*"資訊" + 0.010*"聯絡" 2025-04-19 00:09:11,601 : INFO : topic #2 (0.200): 0.017*"公司" + 0.016*"工作" + 0.008*"面試" + 0.007*"問題" + 0.007*"工程師" + 0.006*"知道" + 0.005*"真的" + 0.005*"時間" + 0.005*"比較" + 0.005*"覺得" 2025-04-19 00:09:11,602 : INFO : topic #3 (0.200): 0.010*"美國" + 0.009*"台灣" + 0.008*"公司" + 0.007*"晶片" + 0.006*"表示" + 0.006*"台積電" + 0.006*"技術" + 0.005*"中國" + 0.005*"科技" + 0.005*"台積" 2025-04-19 00:09:11,602 : INFO : topic #4 (0.200): 0.034*"工作" + 0.016*"推定" + 0.013*"方式" + 0.012*"空白" + 0.012*"砍除" + 0.012*"單位" + 0.012*"內容" + 0.012*"情形" + 0.011*"國定假日" + 0.011*"聯絡" 2025-04-19 00:09:11,603 : INFO : topic diff=0.262556, rho=0.275711 2025-04-19 00:09:11,603 : INFO : LdaModel lifecycle event {'msg': 'trained LdaModel<num_terms=23261, num_topics=5, decay=0.5, chunksize=2000> in 15.10s', 'datetime': '2025-04-19T00:09:11.603411', 'gensim': '4.3.3', 'python': '3.11.2 (main, Apr 21 2023, 22:51:21) [Clang 14.0.3 (clang-1403.0.22.14.1)]', 'platform': 'macOS-15.3.2-arm64-arm-64bit', 'event': 'created'} 2025-04-19 00:09:16,814 : INFO : -6.986 per-word bound, 126.7 perplexity estimate based on a held-out corpus of 16310 documents with 3460358 words 2025-04-19 00:09:16,816 : INFO : using ParallelWordOccurrenceAccumulator<processes=7, batch_size=64> to estimate probabilities from sliding windows 2025-04-19 00:09:20,615 : INFO : 1 batches submitted to accumulate stats from 64 documents (22660 virtual) 2025-04-19 00:09:20,618 : INFO : 2 batches submitted to accumulate stats from 128 documents (45646 virtual) 2025-04-19 00:09:20,621 : INFO : 3 batches submitted to accumulate stats from 192 documents (67171 virtual) 2025-04-19 00:09:20,624 : INFO : 4 batches submitted to accumulate stats from 256 documents (88330 virtual) 2025-04-19 00:09:20,627 : INFO : 5 batches submitted to accumulate stats from 320 documents (109687 virtual) 2025-04-19 00:09:20,630 : INFO : 6 batches submitted to accumulate stats from 384 documents (131042 virtual) 2025-04-19 00:09:20,635 : INFO : 7 batches submitted to accumulate stats from 448 documents (153774 virtual) 2025-04-19 00:09:20,648 : INFO : 8 batches submitted to accumulate stats from 512 documents (176164 virtual) 2025-04-19 00:09:20,657 : INFO : 9 batches submitted to accumulate stats from 576 documents (197020 virtual) 2025-04-19 00:09:20,662 : INFO : 10 batches submitted to accumulate stats from 640 documents (218505 virtual) 2025-04-19 00:09:20,666 : INFO : 11 batches submitted to accumulate stats from 704 documents (240803 virtual) 2025-04-19 00:09:20,671 : INFO : 12 batches submitted to accumulate stats from 768 documents (265360 virtual) 2025-04-19 00:09:20,673 : INFO : 13 batches submitted to accumulate stats from 832 documents (286615 virtual) 2025-04-19 00:09:20,677 : INFO : 14 batches submitted to accumulate stats from 896 documents (310833 virtual) 2025-04-19 00:09:20,756 : INFO : 15 batches submitted to accumulate stats from 960 documents (331313 virtual) 2025-04-19 00:09:20,772 : INFO : 16 batches submitted to accumulate stats from 1024 documents (350940 virtual) 2025-04-19 00:09:20,779 : INFO : 17 batches submitted to accumulate stats from 1088 documents (368371 virtual) 2025-04-19 00:09:20,782 : INFO : 18 batches submitted to accumulate stats from 1152 documents (390334 virtual) 2025-04-19 00:09:20,788 : INFO : 19 batches submitted to accumulate stats from 1216 documents (414153 virtual) 2025-04-19 00:09:20,818 : INFO : 20 batches submitted to accumulate stats from 1280 documents (435684 virtual) 2025-04-19 00:09:20,914 : INFO : 21 batches submitted to accumulate stats from 1344 documents (459433 virtual) 2025-04-19 00:09:20,922 : INFO : 22 batches submitted to accumulate stats from 1408 documents (483210 virtual) 2025-04-19 00:09:20,938 : INFO : 23 batches submitted to accumulate stats from 1472 documents (507391 virtual) 2025-04-19 00:09:20,958 : INFO : 24 batches submitted to accumulate stats from 1536 documents (527404 virtual) 2025-04-19 00:09:20,986 : INFO : 25 batches submitted to accumulate stats from 1600 documents (550178 virtual) 2025-04-19 00:09:20,993 : INFO : 26 batches submitted to accumulate stats from 1664 documents (575041 virtual) 2025-04-19 00:09:21,027 : INFO : 27 batches submitted to accumulate stats from 1728 documents (598912 virtual) 2025-04-19 00:09:21,060 : INFO : 28 batches submitted to accumulate stats from 1792 documents (622487 virtual) 2025-04-19 00:09:21,079 : INFO : 29 batches submitted to accumulate stats from 1856 documents (648902 virtual) 2025-04-19 00:09:21,093 : INFO : 30 batches submitted to accumulate stats from 1920 documents (671126 virtual) 2025-04-19 00:09:21,098 : INFO : 31 batches submitted to accumulate stats from 1984 documents (693717 virtual) 2025-04-19 00:09:21,146 : INFO : 32 batches submitted to accumulate stats from 2048 documents (714139 virtual) 2025-04-19 00:09:21,167 : INFO : 33 batches submitted to accumulate stats from 2112 documents (736202 virtual) 2025-04-19 00:09:21,179 : INFO : 34 batches submitted to accumulate stats from 2176 documents (758687 virtual) 2025-04-19 00:09:21,243 : INFO : 35 batches submitted to accumulate stats from 2240 documents (779677 virtual) 2025-04-19 00:09:21,247 : INFO : 36 batches submitted to accumulate stats from 2304 documents (800483 virtual) 2025-04-19 00:09:21,253 : INFO : 37 batches submitted to accumulate stats from 2368 documents (821258 virtual) 2025-04-19 00:09:21,273 : INFO : 38 batches submitted to accumulate stats from 2432 documents (844326 virtual) 2025-04-19 00:09:21,326 : INFO : 39 batches submitted to accumulate stats from 2496 documents (868823 virtual) 2025-04-19 00:09:21,340 : INFO : 40 batches submitted to accumulate stats from 2560 documents (888215 virtual) 2025-04-19 00:09:21,374 : INFO : 41 batches submitted to accumulate stats from 2624 documents (910499 virtual) 2025-04-19 00:09:21,396 : INFO : 42 batches submitted to accumulate stats from 2688 documents (931945 virtual) 2025-04-19 00:09:21,437 : INFO : 43 batches submitted to accumulate stats from 2752 documents (954111 virtual) 2025-04-19 00:09:21,476 : INFO : 44 batches submitted to accumulate stats from 2816 documents (975617 virtual) 2025-04-19 00:09:21,481 : INFO : 45 batches submitted to accumulate stats from 2880 documents (995125 virtual) 2025-04-19 00:09:21,526 : INFO : 46 batches submitted to accumulate stats from 2944 documents (1016531 virtual) 2025-04-19 00:09:21,580 : INFO : 47 batches submitted to accumulate stats from 3008 documents (1038247 virtual) 2025-04-19 00:09:21,586 : INFO : 48 batches submitted to accumulate stats from 3072 documents (1063862 virtual) 2025-04-19 00:09:21,594 : INFO : 49 batches submitted to accumulate stats from 3136 documents (1087898 virtual) 2025-04-19 00:09:21,610 : INFO : 50 batches submitted to accumulate stats from 3200 documents (1110531 virtual) 2025-04-19 00:09:21,617 : INFO : 51 batches submitted to accumulate stats from 3264 documents (1133127 virtual) 2025-04-19 00:09:21,650 : INFO : 52 batches submitted to accumulate stats from 3328 documents (1153766 virtual) 2025-04-19 00:09:21,691 : INFO : 53 batches submitted to accumulate stats from 3392 documents (1177684 virtual) 2025-04-19 00:09:21,742 : INFO : 54 batches submitted to accumulate stats from 3456 documents (1200190 virtual) 2025-04-19 00:09:21,750 : INFO : 55 batches submitted to accumulate stats from 3520 documents (1225029 virtual) 2025-04-19 00:09:21,754 : INFO : 56 batches submitted to accumulate stats from 3584 documents (1249662 virtual) 2025-04-19 00:09:21,771 : INFO : 57 batches submitted to accumulate stats from 3648 documents (1274547 virtual) 2025-04-19 00:09:21,776 : INFO : 58 batches submitted to accumulate stats from 3712 documents (1297434 virtual) 2025-04-19 00:09:21,781 : INFO : 59 batches submitted to accumulate stats from 3776 documents (1319261 virtual) 2025-04-19 00:09:21,864 : INFO : 60 batches submitted to accumulate stats from 3840 documents (1341972 virtual) 2025-04-19 00:09:21,891 : INFO : 61 batches submitted to accumulate stats from 3904 documents (1364269 virtual) 2025-04-19 00:09:21,897 : INFO : 62 batches submitted to accumulate stats from 3968 documents (1386796 virtual) 2025-04-19 00:09:21,942 : INFO : 63 batches submitted to accumulate stats from 4032 documents (1410249 virtual) 2025-04-19 00:09:21,958 : INFO : 64 batches submitted to accumulate stats from 4096 documents (1433115 virtual) 2025-04-19 00:09:21,962 : INFO : 65 batches submitted to accumulate stats from 4160 documents (1453873 virtual) 2025-04-19 00:09:21,978 : INFO : 66 batches submitted to accumulate stats from 4224 documents (1475474 virtual) 2025-04-19 00:09:22,055 : INFO : 67 batches submitted to accumulate stats from 4288 documents (1497524 virtual) 2025-04-19 00:09:22,096 : INFO : 68 batches submitted to accumulate stats from 4352 documents (1516835 virtual) 2025-04-19 00:09:22,112 : INFO : 69 batches submitted to accumulate stats from 4416 documents (1536986 virtual) 2025-04-19 00:09:22,128 : INFO : 70 batches submitted to accumulate stats from 4480 documents (1558454 virtual) 2025-04-19 00:09:22,134 : INFO : 71 batches submitted to accumulate stats from 4544 documents (1580610 virtual) 2025-04-19 00:09:22,141 : INFO : 72 batches submitted to accumulate stats from 4608 documents (1603508 virtual) 2025-04-19 00:09:22,146 : INFO : 73 batches submitted to accumulate stats from 4672 documents (1624378 virtual) 2025-04-19 00:09:22,209 : INFO : 74 batches submitted to accumulate stats from 4736 documents (1646402 virtual) 2025-04-19 00:09:22,252 : INFO : 75 batches submitted to accumulate stats from 4800 documents (1668704 virtual) 2025-04-19 00:09:22,263 : INFO : 76 batches submitted to accumulate stats from 4864 documents (1690394 virtual) 2025-04-19 00:09:22,296 : INFO : 77 batches submitted to accumulate stats from 4928 documents (1713028 virtual) 2025-04-19 00:09:22,308 : INFO : 78 batches submitted to accumulate stats from 4992 documents (1735434 virtual) 2025-04-19 00:09:22,315 : INFO : 79 batches submitted to accumulate stats from 5056 documents (1755430 virtual) 2025-04-19 00:09:22,322 : INFO : 80 batches submitted to accumulate stats from 5120 documents (1779164 virtual) 2025-04-19 00:09:22,350 : INFO : 81 batches submitted to accumulate stats from 5184 documents (1799023 virtual) 2025-04-19 00:09:22,396 : INFO : 82 batches submitted to accumulate stats from 5248 documents (1821516 virtual) 2025-04-19 00:09:22,416 : INFO : 83 batches submitted to accumulate stats from 5312 documents (1844224 virtual) 2025-04-19 00:09:22,442 : INFO : 84 batches submitted to accumulate stats from 5376 documents (1864739 virtual) 2025-04-19 00:09:22,456 : INFO : 85 batches submitted to accumulate stats from 5440 documents (1885053 virtual) 2025-04-19 00:09:22,513 : INFO : 86 batches submitted to accumulate stats from 5504 documents (1902170 virtual) 2025-04-19 00:09:22,522 : INFO : 87 batches submitted to accumulate stats from 5568 documents (1924910 virtual) 2025-04-19 00:09:22,556 : INFO : 88 batches submitted to accumulate stats from 5632 documents (1931530 virtual) 2025-04-19 00:09:22,584 : INFO : 89 batches submitted to accumulate stats from 5696 documents (1941414 virtual) 2025-04-19 00:09:22,591 : INFO : 90 batches submitted to accumulate stats from 5760 documents (1950642 virtual) 2025-04-19 00:09:22,632 : INFO : 91 batches submitted to accumulate stats from 5824 documents (1957200 virtual) 2025-04-19 00:09:22,658 : INFO : 92 batches submitted to accumulate stats from 5888 documents (1964937 virtual) 2025-04-19 00:09:22,675 : INFO : 93 batches submitted to accumulate stats from 5952 documents (1974259 virtual) 2025-04-19 00:09:22,701 : INFO : 94 batches submitted to accumulate stats from 6016 documents (1988296 virtual) 2025-04-19 00:09:22,708 : INFO : 95 batches submitted to accumulate stats from 6080 documents (1997659 virtual) 2025-04-19 00:09:22,733 : INFO : 96 batches submitted to accumulate stats from 6144 documents (2009678 virtual) 2025-04-19 00:09:22,751 : INFO : 97 batches submitted to accumulate stats from 6208 documents (2019297 virtual) 2025-04-19 00:09:22,769 : INFO : 98 batches submitted to accumulate stats from 6272 documents (2031857 virtual) 2025-04-19 00:09:22,775 : INFO : 99 batches submitted to accumulate stats from 6336 documents (2044117 virtual) 2025-04-19 00:09:22,779 : INFO : 100 batches submitted to accumulate stats from 6400 documents (2053380 virtual) 2025-04-19 00:09:22,797 : INFO : 101 batches submitted to accumulate stats from 6464 documents (2066889 virtual) 2025-04-19 00:09:22,804 : INFO : 102 batches submitted to accumulate stats from 6528 documents (2075479 virtual) 2025-04-19 00:09:22,814 : INFO : 103 batches submitted to accumulate stats from 6592 documents (2085095 virtual) 2025-04-19 00:09:22,820 : INFO : 104 batches submitted to accumulate stats from 6656 documents (2093845 virtual) 2025-04-19 00:09:22,829 : INFO : 105 batches submitted to accumulate stats from 6720 documents (2102407 virtual) 2025-04-19 00:09:22,843 : INFO : 106 batches submitted to accumulate stats from 6784 documents (2111466 virtual) 2025-04-19 00:09:22,853 : INFO : 107 batches submitted to accumulate stats from 6848 documents (2121845 virtual) 2025-04-19 00:09:22,861 : INFO : 108 batches submitted to accumulate stats from 6912 documents (2129219 virtual) 2025-04-19 00:09:22,871 : INFO : 109 batches submitted to accumulate stats from 6976 documents (2137886 virtual) 2025-04-19 00:09:22,877 : INFO : 110 batches submitted to accumulate stats from 7040 documents (2145150 virtual) 2025-04-19 00:09:22,920 : INFO : 111 batches submitted to accumulate stats from 7104 documents (2155495 virtual) 2025-04-19 00:09:22,933 : INFO : 112 batches submitted to accumulate stats from 7168 documents (2164720 virtual) 2025-04-19 00:09:22,942 : INFO : 113 batches submitted to accumulate stats from 7232 documents (2172193 virtual) 2025-04-19 00:09:22,953 : INFO : 114 batches submitted to accumulate stats from 7296 documents (2183458 virtual) 2025-04-19 00:09:22,957 : INFO : 115 batches submitted to accumulate stats from 7360 documents (2191706 virtual) 2025-04-19 00:09:22,972 : INFO : 116 batches submitted to accumulate stats from 7424 documents (2202020 virtual) 2025-04-19 00:09:22,975 : INFO : 117 batches submitted to accumulate stats from 7488 documents (2211055 virtual) 2025-04-19 00:09:22,991 : INFO : 118 batches submitted to accumulate stats from 7552 documents (2223321 virtual) 2025-04-19 00:09:22,993 : INFO : 119 batches submitted to accumulate stats from 7616 documents (2230121 virtual) 2025-04-19 00:09:22,996 : INFO : 120 batches submitted to accumulate stats from 7680 documents (2243511 virtual) 2025-04-19 00:09:23,001 : INFO : 121 batches submitted to accumulate stats from 7744 documents (2258370 virtual) 2025-04-19 00:09:23,021 : INFO : 122 batches submitted to accumulate stats from 7808 documents (2269267 virtual) 2025-04-19 00:09:23,026 : INFO : 123 batches submitted to accumulate stats from 7872 documents (2280490 virtual) 2025-04-19 00:09:23,036 : INFO : 124 batches submitted to accumulate stats from 7936 documents (2289945 virtual) 2025-04-19 00:09:23,041 : INFO : 125 batches submitted to accumulate stats from 8000 documents (2298931 virtual) 2025-04-19 00:09:23,050 : INFO : 126 batches submitted to accumulate stats from 8064 documents (2309719 virtual) 2025-04-19 00:09:23,066 : INFO : 127 batches submitted to accumulate stats from 8128 documents (2320328 virtual) 2025-04-19 00:09:23,075 : INFO : 128 batches submitted to accumulate stats from 8192 documents (2331614 virtual) 2025-04-19 00:09:23,078 : INFO : 129 batches submitted to accumulate stats from 8256 documents (2342568 virtual) 2025-04-19 00:09:23,103 : INFO : 130 batches submitted to accumulate stats from 8320 documents (2351306 virtual) 2025-04-19 00:09:23,110 : INFO : 131 batches submitted to accumulate stats from 8384 documents (2359488 virtual) 2025-04-19 00:09:23,111 : INFO : 132 batches submitted to accumulate stats from 8448 documents (2368497 virtual) 2025-04-19 00:09:23,129 : INFO : 133 batches submitted to accumulate stats from 8512 documents (2378449 virtual) 2025-04-19 00:09:23,142 : INFO : 134 batches submitted to accumulate stats from 8576 documents (2388057 virtual) 2025-04-19 00:09:23,145 : INFO : 135 batches submitted to accumulate stats from 8640 documents (2395926 virtual) 2025-04-19 00:09:23,157 : INFO : 136 batches submitted to accumulate stats from 8704 documents (2403405 virtual) 2025-04-19 00:09:23,160 : INFO : 137 batches submitted to accumulate stats from 8768 documents (2411628 virtual) 2025-04-19 00:09:23,187 : INFO : 138 batches submitted to accumulate stats from 8832 documents (2419219 virtual) 2025-04-19 00:09:23,203 : INFO : 139 batches submitted to accumulate stats from 8896 documents (2428220 virtual) 2025-04-19 00:09:23,224 : INFO : 140 batches submitted to accumulate stats from 8960 documents (2436470 virtual) 2025-04-19 00:09:23,251 : INFO : 141 batches submitted to accumulate stats from 9024 documents (2446006 virtual) 2025-04-19 00:09:23,254 : INFO : 142 batches submitted to accumulate stats from 9088 documents (2453039 virtual) 2025-04-19 00:09:23,260 : INFO : 143 batches submitted to accumulate stats from 9152 documents (2460905 virtual) 2025-04-19 00:09:23,262 : INFO : 144 batches submitted to accumulate stats from 9216 documents (2468645 virtual) 2025-04-19 00:09:23,264 : INFO : 145 batches submitted to accumulate stats from 9280 documents (2476321 virtual) 2025-04-19 00:09:23,267 : INFO : 146 batches submitted to accumulate stats from 9344 documents (2481981 virtual) 2025-04-19 00:09:23,271 : INFO : 147 batches submitted to accumulate stats from 9408 documents (2489833 virtual) 2025-04-19 00:09:23,294 : INFO : 148 batches submitted to accumulate stats from 9472 documents (2496627 virtual) 2025-04-19 00:09:23,306 : INFO : 149 batches submitted to accumulate stats from 9536 documents (2502106 virtual) 2025-04-19 00:09:23,313 : INFO : 150 batches submitted to accumulate stats from 9600 documents (2508434 virtual) 2025-04-19 00:09:23,316 : INFO : 151 batches submitted to accumulate stats from 9664 documents (2517654 virtual) 2025-04-19 00:09:23,318 : INFO : 152 batches submitted to accumulate stats from 9728 documents (2525651 virtual) 2025-04-19 00:09:23,328 : INFO : 153 batches submitted to accumulate stats from 9792 documents (2534661 virtual) 2025-04-19 00:09:23,350 : INFO : 154 batches submitted to accumulate stats from 9856 documents (2542846 virtual) 2025-04-19 00:09:23,352 : INFO : 155 batches submitted to accumulate stats from 9920 documents (2549206 virtual) 2025-04-19 00:09:23,362 : INFO : 156 batches submitted to accumulate stats from 9984 documents (2556742 virtual) 2025-04-19 00:09:23,364 : INFO : 157 batches submitted to accumulate stats from 10048 documents (2565026 virtual) 2025-04-19 00:09:23,367 : INFO : 158 batches submitted to accumulate stats from 10112 documents (2571434 virtual) 2025-04-19 00:09:23,369 : INFO : 159 batches submitted to accumulate stats from 10176 documents (2581280 virtual) 2025-04-19 00:09:23,370 : INFO : 160 batches submitted to accumulate stats from 10240 documents (2589671 virtual) 2025-04-19 00:09:23,417 : INFO : 161 batches submitted to accumulate stats from 10304 documents (2596979 virtual) 2025-04-19 00:09:23,421 : INFO : 162 batches submitted to accumulate stats from 10368 documents (2604556 virtual) 2025-04-19 00:09:23,438 : INFO : 163 batches submitted to accumulate stats from 10432 documents (2613656 virtual) 2025-04-19 00:09:23,443 : INFO : 164 batches submitted to accumulate stats from 10496 documents (2623890 virtual) 2025-04-19 00:09:23,445 : INFO : 165 batches submitted to accumulate stats from 10560 documents (2629308 virtual) 2025-04-19 00:09:23,447 : INFO : 166 batches submitted to accumulate stats from 10624 documents (2636085 virtual) 2025-04-19 00:09:23,455 : INFO : 167 batches submitted to accumulate stats from 10688 documents (2642039 virtual) 2025-04-19 00:09:23,461 : INFO : 168 batches submitted to accumulate stats from 10752 documents (2648389 virtual) 2025-04-19 00:09:23,466 : INFO : 169 batches submitted to accumulate stats from 10816 documents (2661959 virtual) 2025-04-19 00:09:23,495 : INFO : 170 batches submitted to accumulate stats from 10880 documents (2672949 virtual) 2025-04-19 00:09:23,501 : INFO : 171 batches submitted to accumulate stats from 10944 documents (2683365 virtual) 2025-04-19 00:09:23,503 : INFO : 172 batches submitted to accumulate stats from 11008 documents (2690484 virtual) 2025-04-19 00:09:23,506 : INFO : 173 batches submitted to accumulate stats from 11072 documents (2700627 virtual) 2025-04-19 00:09:23,510 : INFO : 174 batches submitted to accumulate stats from 11136 documents (2708742 virtual) 2025-04-19 00:09:23,518 : INFO : 175 batches submitted to accumulate stats from 11200 documents (2718156 virtual) 2025-04-19 00:09:23,520 : INFO : 176 batches submitted to accumulate stats from 11264 documents (2727801 virtual) 2025-04-19 00:09:23,529 : INFO : 177 batches submitted to accumulate stats from 11328 documents (2736288 virtual) 2025-04-19 00:09:23,538 : INFO : 178 batches submitted to accumulate stats from 11392 documents (2743845 virtual) 2025-04-19 00:09:23,544 : INFO : 179 batches submitted to accumulate stats from 11456 documents (2750885 virtual) 2025-04-19 00:09:23,551 : INFO : 180 batches submitted to accumulate stats from 11520 documents (2759213 virtual) 2025-04-19 00:09:23,556 : INFO : 181 batches submitted to accumulate stats from 11584 documents (2770309 virtual) 2025-04-19 00:09:23,608 : INFO : 182 batches submitted to accumulate stats from 11648 documents (2781566 virtual) 2025-04-19 00:09:23,622 : INFO : 183 batches submitted to accumulate stats from 11712 documents (2793513 virtual) 2025-04-19 00:09:23,625 : INFO : 184 batches submitted to accumulate stats from 11776 documents (2805133 virtual) 2025-04-19 00:09:23,630 : INFO : 185 batches submitted to accumulate stats from 11840 documents (2814621 virtual) 2025-04-19 00:09:23,632 : INFO : 186 batches submitted to accumulate stats from 11904 documents (2825917 virtual) 2025-04-19 00:09:23,638 : INFO : 187 batches submitted to accumulate stats from 11968 documents (2834764 virtual) 2025-04-19 00:09:23,641 : INFO : 188 batches submitted to accumulate stats from 12032 documents (2844523 virtual) 2025-04-19 00:09:23,670 : INFO : 189 batches submitted to accumulate stats from 12096 documents (2854512 virtual) 2025-04-19 00:09:23,672 : INFO : 190 batches submitted to accumulate stats from 12160 documents (2863511 virtual) 2025-04-19 00:09:23,676 : INFO : 191 batches submitted to accumulate stats from 12224 documents (2872492 virtual) 2025-04-19 00:09:23,678 : INFO : 192 batches submitted to accumulate stats from 12288 documents (2881543 virtual) 2025-04-19 00:09:23,683 : INFO : 193 batches submitted to accumulate stats from 12352 documents (2891233 virtual) 2025-04-19 00:09:23,693 : INFO : 194 batches submitted to accumulate stats from 12416 documents (2899835 virtual) 2025-04-19 00:09:23,724 : INFO : 195 batches submitted to accumulate stats from 12480 documents (2908542 virtual) 2025-04-19 00:09:23,729 : INFO : 196 batches submitted to accumulate stats from 12544 documents (2920162 virtual) 2025-04-19 00:09:23,732 : INFO : 197 batches submitted to accumulate stats from 12608 documents (2931072 virtual) 2025-04-19 00:09:23,734 : INFO : 198 batches submitted to accumulate stats from 12672 documents (2942168 virtual) 2025-04-19 00:09:23,739 : INFO : 199 batches submitted to accumulate stats from 12736 documents (2951378 virtual) 2025-04-19 00:09:23,743 : INFO : 200 batches submitted to accumulate stats from 12800 documents (2964980 virtual) 2025-04-19 00:09:23,772 : INFO : 201 batches submitted to accumulate stats from 12864 documents (2974742 virtual) 2025-04-19 00:09:23,778 : INFO : 202 batches submitted to accumulate stats from 12928 documents (2984778 virtual) 2025-04-19 00:09:23,780 : INFO : 203 batches submitted to accumulate stats from 12992 documents (2994073 virtual) 2025-04-19 00:09:23,783 : INFO : 204 batches submitted to accumulate stats from 13056 documents (3002522 virtual) 2025-04-19 00:09:23,792 : INFO : 205 batches submitted to accumulate stats from 13120 documents (3012040 virtual) 2025-04-19 00:09:23,795 : INFO : 206 batches submitted to accumulate stats from 13184 documents (3019919 virtual) 2025-04-19 00:09:23,813 : INFO : 207 batches submitted to accumulate stats from 13248 documents (3029004 virtual) 2025-04-19 00:09:23,859 : INFO : 208 batches submitted to accumulate stats from 13312 documents (3037489 virtual) 2025-04-19 00:09:23,868 : INFO : 209 batches submitted to accumulate stats from 13376 documents (3044929 virtual) 2025-04-19 00:09:23,873 : INFO : 210 batches submitted to accumulate stats from 13440 documents (3054034 virtual) 2025-04-19 00:09:23,882 : INFO : 211 batches submitted to accumulate stats from 13504 documents (3064099 virtual) 2025-04-19 00:09:23,885 : INFO : 212 batches submitted to accumulate stats from 13568 documents (3074522 virtual) 2025-04-19 00:09:23,887 : INFO : 213 batches submitted to accumulate stats from 13632 documents (3083808 virtual) 2025-04-19 00:09:23,917 : INFO : 214 batches submitted to accumulate stats from 13696 documents (3093078 virtual) 2025-04-19 00:09:23,919 : INFO : 215 batches submitted to accumulate stats from 13760 documents (3102171 virtual) 2025-04-19 00:09:23,924 : INFO : 216 batches submitted to accumulate stats from 13824 documents (3111128 virtual) 2025-04-19 00:09:23,927 : INFO : 217 batches submitted to accumulate stats from 13888 documents (3120517 virtual) 2025-04-19 00:09:23,930 : INFO : 218 batches submitted to accumulate stats from 13952 documents (3130614 virtual) 2025-04-19 00:09:23,936 : INFO : 219 batches submitted to accumulate stats from 14016 documents (3139268 virtual) 2025-04-19 00:09:23,938 : INFO : 220 batches submitted to accumulate stats from 14080 documents (3148635 virtual) 2025-04-19 00:09:23,971 : INFO : 221 batches submitted to accumulate stats from 14144 documents (3157335 virtual) 2025-04-19 00:09:23,975 : INFO : 222 batches submitted to accumulate stats from 14208 documents (3165838 virtual) 2025-04-19 00:09:23,979 : INFO : 223 batches submitted to accumulate stats from 14272 documents (3175765 virtual) 2025-04-19 00:09:23,984 : INFO : 224 batches submitted to accumulate stats from 14336 documents (3183123 virtual) 2025-04-19 00:09:23,986 : INFO : 225 batches submitted to accumulate stats from 14400 documents (3189537 virtual) 2025-04-19 00:09:23,992 : INFO : 226 batches submitted to accumulate stats from 14464 documents (3197239 virtual) 2025-04-19 00:09:24,006 : INFO : 227 batches submitted to accumulate stats from 14528 documents (3205518 virtual) 2025-04-19 00:09:24,016 : INFO : 228 batches submitted to accumulate stats from 14592 documents (3215608 virtual) 2025-04-19 00:09:24,059 : INFO : 229 batches submitted to accumulate stats from 14656 documents (3223376 virtual) 2025-04-19 00:09:24,061 : INFO : 230 batches submitted to accumulate stats from 14720 documents (3232304 virtual) 2025-04-19 00:09:24,063 : INFO : 231 batches submitted to accumulate stats from 14784 documents (3240270 virtual) 2025-04-19 00:09:24,065 : INFO : 232 batches submitted to accumulate stats from 14848 documents (3249755 virtual) 2025-04-19 00:09:24,071 : INFO : 233 batches submitted to accumulate stats from 14912 documents (3259377 virtual) 2025-04-19 00:09:24,110 : INFO : 234 batches submitted to accumulate stats from 14976 documents (3269637 virtual) 2025-04-19 00:09:24,112 : INFO : 235 batches submitted to accumulate stats from 15040 documents (3278311 virtual) 2025-04-19 00:09:24,114 : INFO : 236 batches submitted to accumulate stats from 15104 documents (3286321 virtual) 2025-04-19 00:09:24,117 : INFO : 237 batches submitted to accumulate stats from 15168 documents (3293385 virtual) 2025-04-19 00:09:24,118 : INFO : 238 batches submitted to accumulate stats from 15232 documents (3300334 virtual) 2025-04-19 00:09:24,121 : INFO : 239 batches submitted to accumulate stats from 15296 documents (3308226 virtual) 2025-04-19 00:09:24,130 : INFO : 240 batches submitted to accumulate stats from 15360 documents (3317325 virtual) 2025-04-19 00:09:24,143 : INFO : 241 batches submitted to accumulate stats from 15424 documents (3325778 virtual) 2025-04-19 00:09:24,158 : INFO : 242 batches submitted to accumulate stats from 15488 documents (3335373 virtual) 2025-04-19 00:09:24,160 : INFO : 243 batches submitted to accumulate stats from 15552 documents (3342716 virtual) 2025-04-19 00:09:24,163 : INFO : 244 batches submitted to accumulate stats from 15616 documents (3350508 virtual) 2025-04-19 00:09:24,178 : INFO : 245 batches submitted to accumulate stats from 15680 documents (3360131 virtual) 2025-04-19 00:09:24,180 : INFO : 246 batches submitted to accumulate stats from 15744 documents (3370635 virtual) 2025-04-19 00:09:24,194 : INFO : 247 batches submitted to accumulate stats from 15808 documents (3380994 virtual) 2025-04-19 00:09:24,208 : INFO : 248 batches submitted to accumulate stats from 15872 documents (3389920 virtual) 2025-04-19 00:09:24,211 : INFO : 249 batches submitted to accumulate stats from 15936 documents (3397487 virtual) 2025-04-19 00:09:24,227 : INFO : 250 batches submitted to accumulate stats from 16000 documents (3406129 virtual) 2025-04-19 00:09:24,231 : INFO : 251 batches submitted to accumulate stats from 16064 documents (3416805 virtual) 2025-04-19 00:09:24,257 : INFO : 252 batches submitted to accumulate stats from 16128 documents (3426189 virtual) 2025-04-19 00:09:24,265 : INFO : 253 batches submitted to accumulate stats from 16192 documents (3433824 virtual) 2025-04-19 00:09:24,275 : INFO : 254 batches submitted to accumulate stats from 16256 documents (3443379 virtual) 2025-04-19 00:09:24,288 : INFO : 255 batches submitted to accumulate stats from 16320 documents (3450914 virtual) 2025-04-19 00:09:24,448 : INFO : 7 accumulators retrieved from output queue 2025-04-19 00:09:24,458 : INFO : accumulated word occurrence stats for 3451622 virtual documents 2025-04-19 00:09:24,535 : INFO : using symmetric alpha at 0.16666666666666666 2025-04-19 00:09:24,536 : INFO : using symmetric eta at 0.16666666666666666 2025-04-19 00:09:24,539 : INFO : using serial LDA version on this node 2025-04-19 00:09:24,545 : INFO : running online (multi-pass) LDA training, 6 topics, 5 passes over the supplied corpus of 16310 documents, updating model once every 2000 documents, evaluating perplexity every 16310 documents, iterating 50x with a convergence threshold of 0.001000 2025-04-19 00:09:24,546 : INFO : PROGRESS: pass 0, at document #2000/16310 2025-04-19 00:09:25,211 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:25,214 : INFO : topic #0 (0.167): 0.029*"工作" + 0.014*"方式" + 0.013*"應徵" + 0.012*"推定" + 0.012*"單位" + 0.012*"空白" + 0.012*"砍除" + 0.010*"內容" + 0.010*"資訊" + 0.009*"聯絡" 2025-04-19 00:09:25,215 : INFO : topic #1 (0.167): 0.030*"工作" + 0.016*"方式" + 0.012*"推定" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"砍除" + 0.010*"第一項" + 0.010*"國定假日" + 0.010*"內容" + 0.010*"情形" 2025-04-19 00:09:25,215 : INFO : topic #4 (0.167): 0.038*"工作" + 0.017*"推定" + 0.014*"空白" + 0.012*"方式" + 0.011*"聯絡" + 0.010*"第一項" + 0.010*"單位" + 0.010*"聯絡人" + 0.009*"情形" + 0.009*"內容" 2025-04-19 00:09:25,216 : INFO : topic #2 (0.167): 0.040*"工作" + 0.013*"內容" + 0.012*"推定" + 0.012*"工資" + 0.011*"應徵" + 0.011*"方式" + 0.010*"情形" + 0.010*"砍除" + 0.010*"聯絡" + 0.010*"小時" 2025-04-19 00:09:25,216 : INFO : topic #5 (0.167): 0.017*"工作" + 0.012*"方式" + 0.011*"空白" + 0.010*"聯絡" + 0.009*"應徵" + 0.009*"內容" + 0.008*"分類" + 0.008*"聯絡人" + 0.008*"小時" + 0.008*"資訊" 2025-04-19 00:09:25,217 : INFO : topic diff=6.811724, rho=1.000000 2025-04-19 00:09:25,218 : INFO : PROGRESS: pass 0, at document #4000/16310 2025-04-19 00:09:25,898 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:25,901 : INFO : topic #0 (0.167): 0.029*"工作" + 0.014*"應徵" + 0.013*"砍除" + 0.013*"方式" + 0.012*"空白" + 0.011*"推定" + 0.011*"單位" + 0.011*"資訊" + 0.010*"第一項" + 0.010*"內容" 2025-04-19 00:09:25,901 : INFO : topic #1 (0.167): 0.029*"工作" + 0.015*"方式" + 0.012*"砍除" + 0.012*"推定" + 0.012*"情形" + 0.012*"第一項" + 0.011*"聯絡" + 0.011*"國定假日" + 0.011*"單位" + 0.011*"連結" 2025-04-19 00:09:25,902 : INFO : topic #4 (0.167): 0.037*"工作" + 0.017*"推定" + 0.015*"空白" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"方式" + 0.011*"情形" + 0.010*"砍除" + 0.010*"國定假日" + 0.010*"聯絡人" 2025-04-19 00:09:25,902 : INFO : topic #3 (0.167): 0.015*"工作" + 0.012*"方式" + 0.009*"聯絡人" + 0.009*"時間" + 0.009*"砍除" + 0.008*"資訊" + 0.008*"電話" + 0.008*"應徵" + 0.008*"聯絡" + 0.008*"內容" 2025-04-19 00:09:25,903 : INFO : topic #5 (0.167): 0.015*"報名" + 0.014*"活動" + 0.013*"電話" + 0.012*"方式" + 0.011*"台北市" + 0.011*"工作" + 0.010*"時間" + 0.010*"聯絡" + 0.010*"通知" + 0.009*"內容" 2025-04-19 00:09:25,903 : INFO : topic diff=0.670462, rho=0.707107 2025-04-19 00:09:25,904 : INFO : PROGRESS: pass 0, at document #6000/16310 2025-04-19 00:09:26,446 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:26,450 : INFO : topic #0 (0.167): 0.029*"工作" + 0.013*"應徵" + 0.013*"砍除" + 0.012*"方式" + 0.012*"空白" + 0.011*"第一項" + 0.011*"資訊" + 0.010*"推定" + 0.010*"內容" + 0.010*"單位" 2025-04-19 00:09:26,450 : INFO : topic #3 (0.167): 0.011*"工作" + 0.011*"方式" + 0.011*"公司" + 0.009*"時間" + 0.009*"電話" + 0.009*"聯絡人" + 0.009*"報名" + 0.008*"實驗" + 0.008*"聯絡" + 0.008*"資訊" 2025-04-19 00:09:26,451 : INFO : topic #4 (0.167): 0.037*"工作" + 0.016*"推定" + 0.015*"空白" + 0.012*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"砍除" + 0.011*"方式" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 00:09:26,451 : INFO : topic #2 (0.167): 0.043*"工作" + 0.016*"方式" + 0.013*"推定" + 0.012*"小時" + 0.012*"工資" + 0.012*"內容" + 0.011*"單位" + 0.010*"未註明" + 0.010*"依法" + 0.010*"應徵" 2025-04-19 00:09:26,452 : INFO : topic #1 (0.167): 0.029*"工作" + 0.015*"方式" + 0.013*"砍除" + 0.012*"情形" + 0.012*"推定" + 0.012*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"連結" 2025-04-19 00:09:26,452 : INFO : topic diff=0.661024, rho=0.577350 2025-04-19 00:09:26,453 : INFO : PROGRESS: pass 0, at document #8000/16310 2025-04-19 00:09:26,794 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:26,797 : INFO : topic #0 (0.167): 0.029*"工作" + 0.013*"應徵" + 0.013*"砍除" + 0.012*"方式" + 0.012*"空白" + 0.011*"第一項" + 0.011*"資訊" + 0.010*"推定" + 0.010*"內容" + 0.010*"單位" 2025-04-19 00:09:26,798 : INFO : topic #4 (0.167): 0.037*"工作" + 0.016*"推定" + 0.015*"空白" + 0.012*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"砍除" + 0.011*"方式" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 00:09:26,799 : INFO : topic #2 (0.167): 0.043*"工作" + 0.012*"方式" + 0.010*"公司" + 0.010*"小時" + 0.009*"內容" + 0.009*"面試" + 0.008*"時間" + 0.008*"推定" + 0.008*"工資" + 0.007*"單位" 2025-04-19 00:09:26,799 : INFO : topic #1 (0.167): 0.029*"工作" + 0.015*"方式" + 0.013*"砍除" + 0.012*"情形" + 0.012*"推定" + 0.012*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"連結" 2025-04-19 00:09:26,800 : INFO : topic #3 (0.167): 0.016*"公司" + 0.011*"工作" + 0.009*"方式" + 0.009*"時間" + 0.007*"電話" + 0.007*"資訊" + 0.007*"實驗" + 0.007*"報名" + 0.007*"聯絡人" + 0.007*"內容" 2025-04-19 00:09:26,800 : INFO : topic diff=0.916051, rho=0.500000 2025-04-19 00:09:26,801 : INFO : PROGRESS: pass 0, at document #10000/16310 2025-04-19 00:09:27,090 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:27,093 : INFO : topic #3 (0.167): 0.016*"公司" + 0.010*"工作" + 0.009*"方式" + 0.008*"時間" + 0.007*"資訊" + 0.007*"研發" + 0.007*"連結" + 0.007*"電話" + 0.007*"報名" + 0.007*"實驗" 2025-04-19 00:09:27,093 : INFO : topic #5 (0.167): 0.015*"公司" + 0.009*"工作" + 0.008*"面試" + 0.008*"問題" + 0.007*"時間" + 0.007*"工程師" + 0.006*"開發" + 0.006*"經驗" + 0.006*"目前" + 0.005*"技術" 2025-04-19 00:09:27,094 : INFO : topic #2 (0.167): 0.044*"工作" + 0.011*"公司" + 0.011*"方式" + 0.010*"覺得" + 0.010*"面試" + 0.009*"小時" + 0.008*"內容" + 0.008*"時間" + 0.007*"單位" + 0.006*"推定" 2025-04-19 00:09:27,094 : INFO : topic #1 (0.167): 0.029*"工作" + 0.015*"方式" + 0.013*"砍除" + 0.012*"情形" + 0.012*"推定" + 0.012*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"連結" 2025-04-19 00:09:27,095 : INFO : topic #4 (0.167): 0.037*"工作" + 0.016*"推定" + 0.015*"空白" + 0.012*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"砍除" + 0.011*"方式" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 00:09:27,095 : INFO : topic diff=0.593984, rho=0.447214 2025-04-19 00:09:27,096 : INFO : PROGRESS: pass 0, at document #12000/16310 2025-04-19 00:09:27,357 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:27,360 : INFO : topic #5 (0.167): 0.014*"公司" + 0.007*"工作" + 0.006*"問題" + 0.006*"面試" + 0.006*"工程師" + 0.005*"時間" + 0.005*"開發" + 0.005*"目前" + 0.005*"技術" + 0.005*"台灣" 2025-04-19 00:09:27,361 : INFO : topic #3 (0.167): 0.042*"半導體" + 0.021*"製程" + 0.013*"公司" + 0.013*"表示" + 0.012*"研發" + 0.009*"職場" + 0.008*"工作" + 0.007*"方式" + 0.007*"中國" + 0.006*"時間" 2025-04-19 00:09:27,361 : INFO : topic #2 (0.167): 0.042*"工作" + 0.011*"覺得" + 0.011*"公司" + 0.009*"方式" + 0.009*"面試" + 0.008*"小時" + 0.008*"時間" + 0.008*"內容" + 0.006*"單位" + 0.006*"程式" 2025-04-19 00:09:27,362 : INFO : topic #0 (0.167): 0.028*"工作" + 0.013*"應徵" + 0.012*"砍除" + 0.012*"方式" + 0.011*"空白" + 0.010*"第一項" + 0.010*"資訊" + 0.010*"推定" + 0.010*"內容" + 0.010*"單位" 2025-04-19 00:09:27,362 : INFO : topic #4 (0.167): 0.037*"工作" + 0.016*"推定" + 0.015*"空白" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"砍除" + 0.011*"方式" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 00:09:27,363 : INFO : topic diff=0.590728, rho=0.408248 2025-04-19 00:09:27,363 : INFO : PROGRESS: pass 0, at document #14000/16310 2025-04-19 00:09:27,621 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:27,624 : INFO : topic #1 (0.167): 0.028*"工作" + 0.014*"方式" + 0.013*"砍除" + 0.012*"情形" + 0.011*"推定" + 0.011*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"連結" 2025-04-19 00:09:27,625 : INFO : topic #5 (0.167): 0.012*"公司" + 0.007*"台灣" + 0.006*"工作" + 0.005*"問題" + 0.005*"工程師" + 0.005*"技術" + 0.004*"目前" + 0.004*"面試" + 0.004*"時間" + 0.004*"員工" 2025-04-19 00:09:27,625 : INFO : topic #2 (0.167): 0.041*"工作" + 0.010*"覺得" + 0.010*"公司" + 0.008*"面試" + 0.008*"方式" + 0.007*"小時" + 0.007*"內容" + 0.007*"時間" + 0.006*"單位" + 0.006*"應該" 2025-04-19 00:09:27,626 : INFO : topic #0 (0.167): 0.027*"工作" + 0.012*"應徵" + 0.012*"砍除" + 0.011*"方式" + 0.011*"空白" + 0.010*"第一項" + 0.010*"資訊" + 0.010*"推定" + 0.010*"單位" + 0.010*"內容" 2025-04-19 00:09:27,626 : INFO : topic #3 (0.167): 0.049*"半導體" + 0.023*"表示" + 0.022*"製程" + 0.017*"中國" + 0.015*"研發" + 0.010*"公司" + 0.007*"輝達" + 0.007*"仁勳" + 0.006*"職場" + 0.005*"奈米" 2025-04-19 00:09:27,627 : INFO : topic diff=0.590928, rho=0.377964 2025-04-19 00:09:27,628 : INFO : PROGRESS: pass 0, at document #16000/16310 2025-04-19 00:09:27,891 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:27,894 : INFO : topic #4 (0.167): 0.036*"工作" + 0.016*"推定" + 0.014*"空白" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.010*"砍除" + 0.010*"方式" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 00:09:27,895 : INFO : topic #3 (0.167): 0.041*"半導體" + 0.024*"表示" + 0.020*"中國" + 0.020*"製程" + 0.012*"研發" + 0.009*"輝達" + 0.008*"公司" + 0.007*"仁勳" + 0.007*"川普" + 0.006*"奈米" 2025-04-19 00:09:27,895 : INFO : topic #0 (0.167): 0.026*"工作" + 0.012*"應徵" + 0.011*"砍除" + 0.011*"方式" + 0.011*"空白" + 0.010*"第一項" + 0.010*"資訊" + 0.010*"單位" + 0.009*"推定" + 0.009*"內容" 2025-04-19 00:09:27,896 : INFO : topic #1 (0.167): 0.028*"工作" + 0.014*"方式" + 0.012*"砍除" + 0.011*"情形" + 0.011*"推定" + 0.011*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.011*"文字" + 0.010*"連結" 2025-04-19 00:09:27,897 : INFO : topic #2 (0.167): 0.039*"工作" + 0.010*"公司" + 0.010*"覺得" + 0.008*"面試" + 0.007*"方式" + 0.007*"內容" + 0.007*"時間" + 0.007*"小時" + 0.006*"真的" + 0.006*"應該" 2025-04-19 00:09:27,897 : INFO : topic diff=0.460963, rho=0.353553 2025-04-19 00:09:27,979 : INFO : -8.586 per-word bound, 384.3 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:09:27,979 : INFO : PROGRESS: pass 0, at document #16310/16310 2025-04-19 00:09:28,020 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:09:28,023 : INFO : topic #3 (0.167): 0.031*"半導體" + 0.023*"表示" + 0.019*"中國" + 0.017*"製程" + 0.014*"川普" + 0.013*"研發" + 0.011*"投資" + 0.009*"輝達" + 0.007*"公司" + 0.007*"魏哲家" 2025-04-19 00:09:28,023 : INFO : topic #5 (0.167): 0.012*"公司" + 0.007*"美國" + 0.007*"台灣" + 0.006*"技術" + 0.005*"晶片" + 0.005*"員工" + 0.005*"工作" + 0.004*"科技" + 0.004*"問題" + 0.004*"工程師" 2025-04-19 00:09:28,024 : INFO : topic #4 (0.167): 0.035*"工作" + 0.015*"推定" + 0.014*"空白" + 0.011*"第一項" + 0.010*"聯絡" + 0.010*"情形" + 0.010*"砍除" + 0.010*"方式" + 0.010*"國定假日" + 0.009*"單位" 2025-04-19 00:09:28,024 : INFO : topic #1 (0.167): 0.027*"工作" + 0.014*"方式" + 0.012*"砍除" + 0.011*"情形" + 0.011*"推定" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"國定假日" + 0.010*"連結" + 0.010*"文字" 2025-04-19 00:09:28,025 : INFO : topic #2 (0.167): 0.037*"工作" + 0.011*"覺得" + 0.011*"公司" + 0.008*"真的" + 0.007*"面試" + 0.007*"小時" + 0.007*"時間" + 0.007*"應該" + 0.006*"方式" + 0.006*"預期" 2025-04-19 00:09:28,025 : INFO : topic diff=0.402306, rho=0.333333 2025-04-19 00:09:28,025 : INFO : PROGRESS: pass 1, at document #2000/16310 2025-04-19 00:09:28,672 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:28,675 : INFO : topic #1 (0.167): 0.029*"工作" + 0.016*"方式" + 0.012*"聯絡" + 0.012*"推定" + 0.012*"砍除" + 0.012*"內容" + 0.011*"情形" + 0.011*"單位" + 0.010*"資訊" + 0.010*"未註明" 2025-04-19 00:09:28,676 : INFO : topic #4 (0.167): 0.037*"工作" + 0.017*"推定" + 0.014*"空白" + 0.011*"方式" + 0.011*"砍除" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"內容" 2025-04-19 00:09:28,677 : INFO : topic #0 (0.167): 0.029*"工作" + 0.013*"方式" + 0.012*"應徵" + 0.010*"內容" + 0.010*"推定" + 0.009*"單位" + 0.009*"工資" + 0.009*"聯絡" + 0.009*"資訊" + 0.008*"砍除" 2025-04-19 00:09:28,677 : INFO : topic #2 (0.167): 0.041*"工作" + 0.010*"時間" + 0.009*"方式" + 0.009*"小時" + 0.009*"面試" + 0.008*"公司" + 0.007*"覺得" + 0.007*"內容" + 0.005*"單位" + 0.005*"真的" 2025-04-19 00:09:28,678 : INFO : topic #3 (0.167): 0.027*"半導體" + 0.021*"表示" + 0.017*"中國" + 0.015*"製程" + 0.012*"川普" + 0.011*"研發" + 0.010*"投資" + 0.008*"輝達" + 0.008*"公司" + 0.006*"魏哲家" 2025-04-19 00:09:28,678 : INFO : topic diff=1.076891, rho=0.313805 2025-04-19 00:09:28,678 : INFO : PROGRESS: pass 1, at document #4000/16310 2025-04-19 00:09:29,317 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:29,320 : INFO : topic #3 (0.167): 0.023*"半導體" + 0.019*"表示" + 0.016*"中國" + 0.013*"製程" + 0.011*"川普" + 0.010*"研發" + 0.009*"實驗" + 0.009*"投資" + 0.008*"參與" + 0.008*"公司" 2025-04-19 00:09:29,320 : INFO : topic #4 (0.167): 0.036*"工作" + 0.016*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"內容" 2025-04-19 00:09:29,321 : INFO : topic #0 (0.167): 0.030*"工作" + 0.015*"方式" + 0.012*"應徵" + 0.010*"推定" + 0.010*"內容" + 0.010*"工資" + 0.009*"單位" + 0.009*"依法" + 0.009*"聯絡" + 0.009*"發薪日" 2025-04-19 00:09:29,321 : INFO : topic #2 (0.167): 0.044*"工作" + 0.013*"時間" + 0.012*"方式" + 0.011*"小時" + 0.010*"面試" + 0.008*"內容" + 0.007*"每日" + 0.007*"公司" + 0.006*"工時" + 0.006*"單位" 2025-04-19 00:09:29,322 : INFO : topic #5 (0.167): 0.011*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.005*"技術" + 0.004*"晶片" + 0.004*"工作" + 0.004*"員工" + 0.004*"資料" + 0.004*"科技" + 0.004*"問題" 2025-04-19 00:09:29,322 : INFO : topic diff=0.422761, rho=0.313805 2025-04-19 00:09:29,322 : INFO : PROGRESS: pass 1, at document #6000/16310 2025-04-19 00:09:29,890 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:29,892 : INFO : topic #3 (0.167): 0.018*"半導體" + 0.017*"表示" + 0.013*"實驗" + 0.012*"中國" + 0.010*"參與" + 0.009*"製程" + 0.009*"報名" + 0.009*"舉辦" + 0.009*"公司" + 0.008*"川普" 2025-04-19 00:09:29,893 : INFO : topic #1 (0.167): 0.029*"工作" + 0.016*"方式" + 0.013*"聯絡" + 0.012*"內容" + 0.011*"砍除" + 0.011*"推定" + 0.011*"情形" + 0.011*"文字" + 0.011*"單位" + 0.010*"資訊" 2025-04-19 00:09:29,894 : INFO : topic #0 (0.167): 0.031*"工作" + 0.018*"方式" + 0.011*"應徵" + 0.011*"推定" + 0.011*"內容" + 0.011*"依法" + 0.011*"工資" + 0.010*"聯絡" + 0.009*"單位" + 0.009*"發薪日" 2025-04-19 00:09:29,894 : INFO : topic #4 (0.167): 0.036*"工作" + 0.016*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"內容" 2025-04-19 00:09:29,895 : INFO : topic #2 (0.167): 0.047*"工作" + 0.015*"時間" + 0.014*"方式" + 0.013*"小時" + 0.010*"面試" + 0.008*"內容" + 0.008*"每日" + 0.007*"休息" + 0.006*"工時" + 0.006*"公司" 2025-04-19 00:09:29,895 : INFO : topic diff=0.289256, rho=0.313805 2025-04-19 00:09:29,895 : INFO : PROGRESS: pass 1, at document #8000/16310 2025-04-19 00:09:30,192 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:30,195 : INFO : topic #3 (0.167): 0.019*"半導體" + 0.016*"表示" + 0.013*"實驗" + 0.012*"中國" + 0.010*"參與" + 0.009*"製程" + 0.009*"公司" + 0.009*"報名" + 0.009*"舉辦" + 0.008*"研發" 2025-04-19 00:09:30,195 : INFO : topic #0 (0.167): 0.031*"工作" + 0.018*"方式" + 0.012*"應徵" + 0.011*"內容" + 0.011*"推定" + 0.010*"依法" + 0.010*"工資" + 0.010*"聯絡" + 0.009*"單位" + 0.009*"發薪日" 2025-04-19 00:09:30,196 : INFO : topic #4 (0.167): 0.036*"工作" + 0.016*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"內容" 2025-04-19 00:09:30,197 : INFO : topic #2 (0.167): 0.053*"工作" + 0.016*"時間" + 0.016*"面試" + 0.013*"方式" + 0.012*"小時" + 0.008*"內容" + 0.008*"公司" + 0.008*"經驗" + 0.007*"覺得" + 0.007*"每日" 2025-04-19 00:09:30,197 : INFO : topic #5 (0.167): 0.015*"公司" + 0.006*"工程師" + 0.006*"問題" + 0.005*"技術" + 0.005*"開發" + 0.005*"工作" + 0.005*"目前" + 0.004*"台灣" + 0.004*"產品" + 0.004*"資料" 2025-04-19 00:09:30,197 : INFO : topic diff=0.359110, rho=0.313805 2025-04-19 00:09:30,198 : INFO : PROGRESS: pass 1, at document #10000/16310 2025-04-19 00:09:30,467 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:30,469 : INFO : topic #3 (0.167): 0.019*"半導體" + 0.015*"表示" + 0.012*"實驗" + 0.012*"中國" + 0.010*"參與" + 0.009*"公司" + 0.009*"製程" + 0.009*"報名" + 0.008*"舉辦" + 0.008*"研發" 2025-04-19 00:09:30,470 : INFO : topic #2 (0.167): 0.054*"工作" + 0.018*"面試" + 0.016*"時間" + 0.013*"方式" + 0.011*"小時" + 0.009*"經驗" + 0.009*"公司" + 0.009*"內容" + 0.008*"覺得" + 0.007*"工時" 2025-04-19 00:09:30,471 : INFO : topic #4 (0.167): 0.036*"工作" + 0.016*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"內容" 2025-04-19 00:09:30,471 : INFO : topic #1 (0.167): 0.028*"工作" + 0.016*"方式" + 0.013*"聯絡" + 0.012*"內容" + 0.011*"砍除" + 0.011*"推定" + 0.011*"情形" + 0.011*"文字" + 0.011*"單位" + 0.010*"資訊" 2025-04-19 00:09:30,472 : INFO : topic #0 (0.167): 0.031*"工作" + 0.018*"方式" + 0.012*"應徵" + 0.011*"內容" + 0.011*"推定" + 0.010*"依法" + 0.010*"工資" + 0.010*"聯絡" + 0.009*"單位" + 0.009*"發薪日" 2025-04-19 00:09:30,472 : INFO : topic diff=0.294015, rho=0.313805 2025-04-19 00:09:30,472 : INFO : PROGRESS: pass 1, at document #12000/16310 2025-04-19 00:09:30,722 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:30,725 : INFO : topic #4 (0.167): 0.035*"工作" + 0.016*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"內容" 2025-04-19 00:09:30,726 : INFO : topic #5 (0.167): 0.014*"公司" + 0.006*"問題" + 0.005*"工程師" + 0.005*"技術" + 0.005*"開發" + 0.005*"目前" + 0.005*"台灣" + 0.004*"工作" + 0.004*"面試" + 0.004*"使用" 2025-04-19 00:09:30,726 : INFO : topic #2 (0.167): 0.054*"工作" + 0.018*"面試" + 0.016*"時間" + 0.012*"方式" + 0.011*"小時" + 0.009*"公司" + 0.009*"經驗" + 0.008*"內容" + 0.008*"覺得" + 0.007*"工時" 2025-04-19 00:09:30,727 : INFO : topic #1 (0.167): 0.028*"工作" + 0.016*"方式" + 0.013*"聯絡" + 0.012*"內容" + 0.011*"砍除" + 0.011*"推定" + 0.011*"情形" + 0.011*"文字" + 0.011*"單位" + 0.010*"資訊" 2025-04-19 00:09:30,727 : INFO : topic #3 (0.167): 0.025*"半導體" + 0.018*"表示" + 0.013*"中國" + 0.013*"製程" + 0.010*"晶片" + 0.008*"台積電" + 0.007*"投資" + 0.007*"研發" + 0.006*"輝達" + 0.006*"公司" 2025-04-19 00:09:30,728 : INFO : topic diff=0.374750, rho=0.313805 2025-04-19 00:09:30,728 : INFO : PROGRESS: pass 1, at document #14000/16310 2025-04-19 00:09:30,985 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:30,988 : INFO : topic #0 (0.167): 0.030*"工作" + 0.017*"方式" + 0.011*"應徵" + 0.010*"內容" + 0.010*"推定" + 0.010*"工資" + 0.010*"依法" + 0.010*"聯絡" + 0.009*"單位" + 0.009*"發薪日" 2025-04-19 00:09:30,989 : INFO : topic #4 (0.167): 0.035*"工作" + 0.016*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:09:30,989 : INFO : topic #3 (0.167): 0.021*"半導體" + 0.019*"表示" + 0.016*"中國" + 0.015*"晶片" + 0.012*"台積電" + 0.010*"製程" + 0.009*"輝達" + 0.008*"台灣" + 0.008*"台積" + 0.007*"投資" 2025-04-19 00:09:30,990 : INFO : topic #2 (0.167): 0.054*"工作" + 0.017*"面試" + 0.015*"時間" + 0.011*"方式" + 0.010*"小時" + 0.009*"公司" + 0.008*"經驗" + 0.008*"內容" + 0.007*"覺得" + 0.007*"工時" 2025-04-19 00:09:30,990 : INFO : topic #5 (0.167): 0.013*"公司" + 0.006*"台灣" + 0.005*"問題" + 0.005*"工程師" + 0.005*"技術" + 0.004*"工作" + 0.004*"目前" + 0.004*"員工" + 0.004*"開發" + 0.003*"科技" 2025-04-19 00:09:30,991 : INFO : topic diff=0.361131, rho=0.313805 2025-04-19 00:09:30,991 : INFO : PROGRESS: pass 1, at document #16000/16310 2025-04-19 00:09:31,212 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:31,215 : INFO : topic #0 (0.167): 0.029*"工作" + 0.017*"方式" + 0.011*"應徵" + 0.010*"工資" + 0.010*"內容" + 0.010*"依法" + 0.010*"推定" + 0.009*"聯絡" + 0.009*"單位" + 0.008*"發薪日" 2025-04-19 00:09:31,216 : INFO : topic #4 (0.167): 0.035*"工作" + 0.016*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:09:31,216 : INFO : topic #2 (0.167): 0.052*"工作" + 0.017*"面試" + 0.014*"時間" + 0.010*"公司" + 0.009*"小時" + 0.009*"方式" + 0.008*"內容" + 0.008*"經驗" + 0.007*"工時" + 0.007*"覺得" 2025-04-19 00:09:31,217 : INFO : topic #1 (0.167): 0.028*"工作" + 0.016*"方式" + 0.012*"聯絡" + 0.012*"內容" + 0.011*"砍除" + 0.011*"情形" + 0.011*"推定" + 0.011*"文字" + 0.010*"單位" + 0.010*"資訊" 2025-04-19 00:09:31,217 : INFO : topic #5 (0.167): 0.013*"公司" + 0.006*"台灣" + 0.005*"技術" + 0.005*"工程師" + 0.005*"問題" + 0.005*"員工" + 0.004*"目前" + 0.004*"科技" + 0.004*"工作" + 0.004*"美國" 2025-04-19 00:09:31,218 : INFO : topic diff=0.302248, rho=0.313805 2025-04-19 00:09:31,295 : INFO : -8.394 per-word bound, 336.4 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:09:31,296 : INFO : PROGRESS: pass 1, at document #16310/16310 2025-04-19 00:09:31,333 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:09:31,336 : INFO : topic #2 (0.167): 0.049*"工作" + 0.015*"面試" + 0.014*"時間" + 0.010*"公司" + 0.010*"小時" + 0.008*"方式" + 0.008*"覺得" + 0.007*"工時" + 0.007*"內容" + 0.007*"經驗" 2025-04-19 00:09:31,337 : INFO : topic #4 (0.167): 0.035*"工作" + 0.015*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:09:31,337 : INFO : topic #5 (0.167): 0.013*"公司" + 0.006*"技術" + 0.005*"台灣" + 0.005*"員工" + 0.004*"科技" + 0.004*"問題" + 0.004*"工程師" + 0.004*"美國" + 0.004*"報導" + 0.004*"目前" 2025-04-19 00:09:31,338 : INFO : topic #0 (0.167): 0.028*"工作" + 0.016*"方式" + 0.010*"應徵" + 0.010*"工資" + 0.010*"內容" + 0.010*"依法" + 0.009*"推定" + 0.009*"單位" + 0.009*"聯絡" + 0.008*"國定假日" 2025-04-19 00:09:31,338 : INFO : topic #1 (0.167): 0.028*"工作" + 0.016*"方式" + 0.012*"聯絡" + 0.012*"內容" + 0.011*"砍除" + 0.011*"情形" + 0.011*"推定" + 0.011*"文字" + 0.010*"單位" + 0.010*"資訊" 2025-04-19 00:09:31,338 : INFO : topic diff=0.311995, rho=0.313805 2025-04-19 00:09:31,339 : INFO : PROGRESS: pass 2, at document #2000/16310 2025-04-19 00:09:31,966 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:31,969 : INFO : topic #3 (0.167): 0.018*"美國" + 0.018*"晶片" + 0.016*"表示" + 0.015*"台積電" + 0.015*"半導體" + 0.015*"中國" + 0.012*"投資" + 0.012*"台積" + 0.011*"台灣" + 0.008*"製程" 2025-04-19 00:09:31,970 : INFO : topic #4 (0.167): 0.034*"工作" + 0.015*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.011*"應徵" + 0.011*"單位" + 0.011*"內容" 2025-04-19 00:09:31,970 : INFO : topic #5 (0.167): 0.013*"公司" + 0.006*"技術" + 0.005*"台灣" + 0.005*"員工" + 0.004*"問題" + 0.004*"科技" + 0.004*"工程師" + 0.004*"目前" + 0.004*"美國" + 0.004*"報導" 2025-04-19 00:09:31,971 : INFO : topic #0 (0.167): 0.032*"工作" + 0.022*"方式" + 0.014*"工資" + 0.014*"依法" + 0.012*"推定" + 0.011*"內容" + 0.010*"休息" + 0.010*"每日" + 0.010*"應徵" + 0.010*"單位" 2025-04-19 00:09:31,972 : INFO : topic #1 (0.167): 0.027*"工作" + 0.015*"方式" + 0.012*"聯絡" + 0.011*"內容" + 0.010*"情形" + 0.010*"文字" + 0.010*"推定" + 0.010*"資訊" + 0.010*"砍除" + 0.010*"單位" 2025-04-19 00:09:31,972 : INFO : topic diff=0.821602, rho=0.299409 2025-04-19 00:09:31,972 : INFO : PROGRESS: pass 2, at document #4000/16310 2025-04-19 00:09:32,552 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:32,555 : INFO : topic #2 (0.167): 0.052*"工作" + 0.018*"時間" + 0.014*"面試" + 0.012*"小時" + 0.010*"方式" + 0.008*"內容" + 0.007*"工時" + 0.007*"經驗" + 0.007*"公司" + 0.006*"需要" 2025-04-19 00:09:32,556 : INFO : topic #4 (0.167): 0.034*"工作" + 0.015*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.012*"第一項" + 0.012*"情形" + 0.011*"應徵" + 0.011*"資訊" + 0.011*"國定假日" 2025-04-19 00:09:32,556 : INFO : topic #3 (0.167): 0.018*"美國" + 0.017*"晶片" + 0.016*"表示" + 0.015*"台積電" + 0.015*"半導體" + 0.015*"中國" + 0.012*"投資" + 0.011*"台積" + 0.011*"台灣" + 0.008*"製程" 2025-04-19 00:09:32,557 : INFO : topic #1 (0.167): 0.025*"工作" + 0.014*"方式" + 0.012*"聯絡" + 0.011*"內容" + 0.010*"文字" + 0.010*"情形" + 0.010*"資訊" + 0.009*"砍除" + 0.009*"第一項" + 0.009*"推定" 2025-04-19 00:09:32,558 : INFO : topic #5 (0.167): 0.013*"公司" + 0.006*"技術" + 0.005*"台灣" + 0.005*"員工" + 0.004*"問題" + 0.004*"科技" + 0.004*"資料" + 0.004*"工程師" + 0.004*"目前" + 0.003*"美國" 2025-04-19 00:09:32,558 : INFO : topic diff=0.335901, rho=0.299409 2025-04-19 00:09:32,558 : INFO : PROGRESS: pass 2, at document #6000/16310 2025-04-19 00:09:33,058 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:33,061 : INFO : topic #4 (0.167): 0.034*"工作" + 0.015*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.012*"第一項" + 0.012*"情形" + 0.011*"資訊" + 0.011*"應徵" + 0.011*"國定假日" 2025-04-19 00:09:33,061 : INFO : topic #1 (0.167): 0.023*"工作" + 0.013*"方式" + 0.013*"聯絡" + 0.012*"內容" + 0.010*"資訊" + 0.010*"文字" + 0.009*"電話" + 0.009*"情形" + 0.009*"聯絡人" + 0.009*"砍除" 2025-04-19 00:09:33,062 : INFO : topic #2 (0.167): 0.051*"工作" + 0.019*"時間" + 0.014*"面試" + 0.012*"小時" + 0.011*"方式" + 0.009*"經驗" + 0.009*"內容" + 0.007*"工時" + 0.007*"公司" + 0.006*"需要" 2025-04-19 00:09:33,062 : INFO : topic #0 (0.167): 0.036*"工作" + 0.025*"方式" + 0.015*"工資" + 0.015*"依法" + 0.015*"推定" + 0.012*"每日" + 0.012*"未註明" + 0.012*"單位" + 0.012*"小時" + 0.012*"休息" 2025-04-19 00:09:33,063 : INFO : topic #3 (0.167): 0.016*"美國" + 0.015*"晶片" + 0.015*"表示" + 0.013*"中國" + 0.013*"半導體" + 0.013*"台積電" + 0.011*"投資" + 0.010*"台積" + 0.010*"台灣" + 0.007*"製程" 2025-04-19 00:09:33,064 : INFO : topic diff=0.222103, rho=0.299409 2025-04-19 00:09:33,064 : INFO : PROGRESS: pass 2, at document #8000/16310 2025-04-19 00:09:33,353 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:33,356 : INFO : topic #4 (0.167): 0.034*"工作" + 0.015*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.012*"第一項" + 0.012*"情形" + 0.011*"資訊" + 0.011*"應徵" + 0.011*"國定假日" 2025-04-19 00:09:33,357 : INFO : topic #2 (0.167): 0.053*"工作" + 0.019*"面試" + 0.019*"時間" + 0.012*"經驗" + 0.011*"小時" + 0.011*"方式" + 0.009*"公司" + 0.009*"內容" + 0.007*"工時" + 0.006*"職缺" 2025-04-19 00:09:33,357 : INFO : topic #3 (0.167): 0.015*"美國" + 0.014*"晶片" + 0.014*"表示" + 0.014*"半導體" + 0.014*"中國" + 0.013*"台積電" + 0.010*"台灣" + 0.010*"投資" + 0.010*"台積" + 0.007*"製程" 2025-04-19 00:09:33,358 : INFO : topic #1 (0.167): 0.023*"工作" + 0.013*"方式" + 0.013*"聯絡" + 0.012*"內容" + 0.010*"資訊" + 0.010*"文字" + 0.009*"電話" + 0.009*"情形" + 0.009*"聯絡人" + 0.009*"砍除" 2025-04-19 00:09:33,358 : INFO : topic #0 (0.167): 0.036*"工作" + 0.025*"方式" + 0.015*"工資" + 0.015*"依法" + 0.015*"推定" + 0.012*"每日" + 0.012*"單位" + 0.012*"未註明" + 0.012*"小時" + 0.012*"休息" 2025-04-19 00:09:33,358 : INFO : topic diff=0.335649, rho=0.299409 2025-04-19 00:09:33,359 : INFO : PROGRESS: pass 2, at document #10000/16310 2025-04-19 00:09:33,614 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:33,617 : INFO : topic #0 (0.167): 0.036*"工作" + 0.025*"方式" + 0.015*"工資" + 0.015*"依法" + 0.015*"推定" + 0.012*"每日" + 0.012*"單位" + 0.012*"未註明" + 0.012*"小時" + 0.011*"休息" 2025-04-19 00:09:33,618 : INFO : topic #3 (0.167): 0.014*"美國" + 0.014*"表示" + 0.014*"半導體" + 0.013*"中國" + 0.013*"晶片" + 0.011*"台積電" + 0.011*"台灣" + 0.010*"投資" + 0.009*"台積" + 0.008*"問卷" 2025-04-19 00:09:33,618 : INFO : topic #4 (0.167): 0.034*"工作" + 0.015*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.012*"第一項" + 0.012*"情形" + 0.011*"資訊" + 0.011*"應徵" + 0.011*"國定假日" 2025-04-19 00:09:33,619 : INFO : topic #5 (0.167): 0.016*"公司" + 0.007*"問題" + 0.006*"工程師" + 0.006*"開發" + 0.006*"技術" + 0.005*"目前" + 0.004*"產品" + 0.004*"使用" + 0.004*"比較" + 0.004*"知道" 2025-04-19 00:09:33,619 : INFO : topic #2 (0.167): 0.053*"工作" + 0.021*"面試" + 0.019*"時間" + 0.013*"經驗" + 0.011*"方式" + 0.011*"小時" + 0.010*"公司" + 0.009*"內容" + 0.008*"職缺" + 0.007*"工時" 2025-04-19 00:09:33,619 : INFO : topic diff=0.270297, rho=0.299409 2025-04-19 00:09:33,620 : INFO : PROGRESS: pass 2, at document #12000/16310 2025-04-19 00:09:33,880 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:33,883 : INFO : topic #0 (0.167): 0.035*"工作" + 0.025*"方式" + 0.015*"工資" + 0.015*"依法" + 0.014*"推定" + 0.012*"單位" + 0.012*"每日" + 0.011*"未註明" + 0.011*"小時" + 0.011*"內容" 2025-04-19 00:09:33,883 : INFO : topic #3 (0.167): 0.017*"晶片" + 0.017*"半導體" + 0.014*"表示" + 0.013*"台積電" + 0.012*"台灣" + 0.012*"美國" + 0.011*"中國" + 0.010*"台積" + 0.008*"製程" + 0.008*"投資" 2025-04-19 00:09:33,884 : INFO : topic #4 (0.167): 0.034*"工作" + 0.015*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.012*"第一項" + 0.012*"情形" + 0.011*"資訊" + 0.011*"應徵" + 0.011*"國定假日" 2025-04-19 00:09:33,884 : INFO : topic #2 (0.167): 0.053*"工作" + 0.021*"面試" + 0.019*"時間" + 0.013*"經驗" + 0.011*"公司" + 0.010*"方式" + 0.010*"小時" + 0.009*"內容" + 0.008*"職缺" + 0.007*"薪資" 2025-04-19 00:09:33,885 : INFO : topic #1 (0.167): 0.023*"工作" + 0.013*"方式" + 0.012*"聯絡" + 0.011*"內容" + 0.010*"資訊" + 0.010*"文字" + 0.009*"電話" + 0.009*"情形" + 0.009*"聯絡人" + 0.009*"報名" 2025-04-19 00:09:33,885 : INFO : topic diff=0.331999, rho=0.299409 2025-04-19 00:09:33,886 : INFO : PROGRESS: pass 2, at document #14000/16310 2025-04-19 00:09:34,105 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:34,108 : INFO : topic #1 (0.167): 0.022*"工作" + 0.013*"方式" + 0.012*"聯絡" + 0.011*"內容" + 0.010*"資訊" + 0.010*"文字" + 0.009*"電話" + 0.009*"情形" + 0.009*"報名" + 0.009*"聯絡人" 2025-04-19 00:09:34,109 : INFO : topic #5 (0.167): 0.014*"公司" + 0.006*"問題" + 0.005*"技術" + 0.005*"工程師" + 0.004*"目前" + 0.004*"台灣" + 0.004*"開發" + 0.004*"員工" + 0.004*"工作" + 0.004*"產品" 2025-04-19 00:09:34,109 : INFO : topic #2 (0.167): 0.053*"工作" + 0.020*"面試" + 0.018*"時間" + 0.012*"經驗" + 0.011*"公司" + 0.010*"小時" + 0.009*"方式" + 0.009*"內容" + 0.008*"薪資" + 0.007*"職缺" 2025-04-19 00:09:34,110 : INFO : topic #0 (0.167): 0.035*"工作" + 0.025*"方式" + 0.015*"工資" + 0.015*"依法" + 0.014*"推定" + 0.012*"單位" + 0.012*"每日" + 0.011*"小時" + 0.011*"未註明" + 0.011*"內容" 2025-04-19 00:09:34,110 : INFO : topic #3 (0.167): 0.017*"晶片" + 0.015*"半導體" + 0.015*"台灣" + 0.014*"表示" + 0.013*"台積電" + 0.013*"中國" + 0.013*"美國" + 0.010*"台積" + 0.008*"英特爾" + 0.007*"全球" 2025-04-19 00:09:34,111 : INFO : topic diff=0.320091, rho=0.299409 2025-04-19 00:09:34,111 : INFO : PROGRESS: pass 2, at document #16000/16310 2025-04-19 00:09:34,318 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:34,321 : INFO : topic #1 (0.167): 0.022*"工作" + 0.013*"方式" + 0.012*"聯絡" + 0.011*"內容" + 0.010*"資訊" + 0.010*"文字" + 0.009*"報名" + 0.009*"電話" + 0.009*"情形" + 0.009*"聯絡人" 2025-04-19 00:09:34,321 : INFO : topic #3 (0.167): 0.018*"晶片" + 0.016*"美國" + 0.014*"表示" + 0.014*"半導體" + 0.014*"台灣" + 0.013*"台積電" + 0.013*"中國" + 0.010*"台積" + 0.009*"英特爾" + 0.007*"積電" 2025-04-19 00:09:34,322 : INFO : topic #5 (0.167): 0.014*"公司" + 0.006*"技術" + 0.005*"問題" + 0.005*"工程師" + 0.004*"員工" + 0.004*"台灣" + 0.004*"目前" + 0.004*"產品" + 0.003*"工作" + 0.003*"開發" 2025-04-19 00:09:34,322 : INFO : topic #4 (0.167): 0.034*"工作" + 0.015*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"情形" + 0.012*"方式" + 0.012*"第一項" + 0.011*"資訊" + 0.011*"應徵" + 0.011*"國定假日" 2025-04-19 00:09:34,323 : INFO : topic #0 (0.167): 0.033*"工作" + 0.024*"方式" + 0.015*"工資" + 0.014*"依法" + 0.013*"推定" + 0.012*"單位" + 0.011*"每日" + 0.011*"小時" + 0.011*"未註明" + 0.011*"內容" 2025-04-19 00:09:34,323 : INFO : topic diff=0.269258, rho=0.299409 2025-04-19 00:09:34,425 : INFO : -8.344 per-word bound, 325.0 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:09:34,426 : INFO : PROGRESS: pass 2, at document #16310/16310 2025-04-19 00:09:34,461 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:09:34,464 : INFO : topic #0 (0.167): 0.033*"工作" + 0.023*"方式" + 0.015*"工資" + 0.014*"依法" + 0.013*"推定" + 0.012*"單位" + 0.011*"每日" + 0.011*"小時" + 0.010*"未註明" + 0.010*"內容" 2025-04-19 00:09:34,465 : INFO : topic #5 (0.167): 0.015*"公司" + 0.007*"技術" + 0.005*"問題" + 0.005*"員工" + 0.005*"工程師" + 0.004*"科技" + 0.004*"台灣" + 0.004*"目前" + 0.004*"現在" + 0.004*"知道" 2025-04-19 00:09:34,465 : INFO : topic #3 (0.167): 0.021*"美國" + 0.018*"晶片" + 0.015*"台灣" + 0.014*"台積電" + 0.013*"表示" + 0.012*"中國" + 0.012*"台積" + 0.012*"半導體" + 0.011*"投資" + 0.009*"英特爾" 2025-04-19 00:09:34,466 : INFO : topic #4 (0.167): 0.034*"工作" + 0.015*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"情形" + 0.012*"方式" + 0.012*"第一項" + 0.011*"資訊" + 0.011*"國定假日" + 0.011*"應徵" 2025-04-19 00:09:34,466 : INFO : topic #2 (0.167): 0.049*"工作" + 0.018*"面試" + 0.016*"時間" + 0.011*"公司" + 0.010*"經驗" + 0.010*"小時" + 0.008*"內容" + 0.008*"工時" + 0.008*"薪資" + 0.007*"方式" 2025-04-19 00:09:34,467 : INFO : topic diff=0.279560, rho=0.299409 2025-04-19 00:09:34,467 : INFO : PROGRESS: pass 3, at document #2000/16310 2025-04-19 00:09:34,992 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:34,995 : INFO : topic #5 (0.167): 0.014*"公司" + 0.006*"技術" + 0.005*"問題" + 0.005*"員工" + 0.005*"工程師" + 0.004*"科技" + 0.004*"台灣" + 0.004*"目前" + 0.004*"現在" + 0.003*"知道" 2025-04-19 00:09:34,996 : INFO : topic #2 (0.167): 0.050*"工作" + 0.017*"時間" + 0.017*"面試" + 0.010*"小時" + 0.010*"經驗" + 0.010*"公司" + 0.009*"內容" + 0.008*"方式" + 0.008*"工時" + 0.006*"薪資" 2025-04-19 00:09:34,996 : INFO : topic #0 (0.167): 0.037*"工作" + 0.026*"方式" + 0.017*"工資" + 0.016*"依法" + 0.015*"推定" + 0.013*"每日" + 0.013*"單位" + 0.013*"小時" + 0.013*"未註明" + 0.012*"休息" 2025-04-19 00:09:34,997 : INFO : topic #3 (0.167): 0.021*"美國" + 0.017*"晶片" + 0.015*"台灣" + 0.014*"台積電" + 0.013*"表示" + 0.012*"中國" + 0.012*"台積" + 0.012*"半導體" + 0.011*"投資" + 0.009*"英特爾" 2025-04-19 00:09:34,998 : INFO : topic #4 (0.167): 0.034*"工作" + 0.015*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.011*"應徵" + 0.011*"資訊" + 0.011*"國定假日" 2025-04-19 00:09:34,999 : INFO : topic diff=0.724106, rho=0.286829 2025-04-19 00:09:34,999 : INFO : PROGRESS: pass 3, at document #4000/16310 2025-04-19 00:09:35,538 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:35,541 : INFO : topic #0 (0.167): 0.038*"工作" + 0.026*"方式" + 0.017*"工資" + 0.016*"推定" + 0.015*"依法" + 0.014*"小時" + 0.014*"單位" + 0.013*"未註明" + 0.013*"每日" + 0.012*"休息" 2025-04-19 00:09:35,541 : INFO : topic #3 (0.167): 0.020*"美國" + 0.017*"晶片" + 0.015*"台灣" + 0.014*"台積電" + 0.013*"表示" + 0.013*"中國" + 0.012*"台積" + 0.012*"半導體" + 0.011*"投資" + 0.009*"英特爾" 2025-04-19 00:09:35,542 : INFO : topic #2 (0.167): 0.049*"工作" + 0.018*"時間" + 0.016*"面試" + 0.011*"小時" + 0.010*"經驗" + 0.009*"內容" + 0.008*"公司" + 0.008*"方式" + 0.007*"工時" + 0.007*"需要" 2025-04-19 00:09:35,542 : INFO : topic #1 (0.167): 0.014*"報名" + 0.014*"電話" + 0.013*"工作" + 0.013*"活動" + 0.012*"聯絡" + 0.011*"方式" + 0.011*"內容" + 0.011*"單位名稱" + 0.010*"單位地址" + 0.010*"台北市" 2025-04-19 00:09:35,543 : INFO : topic #4 (0.167): 0.034*"工作" + 0.014*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"第一項" + 0.012*"情形" + 0.012*"方式" + 0.011*"資訊" + 0.011*"應徵" + 0.011*"國定假日" 2025-04-19 00:09:35,543 : INFO : topic diff=0.293081, rho=0.286829 2025-04-19 00:09:35,544 : INFO : PROGRESS: pass 3, at document #6000/16310 2025-04-19 00:09:35,963 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:35,966 : INFO : topic #5 (0.167): 0.015*"公司" + 0.006*"技術" + 0.005*"問題" + 0.005*"工程師" + 0.005*"產品" + 0.004*"員工" + 0.004*"目前" + 0.004*"台灣" + 0.004*"知道" + 0.004*"科技" 2025-04-19 00:09:35,967 : INFO : topic #1 (0.167): 0.019*"報名" + 0.016*"活動" + 0.016*"電話" + 0.012*"聯絡" + 0.011*"台北市" + 0.011*"內容" + 0.010*"方式" + 0.010*"單位名稱" + 0.010*"工作" + 0.010*"單位地址" 2025-04-19 00:09:35,967 : INFO : topic #2 (0.167): 0.048*"工作" + 0.018*"時間" + 0.015*"面試" + 0.011*"經驗" + 0.010*"小時" + 0.009*"公司" + 0.009*"內容" + 0.008*"方式" + 0.007*"工時" + 0.007*"需要" 2025-04-19 00:09:35,968 : INFO : topic #4 (0.167): 0.033*"工作" + 0.014*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"第一項" + 0.012*"情形" + 0.012*"方式" + 0.011*"資訊" + 0.011*"應徵" + 0.011*"聯絡" 2025-04-19 00:09:35,968 : INFO : topic #3 (0.167): 0.019*"美國" + 0.016*"晶片" + 0.014*"台灣" + 0.013*"台積電" + 0.013*"表示" + 0.012*"中國" + 0.011*"台積" + 0.011*"半導體" + 0.010*"投資" + 0.009*"英特爾" 2025-04-19 00:09:35,969 : INFO : topic diff=0.205952, rho=0.286829 2025-04-19 00:09:35,969 : INFO : PROGRESS: pass 3, at document #8000/16310 2025-04-19 00:09:36,254 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:36,256 : INFO : topic #2 (0.167): 0.050*"工作" + 0.020*"面試" + 0.019*"時間" + 0.014*"經驗" + 0.011*"公司" + 0.010*"小時" + 0.009*"方式" + 0.009*"內容" + 0.008*"職缺" + 0.007*"薪資" 2025-04-19 00:09:36,257 : INFO : topic #0 (0.167): 0.039*"工作" + 0.027*"方式" + 0.017*"工資" + 0.016*"推定" + 0.016*"依法" + 0.014*"小時" + 0.014*"每日" + 0.013*"單位" + 0.013*"未註明" + 0.012*"休息" 2025-04-19 00:09:36,258 : INFO : topic #1 (0.167): 0.019*"報名" + 0.016*"活動" + 0.016*"電話" + 0.012*"聯絡" + 0.011*"台北市" + 0.011*"內容" + 0.010*"方式" + 0.010*"單位名稱" + 0.010*"工作" + 0.009*"單位地址" 2025-04-19 00:09:36,258 : INFO : topic #4 (0.167): 0.033*"工作" + 0.014*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"第一項" + 0.012*"情形" + 0.012*"方式" + 0.011*"資訊" + 0.011*"應徵" + 0.011*"聯絡" 2025-04-19 00:09:36,259 : INFO : topic #3 (0.167): 0.019*"美國" + 0.015*"晶片" + 0.015*"台灣" + 0.013*"台積電" + 0.012*"中國" + 0.012*"表示" + 0.011*"半導體" + 0.011*"台積" + 0.010*"投資" + 0.008*"英特爾" 2025-04-19 00:09:36,259 : INFO : topic diff=0.317737, rho=0.286829 2025-04-19 00:09:36,259 : INFO : PROGRESS: pass 3, at document #10000/16310 2025-04-19 00:09:36,503 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:36,506 : INFO : topic #1 (0.167): 0.020*"報名" + 0.017*"活動" + 0.016*"電話" + 0.012*"聯絡" + 0.011*"台北市" + 0.011*"內容" + 0.010*"方式" + 0.009*"工作" + 0.009*"單位名稱" + 0.009*"單位地址" 2025-04-19 00:09:36,506 : INFO : topic #0 (0.167): 0.039*"工作" + 0.027*"方式" + 0.016*"工資" + 0.016*"推定" + 0.015*"依法" + 0.014*"小時" + 0.013*"每日" + 0.013*"單位" + 0.013*"未註明" + 0.012*"休息" 2025-04-19 00:09:36,507 : INFO : topic #3 (0.167): 0.018*"美國" + 0.015*"台灣" + 0.014*"晶片" + 0.012*"中國" + 0.012*"表示" + 0.012*"台積電" + 0.012*"半導體" + 0.010*"台積" + 0.010*"投資" + 0.007*"英特爾" 2025-04-19 00:09:36,507 : INFO : topic #5 (0.167): 0.016*"公司" + 0.007*"問題" + 0.006*"工程師" + 0.006*"技術" + 0.006*"開發" + 0.005*"目前" + 0.005*"產品" + 0.005*"比較" + 0.004*"覺得" + 0.004*"知道" 2025-04-19 00:09:36,507 : INFO : topic #2 (0.167): 0.049*"工作" + 0.021*"面試" + 0.019*"時間" + 0.015*"經驗" + 0.012*"公司" + 0.010*"小時" + 0.010*"內容" + 0.009*"職缺" + 0.009*"方式" + 0.009*"薪資" 2025-04-19 00:09:36,508 : INFO : topic diff=0.252901, rho=0.286829 2025-04-19 00:09:36,509 : INFO : PROGRESS: pass 3, at document #12000/16310 2025-04-19 00:09:36,776 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:36,778 : INFO : topic #3 (0.167): 0.016*"晶片" + 0.015*"台灣" + 0.014*"半導體" + 0.014*"美國" + 0.012*"表示" + 0.012*"台積電" + 0.010*"中國" + 0.010*"台積" + 0.008*"投資" + 0.007*"全球" 2025-04-19 00:09:36,779 : INFO : topic #0 (0.167): 0.038*"工作" + 0.027*"方式" + 0.016*"工資" + 0.015*"推定" + 0.015*"依法" + 0.014*"小時" + 0.014*"單位" + 0.013*"每日" + 0.013*"未註明" + 0.012*"休息" 2025-04-19 00:09:36,779 : INFO : topic #2 (0.167): 0.049*"工作" + 0.021*"面試" + 0.018*"時間" + 0.014*"經驗" + 0.012*"公司" + 0.010*"小時" + 0.009*"內容" + 0.009*"薪資" + 0.009*"職缺" + 0.009*"方式" 2025-04-19 00:09:36,780 : INFO : topic #4 (0.167): 0.033*"工作" + 0.014*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"第一項" + 0.012*"情形" + 0.012*"方式" + 0.011*"資訊" + 0.011*"應徵" + 0.011*"聯絡" 2025-04-19 00:09:36,780 : INFO : topic #1 (0.167): 0.021*"報名" + 0.019*"活動" + 0.015*"電話" + 0.012*"聯絡" + 0.011*"台北市" + 0.011*"內容" + 0.011*"方式" + 0.009*"工作" + 0.009*"人數" + 0.009*"時間" 2025-04-19 00:09:36,781 : INFO : topic diff=0.301646, rho=0.286829 2025-04-19 00:09:36,781 : INFO : PROGRESS: pass 3, at document #14000/16310 2025-04-19 00:09:36,995 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:36,998 : INFO : topic #2 (0.167): 0.049*"工作" + 0.020*"面試" + 0.017*"時間" + 0.014*"經驗" + 0.012*"公司" + 0.010*"薪資" + 0.009*"內容" + 0.009*"小時" + 0.009*"職缺" + 0.008*"方式" 2025-04-19 00:09:36,999 : INFO : topic #3 (0.167): 0.016*"台灣" + 0.015*"晶片" + 0.013*"美國" + 0.013*"表示" + 0.013*"半導體" + 0.012*"台積電" + 0.011*"中國" + 0.010*"台積" + 0.008*"產業" + 0.007*"全球" 2025-04-19 00:09:36,999 : INFO : topic #0 (0.167): 0.038*"工作" + 0.026*"方式" + 0.016*"工資" + 0.015*"依法" + 0.015*"推定" + 0.014*"單位" + 0.014*"小時" + 0.013*"每日" + 0.012*"未註明" + 0.012*"休息" 2025-04-19 00:09:37,000 : INFO : topic #4 (0.167): 0.033*"工作" + 0.014*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.012*"情形" + 0.012*"第一項" + 0.012*"方式" + 0.011*"資訊" + 0.011*"應徵" + 0.011*"聯絡" 2025-04-19 00:09:37,000 : INFO : topic #5 (0.167): 0.015*"公司" + 0.006*"問題" + 0.006*"技術" + 0.005*"工程師" + 0.004*"工作" + 0.004*"目前" + 0.004*"開發" + 0.004*"產品" + 0.004*"現在" + 0.004*"比較" 2025-04-19 00:09:37,001 : INFO : topic diff=0.294485, rho=0.286829 2025-04-19 00:09:37,001 : INFO : PROGRESS: pass 3, at document #16000/16310 2025-04-19 00:09:37,207 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:37,210 : INFO : topic #2 (0.167): 0.048*"工作" + 0.020*"面試" + 0.016*"時間" + 0.013*"公司" + 0.012*"經驗" + 0.010*"薪資" + 0.009*"內容" + 0.009*"小時" + 0.008*"職缺" + 0.007*"工時" 2025-04-19 00:09:37,210 : INFO : topic #3 (0.167): 0.016*"晶片" + 0.015*"美國" + 0.015*"台灣" + 0.013*"表示" + 0.012*"半導體" + 0.012*"台積電" + 0.012*"中國" + 0.009*"台積" + 0.009*"英特爾" + 0.008*"報導" 2025-04-19 00:09:37,211 : INFO : topic #0 (0.167): 0.037*"工作" + 0.026*"方式" + 0.017*"工資" + 0.015*"依法" + 0.015*"推定" + 0.014*"單位" + 0.014*"小時" + 0.013*"每日" + 0.012*"未註明" + 0.012*"休息" 2025-04-19 00:09:37,211 : INFO : topic #5 (0.167): 0.015*"公司" + 0.006*"技術" + 0.006*"問題" + 0.005*"工程師" + 0.004*"目前" + 0.004*"工作" + 0.004*"產品" + 0.004*"現在" + 0.004*"員工" + 0.004*"開發" 2025-04-19 00:09:37,212 : INFO : topic #1 (0.167): 0.023*"報名" + 0.020*"活動" + 0.015*"電話" + 0.012*"聯絡" + 0.010*"台北市" + 0.010*"方式" + 0.010*"內容" + 0.009*"人數" + 0.009*"時間" + 0.009*"工作" 2025-04-19 00:09:37,212 : INFO : topic diff=0.247734, rho=0.286829 2025-04-19 00:09:37,288 : INFO : -8.324 per-word bound, 320.4 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:09:37,289 : INFO : PROGRESS: pass 3, at document #16310/16310 2025-04-19 00:09:37,323 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:09:37,326 : INFO : topic #0 (0.167): 0.037*"工作" + 0.025*"方式" + 0.017*"工資" + 0.015*"依法" + 0.014*"推定" + 0.013*"小時" + 0.013*"單位" + 0.013*"每日" + 0.012*"未註明" + 0.011*"休息" 2025-04-19 00:09:37,327 : INFO : topic #4 (0.167): 0.033*"工作" + 0.014*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.012*"情形" + 0.012*"第一項" + 0.012*"方式" + 0.011*"資訊" + 0.011*"應徵" + 0.011*"國定假日" 2025-04-19 00:09:37,327 : INFO : topic #2 (0.167): 0.046*"工作" + 0.018*"面試" + 0.016*"時間" + 0.013*"公司" + 0.011*"經驗" + 0.010*"薪資" + 0.010*"小時" + 0.008*"內容" + 0.008*"工時" + 0.007*"職缺" 2025-04-19 00:09:37,328 : INFO : topic #1 (0.167): 0.024*"報名" + 0.020*"活動" + 0.014*"電話" + 0.011*"聯絡" + 0.010*"台北市" + 0.010*"問卷" + 0.010*"方式" + 0.010*"內容" + 0.010*"時間" + 0.009*"工作" 2025-04-19 00:09:37,328 : INFO : topic #3 (0.167): 0.020*"美國" + 0.016*"晶片" + 0.015*"台灣" + 0.013*"台積電" + 0.012*"表示" + 0.011*"台積" + 0.011*"中國" + 0.011*"半導體" + 0.010*"投資" + 0.009*"英特爾" 2025-04-19 00:09:37,328 : INFO : topic diff=0.256199, rho=0.286829 2025-04-19 00:09:37,329 : INFO : PROGRESS: pass 4, at document #2000/16310 2025-04-19 00:09:37,805 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:37,808 : INFO : topic #4 (0.167): 0.033*"工作" + 0.014*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.012*"第一項" + 0.012*"情形" + 0.012*"方式" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"應徵" 2025-04-19 00:09:37,808 : INFO : topic #1 (0.167): 0.024*"報名" + 0.021*"活動" + 0.018*"電話" + 0.013*"台北市" + 0.012*"聯絡" + 0.010*"舉辦" + 0.010*"內容" + 0.010*"時間" + 0.010*"人數" + 0.010*"通知" 2025-04-19 00:09:37,809 : INFO : topic #3 (0.167): 0.020*"美國" + 0.016*"晶片" + 0.015*"台灣" + 0.013*"台積電" + 0.012*"表示" + 0.011*"中國" + 0.011*"台積" + 0.011*"半導體" + 0.010*"投資" + 0.009*"英特爾" 2025-04-19 00:09:37,809 : INFO : topic #5 (0.167): 0.015*"公司" + 0.007*"技術" + 0.005*"問題" + 0.005*"工程師" + 0.004*"工作" + 0.004*"現在" + 0.004*"員工" + 0.004*"知道" + 0.004*"目前" + 0.004*"產品" 2025-04-19 00:09:37,810 : INFO : topic #0 (0.167): 0.039*"工作" + 0.027*"方式" + 0.017*"工資" + 0.016*"推定" + 0.016*"依法" + 0.014*"小時" + 0.014*"單位" + 0.013*"每日" + 0.013*"未註明" + 0.012*"休息" 2025-04-19 00:09:37,810 : INFO : topic diff=0.647242, rho=0.275711 2025-04-19 00:09:37,811 : INFO : PROGRESS: pass 4, at document #4000/16310 2025-04-19 00:09:38,239 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:38,242 : INFO : topic #4 (0.167): 0.033*"工作" + 0.014*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.012*"第一項" + 0.012*"情形" + 0.012*"方式" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"應徵" 2025-04-19 00:09:38,243 : INFO : topic #3 (0.167): 0.019*"美國" + 0.016*"晶片" + 0.015*"台灣" + 0.013*"台積電" + 0.012*"表示" + 0.011*"中國" + 0.011*"台積" + 0.010*"半導體" + 0.010*"投資" + 0.009*"英特爾" 2025-04-19 00:09:38,243 : INFO : topic #0 (0.167): 0.040*"工作" + 0.027*"方式" + 0.017*"工資" + 0.016*"推定" + 0.015*"依法" + 0.015*"小時" + 0.014*"單位" + 0.013*"未註明" + 0.013*"每日" + 0.012*"休息" 2025-04-19 00:09:38,244 : INFO : topic #1 (0.167): 0.027*"報名" + 0.024*"活動" + 0.020*"電話" + 0.014*"台北市" + 0.012*"聯絡" + 0.011*"人數" + 0.011*"車馬費" + 0.011*"舉辦" + 0.011*"時間" + 0.010*"內容" 2025-04-19 00:09:38,244 : INFO : topic #2 (0.167): 0.046*"工作" + 0.017*"時間" + 0.016*"面試" + 0.011*"經驗" + 0.010*"小時" + 0.010*"公司" + 0.009*"內容" + 0.007*"工時" + 0.007*"相關" + 0.007*"薪資" 2025-04-19 00:09:38,245 : INFO : topic diff=0.283261, rho=0.275711 2025-04-19 00:09:38,245 : INFO : PROGRESS: pass 4, at document #6000/16310 2025-04-19 00:09:38,616 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:38,619 : INFO : topic #2 (0.167): 0.044*"工作" + 0.017*"時間" + 0.015*"面試" + 0.012*"經驗" + 0.010*"公司" + 0.010*"小時" + 0.009*"內容" + 0.007*"方式" + 0.007*"相關" + 0.007*"工時" 2025-04-19 00:09:38,620 : INFO : topic #3 (0.167): 0.018*"美國" + 0.015*"晶片" + 0.015*"台灣" + 0.012*"台積電" + 0.012*"表示" + 0.011*"中國" + 0.011*"台積" + 0.010*"半導體" + 0.010*"投資" + 0.008*"英特爾" 2025-04-19 00:09:38,621 : INFO : topic #1 (0.167): 0.030*"報名" + 0.026*"活動" + 0.021*"電話" + 0.015*"台北市" + 0.012*"車馬費" + 0.012*"人數" + 0.012*"聯絡" + 0.012*"舉辦" + 0.011*"訪問" + 0.011*"通知" 2025-04-19 00:09:38,621 : INFO : topic #0 (0.167): 0.041*"工作" + 0.027*"方式" + 0.017*"工資" + 0.016*"推定" + 0.016*"依法" + 0.015*"小時" + 0.014*"每日" + 0.014*"單位" + 0.013*"未註明" + 0.013*"休息" 2025-04-19 00:09:38,622 : INFO : topic #5 (0.167): 0.015*"公司" + 0.006*"技術" + 0.005*"問題" + 0.005*"工程師" + 0.005*"產品" + 0.004*"知道" + 0.004*"工作" + 0.004*"目前" + 0.004*"現在" + 0.004*"開發" 2025-04-19 00:09:38,622 : INFO : topic diff=0.210308, rho=0.275711 2025-04-19 00:09:38,622 : INFO : PROGRESS: pass 4, at document #8000/16310 2025-04-19 00:09:38,913 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:38,916 : INFO : topic #2 (0.167): 0.046*"工作" + 0.019*"面試" + 0.017*"時間" + 0.015*"經驗" + 0.012*"公司" + 0.010*"小時" + 0.009*"內容" + 0.009*"職缺" + 0.008*"方式" + 0.008*"薪資" 2025-04-19 00:09:38,917 : INFO : topic #1 (0.167): 0.030*"報名" + 0.026*"活動" + 0.021*"電話" + 0.015*"台北市" + 0.012*"舉辦" + 0.012*"車馬費" + 0.012*"人數" + 0.012*"聯絡" + 0.011*"通知" + 0.011*"訪問" 2025-04-19 00:09:38,918 : INFO : topic #3 (0.167): 0.019*"美國" + 0.015*"台灣" + 0.014*"晶片" + 0.011*"台積電" + 0.011*"中國" + 0.011*"表示" + 0.010*"半導體" + 0.010*"台積" + 0.009*"投資" + 0.008*"英特爾" 2025-04-19 00:09:38,918 : INFO : topic #4 (0.167): 0.033*"工作" + 0.014*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.012*"第一項" + 0.012*"情形" + 0.012*"方式" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"應徵" 2025-04-19 00:09:38,918 : INFO : topic #5 (0.167): 0.016*"公司" + 0.007*"問題" + 0.006*"工程師" + 0.006*"技術" + 0.005*"工作" + 0.005*"開發" + 0.005*"產品" + 0.005*"目前" + 0.005*"覺得" + 0.004*"知道" 2025-04-19 00:09:38,919 : INFO : topic diff=0.300969, rho=0.275711 2025-04-19 00:09:38,919 : INFO : PROGRESS: pass 4, at document #10000/16310 2025-04-19 00:09:39,163 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:39,166 : INFO : topic #1 (0.167): 0.031*"報名" + 0.027*"活動" + 0.020*"電話" + 0.014*"台北市" + 0.012*"舉辦" + 0.012*"人數" + 0.012*"聯絡" + 0.012*"車馬費" + 0.011*"通知" + 0.011*"時間" 2025-04-19 00:09:39,167 : INFO : topic #3 (0.167): 0.018*"美國" + 0.016*"台灣" + 0.013*"晶片" + 0.011*"中國" + 0.011*"表示" + 0.011*"台積電" + 0.010*"半導體" + 0.010*"台積" + 0.009*"投資" + 0.007*"報導" 2025-04-19 00:09:39,167 : INFO : topic #2 (0.167): 0.046*"工作" + 0.020*"面試" + 0.017*"時間" + 0.015*"經驗" + 0.013*"公司" + 0.010*"職缺" + 0.010*"薪資" + 0.010*"小時" + 0.009*"內容" + 0.008*"方式" 2025-04-19 00:09:39,168 : INFO : topic #5 (0.167): 0.016*"公司" + 0.008*"問題" + 0.006*"工程師" + 0.006*"技術" + 0.005*"開發" + 0.005*"工作" + 0.005*"目前" + 0.005*"覺得" + 0.005*"比較" + 0.005*"產品" 2025-04-19 00:09:39,168 : INFO : topic #4 (0.167): 0.033*"工作" + 0.014*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.012*"第一項" + 0.012*"情形" + 0.012*"方式" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"應徵" 2025-04-19 00:09:39,169 : INFO : topic diff=0.238076, rho=0.275711 2025-04-19 00:09:39,169 : INFO : PROGRESS: pass 4, at document #12000/16310 2025-04-19 00:09:39,389 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:39,392 : INFO : topic #5 (0.167): 0.015*"公司" + 0.007*"問題" + 0.006*"工程師" + 0.006*"技術" + 0.005*"工作" + 0.005*"開發" + 0.005*"目前" + 0.005*"比較" + 0.004*"覺得" + 0.004*"知道" 2025-04-19 00:09:39,392 : INFO : topic #3 (0.167): 0.015*"台灣" + 0.015*"晶片" + 0.014*"美國" + 0.012*"半導體" + 0.011*"表示" + 0.011*"台積電" + 0.010*"中國" + 0.009*"台積" + 0.008*"報導" + 0.007*"產業" 2025-04-19 00:09:39,393 : INFO : topic #4 (0.167): 0.033*"工作" + 0.014*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.012*"第一項" + 0.012*"情形" + 0.012*"方式" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"應徵" 2025-04-19 00:09:39,393 : INFO : topic #2 (0.167): 0.045*"工作" + 0.020*"面試" + 0.017*"時間" + 0.015*"經驗" + 0.014*"公司" + 0.010*"薪資" + 0.010*"職缺" + 0.009*"小時" + 0.009*"內容" + 0.008*"方式" 2025-04-19 00:09:39,394 : INFO : topic #0 (0.167): 0.040*"工作" + 0.027*"方式" + 0.016*"工資" + 0.016*"推定" + 0.015*"依法" + 0.014*"小時" + 0.014*"單位" + 0.013*"每日" + 0.013*"未註明" + 0.012*"休息" 2025-04-19 00:09:39,394 : INFO : topic diff=0.276355, rho=0.275711 2025-04-19 00:09:39,395 : INFO : PROGRESS: pass 4, at document #14000/16310 2025-04-19 00:09:39,630 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:39,633 : INFO : topic #5 (0.167): 0.015*"公司" + 0.006*"問題" + 0.006*"技術" + 0.005*"工程師" + 0.005*"工作" + 0.004*"目前" + 0.004*"開發" + 0.004*"產品" + 0.004*"比較" + 0.004*"現在" 2025-04-19 00:09:39,634 : INFO : topic #3 (0.167): 0.016*"台灣" + 0.014*"晶片" + 0.013*"美國" + 0.012*"表示" + 0.012*"半導體" + 0.011*"台積電" + 0.010*"中國" + 0.009*"台積" + 0.009*"報導" + 0.008*"產業" 2025-04-19 00:09:39,634 : INFO : topic #4 (0.167): 0.032*"工作" + 0.014*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.012*"第一項" + 0.012*"情形" + 0.012*"方式" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"應徵" 2025-04-19 00:09:39,635 : INFO : topic #1 (0.167): 0.032*"報名" + 0.029*"活動" + 0.019*"電話" + 0.013*"台北市" + 0.013*"研究" + 0.013*"問卷" + 0.012*"舉辦" + 0.011*"人數" + 0.011*"聯絡" + 0.010*"參加" 2025-04-19 00:09:39,635 : INFO : topic #2 (0.167): 0.045*"工作" + 0.019*"面試" + 0.016*"時間" + 0.014*"經驗" + 0.014*"公司" + 0.011*"薪資" + 0.009*"職缺" + 0.009*"內容" + 0.009*"小時" + 0.007*"方式" 2025-04-19 00:09:39,636 : INFO : topic diff=0.273552, rho=0.275711 2025-04-19 00:09:39,636 : INFO : PROGRESS: pass 4, at document #16000/16310 2025-04-19 00:09:39,833 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:39,836 : INFO : topic #5 (0.167): 0.015*"公司" + 0.006*"問題" + 0.006*"技術" + 0.005*"工程師" + 0.005*"工作" + 0.004*"目前" + 0.004*"產品" + 0.004*"現在" + 0.004*"開發" + 0.003*"知道" 2025-04-19 00:09:39,837 : INFO : topic #4 (0.167): 0.032*"工作" + 0.013*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.012*"情形" + 0.012*"第一項" + 0.012*"方式" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"應徵" 2025-04-19 00:09:39,838 : INFO : topic #2 (0.167): 0.044*"工作" + 0.018*"面試" + 0.015*"時間" + 0.014*"公司" + 0.013*"經驗" + 0.011*"薪資" + 0.009*"內容" + 0.009*"小時" + 0.009*"職缺" + 0.007*"工時" 2025-04-19 00:09:39,838 : INFO : topic #0 (0.167): 0.039*"工作" + 0.026*"方式" + 0.017*"工資" + 0.015*"依法" + 0.015*"推定" + 0.014*"小時" + 0.014*"單位" + 0.013*"每日" + 0.012*"未註明" + 0.012*"休息" 2025-04-19 00:09:39,838 : INFO : topic #3 (0.167): 0.015*"美國" + 0.015*"晶片" + 0.014*"台灣" + 0.012*"表示" + 0.012*"半導體" + 0.011*"台積電" + 0.011*"中國" + 0.009*"報導" + 0.009*"台積" + 0.008*"英特爾" 2025-04-19 00:09:39,839 : INFO : topic diff=0.230569, rho=0.275711 2025-04-19 00:09:39,914 : INFO : -8.309 per-word bound, 317.2 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:09:39,914 : INFO : PROGRESS: pass 4, at document #16310/16310 2025-04-19 00:09:39,974 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:09:39,977 : INFO : topic #4 (0.167): 0.032*"工作" + 0.013*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.012*"情形" + 0.012*"第一項" + 0.012*"方式" + 0.011*"資訊" + 0.011*"內容" + 0.011*"應徵" 2025-04-19 00:09:39,977 : INFO : topic #1 (0.167): 0.031*"報名" + 0.029*"活動" + 0.019*"問卷" + 0.019*"研究" + 0.016*"電話" + 0.012*"台北市" + 0.011*"舉辦" + 0.011*"時間" + 0.011*"人數" + 0.011*"參與" 2025-04-19 00:09:39,978 : INFO : topic #0 (0.167): 0.038*"工作" + 0.026*"方式" + 0.017*"工資" + 0.015*"依法" + 0.014*"推定" + 0.014*"小時" + 0.014*"單位" + 0.013*"每日" + 0.012*"未註明" + 0.012*"休息" 2025-04-19 00:09:39,978 : INFO : topic #2 (0.167): 0.043*"工作" + 0.017*"面試" + 0.015*"時間" + 0.014*"公司" + 0.012*"經驗" + 0.011*"薪資" + 0.009*"小時" + 0.008*"內容" + 0.007*"工時" + 0.007*"職缺" 2025-04-19 00:09:39,979 : INFO : topic #3 (0.167): 0.019*"美國" + 0.015*"晶片" + 0.015*"台灣" + 0.012*"台積電" + 0.012*"表示" + 0.011*"台積" + 0.011*"中國" + 0.010*"半導體" + 0.009*"投資" + 0.009*"報導" 2025-04-19 00:09:39,979 : INFO : topic diff=0.237303, rho=0.275711 2025-04-19 00:09:39,980 : INFO : LdaModel lifecycle event {'msg': 'trained LdaModel<num_terms=23261, num_topics=6, decay=0.5, chunksize=2000> in 15.43s', 'datetime': '2025-04-19T00:09:39.980242', 'gensim': '4.3.3', 'python': '3.11.2 (main, Apr 21 2023, 22:51:21) [Clang 14.0.3 (clang-1403.0.22.14.1)]', 'platform': 'macOS-15.3.2-arm64-arm-64bit', 'event': 'created'} 2025-04-19 00:09:44,710 : INFO : -6.980 per-word bound, 126.2 perplexity estimate based on a held-out corpus of 16310 documents with 3460358 words 2025-04-19 00:09:44,712 : INFO : using ParallelWordOccurrenceAccumulator<processes=7, batch_size=64> to estimate probabilities from sliding windows 2025-04-19 00:09:48,520 : INFO : 1 batches submitted to accumulate stats from 64 documents (22660 virtual) 2025-04-19 00:09:48,523 : INFO : 2 batches submitted to accumulate stats from 128 documents (45646 virtual) 2025-04-19 00:09:48,525 : INFO : 3 batches submitted to accumulate stats from 192 documents (67171 virtual) 2025-04-19 00:09:48,528 : INFO : 4 batches submitted to accumulate stats from 256 documents (88330 virtual) 2025-04-19 00:09:48,531 : INFO : 5 batches submitted to accumulate stats from 320 documents (109687 virtual) 2025-04-19 00:09:48,534 : INFO : 6 batches submitted to accumulate stats from 384 documents (131042 virtual) 2025-04-19 00:09:48,539 : INFO : 7 batches submitted to accumulate stats from 448 documents (153774 virtual) 2025-04-19 00:09:48,546 : INFO : 8 batches submitted to accumulate stats from 512 documents (176164 virtual) 2025-04-19 00:09:48,549 : INFO : 9 batches submitted to accumulate stats from 576 documents (197020 virtual) 2025-04-19 00:09:48,554 : INFO : 10 batches submitted to accumulate stats from 640 documents (218505 virtual) 2025-04-19 00:09:48,563 : INFO : 11 batches submitted to accumulate stats from 704 documents (240803 virtual) 2025-04-19 00:09:48,567 : INFO : 12 batches submitted to accumulate stats from 768 documents (265360 virtual) 2025-04-19 00:09:48,572 : INFO : 13 batches submitted to accumulate stats from 832 documents (286615 virtual) 2025-04-19 00:09:48,578 : INFO : 14 batches submitted to accumulate stats from 896 documents (310833 virtual) 2025-04-19 00:09:48,650 : INFO : 15 batches submitted to accumulate stats from 960 documents (331313 virtual) 2025-04-19 00:09:48,660 : INFO : 16 batches submitted to accumulate stats from 1024 documents (350940 virtual) 2025-04-19 00:09:48,667 : INFO : 17 batches submitted to accumulate stats from 1088 documents (368371 virtual) 2025-04-19 00:09:48,688 : INFO : 18 batches submitted to accumulate stats from 1152 documents (390334 virtual) 2025-04-19 00:09:48,698 : INFO : 19 batches submitted to accumulate stats from 1216 documents (414153 virtual) 2025-04-19 00:09:48,747 : INFO : 20 batches submitted to accumulate stats from 1280 documents (435684 virtual) 2025-04-19 00:09:48,765 : INFO : 21 batches submitted to accumulate stats from 1344 documents (459433 virtual) 2025-04-19 00:09:48,834 : INFO : 22 batches submitted to accumulate stats from 1408 documents (483210 virtual) 2025-04-19 00:09:48,845 : INFO : 23 batches submitted to accumulate stats from 1472 documents (507391 virtual) 2025-04-19 00:09:48,860 : INFO : 24 batches submitted to accumulate stats from 1536 documents (527404 virtual) 2025-04-19 00:09:48,894 : INFO : 25 batches submitted to accumulate stats from 1600 documents (550178 virtual) 2025-04-19 00:09:48,902 : INFO : 26 batches submitted to accumulate stats from 1664 documents (575041 virtual) 2025-04-19 00:09:48,926 : INFO : 27 batches submitted to accumulate stats from 1728 documents (598912 virtual) 2025-04-19 00:09:48,942 : INFO : 28 batches submitted to accumulate stats from 1792 documents (622487 virtual) 2025-04-19 00:09:48,969 : INFO : 29 batches submitted to accumulate stats from 1856 documents (648902 virtual) 2025-04-19 00:09:48,975 : INFO : 30 batches submitted to accumulate stats from 1920 documents (671126 virtual) 2025-04-19 00:09:49,001 : INFO : 31 batches submitted to accumulate stats from 1984 documents (693717 virtual) 2025-04-19 00:09:49,049 : INFO : 32 batches submitted to accumulate stats from 2048 documents (714139 virtual) 2025-04-19 00:09:49,054 : INFO : 33 batches submitted to accumulate stats from 2112 documents (736202 virtual) 2025-04-19 00:09:49,082 : INFO : 34 batches submitted to accumulate stats from 2176 documents (758687 virtual) 2025-04-19 00:09:49,086 : INFO : 35 batches submitted to accumulate stats from 2240 documents (779677 virtual) 2025-04-19 00:09:49,100 : INFO : 36 batches submitted to accumulate stats from 2304 documents (800483 virtual) 2025-04-19 00:09:49,147 : INFO : 37 batches submitted to accumulate stats from 2368 documents (821258 virtual) 2025-04-19 00:09:49,158 : INFO : 38 batches submitted to accumulate stats from 2432 documents (844326 virtual) 2025-04-19 00:09:49,193 : INFO : 39 batches submitted to accumulate stats from 2496 documents (868823 virtual) 2025-04-19 00:09:49,259 : INFO : 40 batches submitted to accumulate stats from 2560 documents (888215 virtual) 2025-04-19 00:09:49,279 : INFO : 41 batches submitted to accumulate stats from 2624 documents (910499 virtual) 2025-04-19 00:09:49,289 : INFO : 42 batches submitted to accumulate stats from 2688 documents (931945 virtual) 2025-04-19 00:09:49,316 : INFO : 43 batches submitted to accumulate stats from 2752 documents (954111 virtual) 2025-04-19 00:09:49,346 : INFO : 44 batches submitted to accumulate stats from 2816 documents (975617 virtual) 2025-04-19 00:09:49,354 : INFO : 45 batches submitted to accumulate stats from 2880 documents (995125 virtual) 2025-04-19 00:09:49,377 : INFO : 46 batches submitted to accumulate stats from 2944 documents (1016531 virtual) 2025-04-19 00:09:49,410 : INFO : 47 batches submitted to accumulate stats from 3008 documents (1038247 virtual) 2025-04-19 00:09:49,424 : INFO : 48 batches submitted to accumulate stats from 3072 documents (1063862 virtual) 2025-04-19 00:09:49,449 : INFO : 49 batches submitted to accumulate stats from 3136 documents (1087898 virtual) 2025-04-19 00:09:49,475 : INFO : 50 batches submitted to accumulate stats from 3200 documents (1110531 virtual) 2025-04-19 00:09:49,499 : INFO : 51 batches submitted to accumulate stats from 3264 documents (1133127 virtual) 2025-04-19 00:09:49,515 : INFO : 52 batches submitted to accumulate stats from 3328 documents (1153766 virtual) 2025-04-19 00:09:49,536 : INFO : 53 batches submitted to accumulate stats from 3392 documents (1177684 virtual) 2025-04-19 00:09:49,561 : INFO : 54 batches submitted to accumulate stats from 3456 documents (1200190 virtual) 2025-04-19 00:09:49,568 : INFO : 55 batches submitted to accumulate stats from 3520 documents (1225029 virtual) 2025-04-19 00:09:49,592 : INFO : 56 batches submitted to accumulate stats from 3584 documents (1249662 virtual) 2025-04-19 00:09:49,611 : INFO : 57 batches submitted to accumulate stats from 3648 documents (1274547 virtual) 2025-04-19 00:09:49,661 : INFO : 58 batches submitted to accumulate stats from 3712 documents (1297434 virtual) 2025-04-19 00:09:49,666 : INFO : 59 batches submitted to accumulate stats from 3776 documents (1319261 virtual) 2025-04-19 00:09:49,693 : INFO : 60 batches submitted to accumulate stats from 3840 documents (1341972 virtual) 2025-04-19 00:09:49,752 : INFO : 61 batches submitted to accumulate stats from 3904 documents (1364269 virtual) 2025-04-19 00:09:49,756 : INFO : 62 batches submitted to accumulate stats from 3968 documents (1386796 virtual) 2025-04-19 00:09:49,793 : INFO : 63 batches submitted to accumulate stats from 4032 documents (1410249 virtual) 2025-04-19 00:09:49,802 : INFO : 64 batches submitted to accumulate stats from 4096 documents (1433115 virtual) 2025-04-19 00:09:49,857 : INFO : 65 batches submitted to accumulate stats from 4160 documents (1453873 virtual) 2025-04-19 00:09:49,877 : INFO : 66 batches submitted to accumulate stats from 4224 documents (1475474 virtual) 2025-04-19 00:09:49,881 : INFO : 67 batches submitted to accumulate stats from 4288 documents (1497524 virtual) 2025-04-19 00:09:49,885 : INFO : 68 batches submitted to accumulate stats from 4352 documents (1516835 virtual) 2025-04-19 00:09:49,919 : INFO : 69 batches submitted to accumulate stats from 4416 documents (1536986 virtual) 2025-04-19 00:09:49,937 : INFO : 70 batches submitted to accumulate stats from 4480 documents (1558454 virtual) 2025-04-19 00:09:49,975 : INFO : 71 batches submitted to accumulate stats from 4544 documents (1580610 virtual) 2025-04-19 00:09:50,010 : INFO : 72 batches submitted to accumulate stats from 4608 documents (1603508 virtual) 2025-04-19 00:09:50,032 : INFO : 73 batches submitted to accumulate stats from 4672 documents (1624378 virtual) 2025-04-19 00:09:50,036 : INFO : 74 batches submitted to accumulate stats from 4736 documents (1646402 virtual) 2025-04-19 00:09:50,053 : INFO : 75 batches submitted to accumulate stats from 4800 documents (1668704 virtual) 2025-04-19 00:09:50,081 : INFO : 76 batches submitted to accumulate stats from 4864 documents (1690394 virtual) 2025-04-19 00:09:50,086 : INFO : 77 batches submitted to accumulate stats from 4928 documents (1713028 virtual) 2025-04-19 00:09:50,184 : INFO : 78 batches submitted to accumulate stats from 4992 documents (1735434 virtual) 2025-04-19 00:09:50,191 : INFO : 79 batches submitted to accumulate stats from 5056 documents (1755430 virtual) 2025-04-19 00:09:50,202 : INFO : 80 batches submitted to accumulate stats from 5120 documents (1779164 virtual) 2025-04-19 00:09:50,229 : INFO : 81 batches submitted to accumulate stats from 5184 documents (1799023 virtual) 2025-04-19 00:09:50,234 : INFO : 82 batches submitted to accumulate stats from 5248 documents (1821516 virtual) 2025-04-19 00:09:50,257 : INFO : 83 batches submitted to accumulate stats from 5312 documents (1844224 virtual) 2025-04-19 00:09:50,261 : INFO : 84 batches submitted to accumulate stats from 5376 documents (1864739 virtual) 2025-04-19 00:09:50,346 : INFO : 85 batches submitted to accumulate stats from 5440 documents (1885053 virtual) 2025-04-19 00:09:50,352 : INFO : 86 batches submitted to accumulate stats from 5504 documents (1902170 virtual) 2025-04-19 00:09:50,363 : INFO : 87 batches submitted to accumulate stats from 5568 documents (1924910 virtual) 2025-04-19 00:09:50,395 : INFO : 88 batches submitted to accumulate stats from 5632 documents (1931530 virtual) 2025-04-19 00:09:50,408 : INFO : 89 batches submitted to accumulate stats from 5696 documents (1941414 virtual) 2025-04-19 00:09:50,413 : INFO : 90 batches submitted to accumulate stats from 5760 documents (1950642 virtual) 2025-04-19 00:09:50,417 : INFO : 91 batches submitted to accumulate stats from 5824 documents (1957200 virtual) 2025-04-19 00:09:50,480 : INFO : 92 batches submitted to accumulate stats from 5888 documents (1964937 virtual) 2025-04-19 00:09:50,486 : INFO : 93 batches submitted to accumulate stats from 5952 documents (1974259 virtual) 2025-04-19 00:09:50,516 : INFO : 94 batches submitted to accumulate stats from 6016 documents (1988296 virtual) 2025-04-19 00:09:50,559 : INFO : 95 batches submitted to accumulate stats from 6080 documents (1997659 virtual) 2025-04-19 00:09:50,590 : INFO : 96 batches submitted to accumulate stats from 6144 documents (2009678 virtual) 2025-04-19 00:09:50,596 : INFO : 97 batches submitted to accumulate stats from 6208 documents (2019297 virtual) 2025-04-19 00:09:50,599 : INFO : 98 batches submitted to accumulate stats from 6272 documents (2031857 virtual) 2025-04-19 00:09:50,622 : INFO : 99 batches submitted to accumulate stats from 6336 documents (2044117 virtual) 2025-04-19 00:09:50,636 : INFO : 100 batches submitted to accumulate stats from 6400 documents (2053380 virtual) 2025-04-19 00:09:50,640 : INFO : 101 batches submitted to accumulate stats from 6464 documents (2066889 virtual) 2025-04-19 00:09:50,648 : INFO : 102 batches submitted to accumulate stats from 6528 documents (2075479 virtual) 2025-04-19 00:09:50,650 : INFO : 103 batches submitted to accumulate stats from 6592 documents (2085095 virtual) 2025-04-19 00:09:50,652 : INFO : 104 batches submitted to accumulate stats from 6656 documents (2093845 virtual) 2025-04-19 00:09:50,670 : INFO : 105 batches submitted to accumulate stats from 6720 documents (2102407 virtual) 2025-04-19 00:09:50,688 : INFO : 106 batches submitted to accumulate stats from 6784 documents (2111466 virtual) 2025-04-19 00:09:50,693 : INFO : 107 batches submitted to accumulate stats from 6848 documents (2121845 virtual) 2025-04-19 00:09:50,711 : INFO : 108 batches submitted to accumulate stats from 6912 documents (2129219 virtual) 2025-04-19 00:09:50,713 : INFO : 109 batches submitted to accumulate stats from 6976 documents (2137886 virtual) 2025-04-19 00:09:50,720 : INFO : 110 batches submitted to accumulate stats from 7040 documents (2145150 virtual) 2025-04-19 00:09:50,737 : INFO : 111 batches submitted to accumulate stats from 7104 documents (2155495 virtual) 2025-04-19 00:09:50,745 : INFO : 112 batches submitted to accumulate stats from 7168 documents (2164720 virtual) 2025-04-19 00:09:50,764 : INFO : 113 batches submitted to accumulate stats from 7232 documents (2172193 virtual) 2025-04-19 00:09:50,804 : INFO : 114 batches submitted to accumulate stats from 7296 documents (2183458 virtual) 2025-04-19 00:09:50,820 : INFO : 115 batches submitted to accumulate stats from 7360 documents (2191706 virtual) 2025-04-19 00:09:50,823 : INFO : 116 batches submitted to accumulate stats from 7424 documents (2202020 virtual) 2025-04-19 00:09:50,832 : INFO : 117 batches submitted to accumulate stats from 7488 documents (2211055 virtual) 2025-04-19 00:09:50,844 : INFO : 118 batches submitted to accumulate stats from 7552 documents (2223321 virtual) 2025-04-19 00:09:50,846 : INFO : 119 batches submitted to accumulate stats from 7616 documents (2230121 virtual) 2025-04-19 00:09:50,848 : INFO : 120 batches submitted to accumulate stats from 7680 documents (2243511 virtual) 2025-04-19 00:09:50,870 : INFO : 121 batches submitted to accumulate stats from 7744 documents (2258370 virtual) 2025-04-19 00:09:50,883 : INFO : 122 batches submitted to accumulate stats from 7808 documents (2269267 virtual) 2025-04-19 00:09:50,886 : INFO : 123 batches submitted to accumulate stats from 7872 documents (2280490 virtual) 2025-04-19 00:09:50,888 : INFO : 124 batches submitted to accumulate stats from 7936 documents (2289945 virtual) 2025-04-19 00:09:50,909 : INFO : 125 batches submitted to accumulate stats from 8000 documents (2298931 virtual) 2025-04-19 00:09:50,912 : INFO : 126 batches submitted to accumulate stats from 8064 documents (2309719 virtual) 2025-04-19 00:09:50,913 : INFO : 127 batches submitted to accumulate stats from 8128 documents (2320328 virtual) 2025-04-19 00:09:50,970 : INFO : 128 batches submitted to accumulate stats from 8192 documents (2331614 virtual) 2025-04-19 00:09:50,976 : INFO : 129 batches submitted to accumulate stats from 8256 documents (2342568 virtual) 2025-04-19 00:09:50,979 : INFO : 130 batches submitted to accumulate stats from 8320 documents (2351306 virtual) 2025-04-19 00:09:50,984 : INFO : 131 batches submitted to accumulate stats from 8384 documents (2359488 virtual) 2025-04-19 00:09:50,992 : INFO : 132 batches submitted to accumulate stats from 8448 documents (2368497 virtual) 2025-04-19 00:09:51,012 : INFO : 133 batches submitted to accumulate stats from 8512 documents (2378449 virtual) 2025-04-19 00:09:51,035 : INFO : 134 batches submitted to accumulate stats from 8576 documents (2388057 virtual) 2025-04-19 00:09:51,042 : INFO : 135 batches submitted to accumulate stats from 8640 documents (2395926 virtual) 2025-04-19 00:09:51,043 : INFO : 136 batches submitted to accumulate stats from 8704 documents (2403405 virtual) 2025-04-19 00:09:51,045 : INFO : 137 batches submitted to accumulate stats from 8768 documents (2411628 virtual) 2025-04-19 00:09:51,051 : INFO : 138 batches submitted to accumulate stats from 8832 documents (2419219 virtual) 2025-04-19 00:09:51,070 : INFO : 139 batches submitted to accumulate stats from 8896 documents (2428220 virtual) 2025-04-19 00:09:51,076 : INFO : 140 batches submitted to accumulate stats from 8960 documents (2436470 virtual) 2025-04-19 00:09:51,085 : INFO : 141 batches submitted to accumulate stats from 9024 documents (2446006 virtual) 2025-04-19 00:09:51,094 : INFO : 142 batches submitted to accumulate stats from 9088 documents (2453039 virtual) 2025-04-19 00:09:51,106 : INFO : 143 batches submitted to accumulate stats from 9152 documents (2460905 virtual) 2025-04-19 00:09:51,112 : INFO : 144 batches submitted to accumulate stats from 9216 documents (2468645 virtual) 2025-04-19 00:09:51,115 : INFO : 145 batches submitted to accumulate stats from 9280 documents (2476321 virtual) 2025-04-19 00:09:51,118 : INFO : 146 batches submitted to accumulate stats from 9344 documents (2481981 virtual) 2025-04-19 00:09:51,146 : INFO : 147 batches submitted to accumulate stats from 9408 documents (2489833 virtual) 2025-04-19 00:09:51,150 : INFO : 148 batches submitted to accumulate stats from 9472 documents (2496627 virtual) 2025-04-19 00:09:51,152 : INFO : 149 batches submitted to accumulate stats from 9536 documents (2502106 virtual) 2025-04-19 00:09:51,186 : INFO : 150 batches submitted to accumulate stats from 9600 documents (2508434 virtual) 2025-04-19 00:09:51,191 : INFO : 151 batches submitted to accumulate stats from 9664 documents (2517654 virtual) 2025-04-19 00:09:51,196 : INFO : 152 batches submitted to accumulate stats from 9728 documents (2525651 virtual) 2025-04-19 00:09:51,201 : INFO : 153 batches submitted to accumulate stats from 9792 documents (2534661 virtual) 2025-04-19 00:09:51,228 : INFO : 154 batches submitted to accumulate stats from 9856 documents (2542846 virtual) 2025-04-19 00:09:51,230 : INFO : 155 batches submitted to accumulate stats from 9920 documents (2549206 virtual) 2025-04-19 00:09:51,237 : INFO : 156 batches submitted to accumulate stats from 9984 documents (2556742 virtual) 2025-04-19 00:09:51,241 : INFO : 157 batches submitted to accumulate stats from 10048 documents (2565026 virtual) 2025-04-19 00:09:51,245 : INFO : 158 batches submitted to accumulate stats from 10112 documents (2571434 virtual) 2025-04-19 00:09:51,249 : INFO : 159 batches submitted to accumulate stats from 10176 documents (2581280 virtual) 2025-04-19 00:09:51,260 : INFO : 160 batches submitted to accumulate stats from 10240 documents (2589671 virtual) 2025-04-19 00:09:51,268 : INFO : 161 batches submitted to accumulate stats from 10304 documents (2596979 virtual) 2025-04-19 00:09:51,276 : INFO : 162 batches submitted to accumulate stats from 10368 documents (2604556 virtual) 2025-04-19 00:09:51,279 : INFO : 163 batches submitted to accumulate stats from 10432 documents (2613656 virtual) 2025-04-19 00:09:51,284 : INFO : 164 batches submitted to accumulate stats from 10496 documents (2623890 virtual) 2025-04-19 00:09:51,293 : INFO : 165 batches submitted to accumulate stats from 10560 documents (2629308 virtual) 2025-04-19 00:09:51,303 : INFO : 166 batches submitted to accumulate stats from 10624 documents (2636085 virtual) 2025-04-19 00:09:51,305 : INFO : 167 batches submitted to accumulate stats from 10688 documents (2642039 virtual) 2025-04-19 00:09:51,322 : INFO : 168 batches submitted to accumulate stats from 10752 documents (2648389 virtual) 2025-04-19 00:09:51,327 : INFO : 169 batches submitted to accumulate stats from 10816 documents (2661959 virtual) 2025-04-19 00:09:51,333 : INFO : 170 batches submitted to accumulate stats from 10880 documents (2672949 virtual) 2025-04-19 00:09:51,335 : INFO : 171 batches submitted to accumulate stats from 10944 documents (2683365 virtual) 2025-04-19 00:09:51,337 : INFO : 172 batches submitted to accumulate stats from 11008 documents (2690484 virtual) 2025-04-19 00:09:51,356 : INFO : 173 batches submitted to accumulate stats from 11072 documents (2700627 virtual) 2025-04-19 00:09:51,361 : INFO : 174 batches submitted to accumulate stats from 11136 documents (2708742 virtual) 2025-04-19 00:09:51,364 : INFO : 175 batches submitted to accumulate stats from 11200 documents (2718156 virtual) 2025-04-19 00:09:51,403 : INFO : 176 batches submitted to accumulate stats from 11264 documents (2727801 virtual) 2025-04-19 00:09:51,404 : INFO : 177 batches submitted to accumulate stats from 11328 documents (2736288 virtual) 2025-04-19 00:09:51,424 : INFO : 178 batches submitted to accumulate stats from 11392 documents (2743845 virtual) 2025-04-19 00:09:51,427 : INFO : 179 batches submitted to accumulate stats from 11456 documents (2750885 virtual) 2025-04-19 00:09:51,429 : INFO : 180 batches submitted to accumulate stats from 11520 documents (2759213 virtual) 2025-04-19 00:09:51,431 : INFO : 181 batches submitted to accumulate stats from 11584 documents (2770309 virtual) 2025-04-19 00:09:51,438 : INFO : 182 batches submitted to accumulate stats from 11648 documents (2781566 virtual) 2025-04-19 00:09:51,467 : INFO : 183 batches submitted to accumulate stats from 11712 documents (2793513 virtual) 2025-04-19 00:09:51,470 : INFO : 184 batches submitted to accumulate stats from 11776 documents (2805133 virtual) 2025-04-19 00:09:51,482 : INFO : 185 batches submitted to accumulate stats from 11840 documents (2814621 virtual) 2025-04-19 00:09:51,486 : INFO : 186 batches submitted to accumulate stats from 11904 documents (2825917 virtual) 2025-04-19 00:09:51,493 : INFO : 187 batches submitted to accumulate stats from 11968 documents (2834764 virtual) 2025-04-19 00:09:51,495 : INFO : 188 batches submitted to accumulate stats from 12032 documents (2844523 virtual) 2025-04-19 00:09:51,499 : INFO : 189 batches submitted to accumulate stats from 12096 documents (2854512 virtual) 2025-04-19 00:09:51,512 : INFO : 190 batches submitted to accumulate stats from 12160 documents (2863511 virtual) 2025-04-19 00:09:51,528 : INFO : 191 batches submitted to accumulate stats from 12224 documents (2872492 virtual) 2025-04-19 00:09:51,531 : INFO : 192 batches submitted to accumulate stats from 12288 documents (2881543 virtual) 2025-04-19 00:09:51,536 : INFO : 193 batches submitted to accumulate stats from 12352 documents (2891233 virtual) 2025-04-19 00:09:51,543 : INFO : 194 batches submitted to accumulate stats from 12416 documents (2899835 virtual) 2025-04-19 00:09:51,560 : INFO : 195 batches submitted to accumulate stats from 12480 documents (2908542 virtual) 2025-04-19 00:09:51,564 : INFO : 196 batches submitted to accumulate stats from 12544 documents (2920162 virtual) 2025-04-19 00:09:51,615 : INFO : 197 batches submitted to accumulate stats from 12608 documents (2931072 virtual) 2025-04-19 00:09:51,625 : INFO : 198 batches submitted to accumulate stats from 12672 documents (2942168 virtual) 2025-04-19 00:09:51,627 : INFO : 199 batches submitted to accumulate stats from 12736 documents (2951378 virtual) 2025-04-19 00:09:51,630 : INFO : 200 batches submitted to accumulate stats from 12800 documents (2964980 virtual) 2025-04-19 00:09:51,635 : INFO : 201 batches submitted to accumulate stats from 12864 documents (2974742 virtual) 2025-04-19 00:09:51,643 : INFO : 202 batches submitted to accumulate stats from 12928 documents (2984778 virtual) 2025-04-19 00:09:51,658 : INFO : 203 batches submitted to accumulate stats from 12992 documents (2994073 virtual) 2025-04-19 00:09:51,664 : INFO : 204 batches submitted to accumulate stats from 13056 documents (3002522 virtual) 2025-04-19 00:09:51,667 : INFO : 205 batches submitted to accumulate stats from 13120 documents (3012040 virtual) 2025-04-19 00:09:51,679 : INFO : 206 batches submitted to accumulate stats from 13184 documents (3019919 virtual) 2025-04-19 00:09:51,687 : INFO : 207 batches submitted to accumulate stats from 13248 documents (3029004 virtual) 2025-04-19 00:09:51,689 : INFO : 208 batches submitted to accumulate stats from 13312 documents (3037489 virtual) 2025-04-19 00:09:51,693 : INFO : 209 batches submitted to accumulate stats from 13376 documents (3044929 virtual) 2025-04-19 00:09:51,721 : INFO : 210 batches submitted to accumulate stats from 13440 documents (3054034 virtual) 2025-04-19 00:09:51,732 : INFO : 211 batches submitted to accumulate stats from 13504 documents (3064099 virtual) 2025-04-19 00:09:51,735 : INFO : 212 batches submitted to accumulate stats from 13568 documents (3074522 virtual) 2025-04-19 00:09:51,736 : INFO : 213 batches submitted to accumulate stats from 13632 documents (3083808 virtual) 2025-04-19 00:09:51,742 : INFO : 214 batches submitted to accumulate stats from 13696 documents (3093078 virtual) 2025-04-19 00:09:51,758 : INFO : 215 batches submitted to accumulate stats from 13760 documents (3102171 virtual) 2025-04-19 00:09:51,763 : INFO : 216 batches submitted to accumulate stats from 13824 documents (3111128 virtual) 2025-04-19 00:09:51,775 : INFO : 217 batches submitted to accumulate stats from 13888 documents (3120517 virtual) 2025-04-19 00:09:51,777 : INFO : 218 batches submitted to accumulate stats from 13952 documents (3130614 virtual) 2025-04-19 00:09:51,784 : INFO : 219 batches submitted to accumulate stats from 14016 documents (3139268 virtual) 2025-04-19 00:09:51,787 : INFO : 220 batches submitted to accumulate stats from 14080 documents (3148635 virtual) 2025-04-19 00:09:51,799 : INFO : 221 batches submitted to accumulate stats from 14144 documents (3157335 virtual) 2025-04-19 00:09:51,810 : INFO : 222 batches submitted to accumulate stats from 14208 documents (3165838 virtual) 2025-04-19 00:09:51,817 : INFO : 223 batches submitted to accumulate stats from 14272 documents (3175765 virtual) 2025-04-19 00:09:51,828 : INFO : 224 batches submitted to accumulate stats from 14336 documents (3183123 virtual) 2025-04-19 00:09:51,831 : INFO : 225 batches submitted to accumulate stats from 14400 documents (3189537 virtual) 2025-04-19 00:09:51,833 : INFO : 226 batches submitted to accumulate stats from 14464 documents (3197239 virtual) 2025-04-19 00:09:51,844 : INFO : 227 batches submitted to accumulate stats from 14528 documents (3205518 virtual) 2025-04-19 00:09:51,895 : INFO : 228 batches submitted to accumulate stats from 14592 documents (3215608 virtual) 2025-04-19 00:09:51,897 : INFO : 229 batches submitted to accumulate stats from 14656 documents (3223376 virtual) 2025-04-19 00:09:51,908 : INFO : 230 batches submitted to accumulate stats from 14720 documents (3232304 virtual) 2025-04-19 00:09:51,913 : INFO : 231 batches submitted to accumulate stats from 14784 documents (3240270 virtual) 2025-04-19 00:09:51,916 : INFO : 232 batches submitted to accumulate stats from 14848 documents (3249755 virtual) 2025-04-19 00:09:51,917 : INFO : 233 batches submitted to accumulate stats from 14912 documents (3259377 virtual) 2025-04-19 00:09:51,942 : INFO : 234 batches submitted to accumulate stats from 14976 documents (3269637 virtual) 2025-04-19 00:09:51,950 : INFO : 235 batches submitted to accumulate stats from 15040 documents (3278311 virtual) 2025-04-19 00:09:51,953 : INFO : 236 batches submitted to accumulate stats from 15104 documents (3286321 virtual) 2025-04-19 00:09:51,956 : INFO : 237 batches submitted to accumulate stats from 15168 documents (3293385 virtual) 2025-04-19 00:09:51,960 : INFO : 238 batches submitted to accumulate stats from 15232 documents (3300334 virtual) 2025-04-19 00:09:51,963 : INFO : 239 batches submitted to accumulate stats from 15296 documents (3308226 virtual) 2025-04-19 00:09:51,966 : INFO : 240 batches submitted to accumulate stats from 15360 documents (3317325 virtual) 2025-04-19 00:09:51,993 : INFO : 241 batches submitted to accumulate stats from 15424 documents (3325778 virtual) 2025-04-19 00:09:51,998 : INFO : 242 batches submitted to accumulate stats from 15488 documents (3335373 virtual) 2025-04-19 00:09:52,008 : INFO : 243 batches submitted to accumulate stats from 15552 documents (3342716 virtual) 2025-04-19 00:09:52,013 : INFO : 244 batches submitted to accumulate stats from 15616 documents (3350508 virtual) 2025-04-19 00:09:52,016 : INFO : 245 batches submitted to accumulate stats from 15680 documents (3360131 virtual) 2025-04-19 00:09:52,021 : INFO : 246 batches submitted to accumulate stats from 15744 documents (3370635 virtual) 2025-04-19 00:09:52,029 : INFO : 247 batches submitted to accumulate stats from 15808 documents (3380994 virtual) 2025-04-19 00:09:52,075 : INFO : 248 batches submitted to accumulate stats from 15872 documents (3389920 virtual) 2025-04-19 00:09:52,077 : INFO : 249 batches submitted to accumulate stats from 15936 documents (3397487 virtual) 2025-04-19 00:09:52,079 : INFO : 250 batches submitted to accumulate stats from 16000 documents (3406129 virtual) 2025-04-19 00:09:52,084 : INFO : 251 batches submitted to accumulate stats from 16064 documents (3416805 virtual) 2025-04-19 00:09:52,092 : INFO : 252 batches submitted to accumulate stats from 16128 documents (3426189 virtual) 2025-04-19 00:09:52,108 : INFO : 253 batches submitted to accumulate stats from 16192 documents (3433824 virtual) 2025-04-19 00:09:52,110 : INFO : 254 batches submitted to accumulate stats from 16256 documents (3443379 virtual) 2025-04-19 00:09:52,119 : INFO : 255 batches submitted to accumulate stats from 16320 documents (3450914 virtual) 2025-04-19 00:09:52,286 : INFO : 7 accumulators retrieved from output queue 2025-04-19 00:09:52,296 : INFO : accumulated word occurrence stats for 3451622 virtual documents 2025-04-19 00:09:52,383 : INFO : using symmetric alpha at 0.14285714285714285 2025-04-19 00:09:52,384 : INFO : using symmetric eta at 0.14285714285714285 2025-04-19 00:09:52,385 : INFO : using serial LDA version on this node 2025-04-19 00:09:52,392 : INFO : running online (multi-pass) LDA training, 7 topics, 5 passes over the supplied corpus of 16310 documents, updating model once every 2000 documents, evaluating perplexity every 16310 documents, iterating 50x with a convergence threshold of 0.001000 2025-04-19 00:09:52,393 : INFO : PROGRESS: pass 0, at document #2000/16310 2025-04-19 00:09:53,102 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:53,105 : INFO : topic #0 (0.143): 0.030*"工作" + 0.014*"方式" + 0.013*"應徵" + 0.013*"推定" + 0.012*"空白" + 0.012*"砍除" + 0.012*"單位" + 0.010*"內容" + 0.010*"資訊" + 0.010*"第一項" 2025-04-19 00:09:53,106 : INFO : topic #5 (0.143): 0.018*"工作" + 0.012*"方式" + 0.012*"空白" + 0.010*"聯絡" + 0.010*"應徵" + 0.009*"分類" + 0.009*"標題" + 0.008*"內容" + 0.008*"聯絡人" + 0.008*"小時" 2025-04-19 00:09:53,106 : INFO : topic #2 (0.143): 0.041*"工作" + 0.013*"內容" + 0.013*"推定" + 0.012*"工資" + 0.012*"方式" + 0.012*"應徵" + 0.011*"情形" + 0.010*"砍除" + 0.010*"聯絡" + 0.010*"小時" 2025-04-19 00:09:53,107 : INFO : topic #3 (0.143): 0.020*"工作" + 0.013*"方式" + 0.012*"砍除" + 0.010*"應徵" + 0.010*"推定" + 0.010*"聯絡人" + 0.010*"文字" + 0.009*"空白" + 0.009*"情形" + 0.009*"資訊" 2025-04-19 00:09:53,107 : INFO : topic #4 (0.143): 0.039*"工作" + 0.017*"推定" + 0.014*"空白" + 0.012*"方式" + 0.011*"聯絡" + 0.010*"第一項" + 0.010*"單位" + 0.010*"聯絡人" + 0.009*"情形" + 0.009*"國定假日" 2025-04-19 00:09:53,108 : INFO : topic diff=7.406156, rho=1.000000 2025-04-19 00:09:53,109 : INFO : PROGRESS: pass 0, at document #4000/16310 2025-04-19 00:09:54,152 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:54,157 : INFO : topic #3 (0.143): 0.019*"工作" + 0.013*"方式" + 0.011*"砍除" + 0.009*"聯絡人" + 0.009*"應徵" + 0.009*"文字" + 0.008*"資訊" + 0.008*"時間" + 0.008*"推定" + 0.008*"內容" 2025-04-19 00:09:54,158 : INFO : topic #2 (0.143): 0.042*"工作" + 0.015*"方式" + 0.014*"推定" + 0.013*"工資" + 0.013*"內容" + 0.012*"小時" + 0.011*"單位" + 0.011*"應徵" + 0.011*"未註明" + 0.010*"情形" 2025-04-19 00:09:54,160 : INFO : topic #4 (0.143): 0.038*"工作" + 0.017*"推定" + 0.015*"空白" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"方式" + 0.011*"情形" + 0.010*"聯絡人" + 0.010*"國定假日" + 0.010*"砍除" 2025-04-19 00:09:54,162 : INFO : topic #6 (0.143): 0.025*"報名" + 0.022*"活動" + 0.020*"電話" + 0.014*"台北市" + 0.013*"人數" + 0.013*"車馬費" + 0.011*"資料" + 0.011*"聯絡" + 0.011*"時間" + 0.011*"訪問" 2025-04-19 00:09:54,164 : INFO : topic #0 (0.143): 0.029*"工作" + 0.014*"應徵" + 0.013*"砍除" + 0.013*"空白" + 0.013*"方式" + 0.011*"推定" + 0.011*"單位" + 0.011*"第一項" + 0.011*"資訊" + 0.010*"內容" 2025-04-19 00:09:54,165 : INFO : topic diff=0.693025, rho=0.707107 2025-04-19 00:09:54,167 : INFO : PROGRESS: pass 0, at document #6000/16310 2025-04-19 00:09:54,855 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:54,858 : INFO : topic #0 (0.143): 0.030*"工作" + 0.013*"砍除" + 0.013*"應徵" + 0.012*"空白" + 0.012*"方式" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"推定" + 0.010*"單位" + 0.010*"內容" 2025-04-19 00:09:54,859 : INFO : topic #6 (0.143): 0.026*"報名" + 0.023*"活動" + 0.019*"電話" + 0.015*"台北市" + 0.013*"車馬費" + 0.012*"資料" + 0.012*"人數" + 0.011*"訪問" + 0.011*"舉辦" + 0.011*"時間" 2025-04-19 00:09:54,859 : INFO : topic #3 (0.143): 0.015*"工作" + 0.012*"方式" + 0.009*"聯絡人" + 0.009*"時間" + 0.008*"資訊" + 0.008*"砍除" + 0.008*"公司" + 0.008*"文字" + 0.008*"內容" + 0.008*"分類" 2025-04-19 00:09:54,860 : INFO : topic #5 (0.143): 0.021*"工作" + 0.016*"方式" + 0.009*"時間" + 0.008*"依法" + 0.007*"通知" + 0.007*"面試" + 0.007*"應徵" + 0.007*"聯絡" + 0.007*"內容" + 0.006*"每日" 2025-04-19 00:09:54,861 : INFO : topic #4 (0.143): 0.037*"工作" + 0.017*"推定" + 0.015*"空白" + 0.012*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"方式" + 0.011*"砍除" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 00:09:54,861 : INFO : topic diff=0.650259, rho=0.577350 2025-04-19 00:09:54,862 : INFO : PROGRESS: pass 0, at document #8000/16310 2025-04-19 00:09:55,211 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:55,215 : INFO : topic #5 (0.143): 0.018*"工作" + 0.018*"公司" + 0.011*"面試" + 0.008*"工程師" + 0.008*"問題" + 0.008*"時間" + 0.008*"經驗" + 0.007*"開發" + 0.006*"團隊" + 0.006*"技術" 2025-04-19 00:09:55,215 : INFO : topic #1 (0.143): 0.029*"工作" + 0.015*"方式" + 0.013*"砍除" + 0.012*"情形" + 0.012*"推定" + 0.012*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"連結" 2025-04-19 00:09:55,216 : INFO : topic #4 (0.143): 0.037*"工作" + 0.017*"推定" + 0.015*"空白" + 0.012*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"方式" + 0.011*"砍除" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 00:09:55,216 : INFO : topic #3 (0.143): 0.014*"工作" + 0.011*"方式" + 0.009*"公司" + 0.008*"時間" + 0.008*"聯絡人" + 0.008*"資訊" + 0.007*"砍除" + 0.007*"內容" + 0.007*"文字" + 0.007*"分類" 2025-04-19 00:09:55,217 : INFO : topic #0 (0.143): 0.030*"工作" + 0.013*"砍除" + 0.013*"應徵" + 0.012*"空白" + 0.012*"方式" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"推定" + 0.010*"單位" + 0.010*"內容" 2025-04-19 00:09:55,217 : INFO : topic diff=0.846903, rho=0.500000 2025-04-19 00:09:55,218 : INFO : PROGRESS: pass 0, at document #10000/16310 2025-04-19 00:09:55,528 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:55,531 : INFO : topic #5 (0.143): 0.017*"公司" + 0.016*"工作" + 0.010*"面試" + 0.008*"問題" + 0.007*"工程師" + 0.007*"時間" + 0.007*"經驗" + 0.007*"開發" + 0.005*"技術" + 0.005*"目前" 2025-04-19 00:09:55,532 : INFO : topic #0 (0.143): 0.030*"工作" + 0.013*"砍除" + 0.013*"應徵" + 0.012*"空白" + 0.012*"方式" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"推定" + 0.010*"單位" + 0.010*"內容" 2025-04-19 00:09:55,533 : INFO : topic #2 (0.143): 0.038*"工作" + 0.012*"方式" + 0.010*"內容" + 0.010*"小時" + 0.009*"推定" + 0.008*"時間" + 0.008*"單位" + 0.008*"覺得" + 0.008*"工資" + 0.007*"公司" 2025-04-19 00:09:55,533 : INFO : topic #3 (0.143): 0.013*"工作" + 0.010*"方式" + 0.009*"職場" + 0.008*"公司" + 0.008*"時間" + 0.008*"聯絡人" + 0.008*"資訊" + 0.007*"內容" + 0.007*"砍除" + 0.007*"文字" 2025-04-19 00:09:55,534 : INFO : topic #6 (0.143): 0.018*"報名" + 0.017*"活動" + 0.013*"產品" + 0.012*"資料" + 0.011*"電話" + 0.011*"使用" + 0.010*"進行" + 0.010*"台北市" + 0.009*"時間" + 0.009*"研究" 2025-04-19 00:09:55,534 : INFO : topic diff=0.569099, rho=0.447214 2025-04-19 00:09:55,535 : INFO : PROGRESS: pass 0, at document #12000/16310 2025-04-19 00:09:55,790 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:55,794 : INFO : topic #1 (0.143): 0.029*"工作" + 0.015*"方式" + 0.013*"砍除" + 0.012*"情形" + 0.012*"推定" + 0.012*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"連結" 2025-04-19 00:09:55,795 : INFO : topic #2 (0.143): 0.035*"工作" + 0.010*"方式" + 0.009*"內容" + 0.009*"小時" + 0.008*"時間" + 0.008*"單位" + 0.008*"覺得" + 0.007*"推定" + 0.007*"程式" + 0.007*"公司" 2025-04-19 00:09:55,795 : INFO : topic #5 (0.143): 0.016*"公司" + 0.013*"工作" + 0.008*"面試" + 0.007*"問題" + 0.006*"工程師" + 0.005*"時間" + 0.005*"開發" + 0.005*"技術" + 0.005*"經驗" + 0.005*"目前" 2025-04-19 00:09:55,796 : INFO : topic #4 (0.143): 0.037*"工作" + 0.016*"推定" + 0.015*"空白" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"方式" + 0.011*"砍除" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 00:09:55,796 : INFO : topic #0 (0.143): 0.029*"工作" + 0.013*"砍除" + 0.013*"應徵" + 0.012*"空白" + 0.012*"方式" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"推定" + 0.010*"單位" + 0.010*"內容" 2025-04-19 00:09:55,797 : INFO : topic diff=0.546661, rho=0.408248 2025-04-19 00:09:55,798 : INFO : PROGRESS: pass 0, at document #14000/16310 2025-04-19 00:09:56,092 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:56,095 : INFO : topic #2 (0.143): 0.032*"工作" + 0.009*"方式" + 0.008*"內容" + 0.008*"小時" + 0.007*"單位" + 0.007*"時間" + 0.007*"覺得" + 0.007*"公司" + 0.006*"程式" + 0.006*"推定" 2025-04-19 00:09:56,096 : INFO : topic #1 (0.143): 0.029*"工作" + 0.014*"方式" + 0.012*"砍除" + 0.012*"情形" + 0.012*"推定" + 0.011*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"連結" 2025-04-19 00:09:56,096 : INFO : topic #5 (0.143): 0.014*"公司" + 0.010*"工作" + 0.007*"台灣" + 0.005*"面試" + 0.005*"問題" + 0.005*"工程師" + 0.005*"技術" + 0.004*"時間" + 0.004*"員工" + 0.004*"目前" 2025-04-19 00:09:56,097 : INFO : topic #4 (0.143): 0.037*"工作" + 0.016*"推定" + 0.015*"空白" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"方式" + 0.011*"砍除" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 00:09:56,098 : INFO : topic #0 (0.143): 0.029*"工作" + 0.013*"砍除" + 0.013*"應徵" + 0.012*"空白" + 0.012*"方式" + 0.011*"第一項" + 0.010*"資訊" + 0.010*"推定" + 0.010*"單位" + 0.010*"內容" 2025-04-19 00:09:56,098 : INFO : topic diff=0.538009, rho=0.377964 2025-04-19 00:09:56,099 : INFO : PROGRESS: pass 0, at document #16000/16310 2025-04-19 00:09:56,339 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:56,343 : INFO : topic #4 (0.143): 0.036*"工作" + 0.016*"推定" + 0.014*"空白" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"方式" + 0.010*"砍除" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 00:09:56,343 : INFO : topic #1 (0.143): 0.028*"工作" + 0.014*"方式" + 0.012*"砍除" + 0.012*"情形" + 0.012*"推定" + 0.011*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.010*"文字" + 0.010*"連結" 2025-04-19 00:09:56,344 : INFO : topic #6 (0.143): 0.012*"三星" + 0.012*"今年" + 0.011*"研究" + 0.011*"模型" + 0.011*"蘋果" + 0.011*"活動" + 0.010*"進行" + 0.010*"產品" + 0.010*"萬元" + 0.009*"報名" 2025-04-19 00:09:56,345 : INFO : topic #0 (0.143): 0.028*"工作" + 0.012*"砍除" + 0.012*"應徵" + 0.012*"空白" + 0.011*"方式" + 0.010*"第一項" + 0.010*"資訊" + 0.010*"推定" + 0.010*"單位" + 0.010*"內容" 2025-04-19 00:09:56,345 : INFO : topic #5 (0.143): 0.013*"公司" + 0.008*"工作" + 0.007*"台灣" + 0.006*"美國" + 0.005*"技術" + 0.005*"晶片" + 0.004*"員工" + 0.004*"問題" + 0.004*"工程師" + 0.004*"面試" 2025-04-19 00:09:56,345 : INFO : topic diff=0.484237, rho=0.353553 2025-04-19 00:09:56,420 : INFO : -8.712 per-word bound, 419.3 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:09:56,420 : INFO : PROGRESS: pass 0, at document #16310/16310 2025-04-19 00:09:56,499 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:09:56,503 : INFO : topic #0 (0.143): 0.027*"工作" + 0.012*"應徵" + 0.012*"砍除" + 0.011*"空白" + 0.011*"方式" + 0.010*"第一項" + 0.010*"單位" + 0.010*"資訊" + 0.010*"推定" + 0.010*"內容" 2025-04-19 00:09:56,503 : INFO : topic #5 (0.143): 0.013*"公司" + 0.008*"美國" + 0.007*"台灣" + 0.007*"工作" + 0.006*"技術" + 0.005*"晶片" + 0.005*"員工" + 0.004*"科技" + 0.004*"台積電" + 0.004*"問題" 2025-04-19 00:09:56,504 : INFO : topic #1 (0.143): 0.028*"工作" + 0.014*"方式" + 0.012*"砍除" + 0.011*"情形" + 0.011*"推定" + 0.011*"第一項" + 0.011*"單位" + 0.011*"國定假日" + 0.011*"聯絡" + 0.010*"連結" 2025-04-19 00:09:56,504 : INFO : topic #2 (0.143): 0.027*"工作" + 0.009*"預期" + 0.008*"小時" + 0.007*"營收" + 0.007*"方式" + 0.007*"時間" + 0.007*"內容" + 0.007*"覺得" + 0.007*"公司" + 0.006*"單位" 2025-04-19 00:09:56,505 : INFO : topic #6 (0.143): 0.012*"三星" + 0.012*"今年" + 0.011*"研究" + 0.011*"蘋果" + 0.010*"進行" + 0.010*"萬元" + 0.010*"產品" + 0.009*"模型" + 0.009*"活動" + 0.009*"大陸" 2025-04-19 00:09:56,505 : INFO : topic diff=0.405972, rho=0.333333 2025-04-19 00:09:56,506 : INFO : PROGRESS: pass 1, at document #2000/16310 2025-04-19 00:09:57,209 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:57,212 : INFO : topic #1 (0.143): 0.030*"工作" + 0.016*"方式" + 0.012*"推定" + 0.012*"砍除" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"內容" + 0.011*"文字" + 0.010*"未註明" 2025-04-19 00:09:57,213 : INFO : topic #6 (0.143): 0.019*"報名" + 0.018*"活動" + 0.011*"電話" + 0.011*"進行" + 0.010*"研究" + 0.010*"資料" + 0.010*"台北市" + 0.010*"舉辦" + 0.009*"參與" + 0.008*"車馬費" 2025-04-19 00:09:57,213 : INFO : topic #4 (0.143): 0.038*"工作" + 0.017*"推定" + 0.014*"空白" + 0.012*"方式" + 0.011*"砍除" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"第一項" + 0.011*"內容" + 0.010*"應徵" 2025-04-19 00:09:57,214 : INFO : topic #0 (0.143): 0.031*"工作" + 0.013*"應徵" + 0.012*"方式" + 0.011*"砍除" + 0.010*"空白" + 0.010*"推定" + 0.010*"內容" + 0.010*"資訊" + 0.010*"第一項" + 0.010*"單位" 2025-04-19 00:09:57,214 : INFO : topic #5 (0.143): 0.013*"公司" + 0.008*"美國" + 0.007*"台灣" + 0.007*"工作" + 0.006*"技術" + 0.005*"晶片" + 0.005*"員工" + 0.004*"科技" + 0.004*"台積電" + 0.004*"問題" 2025-04-19 00:09:57,214 : INFO : topic diff=1.061438, rho=0.313805 2025-04-19 00:09:57,215 : INFO : PROGRESS: pass 1, at document #4000/16310 2025-04-19 00:09:57,897 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:57,901 : INFO : topic #1 (0.143): 0.030*"工作" + 0.016*"方式" + 0.012*"推定" + 0.012*"砍除" + 0.012*"情形" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"文字" + 0.011*"未註明" 2025-04-19 00:09:57,901 : INFO : topic #6 (0.143): 0.024*"報名" + 0.022*"活動" + 0.016*"電話" + 0.012*"台北市" + 0.011*"資料" + 0.011*"舉辦" + 0.011*"進行" + 0.011*"車馬費" + 0.010*"人數" + 0.010*"參與" 2025-04-19 00:09:57,902 : INFO : topic #0 (0.143): 0.032*"工作" + 0.012*"方式" + 0.012*"應徵" + 0.010*"砍除" + 0.010*"推定" + 0.010*"空白" + 0.010*"內容" + 0.010*"情形" + 0.010*"第一項" + 0.010*"資訊" 2025-04-19 00:09:57,902 : INFO : topic #5 (0.143): 0.013*"公司" + 0.007*"美國" + 0.007*"台灣" + 0.007*"工作" + 0.006*"技術" + 0.005*"晶片" + 0.005*"員工" + 0.004*"科技" + 0.004*"台積電" + 0.004*"問題" 2025-04-19 00:09:57,902 : INFO : topic #3 (0.143): 0.053*"半導體" + 0.029*"製程" + 0.023*"川普" + 0.021*"表示" + 0.012*"投資" + 0.012*"中國" + 0.011*"魏哲家" + 0.010*"研發" + 0.009*"奈米" + 0.006*"晶圓廠" 2025-04-19 00:09:57,903 : INFO : topic diff=0.418343, rho=0.313805 2025-04-19 00:09:57,903 : INFO : PROGRESS: pass 1, at document #6000/16310 2025-04-19 00:09:58,435 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:58,438 : INFO : topic #0 (0.143): 0.032*"工作" + 0.012*"方式" + 0.012*"應徵" + 0.010*"情形" + 0.010*"砍除" + 0.010*"空白" + 0.010*"內容" + 0.010*"推定" + 0.010*"第一項" + 0.010*"文字" 2025-04-19 00:09:58,439 : INFO : topic #5 (0.143): 0.014*"公司" + 0.008*"工作" + 0.006*"台灣" + 0.006*"美國" + 0.005*"技術" + 0.004*"問題" + 0.004*"工程師" + 0.004*"員工" + 0.004*"面試" + 0.004*"晶片" 2025-04-19 00:09:58,439 : INFO : topic #6 (0.143): 0.028*"報名" + 0.025*"活動" + 0.018*"電話" + 0.015*"台北市" + 0.012*"車馬費" + 0.012*"舉辦" + 0.012*"資料" + 0.011*"人數" + 0.011*"訪問" + 0.011*"進行" 2025-04-19 00:09:58,440 : INFO : topic #2 (0.143): 0.044*"工作" + 0.020*"方式" + 0.016*"時間" + 0.014*"小時" + 0.011*"每日" + 0.010*"內容" + 0.010*"休息" + 0.010*"工資" + 0.010*"依法" + 0.008*"面試" 2025-04-19 00:09:58,440 : INFO : topic #1 (0.143): 0.030*"工作" + 0.016*"方式" + 0.012*"推定" + 0.012*"情形" + 0.012*"砍除" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"內容" + 0.011*"未註明" 2025-04-19 00:09:58,441 : INFO : topic diff=0.296699, rho=0.313805 2025-04-19 00:09:58,441 : INFO : PROGRESS: pass 1, at document #8000/16310 2025-04-19 00:09:58,745 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:58,750 : INFO : topic #4 (0.143): 0.036*"工作" + 0.016*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"單位" 2025-04-19 00:09:58,752 : INFO : topic #2 (0.143): 0.048*"工作" + 0.021*"方式" + 0.018*"時間" + 0.016*"小時" + 0.011*"每日" + 0.011*"內容" + 0.009*"休息" + 0.009*"面試" + 0.008*"工資" + 0.008*"依法" 2025-04-19 00:09:58,752 : INFO : topic #6 (0.143): 0.026*"報名" + 0.024*"活動" + 0.017*"電話" + 0.014*"台北市" + 0.012*"資料" + 0.012*"舉辦" + 0.011*"車馬費" + 0.011*"進行" + 0.011*"人數" + 0.010*"參加" 2025-04-19 00:09:58,753 : INFO : topic #0 (0.143): 0.032*"工作" + 0.012*"應徵" + 0.012*"方式" + 0.010*"內容" + 0.010*"情形" + 0.010*"砍除" + 0.010*"空白" + 0.010*"推定" + 0.010*"第一項" + 0.010*"資訊" 2025-04-19 00:09:58,755 : INFO : topic #1 (0.143): 0.030*"工作" + 0.016*"方式" + 0.012*"推定" + 0.012*"情形" + 0.012*"砍除" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"內容" + 0.011*"未註明" 2025-04-19 00:09:58,756 : INFO : topic diff=0.333959, rho=0.313805 2025-04-19 00:09:58,756 : INFO : PROGRESS: pass 1, at document #10000/16310 2025-04-19 00:09:58,993 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:58,996 : INFO : topic #1 (0.143): 0.030*"工作" + 0.016*"方式" + 0.012*"推定" + 0.012*"情形" + 0.012*"砍除" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"內容" + 0.011*"未註明" 2025-04-19 00:09:58,997 : INFO : topic #6 (0.143): 0.026*"報名" + 0.024*"活動" + 0.016*"電話" + 0.013*"台北市" + 0.012*"資料" + 0.012*"研究" + 0.011*"舉辦" + 0.011*"進行" + 0.010*"參加" + 0.010*"參與" 2025-04-19 00:09:58,997 : INFO : topic #2 (0.143): 0.050*"工作" + 0.021*"方式" + 0.019*"時間" + 0.017*"小時" + 0.011*"內容" + 0.011*"每日" + 0.009*"面試" + 0.008*"工時" + 0.008*"休息" + 0.008*"聯絡" 2025-04-19 00:09:58,998 : INFO : topic #5 (0.143): 0.016*"公司" + 0.010*"工作" + 0.008*"面試" + 0.007*"問題" + 0.006*"工程師" + 0.006*"開發" + 0.005*"技術" + 0.005*"經驗" + 0.005*"目前" + 0.004*"比較" 2025-04-19 00:09:58,998 : INFO : topic #0 (0.143): 0.032*"工作" + 0.012*"應徵" + 0.012*"方式" + 0.010*"內容" + 0.010*"情形" + 0.010*"砍除" + 0.010*"空白" + 0.010*"推定" + 0.010*"第一項" + 0.010*"資訊" 2025-04-19 00:09:58,999 : INFO : topic diff=0.289564, rho=0.313805 2025-04-19 00:09:58,999 : INFO : PROGRESS: pass 1, at document #12000/16310 2025-04-19 00:09:59,248 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:59,252 : INFO : topic #1 (0.143): 0.030*"工作" + 0.016*"方式" + 0.012*"推定" + 0.012*"情形" + 0.012*"砍除" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"內容" + 0.010*"未註明" 2025-04-19 00:09:59,252 : INFO : topic #0 (0.143): 0.032*"工作" + 0.012*"應徵" + 0.012*"方式" + 0.010*"內容" + 0.010*"情形" + 0.010*"砍除" + 0.010*"空白" + 0.010*"推定" + 0.010*"第一項" + 0.010*"資訊" 2025-04-19 00:09:59,253 : INFO : topic #4 (0.143): 0.036*"工作" + 0.016*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"單位" 2025-04-19 00:09:59,253 : INFO : topic #2 (0.143): 0.049*"工作" + 0.020*"方式" + 0.019*"時間" + 0.016*"小時" + 0.011*"內容" + 0.010*"每日" + 0.008*"工時" + 0.008*"面試" + 0.007*"休息" + 0.007*"聯絡" 2025-04-19 00:09:59,254 : INFO : topic #5 (0.143): 0.015*"公司" + 0.009*"工作" + 0.007*"面試" + 0.006*"問題" + 0.005*"工程師" + 0.005*"開發" + 0.005*"技術" + 0.005*"台灣" + 0.004*"目前" + 0.004*"比較" 2025-04-19 00:09:59,254 : INFO : topic diff=0.392427, rho=0.313805 2025-04-19 00:09:59,255 : INFO : PROGRESS: pass 1, at document #14000/16310 2025-04-19 00:09:59,495 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:59,498 : INFO : topic #1 (0.143): 0.030*"工作" + 0.016*"方式" + 0.012*"推定" + 0.012*"情形" + 0.011*"砍除" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"內容" + 0.010*"未註明" 2025-04-19 00:09:59,499 : INFO : topic #0 (0.143): 0.031*"工作" + 0.012*"應徵" + 0.011*"方式" + 0.010*"內容" + 0.010*"情形" + 0.010*"砍除" + 0.010*"空白" + 0.010*"推定" + 0.010*"第一項" + 0.010*"資訊" 2025-04-19 00:09:59,499 : INFO : topic #4 (0.143): 0.035*"工作" + 0.016*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"單位" 2025-04-19 00:09:59,500 : INFO : topic #3 (0.143): 0.042*"半導體" + 0.028*"表示" + 0.024*"中國" + 0.023*"晶片" + 0.019*"製程" + 0.013*"輝達" + 0.012*"投資" + 0.010*"先進" + 0.010*"研發" + 0.008*"奈米" 2025-04-19 00:09:59,500 : INFO : topic #6 (0.143): 0.021*"活動" + 0.020*"報名" + 0.014*"研究" + 0.011*"進行" + 0.011*"蘋果" + 0.011*"電話" + 0.010*"資料" + 0.010*"問卷" + 0.010*"三星" + 0.009*"參與" 2025-04-19 00:09:59,500 : INFO : topic diff=0.370313, rho=0.313805 2025-04-19 00:09:59,501 : INFO : PROGRESS: pass 1, at document #16000/16310 2025-04-19 00:09:59,745 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:09:59,749 : INFO : topic #0 (0.143): 0.031*"工作" + 0.011*"應徵" + 0.011*"方式" + 0.010*"內容" + 0.010*"情形" + 0.010*"空白" + 0.010*"砍除" + 0.009*"第一項" + 0.009*"推定" + 0.009*"資訊" 2025-04-19 00:09:59,750 : INFO : topic #1 (0.143): 0.030*"工作" + 0.015*"方式" + 0.012*"推定" + 0.012*"情形" + 0.011*"砍除" + 0.011*"單位" + 0.011*"文字" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"未註明" 2025-04-19 00:09:59,750 : INFO : topic #5 (0.143): 0.014*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.005*"美國" + 0.005*"技術" + 0.004*"問題" + 0.004*"工程師" + 0.004*"員工" + 0.004*"面試" + 0.004*"科技" 2025-04-19 00:09:59,751 : INFO : topic #6 (0.143): 0.018*"活動" + 0.017*"報名" + 0.015*"研究" + 0.013*"三星" + 0.013*"蘋果" + 0.011*"進行" + 0.009*"資料" + 0.009*"問卷" + 0.009*"參與" + 0.008*"使用" 2025-04-19 00:09:59,751 : INFO : topic #4 (0.143): 0.035*"工作" + 0.016*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"單位" 2025-04-19 00:09:59,751 : INFO : topic diff=0.313321, rho=0.313805 2025-04-19 00:09:59,839 : INFO : -8.438 per-word bound, 346.8 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:09:59,840 : INFO : PROGRESS: pass 1, at document #16310/16310 2025-04-19 00:09:59,875 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:09:59,878 : INFO : topic #4 (0.143): 0.035*"工作" + 0.016*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"單位" 2025-04-19 00:09:59,878 : INFO : topic #0 (0.143): 0.030*"工作" + 0.011*"應徵" + 0.011*"方式" + 0.010*"內容" + 0.009*"情形" + 0.009*"空白" + 0.009*"砍除" + 0.009*"資訊" + 0.009*"第一項" + 0.009*"推定" 2025-04-19 00:09:59,879 : INFO : topic #6 (0.143): 0.016*"活動" + 0.015*"研究" + 0.014*"報名" + 0.013*"蘋果" + 0.013*"三星" + 0.011*"問卷" + 0.010*"進行" + 0.010*"機器人" + 0.009*"華為" + 0.009*"參與" 2025-04-19 00:09:59,880 : INFO : topic #1 (0.143): 0.029*"工作" + 0.015*"方式" + 0.012*"推定" + 0.012*"單位" + 0.011*"情形" + 0.011*"砍除" + 0.011*"文字" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"未註明" 2025-04-19 00:09:59,880 : INFO : topic #3 (0.143): 0.031*"晶片" + 0.030*"半導體" + 0.024*"表示" + 0.023*"中國" + 0.022*"投資" + 0.016*"製程" + 0.014*"川普" + 0.013*"輝達" + 0.013*"先進" + 0.013*"美國" 2025-04-19 00:09:59,881 : INFO : topic diff=0.295690, rho=0.313805 2025-04-19 00:09:59,881 : INFO : PROGRESS: pass 2, at document #2000/16310 2025-04-19 00:10:00,566 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:00,569 : INFO : topic #3 (0.143): 0.030*"晶片" + 0.029*"半導體" + 0.023*"表示" + 0.023*"中國" + 0.021*"投資" + 0.016*"製程" + 0.014*"川普" + 0.013*"輝達" + 0.013*"先進" + 0.013*"美國" 2025-04-19 00:10:00,570 : INFO : topic #1 (0.143): 0.030*"工作" + 0.015*"方式" + 0.013*"推定" + 0.012*"單位" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"砍除" + 0.011*"內容" + 0.010*"未註明" 2025-04-19 00:10:00,571 : INFO : topic #0 (0.143): 0.031*"工作" + 0.011*"應徵" + 0.011*"方式" + 0.010*"推定" + 0.010*"內容" + 0.010*"情形" + 0.010*"資訊" + 0.009*"空白" + 0.009*"第一項" + 0.009*"徵才" 2025-04-19 00:10:00,571 : INFO : topic #4 (0.143): 0.034*"工作" + 0.016*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"國定假日" 2025-04-19 00:10:00,572 : INFO : topic #5 (0.143): 0.014*"公司" + 0.006*"台灣" + 0.006*"工作" + 0.006*"美國" + 0.006*"技術" + 0.005*"員工" + 0.004*"科技" + 0.004*"問題" + 0.004*"工程師" + 0.004*"台積" 2025-04-19 00:10:00,572 : INFO : topic diff=0.833026, rho=0.299409 2025-04-19 00:10:00,572 : INFO : PROGRESS: pass 2, at document #4000/16310 2025-04-19 00:10:01,208 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:01,211 : INFO : topic #4 (0.143): 0.034*"工作" + 0.016*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"資訊" 2025-04-19 00:10:01,212 : INFO : topic #0 (0.143): 0.032*"工作" + 0.011*"應徵" + 0.011*"方式" + 0.010*"情形" + 0.010*"空白" + 0.010*"第一項" + 0.010*"內容" + 0.010*"資訊" + 0.010*"推定" + 0.009*"徵才" 2025-04-19 00:10:01,213 : INFO : topic #1 (0.143): 0.030*"工作" + 0.015*"方式" + 0.013*"推定" + 0.012*"情形" + 0.012*"單位" + 0.011*"文字" + 0.011*"聯絡" + 0.011*"未註明" + 0.011*"砍除" + 0.010*"內容" 2025-04-19 00:10:01,213 : INFO : topic #2 (0.143): 0.047*"工作" + 0.022*"方式" + 0.019*"時間" + 0.016*"小時" + 0.011*"每日" + 0.011*"內容" + 0.010*"休息" + 0.009*"工資" + 0.009*"面試" + 0.009*"地點" 2025-04-19 00:10:01,214 : INFO : topic #6 (0.143): 0.027*"報名" + 0.025*"活動" + 0.018*"電話" + 0.014*"台北市" + 0.013*"舉辦" + 0.012*"車馬費" + 0.011*"人數" + 0.011*"資料" + 0.011*"參與" + 0.011*"進行" 2025-04-19 00:10:01,214 : INFO : topic diff=0.359124, rho=0.299409 2025-04-19 00:10:01,214 : INFO : PROGRESS: pass 2, at document #6000/16310 2025-04-19 00:10:01,770 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:01,773 : INFO : topic #3 (0.143): 0.027*"晶片" + 0.026*"半導體" + 0.022*"中國" + 0.021*"表示" + 0.019*"投資" + 0.014*"製程" + 0.012*"川普" + 0.011*"輝達" + 0.011*"先進" + 0.011*"美國" 2025-04-19 00:10:01,774 : INFO : topic #5 (0.143): 0.015*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.005*"技術" + 0.005*"美國" + 0.004*"問題" + 0.004*"工程師" + 0.004*"員工" + 0.004*"科技" + 0.004*"面試" 2025-04-19 00:10:01,774 : INFO : topic #4 (0.143): 0.034*"工作" + 0.016*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.012*"第一項" + 0.012*"情形" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"資訊" 2025-04-19 00:10:01,775 : INFO : topic #6 (0.143): 0.030*"報名" + 0.027*"活動" + 0.020*"電話" + 0.015*"台北市" + 0.013*"車馬費" + 0.013*"舉辦" + 0.012*"人數" + 0.012*"訪問" + 0.012*"資料" + 0.011*"參與" 2025-04-19 00:10:01,775 : INFO : topic #2 (0.143): 0.048*"工作" + 0.024*"方式" + 0.019*"時間" + 0.016*"小時" + 0.013*"每日" + 0.011*"內容" + 0.011*"休息" + 0.010*"依法" + 0.010*"工資" + 0.009*"地點" 2025-04-19 00:10:01,776 : INFO : topic diff=0.230380, rho=0.299409 2025-04-19 00:10:01,776 : INFO : PROGRESS: pass 2, at document #8000/16310 2025-04-19 00:10:02,016 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:02,019 : INFO : topic #5 (0.143): 0.017*"公司" + 0.009*"工作" + 0.006*"面試" + 0.006*"問題" + 0.006*"工程師" + 0.005*"技術" + 0.005*"開發" + 0.004*"台灣" + 0.004*"目前" + 0.004*"經驗" 2025-04-19 00:10:02,020 : INFO : topic #2 (0.143): 0.052*"工作" + 0.024*"方式" + 0.021*"時間" + 0.017*"小時" + 0.012*"內容" + 0.012*"每日" + 0.010*"面試" + 0.009*"休息" + 0.009*"工時" + 0.008*"聯絡" 2025-04-19 00:10:02,021 : INFO : topic #6 (0.143): 0.029*"報名" + 0.027*"活動" + 0.019*"電話" + 0.015*"台北市" + 0.013*"舉辦" + 0.012*"車馬費" + 0.012*"資料" + 0.012*"人數" + 0.011*"訪問" + 0.011*"參與" 2025-04-19 00:10:02,021 : INFO : topic #3 (0.143): 0.027*"半導體" + 0.026*"晶片" + 0.022*"中國" + 0.021*"表示" + 0.018*"投資" + 0.014*"製程" + 0.012*"川普" + 0.011*"輝達" + 0.011*"先進" + 0.011*"美國" 2025-04-19 00:10:02,022 : INFO : topic #1 (0.143): 0.030*"工作" + 0.015*"方式" + 0.013*"推定" + 0.012*"情形" + 0.012*"單位" + 0.011*"文字" + 0.011*"聯絡" + 0.011*"未註明" + 0.011*"砍除" + 0.010*"內容" 2025-04-19 00:10:02,022 : INFO : topic diff=0.305917, rho=0.299409 2025-04-19 00:10:02,022 : INFO : PROGRESS: pass 2, at document #10000/16310 2025-04-19 00:10:02,284 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:02,287 : INFO : topic #3 (0.143): 0.029*"半導體" + 0.025*"晶片" + 0.022*"中國" + 0.021*"表示" + 0.018*"投資" + 0.013*"製程" + 0.011*"川普" + 0.011*"先進" + 0.010*"輝達" + 0.010*"美國" 2025-04-19 00:10:02,288 : INFO : topic #1 (0.143): 0.030*"工作" + 0.015*"方式" + 0.013*"推定" + 0.012*"情形" + 0.012*"單位" + 0.011*"文字" + 0.011*"聯絡" + 0.011*"未註明" + 0.011*"砍除" + 0.010*"內容" 2025-04-19 00:10:02,289 : INFO : topic #4 (0.143): 0.034*"工作" + 0.016*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.012*"情形" + 0.012*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"資訊" 2025-04-19 00:10:02,289 : INFO : topic #5 (0.143): 0.017*"公司" + 0.009*"工作" + 0.008*"面試" + 0.007*"問題" + 0.006*"工程師" + 0.006*"開發" + 0.005*"技術" + 0.005*"目前" + 0.005*"比較" + 0.004*"經驗" 2025-04-19 00:10:02,290 : INFO : topic #6 (0.143): 0.029*"報名" + 0.027*"活動" + 0.017*"電話" + 0.014*"台北市" + 0.012*"研究" + 0.012*"舉辦" + 0.012*"資料" + 0.011*"車馬費" + 0.011*"參與" + 0.011*"參加" 2025-04-19 00:10:02,290 : INFO : topic diff=0.273938, rho=0.299409 2025-04-19 00:10:02,290 : INFO : PROGRESS: pass 2, at document #12000/16310 2025-04-19 00:10:02,492 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:02,495 : INFO : topic #3 (0.143): 0.029*"晶片" + 0.029*"半導體" + 0.020*"表示" + 0.018*"中國" + 0.015*"製程" + 0.014*"台積電" + 0.013*"投資" + 0.012*"美國" + 0.012*"台灣" + 0.010*"英特爾" 2025-04-19 00:10:02,496 : INFO : topic #2 (0.143): 0.052*"工作" + 0.023*"方式" + 0.022*"時間" + 0.018*"小時" + 0.011*"內容" + 0.010*"工時" + 0.010*"每日" + 0.009*"面試" + 0.008*"地點" + 0.008*"聯絡" 2025-04-19 00:10:02,496 : INFO : topic #6 (0.143): 0.028*"報名" + 0.027*"活動" + 0.015*"電話" + 0.014*"研究" + 0.012*"舉辦" + 0.012*"台北市" + 0.011*"資料" + 0.011*"參加" + 0.011*"進行" + 0.011*"參與" 2025-04-19 00:10:02,497 : INFO : topic #0 (0.143): 0.031*"工作" + 0.011*"應徵" + 0.010*"方式" + 0.010*"徵才" + 0.010*"情形" + 0.010*"資訊" + 0.010*"內容" + 0.010*"第一項" + 0.010*"空白" + 0.009*"文字" 2025-04-19 00:10:02,497 : INFO : topic #1 (0.143): 0.030*"工作" + 0.014*"方式" + 0.013*"推定" + 0.012*"情形" + 0.012*"單位" + 0.011*"文字" + 0.011*"聯絡" + 0.011*"未註明" + 0.010*"砍除" + 0.010*"內容" 2025-04-19 00:10:02,498 : INFO : topic diff=0.350334, rho=0.299409 2025-04-19 00:10:02,498 : INFO : PROGRESS: pass 2, at document #14000/16310 2025-04-19 00:10:02,739 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:02,742 : INFO : topic #4 (0.143): 0.034*"工作" + 0.015*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"資訊" 2025-04-19 00:10:02,743 : INFO : topic #3 (0.143): 0.028*"晶片" + 0.025*"半導體" + 0.021*"表示" + 0.020*"中國" + 0.017*"台積電" + 0.017*"台灣" + 0.015*"美國" + 0.013*"英特爾" + 0.011*"投資" + 0.011*"製程" 2025-04-19 00:10:02,743 : INFO : topic #2 (0.143): 0.050*"工作" + 0.021*"方式" + 0.020*"時間" + 0.017*"小時" + 0.011*"內容" + 0.010*"工時" + 0.009*"每日" + 0.009*"地點" + 0.008*"面試" + 0.007*"聯絡" 2025-04-19 00:10:02,744 : INFO : topic #0 (0.143): 0.031*"工作" + 0.011*"應徵" + 0.010*"徵才" + 0.010*"情形" + 0.010*"方式" + 0.010*"資訊" + 0.010*"內容" + 0.009*"第一項" + 0.009*"空白" + 0.009*"文字" 2025-04-19 00:10:02,744 : INFO : topic #1 (0.143): 0.030*"工作" + 0.014*"方式" + 0.013*"推定" + 0.012*"情形" + 0.012*"單位" + 0.011*"文字" + 0.011*"聯絡" + 0.010*"未註明" + 0.010*"砍除" + 0.010*"內容" 2025-04-19 00:10:02,745 : INFO : topic diff=0.322385, rho=0.299409 2025-04-19 00:10:02,745 : INFO : PROGRESS: pass 2, at document #16000/16310 2025-04-19 00:10:02,941 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:02,944 : INFO : topic #2 (0.143): 0.049*"工作" + 0.019*"時間" + 0.018*"方式" + 0.016*"小時" + 0.011*"內容" + 0.011*"工時" + 0.009*"地點" + 0.008*"面試" + 0.008*"每日" + 0.007*"以上" 2025-04-19 00:10:02,944 : INFO : topic #4 (0.143): 0.034*"工作" + 0.015*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"資訊" 2025-04-19 00:10:02,945 : INFO : topic #1 (0.143): 0.030*"工作" + 0.014*"方式" + 0.013*"推定" + 0.012*"情形" + 0.012*"單位" + 0.011*"文字" + 0.011*"聯絡" + 0.010*"未註明" + 0.010*"砍除" + 0.010*"內容" 2025-04-19 00:10:02,945 : INFO : topic #5 (0.143): 0.014*"公司" + 0.007*"工作" + 0.005*"技術" + 0.005*"問題" + 0.005*"工程師" + 0.005*"台灣" + 0.004*"員工" + 0.004*"面試" + 0.004*"目前" + 0.004*"科技" 2025-04-19 00:10:02,946 : INFO : topic #0 (0.143): 0.030*"工作" + 0.010*"應徵" + 0.010*"情形" + 0.010*"徵才" + 0.010*"方式" + 0.010*"資訊" + 0.009*"內容" + 0.009*"空白" + 0.009*"第一項" + 0.009*"文字" 2025-04-19 00:10:02,946 : INFO : topic diff=0.270860, rho=0.299409 2025-04-19 00:10:03,038 : INFO : -8.378 per-word bound, 332.7 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:10:03,038 : INFO : PROGRESS: pass 2, at document #16310/16310 2025-04-19 00:10:03,070 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:10:03,073 : INFO : topic #0 (0.143): 0.029*"工作" + 0.011*"徵才" + 0.010*"應徵" + 0.010*"情形" + 0.010*"方式" + 0.010*"資訊" + 0.009*"內容" + 0.009*"空白" + 0.009*"第一項" + 0.009*"文字" 2025-04-19 00:10:03,073 : INFO : topic #1 (0.143): 0.030*"工作" + 0.014*"方式" + 0.013*"推定" + 0.012*"情形" + 0.011*"單位" + 0.011*"文字" + 0.010*"聯絡" + 0.010*"未註明" + 0.010*"砍除" + 0.010*"內容" 2025-04-19 00:10:03,074 : INFO : topic #5 (0.143): 0.015*"公司" + 0.006*"工作" + 0.006*"技術" + 0.005*"員工" + 0.004*"問題" + 0.004*"台灣" + 0.004*"科技" + 0.004*"工程師" + 0.004*"台積" + 0.004*"面試" 2025-04-19 00:10:03,074 : INFO : topic #4 (0.143): 0.034*"工作" + 0.015*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:10:03,075 : INFO : topic #6 (0.143): 0.019*"活動" + 0.017*"報名" + 0.016*"研究" + 0.012*"問卷" + 0.010*"華為" + 0.010*"進行" + 0.010*"機器人" + 0.009*"參與" + 0.009*"蘋果" + 0.008*"女性" 2025-04-19 00:10:03,075 : INFO : topic diff=0.259379, rho=0.299409 2025-04-19 00:10:03,076 : INFO : PROGRESS: pass 3, at document #2000/16310 2025-04-19 00:10:03,712 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:03,716 : INFO : topic #1 (0.143): 0.030*"工作" + 0.014*"方式" + 0.014*"推定" + 0.012*"情形" + 0.012*"單位" + 0.011*"文字" + 0.011*"未註明" + 0.011*"聯絡" + 0.010*"內容" + 0.010*"砍除" 2025-04-19 00:10:03,717 : INFO : topic #4 (0.143): 0.034*"工作" + 0.015*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.012*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"資訊" 2025-04-19 00:10:03,717 : INFO : topic #2 (0.143): 0.049*"工作" + 0.021*"方式" + 0.020*"時間" + 0.017*"小時" + 0.011*"內容" + 0.010*"每日" + 0.010*"工時" + 0.009*"地點" + 0.009*"休息" + 0.008*"面試" 2025-04-19 00:10:03,718 : INFO : topic #0 (0.143): 0.030*"工作" + 0.010*"應徵" + 0.010*"徵才" + 0.010*"情形" + 0.010*"資訊" + 0.010*"方式" + 0.009*"內容" + 0.009*"第一項" + 0.009*"空白" + 0.009*"文字" 2025-04-19 00:10:03,718 : INFO : topic #3 (0.143): 0.029*"美國" + 0.029*"晶片" + 0.021*"台積電" + 0.020*"中國" + 0.020*"半導體" + 0.019*"台灣" + 0.019*"表示" + 0.017*"投資" + 0.015*"英特爾" + 0.011*"輝達" 2025-04-19 00:10:03,718 : INFO : topic diff=0.765701, rho=0.286829 2025-04-19 00:10:03,719 : INFO : PROGRESS: pass 3, at document #4000/16310 2025-04-19 00:10:04,362 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:04,365 : INFO : topic #5 (0.143): 0.015*"公司" + 0.006*"工作" + 0.006*"技術" + 0.005*"員工" + 0.004*"問題" + 0.004*"台灣" + 0.004*"科技" + 0.004*"工程師" + 0.004*"台積" + 0.004*"面試" 2025-04-19 00:10:04,366 : INFO : topic #1 (0.143): 0.031*"工作" + 0.014*"推定" + 0.014*"方式" + 0.012*"情形" + 0.012*"單位" + 0.011*"文字" + 0.011*"未註明" + 0.010*"聯絡" + 0.010*"砍除" + 0.010*"內容" 2025-04-19 00:10:04,367 : INFO : topic #2 (0.143): 0.049*"工作" + 0.023*"方式" + 0.019*"時間" + 0.017*"小時" + 0.012*"每日" + 0.011*"內容" + 0.010*"休息" + 0.009*"工時" + 0.009*"地點" + 0.009*"依法" 2025-04-19 00:10:04,367 : INFO : topic #4 (0.143): 0.034*"工作" + 0.015*"推定" + 0.014*"空白" + 0.014*"砍除" + 0.013*"方式" + 0.012*"第一項" + 0.012*"情形" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"資訊" 2025-04-19 00:10:04,368 : INFO : topic #0 (0.143): 0.030*"工作" + 0.010*"情形" + 0.010*"資訊" + 0.010*"應徵" + 0.010*"徵才" + 0.010*"方式" + 0.009*"內容" + 0.009*"第一項" + 0.009*"空白" + 0.009*"文字" 2025-04-19 00:10:04,368 : INFO : topic diff=0.338462, rho=0.286829 2025-04-19 00:10:04,369 : INFO : PROGRESS: pass 3, at document #6000/16310 2025-04-19 00:10:04,865 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:04,869 : INFO : topic #0 (0.143): 0.030*"工作" + 0.010*"情形" + 0.010*"資訊" + 0.010*"應徵" + 0.010*"內容" + 0.010*"徵才" + 0.009*"方式" + 0.009*"第一項" + 0.009*"文字" + 0.009*"空白" 2025-04-19 00:10:04,869 : INFO : topic #3 (0.143): 0.027*"美國" + 0.027*"晶片" + 0.019*"中國" + 0.019*"台積電" + 0.018*"半導體" + 0.018*"表示" + 0.018*"台灣" + 0.016*"投資" + 0.014*"英特爾" + 0.010*"輝達" 2025-04-19 00:10:04,870 : INFO : topic #5 (0.143): 0.015*"公司" + 0.007*"工作" + 0.005*"技術" + 0.005*"問題" + 0.005*"工程師" + 0.004*"員工" + 0.004*"台灣" + 0.004*"面試" + 0.004*"科技" + 0.004*"目前" 2025-04-19 00:10:04,870 : INFO : topic #2 (0.143): 0.048*"工作" + 0.024*"方式" + 0.019*"時間" + 0.017*"小時" + 0.013*"每日" + 0.011*"內容" + 0.011*"休息" + 0.010*"依法" + 0.010*"工資" + 0.009*"地點" 2025-04-19 00:10:04,871 : INFO : topic #1 (0.143): 0.031*"工作" + 0.014*"推定" + 0.014*"方式" + 0.012*"情形" + 0.012*"單位" + 0.011*"文字" + 0.011*"未註明" + 0.010*"聯絡" + 0.010*"內容" + 0.010*"砍除" 2025-04-19 00:10:04,871 : INFO : topic diff=0.209269, rho=0.286829 2025-04-19 00:10:04,871 : INFO : PROGRESS: pass 3, at document #8000/16310 2025-04-19 00:10:05,131 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:05,134 : INFO : topic #4 (0.143): 0.034*"工作" + 0.015*"推定" + 0.014*"空白" + 0.014*"砍除" + 0.013*"方式" + 0.012*"第一項" + 0.012*"情形" + 0.011*"資訊" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:10:05,134 : INFO : topic #0 (0.143): 0.030*"工作" + 0.010*"徵才" + 0.010*"資訊" + 0.010*"情形" + 0.010*"應徵" + 0.010*"內容" + 0.009*"方式" + 0.009*"文字" + 0.009*"第一項" + 0.009*"空白" 2025-04-19 00:10:05,135 : INFO : topic #5 (0.143): 0.017*"公司" + 0.008*"工作" + 0.007*"面試" + 0.006*"問題" + 0.006*"工程師" + 0.005*"技術" + 0.005*"開發" + 0.004*"目前" + 0.004*"比較" + 0.004*"覺得" 2025-04-19 00:10:05,136 : INFO : topic #1 (0.143): 0.031*"工作" + 0.014*"推定" + 0.014*"方式" + 0.012*"情形" + 0.012*"單位" + 0.011*"文字" + 0.011*"未註明" + 0.010*"聯絡" + 0.010*"內容" + 0.010*"砍除" 2025-04-19 00:10:05,136 : INFO : topic #6 (0.143): 0.030*"報名" + 0.027*"活動" + 0.019*"電話" + 0.015*"台北市" + 0.013*"舉辦" + 0.013*"車馬費" + 0.012*"人數" + 0.012*"資料" + 0.012*"訪問" + 0.011*"參加" 2025-04-19 00:10:05,137 : INFO : topic diff=0.287448, rho=0.286829 2025-04-19 00:10:05,137 : INFO : PROGRESS: pass 3, at document #10000/16310 2025-04-19 00:10:05,341 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:05,344 : INFO : topic #4 (0.143): 0.034*"工作" + 0.015*"推定" + 0.014*"空白" + 0.014*"砍除" + 0.013*"方式" + 0.012*"第一項" + 0.012*"情形" + 0.011*"資訊" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:10:05,344 : INFO : topic #6 (0.143): 0.030*"報名" + 0.027*"活動" + 0.018*"電話" + 0.014*"台北市" + 0.013*"舉辦" + 0.013*"研究" + 0.012*"資料" + 0.012*"車馬費" + 0.011*"人數" + 0.011*"參加" 2025-04-19 00:10:05,345 : INFO : topic #2 (0.143): 0.052*"工作" + 0.023*"方式" + 0.022*"時間" + 0.018*"小時" + 0.012*"內容" + 0.011*"每日" + 0.011*"工時" + 0.009*"面試" + 0.008*"聯絡" + 0.008*"休息" 2025-04-19 00:10:05,345 : INFO : topic #5 (0.143): 0.017*"公司" + 0.009*"工作" + 0.008*"面試" + 0.007*"問題" + 0.006*"工程師" + 0.006*"開發" + 0.005*"技術" + 0.005*"目前" + 0.005*"比較" + 0.004*"覺得" 2025-04-19 00:10:05,346 : INFO : topic #3 (0.143): 0.026*"美國" + 0.025*"晶片" + 0.020*"中國" + 0.020*"半導體" + 0.019*"台灣" + 0.018*"表示" + 0.018*"台積電" + 0.016*"投資" + 0.013*"英特爾" + 0.010*"製程" 2025-04-19 00:10:05,346 : INFO : topic diff=0.259690, rho=0.286829 2025-04-19 00:10:05,346 : INFO : PROGRESS: pass 3, at document #12000/16310 2025-04-19 00:10:05,551 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:05,554 : INFO : topic #2 (0.143): 0.052*"工作" + 0.023*"方式" + 0.021*"時間" + 0.018*"小時" + 0.011*"內容" + 0.011*"工時" + 0.010*"每日" + 0.009*"面試" + 0.008*"地點" + 0.008*"聯絡" 2025-04-19 00:10:05,555 : INFO : topic #3 (0.143): 0.026*"晶片" + 0.022*"半導體" + 0.020*"台灣" + 0.019*"美國" + 0.018*"台積電" + 0.017*"表示" + 0.016*"中國" + 0.012*"投資" + 0.011*"製程" + 0.011*"產業" 2025-04-19 00:10:05,555 : INFO : topic #0 (0.143): 0.029*"工作" + 0.010*"徵才" + 0.010*"資訊" + 0.010*"情形" + 0.010*"應徵" + 0.009*"內容" + 0.009*"方式" + 0.009*"文字" + 0.009*"第一項" + 0.009*"空白" 2025-04-19 00:10:05,556 : INFO : topic #4 (0.143): 0.033*"工作" + 0.015*"推定" + 0.014*"空白" + 0.014*"砍除" + 0.013*"方式" + 0.012*"第一項" + 0.012*"情形" + 0.011*"資訊" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:10:05,556 : INFO : topic #1 (0.143): 0.031*"工作" + 0.014*"推定" + 0.014*"方式" + 0.012*"情形" + 0.012*"單位" + 0.011*"文字" + 0.011*"未註明" + 0.010*"聯絡" + 0.010*"內容" + 0.010*"第一項" 2025-04-19 00:10:05,557 : INFO : topic diff=0.312646, rho=0.286829 2025-04-19 00:10:05,557 : INFO : PROGRESS: pass 3, at document #14000/16310 2025-04-19 00:10:05,750 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:05,753 : INFO : topic #4 (0.143): 0.033*"工作" + 0.015*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.012*"第一項" + 0.011*"情形" + 0.011*"資訊" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:10:05,754 : INFO : topic #0 (0.143): 0.028*"工作" + 0.011*"徵才" + 0.010*"資訊" + 0.010*"情形" + 0.010*"應徵" + 0.009*"內容" + 0.009*"方式" + 0.009*"文字" + 0.009*"第一項" + 0.009*"空白" 2025-04-19 00:10:05,754 : INFO : topic #6 (0.143): 0.026*"活動" + 0.025*"報名" + 0.016*"研究" + 0.014*"電話" + 0.012*"問卷" + 0.011*"舉辦" + 0.011*"參與" + 0.011*"進行" + 0.010*"參加" + 0.010*"資料" 2025-04-19 00:10:05,755 : INFO : topic #5 (0.143): 0.015*"公司" + 0.008*"工作" + 0.005*"問題" + 0.005*"面試" + 0.005*"工程師" + 0.005*"技術" + 0.004*"目前" + 0.004*"開發" + 0.004*"員工" + 0.004*"比較" 2025-04-19 00:10:05,755 : INFO : topic #1 (0.143): 0.031*"工作" + 0.014*"推定" + 0.014*"方式" + 0.012*"情形" + 0.012*"單位" + 0.011*"文字" + 0.011*"未註明" + 0.010*"聯絡" + 0.010*"內容" + 0.010*"第一項" 2025-04-19 00:10:05,756 : INFO : topic diff=0.295359, rho=0.286829 2025-04-19 00:10:05,756 : INFO : PROGRESS: pass 3, at document #16000/16310 2025-04-19 00:10:05,964 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:05,968 : INFO : topic #2 (0.143): 0.049*"工作" + 0.020*"時間" + 0.019*"方式" + 0.017*"小時" + 0.011*"工時" + 0.011*"內容" + 0.009*"地點" + 0.008*"每日" + 0.008*"面試" + 0.007*"經驗" 2025-04-19 00:10:05,968 : INFO : topic #0 (0.143): 0.028*"工作" + 0.010*"徵才" + 0.010*"情形" + 0.010*"資訊" + 0.009*"應徵" + 0.009*"內容" + 0.009*"方式" + 0.009*"文字" + 0.008*"第一項" + 0.008*"空白" 2025-04-19 00:10:05,968 : INFO : topic #6 (0.143): 0.024*"活動" + 0.023*"報名" + 0.017*"研究" + 0.011*"電話" + 0.011*"問卷" + 0.010*"進行" + 0.010*"舉辦" + 0.010*"參與" + 0.010*"參加" + 0.009*"資料" 2025-04-19 00:10:05,969 : INFO : topic #4 (0.143): 0.033*"工作" + 0.015*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.012*"第一項" + 0.011*"情形" + 0.011*"資訊" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:10:05,970 : INFO : topic #3 (0.143): 0.025*"晶片" + 0.023*"美國" + 0.021*"台灣" + 0.020*"半導體" + 0.019*"表示" + 0.018*"台積電" + 0.018*"中國" + 0.014*"英特爾" + 0.011*"產業" + 0.011*"全球" 2025-04-19 00:10:05,970 : INFO : topic diff=0.249440, rho=0.286829 2025-04-19 00:10:06,036 : INFO : -8.362 per-word bound, 329.1 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:10:06,036 : INFO : PROGRESS: pass 3, at document #16310/16310 2025-04-19 00:10:06,067 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:10:06,070 : INFO : topic #1 (0.143): 0.030*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"單位" + 0.012*"情形" + 0.011*"文字" + 0.011*"未註明" + 0.010*"聯絡" + 0.010*"內容" + 0.010*"第一項" 2025-04-19 00:10:06,070 : INFO : topic #2 (0.143): 0.049*"工作" + 0.019*"時間" + 0.018*"小時" + 0.017*"方式" + 0.012*"工時" + 0.010*"內容" + 0.008*"地點" + 0.007*"每日" + 0.007*"面試" + 0.007*"經驗" 2025-04-19 00:10:06,071 : INFO : topic #0 (0.143): 0.027*"工作" + 0.011*"徵才" + 0.010*"資訊" + 0.010*"情形" + 0.009*"應徵" + 0.009*"內容" + 0.009*"水桶" + 0.009*"方式" + 0.008*"文字" + 0.008*"聯絡" 2025-04-19 00:10:06,072 : INFO : topic #5 (0.143): 0.015*"公司" + 0.006*"工作" + 0.006*"技術" + 0.005*"員工" + 0.005*"問題" + 0.004*"工程師" + 0.004*"台積" + 0.004*"科技" + 0.004*"面試" + 0.004*"目前" 2025-04-19 00:10:06,072 : INFO : topic #3 (0.143): 0.031*"美國" + 0.026*"晶片" + 0.022*"台灣" + 0.021*"台積電" + 0.018*"中國" + 0.018*"表示" + 0.017*"半導體" + 0.016*"投資" + 0.014*"英特爾" + 0.011*"產業" 2025-04-19 00:10:06,072 : INFO : topic diff=0.238406, rho=0.286829 2025-04-19 00:10:06,073 : INFO : PROGRESS: pass 4, at document #2000/16310 2025-04-19 00:10:06,692 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:06,695 : INFO : topic #2 (0.143): 0.049*"工作" + 0.021*"方式" + 0.020*"時間" + 0.017*"小時" + 0.011*"內容" + 0.011*"工時" + 0.010*"每日" + 0.009*"地點" + 0.009*"休息" + 0.008*"面試" 2025-04-19 00:10:06,695 : INFO : topic #0 (0.143): 0.027*"工作" + 0.010*"徵才" + 0.010*"資訊" + 0.010*"情形" + 0.009*"內容" + 0.009*"應徵" + 0.009*"文字" + 0.009*"方式" + 0.008*"水桶" + 0.008*"聯絡" 2025-04-19 00:10:06,696 : INFO : topic #6 (0.143): 0.026*"報名" + 0.025*"活動" + 0.016*"電話" + 0.013*"台北市" + 0.013*"舉辦" + 0.013*"研究" + 0.012*"參與" + 0.011*"車馬費" + 0.010*"人數" + 0.010*"進行" 2025-04-19 00:10:06,696 : INFO : topic #1 (0.143): 0.031*"工作" + 0.014*"推定" + 0.014*"方式" + 0.012*"單位" + 0.012*"情形" + 0.011*"文字" + 0.011*"未註明" + 0.010*"聯絡" + 0.010*"內容" + 0.010*"應徵" 2025-04-19 00:10:06,697 : INFO : topic #4 (0.143): 0.033*"工作" + 0.015*"推定" + 0.014*"空白" + 0.014*"砍除" + 0.013*"方式" + 0.012*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"資訊" 2025-04-19 00:10:06,697 : INFO : topic diff=0.708066, rho=0.275711 2025-04-19 00:10:06,697 : INFO : PROGRESS: pass 4, at document #4000/16310 2025-04-19 00:10:07,294 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:07,297 : INFO : topic #5 (0.143): 0.015*"公司" + 0.006*"工作" + 0.006*"技術" + 0.005*"員工" + 0.005*"問題" + 0.004*"工程師" + 0.004*"科技" + 0.004*"台積" + 0.004*"面試" + 0.004*"目前" 2025-04-19 00:10:07,298 : INFO : topic #2 (0.143): 0.049*"工作" + 0.023*"方式" + 0.019*"時間" + 0.017*"小時" + 0.012*"每日" + 0.011*"內容" + 0.010*"休息" + 0.010*"工時" + 0.009*"地點" + 0.009*"依法" 2025-04-19 00:10:07,298 : INFO : topic #1 (0.143): 0.031*"工作" + 0.015*"推定" + 0.014*"方式" + 0.012*"單位" + 0.012*"情形" + 0.011*"未註明" + 0.011*"文字" + 0.010*"聯絡" + 0.010*"第一項" + 0.010*"內容" 2025-04-19 00:10:07,299 : INFO : topic #3 (0.143): 0.030*"美國" + 0.025*"晶片" + 0.022*"台灣" + 0.020*"台積電" + 0.018*"中國" + 0.017*"表示" + 0.017*"半導體" + 0.015*"投資" + 0.014*"英特爾" + 0.011*"產業" 2025-04-19 00:10:07,299 : INFO : topic #4 (0.143): 0.033*"工作" + 0.015*"推定" + 0.014*"空白" + 0.014*"砍除" + 0.013*"方式" + 0.012*"第一項" + 0.011*"情形" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:10:07,299 : INFO : topic diff=0.327185, rho=0.275711 2025-04-19 00:10:07,300 : INFO : PROGRESS: pass 4, at document #6000/16310 2025-04-19 00:10:07,782 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:07,786 : INFO : topic #5 (0.143): 0.016*"公司" + 0.007*"工作" + 0.006*"技術" + 0.005*"問題" + 0.005*"工程師" + 0.004*"員工" + 0.004*"面試" + 0.004*"目前" + 0.004*"產品" + 0.004*"知道" 2025-04-19 00:10:07,786 : INFO : topic #2 (0.143): 0.048*"工作" + 0.024*"方式" + 0.019*"時間" + 0.017*"小時" + 0.012*"每日" + 0.011*"內容" + 0.011*"休息" + 0.010*"依法" + 0.009*"工資" + 0.009*"工時" 2025-04-19 00:10:07,787 : INFO : topic #4 (0.143): 0.033*"工作" + 0.015*"推定" + 0.014*"空白" + 0.014*"砍除" + 0.013*"方式" + 0.012*"第一項" + 0.012*"情形" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:10:07,787 : INFO : topic #6 (0.143): 0.031*"報名" + 0.028*"活動" + 0.020*"電話" + 0.016*"台北市" + 0.014*"車馬費" + 0.013*"舉辦" + 0.013*"人數" + 0.012*"訪問" + 0.012*"資料" + 0.011*"參與" 2025-04-19 00:10:07,787 : INFO : topic #0 (0.143): 0.027*"工作" + 0.010*"資訊" + 0.010*"徵才" + 0.010*"情形" + 0.009*"內容" + 0.009*"文字" + 0.009*"分類" + 0.009*"應徵" + 0.008*"方式" + 0.008*"水桶" 2025-04-19 00:10:07,788 : INFO : topic diff=0.199973, rho=0.275711 2025-04-19 00:10:07,788 : INFO : PROGRESS: pass 4, at document #8000/16310 2025-04-19 00:10:08,047 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:08,050 : INFO : topic #4 (0.143): 0.033*"工作" + 0.015*"推定" + 0.014*"空白" + 0.014*"砍除" + 0.013*"方式" + 0.012*"第一項" + 0.012*"情形" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:10:08,051 : INFO : topic #1 (0.143): 0.031*"工作" + 0.015*"推定" + 0.013*"方式" + 0.012*"單位" + 0.012*"情形" + 0.011*"未註明" + 0.011*"文字" + 0.010*"聯絡" + 0.010*"第一項" + 0.010*"內容" 2025-04-19 00:10:08,051 : INFO : topic #6 (0.143): 0.030*"報名" + 0.028*"活動" + 0.019*"電話" + 0.015*"台北市" + 0.013*"舉辦" + 0.013*"車馬費" + 0.012*"人數" + 0.012*"資料" + 0.012*"訪問" + 0.011*"參加" 2025-04-19 00:10:08,052 : INFO : topic #5 (0.143): 0.017*"公司" + 0.008*"工作" + 0.007*"面試" + 0.006*"問題" + 0.006*"工程師" + 0.005*"技術" + 0.005*"開發" + 0.004*"目前" + 0.004*"比較" + 0.004*"產品" 2025-04-19 00:10:08,052 : INFO : topic #2 (0.143): 0.051*"工作" + 0.024*"方式" + 0.021*"時間" + 0.017*"小時" + 0.012*"內容" + 0.011*"每日" + 0.010*"工時" + 0.009*"休息" + 0.009*"面試" + 0.008*"聯絡" 2025-04-19 00:10:08,053 : INFO : topic diff=0.271208, rho=0.275711 2025-04-19 00:10:08,053 : INFO : PROGRESS: pass 4, at document #10000/16310 2025-04-19 00:10:08,258 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:08,261 : INFO : topic #2 (0.143): 0.052*"工作" + 0.023*"方式" + 0.021*"時間" + 0.018*"小時" + 0.012*"內容" + 0.011*"工時" + 0.011*"每日" + 0.009*"面試" + 0.008*"地點" + 0.008*"聯絡" 2025-04-19 00:10:08,261 : INFO : topic #1 (0.143): 0.031*"工作" + 0.015*"推定" + 0.013*"方式" + 0.012*"單位" + 0.012*"情形" + 0.011*"未註明" + 0.011*"文字" + 0.010*"聯絡" + 0.010*"第一項" + 0.010*"內容" 2025-04-19 00:10:08,262 : INFO : topic #5 (0.143): 0.017*"公司" + 0.009*"工作" + 0.008*"面試" + 0.007*"問題" + 0.006*"工程師" + 0.006*"開發" + 0.005*"技術" + 0.005*"目前" + 0.005*"比較" + 0.004*"覺得" 2025-04-19 00:10:08,262 : INFO : topic #4 (0.143): 0.033*"工作" + 0.015*"推定" + 0.014*"空白" + 0.014*"砍除" + 0.013*"方式" + 0.012*"第一項" + 0.011*"情形" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:10:08,263 : INFO : topic #3 (0.143): 0.029*"美國" + 0.023*"台灣" + 0.022*"晶片" + 0.018*"中國" + 0.017*"台積電" + 0.017*"半導體" + 0.016*"表示" + 0.015*"投資" + 0.012*"英特爾" + 0.011*"產業" 2025-04-19 00:10:08,263 : INFO : topic diff=0.246010, rho=0.275711 2025-04-19 00:10:08,263 : INFO : PROGRESS: pass 4, at document #12000/16310 2025-04-19 00:10:08,455 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:08,458 : INFO : topic #6 (0.143): 0.029*"報名" + 0.029*"活動" + 0.016*"電話" + 0.014*"研究" + 0.013*"舉辦" + 0.012*"台北市" + 0.011*"參加" + 0.011*"資料" + 0.011*"問卷" + 0.011*"參與" 2025-04-19 00:10:08,459 : INFO : topic #2 (0.143): 0.051*"工作" + 0.022*"方式" + 0.021*"時間" + 0.018*"小時" + 0.011*"內容" + 0.011*"工時" + 0.010*"每日" + 0.008*"面試" + 0.008*"地點" + 0.008*"經驗" 2025-04-19 00:10:08,459 : INFO : topic #5 (0.143): 0.016*"公司" + 0.008*"工作" + 0.007*"面試" + 0.006*"問題" + 0.006*"工程師" + 0.005*"開發" + 0.005*"技術" + 0.005*"目前" + 0.004*"比較" + 0.004*"軟體" 2025-04-19 00:10:08,460 : INFO : topic #1 (0.143): 0.031*"工作" + 0.015*"推定" + 0.013*"方式" + 0.012*"單位" + 0.012*"情形" + 0.011*"文字" + 0.011*"未註明" + 0.010*"聯絡" + 0.010*"第一項" + 0.010*"內容" 2025-04-19 00:10:08,460 : INFO : topic #3 (0.143): 0.023*"晶片" + 0.022*"台灣" + 0.021*"美國" + 0.020*"半導體" + 0.017*"台積電" + 0.016*"表示" + 0.015*"中國" + 0.012*"產業" + 0.011*"投資" + 0.011*"全球" 2025-04-19 00:10:08,460 : INFO : topic diff=0.288108, rho=0.275711 2025-04-19 00:10:08,461 : INFO : PROGRESS: pass 4, at document #14000/16310 2025-04-19 00:10:08,679 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:08,682 : INFO : topic #0 (0.143): 0.025*"工作" + 0.011*"徵才" + 0.010*"資訊" + 0.010*"水桶" + 0.009*"情形" + 0.009*"文字" + 0.009*"內容" + 0.008*"應徵" + 0.008*"分類" + 0.008*"方式" 2025-04-19 00:10:08,682 : INFO : topic #6 (0.143): 0.027*"活動" + 0.026*"報名" + 0.015*"研究" + 0.014*"電話" + 0.012*"問卷" + 0.011*"舉辦" + 0.011*"參與" + 0.011*"參加" + 0.010*"台北市" + 0.010*"進行" 2025-04-19 00:10:08,683 : INFO : topic #4 (0.143): 0.033*"工作" + 0.015*"推定" + 0.014*"空白" + 0.014*"砍除" + 0.013*"方式" + 0.012*"第一項" + 0.011*"情形" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:10:08,683 : INFO : topic #5 (0.143): 0.015*"公司" + 0.008*"工作" + 0.005*"問題" + 0.005*"面試" + 0.005*"工程師" + 0.005*"技術" + 0.004*"目前" + 0.004*"開發" + 0.004*"員工" + 0.004*"比較" 2025-04-19 00:10:08,684 : INFO : topic #1 (0.143): 0.031*"工作" + 0.015*"推定" + 0.013*"方式" + 0.012*"單位" + 0.012*"情形" + 0.011*"文字" + 0.011*"未註明" + 0.010*"聯絡" + 0.010*"內容" + 0.010*"第一項" 2025-04-19 00:10:08,684 : INFO : topic diff=0.277037, rho=0.275711 2025-04-19 00:10:08,684 : INFO : PROGRESS: pass 4, at document #16000/16310 2025-04-19 00:10:08,863 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:08,866 : INFO : topic #0 (0.143): 0.024*"工作" + 0.011*"徵才" + 0.010*"資訊" + 0.010*"水桶" + 0.009*"情形" + 0.008*"內容" + 0.008*"文字" + 0.008*"應徵" + 0.008*"分類" + 0.008*"方式" 2025-04-19 00:10:08,866 : INFO : topic #3 (0.143): 0.024*"晶片" + 0.023*"美國" + 0.022*"台灣" + 0.018*"半導體" + 0.018*"表示" + 0.017*"台積電" + 0.017*"中國" + 0.013*"英特爾" + 0.012*"產業" + 0.011*"全球" 2025-04-19 00:10:08,867 : INFO : topic #4 (0.143): 0.033*"工作" + 0.015*"推定" + 0.014*"空白" + 0.014*"砍除" + 0.012*"方式" + 0.012*"第一項" + 0.011*"情形" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:10:08,867 : INFO : topic #6 (0.143): 0.025*"活動" + 0.024*"報名" + 0.017*"研究" + 0.012*"電話" + 0.011*"問卷" + 0.010*"舉辦" + 0.010*"參與" + 0.010*"參加" + 0.010*"進行" + 0.010*"資料" 2025-04-19 00:10:08,868 : INFO : topic #1 (0.143): 0.031*"工作" + 0.015*"推定" + 0.013*"方式" + 0.012*"單位" + 0.012*"情形" + 0.011*"文字" + 0.011*"未註明" + 0.010*"聯絡" + 0.010*"內容" + 0.010*"第一項" 2025-04-19 00:10:08,868 : INFO : topic diff=0.234404, rho=0.275711 2025-04-19 00:10:08,934 : INFO : -8.356 per-word bound, 327.6 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:10:08,934 : INFO : PROGRESS: pass 4, at document #16310/16310 2025-04-19 00:10:08,991 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:10:08,994 : INFO : topic #1 (0.143): 0.030*"工作" + 0.015*"推定" + 0.013*"方式" + 0.012*"單位" + 0.012*"情形" + 0.011*"文字" + 0.011*"未註明" + 0.010*"聯絡" + 0.010*"內容" + 0.010*"第一項" 2025-04-19 00:10:08,995 : INFO : topic #2 (0.143): 0.049*"工作" + 0.019*"時間" + 0.018*"小時" + 0.017*"方式" + 0.013*"工時" + 0.010*"內容" + 0.008*"地點" + 0.007*"每日" + 0.007*"經驗" + 0.007*"面試" 2025-04-19 00:10:08,996 : INFO : topic #0 (0.143): 0.023*"工作" + 0.011*"徵才" + 0.010*"水桶" + 0.010*"資訊" + 0.009*"詐騙" + 0.009*"情形" + 0.008*"內容" + 0.008*"應徵" + 0.008*"文字" + 0.007*"分類" 2025-04-19 00:10:08,996 : INFO : topic #3 (0.143): 0.030*"美國" + 0.024*"晶片" + 0.023*"台灣" + 0.020*"台積電" + 0.017*"表示" + 0.017*"中國" + 0.016*"半導體" + 0.015*"投資" + 0.013*"英特爾" + 0.011*"產業" 2025-04-19 00:10:08,997 : INFO : topic #5 (0.143): 0.015*"公司" + 0.006*"工作" + 0.006*"技術" + 0.005*"員工" + 0.005*"問題" + 0.004*"工程師" + 0.004*"台積" + 0.004*"面試" + 0.004*"目前" + 0.004*"報導" 2025-04-19 00:10:08,997 : INFO : topic diff=0.222449, rho=0.275711 2025-04-19 00:10:08,997 : INFO : LdaModel lifecycle event {'msg': 'trained LdaModel<num_terms=23261, num_topics=7, decay=0.5, chunksize=2000> in 16.61s', 'datetime': '2025-04-19T00:10:08.997783', 'gensim': '4.3.3', 'python': '3.11.2 (main, Apr 21 2023, 22:51:21) [Clang 14.0.3 (clang-1403.0.22.14.1)]', 'platform': 'macOS-15.3.2-arm64-arm-64bit', 'event': 'created'} 2025-04-19 00:10:14,125 : INFO : -7.022 per-word bound, 130.0 perplexity estimate based on a held-out corpus of 16310 documents with 3460358 words 2025-04-19 00:10:14,128 : INFO : using ParallelWordOccurrenceAccumulator<processes=7, batch_size=64> to estimate probabilities from sliding windows 2025-04-19 00:10:17,728 : INFO : 1 batches submitted to accumulate stats from 64 documents (22660 virtual) 2025-04-19 00:10:17,731 : INFO : 2 batches submitted to accumulate stats from 128 documents (45646 virtual) 2025-04-19 00:10:17,733 : INFO : 3 batches submitted to accumulate stats from 192 documents (67171 virtual) 2025-04-19 00:10:17,735 : INFO : 4 batches submitted to accumulate stats from 256 documents (88330 virtual) 2025-04-19 00:10:17,738 : INFO : 5 batches submitted to accumulate stats from 320 documents (109687 virtual) 2025-04-19 00:10:17,741 : INFO : 6 batches submitted to accumulate stats from 384 documents (131042 virtual) 2025-04-19 00:10:17,746 : INFO : 7 batches submitted to accumulate stats from 448 documents (153774 virtual) 2025-04-19 00:10:17,750 : INFO : 8 batches submitted to accumulate stats from 512 documents (176164 virtual) 2025-04-19 00:10:17,753 : INFO : 9 batches submitted to accumulate stats from 576 documents (197020 virtual) 2025-04-19 00:10:17,757 : INFO : 10 batches submitted to accumulate stats from 640 documents (218505 virtual) 2025-04-19 00:10:17,783 : INFO : 11 batches submitted to accumulate stats from 704 documents (240803 virtual) 2025-04-19 00:10:17,788 : INFO : 12 batches submitted to accumulate stats from 768 documents (265360 virtual) 2025-04-19 00:10:17,795 : INFO : 13 batches submitted to accumulate stats from 832 documents (286615 virtual) 2025-04-19 00:10:17,798 : INFO : 14 batches submitted to accumulate stats from 896 documents (310833 virtual) 2025-04-19 00:10:17,884 : INFO : 15 batches submitted to accumulate stats from 960 documents (331313 virtual) 2025-04-19 00:10:17,893 : INFO : 16 batches submitted to accumulate stats from 1024 documents (350940 virtual) 2025-04-19 00:10:17,897 : INFO : 17 batches submitted to accumulate stats from 1088 documents (368371 virtual) 2025-04-19 00:10:17,908 : INFO : 18 batches submitted to accumulate stats from 1152 documents (390334 virtual) 2025-04-19 00:10:17,918 : INFO : 19 batches submitted to accumulate stats from 1216 documents (414153 virtual) 2025-04-19 00:10:17,930 : INFO : 20 batches submitted to accumulate stats from 1280 documents (435684 virtual) 2025-04-19 00:10:18,044 : INFO : 21 batches submitted to accumulate stats from 1344 documents (459433 virtual) 2025-04-19 00:10:18,049 : INFO : 22 batches submitted to accumulate stats from 1408 documents (483210 virtual) 2025-04-19 00:10:18,054 : INFO : 23 batches submitted to accumulate stats from 1472 documents (507391 virtual) 2025-04-19 00:10:18,064 : INFO : 24 batches submitted to accumulate stats from 1536 documents (527404 virtual) 2025-04-19 00:10:18,069 : INFO : 25 batches submitted to accumulate stats from 1600 documents (550178 virtual) 2025-04-19 00:10:18,080 : INFO : 26 batches submitted to accumulate stats from 1664 documents (575041 virtual) 2025-04-19 00:10:18,085 : INFO : 27 batches submitted to accumulate stats from 1728 documents (598912 virtual) 2025-04-19 00:10:18,197 : INFO : 28 batches submitted to accumulate stats from 1792 documents (622487 virtual) 2025-04-19 00:10:18,211 : INFO : 29 batches submitted to accumulate stats from 1856 documents (648902 virtual) 2025-04-19 00:10:18,230 : INFO : 30 batches submitted to accumulate stats from 1920 documents (671126 virtual) 2025-04-19 00:10:18,249 : INFO : 31 batches submitted to accumulate stats from 1984 documents (693717 virtual) 2025-04-19 00:10:18,261 : INFO : 32 batches submitted to accumulate stats from 2048 documents (714139 virtual) 2025-04-19 00:10:18,265 : INFO : 33 batches submitted to accumulate stats from 2112 documents (736202 virtual) 2025-04-19 00:10:18,312 : INFO : 34 batches submitted to accumulate stats from 2176 documents (758687 virtual) 2025-04-19 00:10:18,340 : INFO : 35 batches submitted to accumulate stats from 2240 documents (779677 virtual) 2025-04-19 00:10:18,349 : INFO : 36 batches submitted to accumulate stats from 2304 documents (800483 virtual) 2025-04-19 00:10:18,370 : INFO : 37 batches submitted to accumulate stats from 2368 documents (821258 virtual) 2025-04-19 00:10:18,419 : INFO : 38 batches submitted to accumulate stats from 2432 documents (844326 virtual) 2025-04-19 00:10:18,423 : INFO : 39 batches submitted to accumulate stats from 2496 documents (868823 virtual) 2025-04-19 00:10:18,460 : INFO : 40 batches submitted to accumulate stats from 2560 documents (888215 virtual) 2025-04-19 00:10:18,498 : INFO : 41 batches submitted to accumulate stats from 2624 documents (910499 virtual) 2025-04-19 00:10:18,503 : INFO : 42 batches submitted to accumulate stats from 2688 documents (931945 virtual) 2025-04-19 00:10:18,531 : INFO : 43 batches submitted to accumulate stats from 2752 documents (954111 virtual) 2025-04-19 00:10:18,555 : INFO : 44 batches submitted to accumulate stats from 2816 documents (975617 virtual) 2025-04-19 00:10:18,563 : INFO : 45 batches submitted to accumulate stats from 2880 documents (995125 virtual) 2025-04-19 00:10:18,581 : INFO : 46 batches submitted to accumulate stats from 2944 documents (1016531 virtual) 2025-04-19 00:10:18,627 : INFO : 47 batches submitted to accumulate stats from 3008 documents (1038247 virtual) 2025-04-19 00:10:18,641 : INFO : 48 batches submitted to accumulate stats from 3072 documents (1063862 virtual) 2025-04-19 00:10:18,649 : INFO : 49 batches submitted to accumulate stats from 3136 documents (1087898 virtual) 2025-04-19 00:10:18,666 : INFO : 50 batches submitted to accumulate stats from 3200 documents (1110531 virtual) 2025-04-19 00:10:18,696 : INFO : 51 batches submitted to accumulate stats from 3264 documents (1133127 virtual) 2025-04-19 00:10:18,736 : INFO : 52 batches submitted to accumulate stats from 3328 documents (1153766 virtual) 2025-04-19 00:10:18,745 : INFO : 53 batches submitted to accumulate stats from 3392 documents (1177684 virtual) 2025-04-19 00:10:18,753 : INFO : 54 batches submitted to accumulate stats from 3456 documents (1200190 virtual) 2025-04-19 00:10:18,787 : INFO : 55 batches submitted to accumulate stats from 3520 documents (1225029 virtual) 2025-04-19 00:10:18,793 : INFO : 56 batches submitted to accumulate stats from 3584 documents (1249662 virtual) 2025-04-19 00:10:18,863 : INFO : 57 batches submitted to accumulate stats from 3648 documents (1274547 virtual) 2025-04-19 00:10:18,869 : INFO : 58 batches submitted to accumulate stats from 3712 documents (1297434 virtual) 2025-04-19 00:10:18,899 : INFO : 59 batches submitted to accumulate stats from 3776 documents (1319261 virtual) 2025-04-19 00:10:18,924 : INFO : 60 batches submitted to accumulate stats from 3840 documents (1341972 virtual) 2025-04-19 00:10:18,931 : INFO : 61 batches submitted to accumulate stats from 3904 documents (1364269 virtual) 2025-04-19 00:10:18,998 : INFO : 62 batches submitted to accumulate stats from 3968 documents (1386796 virtual) 2025-04-19 00:10:19,007 : INFO : 63 batches submitted to accumulate stats from 4032 documents (1410249 virtual) 2025-04-19 00:10:19,014 : INFO : 64 batches submitted to accumulate stats from 4096 documents (1433115 virtual) 2025-04-19 00:10:19,024 : INFO : 65 batches submitted to accumulate stats from 4160 documents (1453873 virtual) 2025-04-19 00:10:19,053 : INFO : 66 batches submitted to accumulate stats from 4224 documents (1475474 virtual) 2025-04-19 00:10:19,077 : INFO : 67 batches submitted to accumulate stats from 4288 documents (1497524 virtual) 2025-04-19 00:10:19,087 : INFO : 68 batches submitted to accumulate stats from 4352 documents (1516835 virtual) 2025-04-19 00:10:19,174 : INFO : 69 batches submitted to accumulate stats from 4416 documents (1536986 virtual) 2025-04-19 00:10:19,179 : INFO : 70 batches submitted to accumulate stats from 4480 documents (1558454 virtual) 2025-04-19 00:10:19,189 : INFO : 71 batches submitted to accumulate stats from 4544 documents (1580610 virtual) 2025-04-19 00:10:19,195 : INFO : 72 batches submitted to accumulate stats from 4608 documents (1603508 virtual) 2025-04-19 00:10:19,201 : INFO : 73 batches submitted to accumulate stats from 4672 documents (1624378 virtual) 2025-04-19 00:10:19,271 : INFO : 74 batches submitted to accumulate stats from 4736 documents (1646402 virtual) 2025-04-19 00:10:19,275 : INFO : 75 batches submitted to accumulate stats from 4800 documents (1668704 virtual) 2025-04-19 00:10:19,356 : INFO : 76 batches submitted to accumulate stats from 4864 documents (1690394 virtual) 2025-04-19 00:10:19,361 : INFO : 77 batches submitted to accumulate stats from 4928 documents (1713028 virtual) 2025-04-19 00:10:19,380 : INFO : 78 batches submitted to accumulate stats from 4992 documents (1735434 virtual) 2025-04-19 00:10:19,385 : INFO : 79 batches submitted to accumulate stats from 5056 documents (1755430 virtual) 2025-04-19 00:10:19,388 : INFO : 80 batches submitted to accumulate stats from 5120 documents (1779164 virtual) 2025-04-19 00:10:19,400 : INFO : 81 batches submitted to accumulate stats from 5184 documents (1799023 virtual) 2025-04-19 00:10:19,424 : INFO : 82 batches submitted to accumulate stats from 5248 documents (1821516 virtual) 2025-04-19 00:10:19,504 : INFO : 83 batches submitted to accumulate stats from 5312 documents (1844224 virtual) 2025-04-19 00:10:19,511 : INFO : 84 batches submitted to accumulate stats from 5376 documents (1864739 virtual) 2025-04-19 00:10:19,518 : INFO : 85 batches submitted to accumulate stats from 5440 documents (1885053 virtual) 2025-04-19 00:10:19,528 : INFO : 86 batches submitted to accumulate stats from 5504 documents (1902170 virtual) 2025-04-19 00:10:19,570 : INFO : 87 batches submitted to accumulate stats from 5568 documents (1924910 virtual) 2025-04-19 00:10:19,587 : INFO : 88 batches submitted to accumulate stats from 5632 documents (1931530 virtual) 2025-04-19 00:10:19,596 : INFO : 89 batches submitted to accumulate stats from 5696 documents (1941414 virtual) 2025-04-19 00:10:19,636 : INFO : 90 batches submitted to accumulate stats from 5760 documents (1950642 virtual) 2025-04-19 00:10:19,645 : INFO : 91 batches submitted to accumulate stats from 5824 documents (1957200 virtual) 2025-04-19 00:10:19,686 : INFO : 92 batches submitted to accumulate stats from 5888 documents (1964937 virtual) 2025-04-19 00:10:19,710 : INFO : 93 batches submitted to accumulate stats from 5952 documents (1974259 virtual) 2025-04-19 00:10:19,777 : INFO : 94 batches submitted to accumulate stats from 6016 documents (1988296 virtual) 2025-04-19 00:10:19,780 : INFO : 95 batches submitted to accumulate stats from 6080 documents (1997659 virtual) 2025-04-19 00:10:19,798 : INFO : 96 batches submitted to accumulate stats from 6144 documents (2009678 virtual) 2025-04-19 00:10:19,810 : INFO : 97 batches submitted to accumulate stats from 6208 documents (2019297 virtual) 2025-04-19 00:10:19,818 : INFO : 98 batches submitted to accumulate stats from 6272 documents (2031857 virtual) 2025-04-19 00:10:19,822 : INFO : 99 batches submitted to accumulate stats from 6336 documents (2044117 virtual) 2025-04-19 00:10:19,829 : INFO : 100 batches submitted to accumulate stats from 6400 documents (2053380 virtual) 2025-04-19 00:10:19,849 : INFO : 101 batches submitted to accumulate stats from 6464 documents (2066889 virtual) 2025-04-19 00:10:19,851 : INFO : 102 batches submitted to accumulate stats from 6528 documents (2075479 virtual) 2025-04-19 00:10:19,857 : INFO : 103 batches submitted to accumulate stats from 6592 documents (2085095 virtual) 2025-04-19 00:10:19,860 : INFO : 104 batches submitted to accumulate stats from 6656 documents (2093845 virtual) 2025-04-19 00:10:19,875 : INFO : 105 batches submitted to accumulate stats from 6720 documents (2102407 virtual) 2025-04-19 00:10:19,881 : INFO : 106 batches submitted to accumulate stats from 6784 documents (2111466 virtual) 2025-04-19 00:10:19,904 : INFO : 107 batches submitted to accumulate stats from 6848 documents (2121845 virtual) 2025-04-19 00:10:19,909 : INFO : 108 batches submitted to accumulate stats from 6912 documents (2129219 virtual) 2025-04-19 00:10:19,925 : INFO : 109 batches submitted to accumulate stats from 6976 documents (2137886 virtual) 2025-04-19 00:10:19,931 : INFO : 110 batches submitted to accumulate stats from 7040 documents (2145150 virtual) 2025-04-19 00:10:19,940 : INFO : 111 batches submitted to accumulate stats from 7104 documents (2155495 virtual) 2025-04-19 00:10:19,948 : INFO : 112 batches submitted to accumulate stats from 7168 documents (2164720 virtual) 2025-04-19 00:10:19,959 : INFO : 113 batches submitted to accumulate stats from 7232 documents (2172193 virtual) 2025-04-19 00:10:19,962 : INFO : 114 batches submitted to accumulate stats from 7296 documents (2183458 virtual) 2025-04-19 00:10:19,969 : INFO : 115 batches submitted to accumulate stats from 7360 documents (2191706 virtual) 2025-04-19 00:10:19,980 : INFO : 116 batches submitted to accumulate stats from 7424 documents (2202020 virtual) 2025-04-19 00:10:19,987 : INFO : 117 batches submitted to accumulate stats from 7488 documents (2211055 virtual) 2025-04-19 00:10:19,999 : INFO : 118 batches submitted to accumulate stats from 7552 documents (2223321 virtual) 2025-04-19 00:10:20,007 : INFO : 119 batches submitted to accumulate stats from 7616 documents (2230121 virtual) 2025-04-19 00:10:20,011 : INFO : 120 batches submitted to accumulate stats from 7680 documents (2243511 virtual) 2025-04-19 00:10:20,013 : INFO : 121 batches submitted to accumulate stats from 7744 documents (2258370 virtual) 2025-04-19 00:10:20,015 : INFO : 122 batches submitted to accumulate stats from 7808 documents (2269267 virtual) 2025-04-19 00:10:20,050 : INFO : 123 batches submitted to accumulate stats from 7872 documents (2280490 virtual) 2025-04-19 00:10:20,068 : INFO : 124 batches submitted to accumulate stats from 7936 documents (2289945 virtual) 2025-04-19 00:10:20,079 : INFO : 125 batches submitted to accumulate stats from 8000 documents (2298931 virtual) 2025-04-19 00:10:20,087 : INFO : 126 batches submitted to accumulate stats from 8064 documents (2309719 virtual) 2025-04-19 00:10:20,091 : INFO : 127 batches submitted to accumulate stats from 8128 documents (2320328 virtual) 2025-04-19 00:10:20,108 : INFO : 128 batches submitted to accumulate stats from 8192 documents (2331614 virtual) 2025-04-19 00:10:20,112 : INFO : 129 batches submitted to accumulate stats from 8256 documents (2342568 virtual) 2025-04-19 00:10:20,115 : INFO : 130 batches submitted to accumulate stats from 8320 documents (2351306 virtual) 2025-04-19 00:10:20,132 : INFO : 131 batches submitted to accumulate stats from 8384 documents (2359488 virtual) 2025-04-19 00:10:20,135 : INFO : 132 batches submitted to accumulate stats from 8448 documents (2368497 virtual) 2025-04-19 00:10:20,161 : INFO : 133 batches submitted to accumulate stats from 8512 documents (2378449 virtual) 2025-04-19 00:10:20,163 : INFO : 134 batches submitted to accumulate stats from 8576 documents (2388057 virtual) 2025-04-19 00:10:20,177 : INFO : 135 batches submitted to accumulate stats from 8640 documents (2395926 virtual) 2025-04-19 00:10:20,182 : INFO : 136 batches submitted to accumulate stats from 8704 documents (2403405 virtual) 2025-04-19 00:10:20,185 : INFO : 137 batches submitted to accumulate stats from 8768 documents (2411628 virtual) 2025-04-19 00:10:20,194 : INFO : 138 batches submitted to accumulate stats from 8832 documents (2419219 virtual) 2025-04-19 00:10:20,199 : INFO : 139 batches submitted to accumulate stats from 8896 documents (2428220 virtual) 2025-04-19 00:10:20,225 : INFO : 140 batches submitted to accumulate stats from 8960 documents (2436470 virtual) 2025-04-19 00:10:20,230 : INFO : 141 batches submitted to accumulate stats from 9024 documents (2446006 virtual) 2025-04-19 00:10:20,242 : INFO : 142 batches submitted to accumulate stats from 9088 documents (2453039 virtual) 2025-04-19 00:10:20,245 : INFO : 143 batches submitted to accumulate stats from 9152 documents (2460905 virtual) 2025-04-19 00:10:20,247 : INFO : 144 batches submitted to accumulate stats from 9216 documents (2468645 virtual) 2025-04-19 00:10:20,249 : INFO : 145 batches submitted to accumulate stats from 9280 documents (2476321 virtual) 2025-04-19 00:10:20,279 : INFO : 146 batches submitted to accumulate stats from 9344 documents (2481981 virtual) 2025-04-19 00:10:20,305 : INFO : 147 batches submitted to accumulate stats from 9408 documents (2489833 virtual) 2025-04-19 00:10:20,307 : INFO : 148 batches submitted to accumulate stats from 9472 documents (2496627 virtual) 2025-04-19 00:10:20,316 : INFO : 149 batches submitted to accumulate stats from 9536 documents (2502106 virtual) 2025-04-19 00:10:20,317 : INFO : 150 batches submitted to accumulate stats from 9600 documents (2508434 virtual) 2025-04-19 00:10:20,325 : INFO : 151 batches submitted to accumulate stats from 9664 documents (2517654 virtual) 2025-04-19 00:10:20,340 : INFO : 152 batches submitted to accumulate stats from 9728 documents (2525651 virtual) 2025-04-19 00:10:20,348 : INFO : 153 batches submitted to accumulate stats from 9792 documents (2534661 virtual) 2025-04-19 00:10:20,359 : INFO : 154 batches submitted to accumulate stats from 9856 documents (2542846 virtual) 2025-04-19 00:10:20,362 : INFO : 155 batches submitted to accumulate stats from 9920 documents (2549206 virtual) 2025-04-19 00:10:20,364 : INFO : 156 batches submitted to accumulate stats from 9984 documents (2556742 virtual) 2025-04-19 00:10:20,368 : INFO : 157 batches submitted to accumulate stats from 10048 documents (2565026 virtual) 2025-04-19 00:10:20,375 : INFO : 158 batches submitted to accumulate stats from 10112 documents (2571434 virtual) 2025-04-19 00:10:20,378 : INFO : 159 batches submitted to accumulate stats from 10176 documents (2581280 virtual) 2025-04-19 00:10:20,396 : INFO : 160 batches submitted to accumulate stats from 10240 documents (2589671 virtual) 2025-04-19 00:10:20,399 : INFO : 161 batches submitted to accumulate stats from 10304 documents (2596979 virtual) 2025-04-19 00:10:20,403 : INFO : 162 batches submitted to accumulate stats from 10368 documents (2604556 virtual) 2025-04-19 00:10:20,416 : INFO : 163 batches submitted to accumulate stats from 10432 documents (2613656 virtual) 2025-04-19 00:10:20,419 : INFO : 164 batches submitted to accumulate stats from 10496 documents (2623890 virtual) 2025-04-19 00:10:20,423 : INFO : 165 batches submitted to accumulate stats from 10560 documents (2629308 virtual) 2025-04-19 00:10:20,430 : INFO : 166 batches submitted to accumulate stats from 10624 documents (2636085 virtual) 2025-04-19 00:10:20,437 : INFO : 167 batches submitted to accumulate stats from 10688 documents (2642039 virtual) 2025-04-19 00:10:20,449 : INFO : 168 batches submitted to accumulate stats from 10752 documents (2648389 virtual) 2025-04-19 00:10:20,454 : INFO : 169 batches submitted to accumulate stats from 10816 documents (2661959 virtual) 2025-04-19 00:10:20,460 : INFO : 170 batches submitted to accumulate stats from 10880 documents (2672949 virtual) 2025-04-19 00:10:20,463 : INFO : 171 batches submitted to accumulate stats from 10944 documents (2683365 virtual) 2025-04-19 00:10:20,474 : INFO : 172 batches submitted to accumulate stats from 11008 documents (2690484 virtual) 2025-04-19 00:10:20,484 : INFO : 173 batches submitted to accumulate stats from 11072 documents (2700627 virtual) 2025-04-19 00:10:20,498 : INFO : 174 batches submitted to accumulate stats from 11136 documents (2708742 virtual) 2025-04-19 00:10:20,500 : INFO : 175 batches submitted to accumulate stats from 11200 documents (2718156 virtual) 2025-04-19 00:10:20,502 : INFO : 176 batches submitted to accumulate stats from 11264 documents (2727801 virtual) 2025-04-19 00:10:20,511 : INFO : 177 batches submitted to accumulate stats from 11328 documents (2736288 virtual) 2025-04-19 00:10:20,516 : INFO : 178 batches submitted to accumulate stats from 11392 documents (2743845 virtual) 2025-04-19 00:10:20,526 : INFO : 179 batches submitted to accumulate stats from 11456 documents (2750885 virtual) 2025-04-19 00:10:20,529 : INFO : 180 batches submitted to accumulate stats from 11520 documents (2759213 virtual) 2025-04-19 00:10:20,570 : INFO : 181 batches submitted to accumulate stats from 11584 documents (2770309 virtual) 2025-04-19 00:10:20,579 : INFO : 182 batches submitted to accumulate stats from 11648 documents (2781566 virtual) 2025-04-19 00:10:20,601 : INFO : 183 batches submitted to accumulate stats from 11712 documents (2793513 virtual) 2025-04-19 00:10:20,603 : INFO : 184 batches submitted to accumulate stats from 11776 documents (2805133 virtual) 2025-04-19 00:10:20,607 : INFO : 185 batches submitted to accumulate stats from 11840 documents (2814621 virtual) 2025-04-19 00:10:20,612 : INFO : 186 batches submitted to accumulate stats from 11904 documents (2825917 virtual) 2025-04-19 00:10:20,616 : INFO : 187 batches submitted to accumulate stats from 11968 documents (2834764 virtual) 2025-04-19 00:10:20,618 : INFO : 188 batches submitted to accumulate stats from 12032 documents (2844523 virtual) 2025-04-19 00:10:20,625 : INFO : 189 batches submitted to accumulate stats from 12096 documents (2854512 virtual) 2025-04-19 00:10:20,642 : INFO : 190 batches submitted to accumulate stats from 12160 documents (2863511 virtual) 2025-04-19 00:10:20,649 : INFO : 191 batches submitted to accumulate stats from 12224 documents (2872492 virtual) 2025-04-19 00:10:20,654 : INFO : 192 batches submitted to accumulate stats from 12288 documents (2881543 virtual) 2025-04-19 00:10:20,660 : INFO : 193 batches submitted to accumulate stats from 12352 documents (2891233 virtual) 2025-04-19 00:10:20,665 : INFO : 194 batches submitted to accumulate stats from 12416 documents (2899835 virtual) 2025-04-19 00:10:20,674 : INFO : 195 batches submitted to accumulate stats from 12480 documents (2908542 virtual) 2025-04-19 00:10:20,692 : INFO : 196 batches submitted to accumulate stats from 12544 documents (2920162 virtual) 2025-04-19 00:10:20,708 : INFO : 197 batches submitted to accumulate stats from 12608 documents (2931072 virtual) 2025-04-19 00:10:20,713 : INFO : 198 batches submitted to accumulate stats from 12672 documents (2942168 virtual) 2025-04-19 00:10:20,715 : INFO : 199 batches submitted to accumulate stats from 12736 documents (2951378 virtual) 2025-04-19 00:10:20,718 : INFO : 200 batches submitted to accumulate stats from 12800 documents (2964980 virtual) 2025-04-19 00:10:20,720 : INFO : 201 batches submitted to accumulate stats from 12864 documents (2974742 virtual) 2025-04-19 00:10:20,734 : INFO : 202 batches submitted to accumulate stats from 12928 documents (2984778 virtual) 2025-04-19 00:10:20,753 : INFO : 203 batches submitted to accumulate stats from 12992 documents (2994073 virtual) 2025-04-19 00:10:20,756 : INFO : 204 batches submitted to accumulate stats from 13056 documents (3002522 virtual) 2025-04-19 00:10:20,760 : INFO : 205 batches submitted to accumulate stats from 13120 documents (3012040 virtual) 2025-04-19 00:10:20,761 : INFO : 206 batches submitted to accumulate stats from 13184 documents (3019919 virtual) 2025-04-19 00:10:20,774 : INFO : 207 batches submitted to accumulate stats from 13248 documents (3029004 virtual) 2025-04-19 00:10:20,777 : INFO : 208 batches submitted to accumulate stats from 13312 documents (3037489 virtual) 2025-04-19 00:10:20,783 : INFO : 209 batches submitted to accumulate stats from 13376 documents (3044929 virtual) 2025-04-19 00:10:20,808 : INFO : 210 batches submitted to accumulate stats from 13440 documents (3054034 virtual) 2025-04-19 00:10:20,811 : INFO : 211 batches submitted to accumulate stats from 13504 documents (3064099 virtual) 2025-04-19 00:10:20,813 : INFO : 212 batches submitted to accumulate stats from 13568 documents (3074522 virtual) 2025-04-19 00:10:20,819 : INFO : 213 batches submitted to accumulate stats from 13632 documents (3083808 virtual) 2025-04-19 00:10:20,865 : INFO : 214 batches submitted to accumulate stats from 13696 documents (3093078 virtual) 2025-04-19 00:10:20,867 : INFO : 215 batches submitted to accumulate stats from 13760 documents (3102171 virtual) 2025-04-19 00:10:20,883 : INFO : 216 batches submitted to accumulate stats from 13824 documents (3111128 virtual) 2025-04-19 00:10:20,886 : INFO : 217 batches submitted to accumulate stats from 13888 documents (3120517 virtual) 2025-04-19 00:10:20,890 : INFO : 218 batches submitted to accumulate stats from 13952 documents (3130614 virtual) 2025-04-19 00:10:20,894 : INFO : 219 batches submitted to accumulate stats from 14016 documents (3139268 virtual) 2025-04-19 00:10:20,899 : INFO : 220 batches submitted to accumulate stats from 14080 documents (3148635 virtual) 2025-04-19 00:10:20,914 : INFO : 221 batches submitted to accumulate stats from 14144 documents (3157335 virtual) 2025-04-19 00:10:20,924 : INFO : 222 batches submitted to accumulate stats from 14208 documents (3165838 virtual) 2025-04-19 00:10:20,931 : INFO : 223 batches submitted to accumulate stats from 14272 documents (3175765 virtual) 2025-04-19 00:10:20,936 : INFO : 224 batches submitted to accumulate stats from 14336 documents (3183123 virtual) 2025-04-19 00:10:20,947 : INFO : 225 batches submitted to accumulate stats from 14400 documents (3189537 virtual) 2025-04-19 00:10:20,951 : INFO : 226 batches submitted to accumulate stats from 14464 documents (3197239 virtual) 2025-04-19 00:10:20,953 : INFO : 227 batches submitted to accumulate stats from 14528 documents (3205518 virtual) 2025-04-19 00:10:20,961 : INFO : 228 batches submitted to accumulate stats from 14592 documents (3215608 virtual) 2025-04-19 00:10:20,965 : INFO : 229 batches submitted to accumulate stats from 14656 documents (3223376 virtual) 2025-04-19 00:10:21,007 : INFO : 230 batches submitted to accumulate stats from 14720 documents (3232304 virtual) 2025-04-19 00:10:21,020 : INFO : 231 batches submitted to accumulate stats from 14784 documents (3240270 virtual) 2025-04-19 00:10:21,026 : INFO : 232 batches submitted to accumulate stats from 14848 documents (3249755 virtual) 2025-04-19 00:10:21,032 : INFO : 233 batches submitted to accumulate stats from 14912 documents (3259377 virtual) 2025-04-19 00:10:21,037 : INFO : 234 batches submitted to accumulate stats from 14976 documents (3269637 virtual) 2025-04-19 00:10:21,043 : INFO : 235 batches submitted to accumulate stats from 15040 documents (3278311 virtual) 2025-04-19 00:10:21,051 : INFO : 236 batches submitted to accumulate stats from 15104 documents (3286321 virtual) 2025-04-19 00:10:21,053 : INFO : 237 batches submitted to accumulate stats from 15168 documents (3293385 virtual) 2025-04-19 00:10:21,058 : INFO : 238 batches submitted to accumulate stats from 15232 documents (3300334 virtual) 2025-04-19 00:10:21,063 : INFO : 239 batches submitted to accumulate stats from 15296 documents (3308226 virtual) 2025-04-19 00:10:21,074 : INFO : 240 batches submitted to accumulate stats from 15360 documents (3317325 virtual) 2025-04-19 00:10:21,093 : INFO : 241 batches submitted to accumulate stats from 15424 documents (3325778 virtual) 2025-04-19 00:10:21,100 : INFO : 242 batches submitted to accumulate stats from 15488 documents (3335373 virtual) 2025-04-19 00:10:21,102 : INFO : 243 batches submitted to accumulate stats from 15552 documents (3342716 virtual) 2025-04-19 00:10:21,105 : INFO : 244 batches submitted to accumulate stats from 15616 documents (3350508 virtual) 2025-04-19 00:10:21,112 : INFO : 245 batches submitted to accumulate stats from 15680 documents (3360131 virtual) 2025-04-19 00:10:21,116 : INFO : 246 batches submitted to accumulate stats from 15744 documents (3370635 virtual) 2025-04-19 00:10:21,138 : INFO : 247 batches submitted to accumulate stats from 15808 documents (3380994 virtual) 2025-04-19 00:10:21,140 : INFO : 248 batches submitted to accumulate stats from 15872 documents (3389920 virtual) 2025-04-19 00:10:21,144 : INFO : 249 batches submitted to accumulate stats from 15936 documents (3397487 virtual) 2025-04-19 00:10:21,146 : INFO : 250 batches submitted to accumulate stats from 16000 documents (3406129 virtual) 2025-04-19 00:10:21,148 : INFO : 251 batches submitted to accumulate stats from 16064 documents (3416805 virtual) 2025-04-19 00:10:21,155 : INFO : 252 batches submitted to accumulate stats from 16128 documents (3426189 virtual) 2025-04-19 00:10:21,167 : INFO : 253 batches submitted to accumulate stats from 16192 documents (3433824 virtual) 2025-04-19 00:10:21,179 : INFO : 254 batches submitted to accumulate stats from 16256 documents (3443379 virtual) 2025-04-19 00:10:21,187 : INFO : 255 batches submitted to accumulate stats from 16320 documents (3450914 virtual) 2025-04-19 00:10:21,350 : INFO : 7 accumulators retrieved from output queue 2025-04-19 00:10:21,360 : INFO : accumulated word occurrence stats for 3451622 virtual documents 2025-04-19 00:10:21,448 : INFO : using symmetric alpha at 0.125 2025-04-19 00:10:21,448 : INFO : using symmetric eta at 0.125 2025-04-19 00:10:21,449 : INFO : using serial LDA version on this node 2025-04-19 00:10:21,457 : INFO : running online (multi-pass) LDA training, 8 topics, 5 passes over the supplied corpus of 16310 documents, updating model once every 2000 documents, evaluating perplexity every 16310 documents, iterating 50x with a convergence threshold of 0.001000 2025-04-19 00:10:21,457 : INFO : PROGRESS: pass 0, at document #2000/16310 2025-04-19 00:10:22,102 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:22,106 : INFO : topic #3 (0.125): 0.021*"工作" + 0.013*"方式" + 0.012*"砍除" + 0.011*"應徵" + 0.010*"聯絡人" + 0.010*"推定" + 0.010*"文字" + 0.009*"空白" + 0.009*"情形" + 0.009*"資訊" 2025-04-19 00:10:22,106 : INFO : topic #4 (0.125): 0.039*"工作" + 0.017*"推定" + 0.013*"空白" + 0.012*"方式" + 0.011*"聯絡" + 0.010*"單位" + 0.010*"聯絡人" + 0.010*"第一項" + 0.009*"國定假日" + 0.009*"內容" 2025-04-19 00:10:22,107 : INFO : topic #2 (0.125): 0.041*"工作" + 0.013*"內容" + 0.013*"推定" + 0.012*"工資" + 0.012*"方式" + 0.012*"應徵" + 0.010*"情形" + 0.010*"小時" + 0.010*"砍除" + 0.010*"聯絡" 2025-04-19 00:10:22,107 : INFO : topic #0 (0.125): 0.030*"工作" + 0.015*"方式" + 0.014*"應徵" + 0.012*"推定" + 0.012*"單位" + 0.012*"砍除" + 0.012*"空白" + 0.010*"內容" + 0.010*"資訊" + 0.009*"聯絡" 2025-04-19 00:10:22,108 : INFO : topic #7 (0.125): 0.026*"工作" + 0.014*"空白" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"砍除" + 0.011*"第一項" + 0.011*"情形" + 0.011*"內容" + 0.011*"資訊" + 0.010*"方式" 2025-04-19 00:10:22,108 : INFO : topic diff=8.045865, rho=1.000000 2025-04-19 00:10:22,109 : INFO : PROGRESS: pass 0, at document #4000/16310 2025-04-19 00:10:22,686 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:22,689 : INFO : topic #5 (0.125): 0.019*"工作" + 0.016*"方式" + 0.010*"應徵" + 0.010*"依法" + 0.009*"聯絡" + 0.009*"標題" + 0.008*"通知" + 0.008*"時間" + 0.008*"內容" + 0.008*"小時" 2025-04-19 00:10:22,690 : INFO : topic #2 (0.125): 0.043*"工作" + 0.015*"方式" + 0.014*"推定" + 0.013*"工資" + 0.013*"內容" + 0.012*"小時" + 0.011*"單位" + 0.011*"應徵" + 0.011*"未註明" + 0.010*"依法" 2025-04-19 00:10:22,691 : INFO : topic #7 (0.125): 0.026*"工作" + 0.014*"空白" + 0.012*"推定" + 0.011*"聯絡" + 0.011*"砍除" + 0.011*"第一項" + 0.011*"情形" + 0.011*"內容" + 0.010*"資訊" + 0.009*"方式" 2025-04-19 00:10:22,691 : INFO : topic #4 (0.125): 0.038*"工作" + 0.017*"推定" + 0.015*"空白" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"方式" + 0.011*"情形" + 0.010*"聯絡人" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 00:10:22,692 : INFO : topic #3 (0.125): 0.019*"工作" + 0.013*"方式" + 0.011*"砍除" + 0.010*"應徵" + 0.010*"聯絡人" + 0.009*"文字" + 0.008*"推定" + 0.008*"資訊" + 0.008*"分類" + 0.008*"空白" 2025-04-19 00:10:22,692 : INFO : topic diff=0.742718, rho=0.707107 2025-04-19 00:10:22,693 : INFO : PROGRESS: pass 0, at document #6000/16310 2025-04-19 00:10:23,224 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:23,228 : INFO : topic #7 (0.125): 0.025*"工作" + 0.013*"空白" + 0.011*"推定" + 0.011*"內容" + 0.010*"聯絡" + 0.010*"第一項" + 0.010*"資訊" + 0.010*"情形" + 0.010*"砍除" + 0.009*"文字" 2025-04-19 00:10:23,228 : INFO : topic #6 (0.125): 0.026*"報名" + 0.023*"活動" + 0.019*"電話" + 0.015*"台北市" + 0.012*"車馬費" + 0.012*"資料" + 0.012*"人數" + 0.011*"訪問" + 0.011*"時間" + 0.011*"舉辦" 2025-04-19 00:10:23,229 : INFO : topic #0 (0.125): 0.030*"工作" + 0.013*"應徵" + 0.013*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.011*"第一項" + 0.011*"推定" + 0.011*"資訊" + 0.010*"單位" + 0.010*"內容" 2025-04-19 00:10:23,229 : INFO : topic #5 (0.125): 0.022*"工作" + 0.017*"方式" + 0.009*"時間" + 0.009*"依法" + 0.008*"通知" + 0.007*"應徵" + 0.007*"聯絡" + 0.007*"每日" + 0.007*"面試" + 0.007*"內容" 2025-04-19 00:10:23,230 : INFO : topic #1 (0.125): 0.030*"工作" + 0.015*"方式" + 0.013*"砍除" + 0.012*"推定" + 0.012*"情形" + 0.012*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"單位" 2025-04-19 00:10:23,230 : INFO : topic diff=0.607741, rho=0.577350 2025-04-19 00:10:23,231 : INFO : PROGRESS: pass 0, at document #8000/16310 2025-04-19 00:10:23,543 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:23,547 : INFO : topic #5 (0.125): 0.018*"工作" + 0.016*"公司" + 0.012*"面試" + 0.010*"工程師" + 0.009*"經驗" + 0.008*"問題" + 0.008*"團隊" + 0.008*"時間" + 0.007*"方式" + 0.007*"技術" 2025-04-19 00:10:23,547 : INFO : topic #2 (0.125): 0.037*"工作" + 0.012*"公司" + 0.011*"方式" + 0.009*"內容" + 0.009*"小時" + 0.009*"時間" + 0.008*"面試" + 0.008*"推定" + 0.007*"工資" + 0.007*"覺得" 2025-04-19 00:10:23,548 : INFO : topic #3 (0.125): 0.014*"工作" + 0.011*"方式" + 0.009*"公司" + 0.008*"聯絡人" + 0.008*"資訊" + 0.008*"時間" + 0.008*"砍除" + 0.008*"研發" + 0.007*"文字" + 0.007*"分類" 2025-04-19 00:10:23,548 : INFO : topic #0 (0.125): 0.030*"工作" + 0.013*"應徵" + 0.013*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.011*"第一項" + 0.011*"推定" + 0.011*"資訊" + 0.010*"單位" + 0.010*"內容" 2025-04-19 00:10:23,549 : INFO : topic #7 (0.125): 0.025*"工作" + 0.012*"空白" + 0.011*"推定" + 0.010*"內容" + 0.010*"聯絡" + 0.010*"第一項" + 0.010*"資訊" + 0.010*"情形" + 0.010*"砍除" + 0.009*"文字" 2025-04-19 00:10:23,549 : INFO : topic diff=0.746824, rho=0.500000 2025-04-19 00:10:23,550 : INFO : PROGRESS: pass 0, at document #10000/16310 2025-04-19 00:10:23,814 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:23,833 : INFO : topic #6 (0.125): 0.016*"產品" + 0.013*"資料" + 0.013*"報名" + 0.012*"活動" + 0.012*"公司" + 0.012*"使用" + 0.010*"目前" + 0.009*"進行" + 0.009*"時間" + 0.009*"電話" 2025-04-19 00:10:23,839 : INFO : topic #3 (0.125): 0.014*"工作" + 0.010*"方式" + 0.010*"資工" + 0.010*"職場" + 0.010*"研發" + 0.009*"數學" + 0.008*"公司" + 0.008*"資訊" + 0.008*"聯絡人" + 0.008*"時間" 2025-04-19 00:10:23,842 : INFO : topic #2 (0.125): 0.033*"工作" + 0.012*"公司" + 0.010*"覺得" + 0.008*"方式" + 0.008*"程式" + 0.008*"內容" + 0.008*"時間" + 0.008*"面試" + 0.007*"小時" + 0.007*"比較" 2025-04-19 00:10:23,843 : INFO : topic #7 (0.125): 0.024*"工作" + 0.012*"空白" + 0.010*"推定" + 0.010*"內容" + 0.010*"聯絡" + 0.010*"砍除" + 0.009*"第一項" + 0.009*"資訊" + 0.009*"情形" + 0.008*"文字" 2025-04-19 00:10:23,844 : INFO : topic #5 (0.125): 0.017*"公司" + 0.016*"工作" + 0.012*"面試" + 0.009*"問題" + 0.009*"工程師" + 0.008*"經驗" + 0.007*"時間" + 0.007*"開發" + 0.007*"技術" + 0.007*"團隊" 2025-04-19 00:10:23,844 : INFO : topic diff=0.586785, rho=0.447214 2025-04-19 00:10:23,845 : INFO : PROGRESS: pass 0, at document #12000/16310 2025-04-19 00:10:24,093 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:24,097 : INFO : topic #0 (0.125): 0.029*"工作" + 0.013*"應徵" + 0.013*"砍除" + 0.012*"空白" + 0.012*"方式" + 0.011*"第一項" + 0.011*"推定" + 0.011*"資訊" + 0.010*"單位" + 0.010*"內容" 2025-04-19 00:10:24,097 : INFO : topic #3 (0.125): 0.052*"半導體" + 0.035*"製程" + 0.017*"研發" + 0.014*"職場" + 0.013*"表示" + 0.011*"工作" + 0.010*"資工" + 0.008*"數學" + 0.008*"方式" + 0.007*"公司" 2025-04-19 00:10:24,098 : INFO : topic #1 (0.125): 0.029*"工作" + 0.015*"方式" + 0.013*"砍除" + 0.012*"推定" + 0.012*"情形" + 0.012*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"單位" 2025-04-19 00:10:24,098 : INFO : topic #6 (0.125): 0.014*"產品" + 0.012*"資料" + 0.011*"使用" + 0.011*"活動" + 0.011*"報名" + 0.010*"公司" + 0.010*"進行" + 0.009*"目前" + 0.009*"研究" + 0.008*"日本" 2025-04-19 00:10:24,099 : INFO : topic #5 (0.125): 0.016*"公司" + 0.012*"工作" + 0.009*"面試" + 0.008*"問題" + 0.007*"工程師" + 0.006*"技術" + 0.006*"經驗" + 0.006*"開發" + 0.006*"時間" + 0.005*"台灣" 2025-04-19 00:10:24,099 : INFO : topic diff=0.535218, rho=0.408248 2025-04-19 00:10:24,100 : INFO : PROGRESS: pass 0, at document #14000/16310 2025-04-19 00:10:24,433 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:24,441 : INFO : topic #5 (0.125): 0.014*"公司" + 0.010*"工作" + 0.008*"台灣" + 0.006*"工程師" + 0.006*"技術" + 0.006*"問題" + 0.006*"面試" + 0.005*"員工" + 0.005*"美國" + 0.004*"科技" 2025-04-19 00:10:24,442 : INFO : topic #4 (0.125): 0.037*"工作" + 0.016*"推定" + 0.015*"空白" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"方式" + 0.011*"砍除" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 00:10:24,443 : INFO : topic #1 (0.125): 0.029*"工作" + 0.014*"方式" + 0.013*"砍除" + 0.012*"推定" + 0.012*"情形" + 0.011*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"單位" 2025-04-19 00:10:24,444 : INFO : topic #6 (0.125): 0.013*"產品" + 0.010*"進行" + 0.010*"資料" + 0.010*"使用" + 0.009*"模型" + 0.009*"研究" + 0.009*"今年" + 0.009*"日本" + 0.009*"公司" + 0.009*"活動" 2025-04-19 00:10:24,445 : INFO : topic #2 (0.125): 0.025*"工作" + 0.011*"公司" + 0.009*"覺得" + 0.007*"程式" + 0.006*"應該" + 0.006*"時間" + 0.006*"內容" + 0.006*"比較" + 0.006*"面試" + 0.006*"方式" 2025-04-19 00:10:24,446 : INFO : topic diff=0.505230, rho=0.377964 2025-04-19 00:10:24,447 : INFO : PROGRESS: pass 0, at document #16000/16310 2025-04-19 00:10:24,692 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:24,696 : INFO : topic #2 (0.125): 0.023*"工作" + 0.011*"公司" + 0.009*"覺得" + 0.006*"應該" + 0.006*"記者" + 0.006*"時間" + 0.005*"程式" + 0.005*"比較" + 0.005*"內容" + 0.005*"真的" 2025-04-19 00:10:24,696 : INFO : topic #4 (0.125): 0.036*"工作" + 0.016*"推定" + 0.015*"空白" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.010*"方式" + 0.010*"砍除" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 00:10:24,697 : INFO : topic #6 (0.125): 0.012*"產品" + 0.011*"模型" + 0.011*"今年" + 0.010*"生產" + 0.010*"進行" + 0.009*"研究" + 0.009*"日本" + 0.009*"資料" + 0.009*"蘋果" + 0.009*"影響" 2025-04-19 00:10:24,698 : INFO : topic #7 (0.125): 0.018*"工作" + 0.009*"空白" + 0.008*"推定" + 0.008*"內容" + 0.007*"第一項" + 0.007*"聯絡" + 0.007*"砍除" + 0.007*"資訊" + 0.007*"惠普" + 0.007*"情形" 2025-04-19 00:10:24,698 : INFO : topic #5 (0.125): 0.014*"公司" + 0.008*"台灣" + 0.008*"工作" + 0.007*"美國" + 0.006*"技術" + 0.006*"員工" + 0.005*"工程師" + 0.005*"晶片" + 0.005*"問題" + 0.005*"科技" 2025-04-19 00:10:24,700 : INFO : topic diff=0.470208, rho=0.353553 2025-04-19 00:10:24,801 : INFO : -8.974 per-word bound, 502.9 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:10:24,801 : INFO : PROGRESS: pass 0, at document #16310/16310 2025-04-19 00:10:24,839 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:10:24,843 : INFO : topic #5 (0.125): 0.014*"公司" + 0.009*"美國" + 0.008*"台灣" + 0.007*"技術" + 0.007*"工作" + 0.006*"員工" + 0.006*"晶片" + 0.005*"台積電" + 0.005*"科技" + 0.005*"台積" 2025-04-19 00:10:24,843 : INFO : topic #2 (0.125): 0.021*"工作" + 0.011*"公司" + 0.009*"覺得" + 0.007*"應該" + 0.007*"真的" + 0.006*"記者" + 0.006*"時間" + 0.005*"東西" + 0.005*"比較" + 0.005*"一下" 2025-04-19 00:10:24,844 : INFO : topic #7 (0.125): 0.016*"工作" + 0.008*"空白" + 0.007*"推定" + 0.007*"內容" + 0.006*"第一項" + 0.006*"聯絡" + 0.006*"砍除" + 0.006*"資訊" + 0.006*"惠普" + 0.006*"情形" 2025-04-19 00:10:24,844 : INFO : topic #1 (0.125): 0.028*"工作" + 0.014*"方式" + 0.012*"砍除" + 0.012*"情形" + 0.011*"推定" + 0.011*"第一項" + 0.011*"單位" + 0.011*"國定假日" + 0.011*"聯絡" + 0.010*"文字" 2025-04-19 00:10:24,845 : INFO : topic #0 (0.125): 0.027*"工作" + 0.012*"應徵" + 0.012*"砍除" + 0.011*"空白" + 0.011*"方式" + 0.010*"單位" + 0.010*"第一項" + 0.010*"推定" + 0.010*"資訊" + 0.010*"內容" 2025-04-19 00:10:24,845 : INFO : topic diff=0.427759, rho=0.333333 2025-04-19 00:10:24,845 : INFO : PROGRESS: pass 1, at document #2000/16310 2025-04-19 00:10:25,440 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:25,444 : INFO : topic #7 (0.125): 0.019*"工作" + 0.009*"內容" + 0.008*"聯絡" + 0.007*"方式" + 0.007*"工資" + 0.007*"台北市" + 0.007*"時間" + 0.006*"資訊" + 0.006*"推定" + 0.006*"空白" 2025-04-19 00:10:25,444 : INFO : topic #1 (0.125): 0.032*"工作" + 0.016*"方式" + 0.012*"推定" + 0.012*"砍除" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"內容" + 0.011*"單位" + 0.010*"國定假日" + 0.010*"第一項" 2025-04-19 00:10:25,445 : INFO : topic #0 (0.125): 0.033*"工作" + 0.013*"方式" + 0.012*"應徵" + 0.011*"推定" + 0.010*"內容" + 0.010*"砍除" + 0.010*"空白" + 0.010*"單位" + 0.009*"資訊" + 0.009*"工資" 2025-04-19 00:10:25,446 : INFO : topic #5 (0.125): 0.014*"公司" + 0.009*"美國" + 0.008*"台灣" + 0.007*"技術" + 0.007*"工作" + 0.006*"員工" + 0.006*"晶片" + 0.005*"台積電" + 0.005*"科技" + 0.005*"台積" 2025-04-19 00:10:25,446 : INFO : topic #3 (0.125): 0.065*"半導體" + 0.039*"製程" + 0.025*"研發" + 0.024*"表示" + 0.021*"川普" + 0.018*"投資" + 0.015*"中國" + 0.015*"魏哲家" + 0.011*"晶圓廠" + 0.010*"奈米" 2025-04-19 00:10:25,447 : INFO : topic diff=0.938244, rho=0.313805 2025-04-19 00:10:25,447 : INFO : PROGRESS: pass 1, at document #4000/16310 2025-04-19 00:10:26,034 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:26,037 : INFO : topic #0 (0.125): 0.034*"工作" + 0.014*"方式" + 0.012*"應徵" + 0.011*"推定" + 0.010*"內容" + 0.010*"單位" + 0.010*"工資" + 0.009*"聯絡" + 0.009*"情形" + 0.009*"砍除" 2025-04-19 00:10:26,038 : INFO : topic #3 (0.125): 0.055*"半導體" + 0.033*"製程" + 0.021*"研發" + 0.021*"表示" + 0.018*"川普" + 0.015*"投資" + 0.015*"中國" + 0.012*"魏哲家" + 0.010*"晶圓廠" + 0.009*"奈米" 2025-04-19 00:10:26,038 : INFO : topic #4 (0.125): 0.037*"工作" + 0.017*"推定" + 0.015*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"聯絡" + 0.011*"第一項" + 0.011*"情形" + 0.011*"內容" + 0.011*"單位" 2025-04-19 00:10:26,039 : INFO : topic #6 (0.125): 0.020*"報名" + 0.019*"活動" + 0.014*"電話" + 0.011*"資料" + 0.011*"進行" + 0.011*"台北市" + 0.010*"舉辦" + 0.009*"車馬費" + 0.009*"研究" + 0.009*"參與" 2025-04-19 00:10:26,039 : INFO : topic #5 (0.125): 0.013*"公司" + 0.009*"美國" + 0.008*"台灣" + 0.007*"工作" + 0.007*"技術" + 0.006*"員工" + 0.005*"晶片" + 0.005*"台積電" + 0.005*"科技" + 0.004*"台積" 2025-04-19 00:10:26,039 : INFO : topic diff=0.398428, rho=0.313805 2025-04-19 00:10:26,040 : INFO : PROGRESS: pass 1, at document #6000/16310 2025-04-19 00:10:26,525 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:26,542 : INFO : topic #1 (0.125): 0.032*"工作" + 0.016*"方式" + 0.012*"推定" + 0.011*"單位" + 0.011*"情形" + 0.011*"砍除" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"文字" + 0.010*"未註明" 2025-04-19 00:10:26,552 : INFO : topic #7 (0.125): 0.022*"工作" + 0.012*"時間" + 0.012*"訪員" + 0.010*"內容" + 0.009*"規定" + 0.009*"南港" + 0.008*"南港區" + 0.008*"台北市" + 0.008*"工資" + 0.007*"方式" 2025-04-19 00:10:26,553 : INFO : topic #6 (0.125): 0.025*"報名" + 0.022*"活動" + 0.016*"電話" + 0.013*"台北市" + 0.012*"資料" + 0.011*"車馬費" + 0.011*"舉辦" + 0.011*"進行" + 0.010*"人數" + 0.010*"訪問" 2025-04-19 00:10:26,553 : INFO : topic #5 (0.125): 0.014*"公司" + 0.007*"美國" + 0.007*"台灣" + 0.007*"工作" + 0.006*"技術" + 0.005*"員工" + 0.005*"工程師" + 0.005*"問題" + 0.004*"科技" + 0.004*"晶片" 2025-04-19 00:10:26,554 : INFO : topic #3 (0.125): 0.047*"半導體" + 0.028*"製程" + 0.019*"研發" + 0.018*"表示" + 0.015*"川普" + 0.013*"中國" + 0.013*"投資" + 0.011*"魏哲家" + 0.008*"時間" + 0.008*"職場" 2025-04-19 00:10:26,554 : INFO : topic diff=0.251908, rho=0.313805 2025-04-19 00:10:26,555 : INFO : PROGRESS: pass 1, at document #8000/16310 2025-04-19 00:10:26,818 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:26,822 : INFO : topic #1 (0.125): 0.032*"工作" + 0.016*"方式" + 0.012*"推定" + 0.011*"單位" + 0.011*"情形" + 0.011*"砍除" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"文字" + 0.010*"未註明" 2025-04-19 00:10:26,822 : INFO : topic #7 (0.125): 0.021*"工作" + 0.013*"時間" + 0.010*"訪員" + 0.009*"內容" + 0.009*"南港" + 0.008*"規定" + 0.008*"南港區" + 0.007*"台北市" + 0.007*"勞基法" + 0.007*"工資" 2025-04-19 00:10:26,823 : INFO : topic #2 (0.125): 0.030*"工作" + 0.011*"覺得" + 0.011*"公司" + 0.011*"面試" + 0.011*"時間" + 0.008*"程式" + 0.007*"比較" + 0.006*"內容" + 0.006*"應該" + 0.006*"小時" 2025-04-19 00:10:26,823 : INFO : topic #5 (0.125): 0.017*"公司" + 0.009*"工作" + 0.007*"工程師" + 0.007*"問題" + 0.006*"技術" + 0.006*"面試" + 0.005*"開發" + 0.005*"台灣" + 0.005*"經驗" + 0.005*"團隊" 2025-04-19 00:10:26,824 : INFO : topic #6 (0.125): 0.023*"報名" + 0.021*"活動" + 0.015*"電話" + 0.013*"台北市" + 0.012*"資料" + 0.011*"進行" + 0.010*"舉辦" + 0.010*"使用" + 0.010*"時間" + 0.010*"車馬費" 2025-04-19 00:10:26,824 : INFO : topic diff=0.350872, rho=0.313805 2025-04-19 00:10:26,824 : INFO : PROGRESS: pass 1, at document #10000/16310 2025-04-19 00:10:27,055 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:27,059 : INFO : topic #3 (0.125): 0.050*"半導體" + 0.025*"製程" + 0.020*"研發" + 0.016*"數學" + 0.015*"表示" + 0.012*"川普" + 0.012*"中國" + 0.011*"投資" + 0.009*"職場" + 0.009*"魏哲家" 2025-04-19 00:10:27,060 : INFO : topic #0 (0.125): 0.038*"工作" + 0.016*"方式" + 0.012*"應徵" + 0.011*"內容" + 0.010*"推定" + 0.010*"聯絡" + 0.009*"工資" + 0.009*"依法" + 0.009*"單位" + 0.009*"小時" 2025-04-19 00:10:27,060 : INFO : topic #7 (0.125): 0.020*"工作" + 0.012*"時間" + 0.011*"東京" + 0.009*"南港" + 0.009*"訪員" + 0.008*"內容" + 0.008*"規定" + 0.008*"接案" + 0.008*"勞基法" + 0.007*"南港區" 2025-04-19 00:10:27,061 : INFO : topic #5 (0.125): 0.017*"公司" + 0.009*"工作" + 0.007*"問題" + 0.007*"工程師" + 0.007*"面試" + 0.006*"開發" + 0.006*"技術" + 0.006*"經驗" + 0.006*"目前" + 0.005*"團隊" 2025-04-19 00:10:27,061 : INFO : topic #2 (0.125): 0.030*"工作" + 0.013*"面試" + 0.013*"覺得" + 0.011*"公司" + 0.011*"時間" + 0.009*"程式" + 0.008*"比較" + 0.007*"應該" + 0.007*"東西" + 0.007*"一下" 2025-04-19 00:10:27,061 : INFO : topic diff=0.291535, rho=0.313805 2025-04-19 00:10:27,062 : INFO : PROGRESS: pass 1, at document #12000/16310 2025-04-19 00:10:27,364 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:27,368 : INFO : topic #0 (0.125): 0.038*"工作" + 0.017*"方式" + 0.012*"應徵" + 0.011*"內容" + 0.010*"聯絡" + 0.010*"推定" + 0.009*"工資" + 0.009*"依法" + 0.009*"單位" + 0.009*"小時" 2025-04-19 00:10:27,368 : INFO : topic #3 (0.125): 0.057*"半導體" + 0.030*"製程" + 0.024*"表示" + 0.018*"中國" + 0.015*"研發" + 0.014*"投資" + 0.012*"熊本" + 0.012*"奈米" + 0.011*"晶圓廠" + 0.009*"先進" 2025-04-19 00:10:27,369 : INFO : topic #5 (0.125): 0.016*"公司" + 0.008*"工作" + 0.006*"問題" + 0.006*"工程師" + 0.006*"技術" + 0.006*"開發" + 0.006*"台灣" + 0.005*"面試" + 0.005*"目前" + 0.005*"經驗" 2025-04-19 00:10:27,369 : INFO : topic #1 (0.125): 0.032*"工作" + 0.016*"方式" + 0.012*"推定" + 0.011*"單位" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"砍除" + 0.011*"內容" + 0.010*"文字" + 0.010*"未註明" 2025-04-19 00:10:27,370 : INFO : topic #6 (0.125): 0.020*"活動" + 0.020*"報名" + 0.012*"進行" + 0.012*"資料" + 0.012*"研究" + 0.011*"電話" + 0.010*"使用" + 0.010*"台北市" + 0.009*"參加" + 0.009*"舉辦" 2025-04-19 00:10:27,370 : INFO : topic diff=0.367805, rho=0.313805 2025-04-19 00:10:27,371 : INFO : PROGRESS: pass 1, at document #14000/16310 2025-04-19 00:10:27,704 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:27,711 : INFO : topic #2 (0.125): 0.025*"工作" + 0.011*"覺得" + 0.010*"公司" + 0.010*"面試" + 0.009*"時間" + 0.008*"比較" + 0.007*"程式" + 0.007*"應該" + 0.006*"真的" + 0.006*"一下" 2025-04-19 00:10:27,712 : INFO : topic #4 (0.125): 0.035*"工作" + 0.016*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"單位" 2025-04-19 00:10:27,713 : INFO : topic #3 (0.125): 0.047*"半導體" + 0.029*"表示" + 0.027*"中國" + 0.022*"製程" + 0.015*"研發" + 0.014*"投資" + 0.013*"晶片" + 0.013*"英特爾" + 0.012*"先進" + 0.009*"奈米" 2025-04-19 00:10:27,714 : INFO : topic #0 (0.125): 0.038*"工作" + 0.017*"方式" + 0.012*"應徵" + 0.011*"內容" + 0.010*"聯絡" + 0.010*"推定" + 0.009*"工資" + 0.009*"單位" + 0.009*"依法" + 0.009*"小時" 2025-04-19 00:10:27,716 : INFO : topic #5 (0.125): 0.014*"公司" + 0.007*"台灣" + 0.006*"工作" + 0.006*"技術" + 0.005*"工程師" + 0.005*"問題" + 0.005*"員工" + 0.005*"美國" + 0.004*"目前" + 0.004*"開發" 2025-04-19 00:10:27,716 : INFO : topic diff=0.351682, rho=0.313805 2025-04-19 00:10:27,732 : INFO : PROGRESS: pass 1, at document #16000/16310 2025-04-19 00:10:27,947 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:27,951 : INFO : topic #6 (0.125): 0.014*"活動" + 0.013*"蘋果" + 0.013*"報名" + 0.013*"研究" + 0.011*"進行" + 0.011*"三星" + 0.010*"生產" + 0.009*"資料" + 0.009*"今年" + 0.009*"使用" 2025-04-19 00:10:27,951 : INFO : topic #1 (0.125): 0.032*"工作" + 0.016*"方式" + 0.012*"推定" + 0.011*"單位" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"砍除" + 0.011*"內容" + 0.010*"文字" + 0.010*"未註明" 2025-04-19 00:10:27,952 : INFO : topic #2 (0.125): 0.024*"工作" + 0.011*"公司" + 0.010*"覺得" + 0.009*"面試" + 0.008*"時間" + 0.007*"比較" + 0.007*"應該" + 0.006*"真的" + 0.006*"一下" + 0.006*"程式" 2025-04-19 00:10:27,953 : INFO : topic #4 (0.125): 0.035*"工作" + 0.016*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"單位" 2025-04-19 00:10:27,953 : INFO : topic #7 (0.125): 0.027*"東京" + 0.017*"南港" + 0.012*"惠普" + 0.011*"工作" + 0.009*"三家" + 0.009*"投保" + 0.009*"給付" + 0.009*"展覽館" + 0.007*"規定" + 0.007*"店家" 2025-04-19 00:10:27,954 : INFO : topic diff=0.300320, rho=0.313805 2025-04-19 00:10:28,062 : INFO : -8.528 per-word bound, 369.1 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:10:28,062 : INFO : PROGRESS: pass 1, at document #16310/16310 2025-04-19 00:10:28,114 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:10:28,118 : INFO : topic #7 (0.125): 0.036*"東京" + 0.017*"南港" + 0.013*"投保" + 0.010*"三家" + 0.010*"展覽館" + 0.010*"惠普" + 0.009*"工作" + 0.007*"給付" + 0.006*"規定" + 0.006*"店家" 2025-04-19 00:10:28,119 : INFO : topic #1 (0.125): 0.031*"工作" + 0.016*"方式" + 0.012*"推定" + 0.012*"單位" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"砍除" + 0.010*"文字" + 0.010*"未註明" 2025-04-19 00:10:28,119 : INFO : topic #0 (0.125): 0.038*"工作" + 0.016*"方式" + 0.012*"應徵" + 0.011*"小時" + 0.010*"內容" + 0.010*"聯絡" + 0.009*"工資" + 0.009*"推定" + 0.009*"單位" + 0.009*"依法" 2025-04-19 00:10:28,120 : INFO : topic #6 (0.125): 0.014*"蘋果" + 0.013*"研究" + 0.013*"活動" + 0.012*"生產" + 0.011*"進行" + 0.011*"三星" + 0.011*"機器人" + 0.011*"報名" + 0.009*"今年" + 0.009*"資料" 2025-04-19 00:10:28,120 : INFO : topic #3 (0.125): 0.032*"半導體" + 0.025*"中國" + 0.025*"表示" + 0.024*"晶片" + 0.024*"投資" + 0.019*"英特爾" + 0.018*"製程" + 0.015*"川普" + 0.015*"先進" + 0.012*"研發" 2025-04-19 00:10:28,121 : INFO : topic diff=0.299043, rho=0.313805 2025-04-19 00:10:28,121 : INFO : PROGRESS: pass 2, at document #2000/16310 2025-04-19 00:10:28,716 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:28,720 : INFO : topic #3 (0.125): 0.031*"半導體" + 0.025*"中國" + 0.024*"表示" + 0.024*"晶片" + 0.023*"投資" + 0.019*"英特爾" + 0.017*"製程" + 0.014*"川普" + 0.014*"先進" + 0.012*"研發" 2025-04-19 00:10:28,721 : INFO : topic #6 (0.125): 0.020*"報名" + 0.020*"活動" + 0.012*"電話" + 0.011*"進行" + 0.011*"研究" + 0.011*"台北市" + 0.011*"參與" + 0.010*"舉辦" + 0.010*"資料" + 0.008*"時間" 2025-04-19 00:10:28,721 : INFO : topic #5 (0.125): 0.014*"公司" + 0.008*"台灣" + 0.008*"美國" + 0.007*"技術" + 0.006*"員工" + 0.005*"科技" + 0.005*"台積電" + 0.005*"工作" + 0.005*"工程師" + 0.005*"台積" 2025-04-19 00:10:28,722 : INFO : topic #0 (0.125): 0.041*"工作" + 0.020*"方式" + 0.012*"工資" + 0.011*"依法" + 0.011*"小時" + 0.011*"應徵" + 0.011*"內容" + 0.011*"推定" + 0.010*"聯絡" + 0.010*"每日" 2025-04-19 00:10:28,722 : INFO : topic #1 (0.125): 0.032*"工作" + 0.015*"方式" + 0.012*"推定" + 0.012*"情形" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"砍除" + 0.011*"內容" + 0.011*"文字" + 0.010*"第一項" 2025-04-19 00:10:28,722 : INFO : topic diff=0.734008, rho=0.299409 2025-04-19 00:10:28,723 : INFO : PROGRESS: pass 2, at document #4000/16310 2025-04-19 00:10:29,334 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:29,338 : INFO : topic #2 (0.125): 0.023*"工作" + 0.010*"公司" + 0.009*"時間" + 0.009*"面試" + 0.007*"覺得" + 0.006*"真的" + 0.006*"比較" + 0.006*"應該" + 0.005*"需要" + 0.005*"東西" 2025-04-19 00:10:29,338 : INFO : topic #1 (0.125): 0.032*"工作" + 0.014*"方式" + 0.012*"情形" + 0.012*"推定" + 0.011*"砍除" + 0.011*"單位" + 0.011*"第一項" + 0.011*"文字" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:10:29,339 : INFO : topic #6 (0.125): 0.025*"報名" + 0.023*"活動" + 0.016*"電話" + 0.013*"台北市" + 0.012*"舉辦" + 0.012*"資料" + 0.011*"進行" + 0.011*"車馬費" + 0.011*"參與" + 0.011*"人數" 2025-04-19 00:10:29,339 : INFO : topic #4 (0.125): 0.034*"工作" + 0.016*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.012*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:10:29,340 : INFO : topic #5 (0.125): 0.014*"公司" + 0.008*"台灣" + 0.007*"美國" + 0.007*"技術" + 0.006*"員工" + 0.005*"科技" + 0.005*"台積電" + 0.005*"工作" + 0.004*"工程師" + 0.004*"台積" 2025-04-19 00:10:29,340 : INFO : topic diff=0.344892, rho=0.299409 2025-04-19 00:10:29,341 : INFO : PROGRESS: pass 2, at document #6000/16310 2025-04-19 00:10:29,797 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:29,801 : INFO : topic #4 (0.125): 0.033*"工作" + 0.016*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.012*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:10:29,802 : INFO : topic #7 (0.125): 0.021*"勞基法" + 0.017*"南港" + 0.014*"訪員" + 0.013*"時間" + 0.012*"規定" + 0.011*"勞務" + 0.011*"展覽館" + 0.010*"投保" + 0.010*"南港區" + 0.010*"工作" 2025-04-19 00:10:29,802 : INFO : topic #5 (0.125): 0.015*"公司" + 0.007*"台灣" + 0.006*"技術" + 0.006*"美國" + 0.005*"員工" + 0.005*"工作" + 0.005*"工程師" + 0.005*"科技" + 0.005*"問題" + 0.004*"台積電" 2025-04-19 00:10:29,802 : INFO : topic #2 (0.125): 0.024*"工作" + 0.010*"公司" + 0.010*"時間" + 0.009*"面試" + 0.007*"覺得" + 0.006*"比較" + 0.006*"需要" + 0.005*"真的" + 0.005*"應該" + 0.005*"東西" 2025-04-19 00:10:29,803 : INFO : topic #6 (0.125): 0.028*"報名" + 0.025*"活動" + 0.018*"電話" + 0.015*"台北市" + 0.012*"車馬費" + 0.012*"舉辦" + 0.012*"資料" + 0.012*"人數" + 0.011*"訪問" + 0.011*"進行" 2025-04-19 00:10:29,803 : INFO : topic diff=0.197971, rho=0.299409 2025-04-19 00:10:29,804 : INFO : PROGRESS: pass 2, at document #8000/16310 2025-04-19 00:10:30,045 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:30,049 : INFO : topic #7 (0.125): 0.034*"勞基法" + 0.017*"南港" + 0.014*"時間" + 0.012*"給付" + 0.011*"訪員" + 0.011*"工作" + 0.011*"規定" + 0.010*"時數" + 0.010*"東京" + 0.009*"接案" 2025-04-19 00:10:30,050 : INFO : topic #3 (0.125): 0.030*"半導體" + 0.024*"中國" + 0.021*"表示" + 0.020*"晶片" + 0.020*"投資" + 0.016*"英特爾" + 0.015*"製程" + 0.012*"先進" + 0.012*"川普" + 0.012*"研發" 2025-04-19 00:10:30,050 : INFO : topic #1 (0.125): 0.031*"工作" + 0.014*"方式" + 0.012*"情形" + 0.011*"砍除" + 0.011*"推定" + 0.011*"第一項" + 0.011*"文字" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"內容" 2025-04-19 00:10:30,051 : INFO : topic #2 (0.125): 0.027*"工作" + 0.016*"面試" + 0.012*"公司" + 0.011*"覺得" + 0.011*"時間" + 0.009*"比較" + 0.007*"程式" + 0.007*"一下" + 0.006*"應該" + 0.006*"真的" 2025-04-19 00:10:30,051 : INFO : topic #4 (0.125): 0.033*"工作" + 0.015*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.012*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"內容" 2025-04-19 00:10:30,052 : INFO : topic diff=0.318363, rho=0.299409 2025-04-19 00:10:30,052 : INFO : PROGRESS: pass 2, at document #10000/16310 2025-04-19 00:10:30,298 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:30,302 : INFO : topic #3 (0.125): 0.031*"半導體" + 0.024*"中國" + 0.021*"表示" + 0.020*"投資" + 0.019*"晶片" + 0.015*"英特爾" + 0.015*"製程" + 0.012*"先進" + 0.012*"研發" + 0.011*"川普" 2025-04-19 00:10:30,303 : INFO : topic #0 (0.125): 0.049*"工作" + 0.025*"方式" + 0.016*"小時" + 0.013*"每日" + 0.013*"時間" + 0.012*"依法" + 0.012*"工資" + 0.012*"內容" + 0.012*"聯絡" + 0.011*"應徵" 2025-04-19 00:10:30,303 : INFO : topic #2 (0.125): 0.027*"工作" + 0.018*"面試" + 0.012*"公司" + 0.012*"覺得" + 0.011*"時間" + 0.010*"比較" + 0.008*"程式" + 0.007*"一下" + 0.007*"應該" + 0.007*"真的" 2025-04-19 00:10:30,304 : INFO : topic #7 (0.125): 0.040*"勞基法" + 0.017*"南港" + 0.016*"給付" + 0.016*"時數" + 0.016*"東京" + 0.015*"時間" + 0.011*"填寫" + 0.011*"工作日" + 0.011*"規定" + 0.011*"工作" 2025-04-19 00:10:30,304 : INFO : topic #1 (0.125): 0.031*"工作" + 0.014*"方式" + 0.012*"情形" + 0.011*"砍除" + 0.011*"推定" + 0.011*"第一項" + 0.011*"文字" + 0.011*"聯絡" + 0.011*"單位" + 0.010*"內容" 2025-04-19 00:10:30,305 : INFO : topic diff=0.261058, rho=0.299409 2025-04-19 00:10:30,305 : INFO : PROGRESS: pass 2, at document #12000/16310 2025-04-19 00:10:30,517 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:30,521 : INFO : topic #7 (0.125): 0.039*"勞基法" + 0.024*"東京" + 0.018*"南港" + 0.016*"給付" + 0.016*"時數" + 0.015*"時間" + 0.013*"填寫" + 0.010*"工作日" + 0.010*"店家" + 0.010*"工作" 2025-04-19 00:10:30,522 : INFO : topic #6 (0.125): 0.025*"活動" + 0.024*"報名" + 0.014*"電話" + 0.013*"研究" + 0.012*"進行" + 0.011*"台北市" + 0.011*"資料" + 0.011*"舉辦" + 0.011*"參加" + 0.010*"參與" 2025-04-19 00:10:30,522 : INFO : topic #2 (0.125): 0.025*"工作" + 0.017*"面試" + 0.012*"公司" + 0.011*"覺得" + 0.010*"時間" + 0.010*"比較" + 0.008*"程式" + 0.007*"真的" + 0.007*"應該" + 0.007*"一下" 2025-04-19 00:10:30,523 : INFO : topic #3 (0.125): 0.032*"半導體" + 0.028*"晶片" + 0.021*"表示" + 0.020*"中國" + 0.016*"製程" + 0.015*"投資" + 0.013*"英特爾" + 0.012*"美國" + 0.011*"先進" + 0.011*"台灣" 2025-04-19 00:10:30,523 : INFO : topic #1 (0.125): 0.031*"工作" + 0.014*"方式" + 0.012*"情形" + 0.011*"砍除" + 0.011*"推定" + 0.011*"文字" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"單位" + 0.010*"內容" 2025-04-19 00:10:30,524 : INFO : topic diff=0.316441, rho=0.299409 2025-04-19 00:10:30,524 : INFO : PROGRESS: pass 2, at document #14000/16310 2025-04-19 00:10:30,813 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:30,823 : INFO : topic #0 (0.125): 0.050*"工作" + 0.025*"方式" + 0.016*"小時" + 0.014*"時間" + 0.013*"每日" + 0.012*"工資" + 0.012*"依法" + 0.012*"聯絡" + 0.012*"內容" + 0.011*"應徵" 2025-04-19 00:10:30,825 : INFO : topic #5 (0.125): 0.015*"公司" + 0.006*"台灣" + 0.006*"技術" + 0.006*"工程師" + 0.005*"員工" + 0.005*"工作" + 0.005*"開發" + 0.005*"問題" + 0.004*"目前" + 0.004*"科技" 2025-04-19 00:10:30,827 : INFO : topic #2 (0.125): 0.024*"工作" + 0.015*"面試" + 0.012*"公司" + 0.010*"覺得" + 0.009*"時間" + 0.009*"比較" + 0.007*"真的" + 0.007*"應該" + 0.007*"程式" + 0.006*"一下" 2025-04-19 00:10:30,828 : INFO : topic #1 (0.125): 0.031*"工作" + 0.013*"方式" + 0.012*"情形" + 0.011*"砍除" + 0.011*"推定" + 0.011*"文字" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"單位" + 0.010*"內容" 2025-04-19 00:10:30,829 : INFO : topic #3 (0.125): 0.029*"晶片" + 0.028*"半導體" + 0.023*"表示" + 0.023*"中國" + 0.016*"英特爾" + 0.015*"台灣" + 0.014*"美國" + 0.013*"製程" + 0.013*"投資" + 0.011*"全球" 2025-04-19 00:10:30,830 : INFO : topic diff=0.296754, rho=0.299409 2025-04-19 00:10:30,831 : INFO : PROGRESS: pass 2, at document #16000/16310 2025-04-19 00:10:31,083 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:31,087 : INFO : topic #3 (0.125): 0.031*"晶片" + 0.026*"半導體" + 0.023*"中國" + 0.023*"表示" + 0.021*"美國" + 0.018*"英特爾" + 0.015*"台灣" + 0.013*"製程" + 0.012*"投資" + 0.011*"全球" 2025-04-19 00:10:31,088 : INFO : topic #5 (0.125): 0.014*"公司" + 0.006*"台灣" + 0.006*"技術" + 0.006*"員工" + 0.005*"工程師" + 0.005*"科技" + 0.004*"台積電" + 0.004*"工作" + 0.004*"目前" + 0.004*"問題" 2025-04-19 00:10:31,088 : INFO : topic #4 (0.125): 0.033*"工作" + 0.015*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.011*"第一項" + 0.011*"情形" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:10:31,089 : INFO : topic #2 (0.125): 0.023*"工作" + 0.014*"面試" + 0.012*"公司" + 0.009*"覺得" + 0.009*"時間" + 0.009*"比較" + 0.007*"真的" + 0.007*"應該" + 0.007*"主管" + 0.006*"一下" 2025-04-19 00:10:31,090 : INFO : topic #7 (0.125): 0.029*"東京" + 0.029*"勞基法" + 0.021*"南港" + 0.020*"給付" + 0.014*"發放" + 0.013*"時數" + 0.013*"投保" + 0.010*"時間" + 0.010*"惠普" + 0.010*"規定" 2025-04-19 00:10:31,090 : INFO : topic diff=0.249445, rho=0.299409 2025-04-19 00:10:31,159 : INFO : -8.407 per-word bound, 339.4 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:10:31,160 : INFO : PROGRESS: pass 2, at document #16310/16310 2025-04-19 00:10:31,193 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:10:31,196 : INFO : topic #6 (0.125): 0.018*"活動" + 0.017*"蘋果" + 0.015*"研究" + 0.015*"報名" + 0.013*"機器人" + 0.011*"進行" + 0.011*"問卷" + 0.009*"參與" + 0.009*"華為" + 0.009*"三星" 2025-04-19 00:10:31,197 : INFO : topic #7 (0.125): 0.036*"東京" + 0.023*"勞基法" + 0.020*"南港" + 0.020*"時數" + 0.018*"給付" + 0.015*"投保" + 0.013*"發放" + 0.010*"展覽館" + 0.009*"三家" + 0.008*"時間" 2025-04-19 00:10:31,197 : INFO : topic #4 (0.125): 0.033*"工作" + 0.015*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.011*"第一項" + 0.011*"情形" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:10:31,198 : INFO : topic #1 (0.125): 0.031*"工作" + 0.013*"方式" + 0.012*"情形" + 0.011*"砍除" + 0.011*"推定" + 0.011*"文字" + 0.011*"第一項" + 0.011*"單位" + 0.010*"聯絡" + 0.010*"內容" 2025-04-19 00:10:31,198 : INFO : topic #3 (0.125): 0.031*"晶片" + 0.030*"美國" + 0.023*"半導體" + 0.022*"中國" + 0.021*"表示" + 0.019*"投資" + 0.018*"英特爾" + 0.018*"台灣" + 0.012*"製程" + 0.012*"先進" 2025-04-19 00:10:31,199 : INFO : topic diff=0.251512, rho=0.299409 2025-04-19 00:10:31,199 : INFO : PROGRESS: pass 3, at document #2000/16310 2025-04-19 00:10:31,825 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:31,829 : INFO : topic #0 (0.125): 0.050*"工作" + 0.026*"方式" + 0.017*"小時" + 0.015*"工資" + 0.015*"時間" + 0.014*"依法" + 0.013*"每日" + 0.013*"推定" + 0.012*"內容" + 0.012*"休息" 2025-04-19 00:10:31,830 : INFO : topic #4 (0.125): 0.033*"工作" + 0.016*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.011*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"資訊" 2025-04-19 00:10:31,830 : INFO : topic #5 (0.125): 0.014*"公司" + 0.007*"技術" + 0.006*"台灣" + 0.006*"員工" + 0.005*"科技" + 0.005*"台積電" + 0.005*"台積" + 0.005*"工程師" + 0.004*"報導" + 0.004*"工作" 2025-04-19 00:10:31,831 : INFO : topic #1 (0.125): 0.030*"工作" + 0.012*"方式" + 0.012*"情形" + 0.011*"第一項" + 0.011*"砍除" + 0.011*"文字" + 0.011*"推定" + 0.010*"聯絡" + 0.010*"空白" + 0.010*"單位" 2025-04-19 00:10:31,831 : INFO : topic #6 (0.125): 0.024*"報名" + 0.023*"活動" + 0.015*"電話" + 0.012*"台北市" + 0.012*"舉辦" + 0.012*"研究" + 0.012*"參與" + 0.011*"進行" + 0.010*"資料" + 0.010*"人數" 2025-04-19 00:10:31,832 : INFO : topic diff=0.686165, rho=0.286829 2025-04-19 00:10:31,832 : INFO : PROGRESS: pass 3, at document #4000/16310 2025-04-19 00:10:32,373 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:32,377 : INFO : topic #6 (0.125): 0.027*"報名" + 0.026*"活動" + 0.018*"電話" + 0.014*"台北市" + 0.013*"舉辦" + 0.012*"車馬費" + 0.012*"人數" + 0.012*"資料" + 0.011*"參與" + 0.011*"進行" 2025-04-19 00:10:32,378 : INFO : topic #2 (0.125): 0.021*"工作" + 0.012*"面試" + 0.011*"公司" + 0.009*"時間" + 0.007*"覺得" + 0.007*"比較" + 0.007*"真的" + 0.006*"知道" + 0.006*"需要" + 0.006*"應該" 2025-04-19 00:10:32,379 : INFO : topic #0 (0.125): 0.049*"工作" + 0.026*"方式" + 0.017*"小時" + 0.015*"工資" + 0.015*"時間" + 0.014*"推定" + 0.014*"依法" + 0.013*"每日" + 0.012*"單位" + 0.012*"內容" 2025-04-19 00:10:32,379 : INFO : topic #5 (0.125): 0.014*"公司" + 0.007*"技術" + 0.006*"台灣" + 0.006*"員工" + 0.005*"科技" + 0.005*"台積電" + 0.005*"台積" + 0.005*"工程師" + 0.004*"報導" + 0.004*"工作" 2025-04-19 00:10:32,380 : INFO : topic #1 (0.125): 0.031*"工作" + 0.012*"情形" + 0.012*"第一項" + 0.012*"方式" + 0.012*"砍除" + 0.011*"文字" + 0.011*"空白" + 0.011*"推定" + 0.010*"聯絡" + 0.010*"資訊" 2025-04-19 00:10:32,380 : INFO : topic diff=0.326209, rho=0.286829 2025-04-19 00:10:32,381 : INFO : PROGRESS: pass 3, at document #6000/16310 2025-04-19 00:10:32,862 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:32,865 : INFO : topic #6 (0.125): 0.030*"報名" + 0.027*"活動" + 0.020*"電話" + 0.015*"台北市" + 0.013*"車馬費" + 0.013*"舉辦" + 0.012*"人數" + 0.012*"資料" + 0.012*"訪問" + 0.011*"參與" 2025-04-19 00:10:32,866 : INFO : topic #2 (0.125): 0.021*"工作" + 0.012*"面試" + 0.012*"公司" + 0.009*"時間" + 0.007*"比較" + 0.007*"覺得" + 0.007*"需要" + 0.006*"知道" + 0.006*"真的" + 0.006*"應該" 2025-04-19 00:10:32,866 : INFO : topic #4 (0.125): 0.033*"工作" + 0.016*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"資訊" 2025-04-19 00:10:32,867 : INFO : topic #0 (0.125): 0.050*"工作" + 0.027*"方式" + 0.017*"小時" + 0.015*"時間" + 0.015*"工資" + 0.014*"依法" + 0.014*"推定" + 0.014*"每日" + 0.012*"內容" + 0.012*"休息" 2025-04-19 00:10:32,868 : INFO : topic #7 (0.125): 0.028*"勞基法" + 0.018*"南港" + 0.016*"勞務" + 0.016*"發放" + 0.014*"時間" + 0.012*"報名者" + 0.012*"規定" + 0.012*"訪員" + 0.011*"投保" + 0.010*"展覽館" 2025-04-19 00:10:32,868 : INFO : topic diff=0.182729, rho=0.286829 2025-04-19 00:10:32,868 : INFO : PROGRESS: pass 3, at document #8000/16310 2025-04-19 00:10:33,097 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:33,101 : INFO : topic #1 (0.125): 0.030*"工作" + 0.012*"情形" + 0.012*"第一項" + 0.012*"砍除" + 0.012*"方式" + 0.012*"文字" + 0.011*"空白" + 0.010*"推定" + 0.010*"資訊" + 0.010*"聯絡" 2025-04-19 00:10:33,101 : INFO : topic #4 (0.125): 0.033*"工作" + 0.016*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"資訊" 2025-04-19 00:10:33,102 : INFO : topic #0 (0.125): 0.053*"工作" + 0.028*"方式" + 0.019*"小時" + 0.016*"時間" + 0.015*"每日" + 0.014*"工資" + 0.013*"依法" + 0.013*"推定" + 0.013*"內容" + 0.012*"休息" 2025-04-19 00:10:33,102 : INFO : topic #3 (0.125): 0.028*"晶片" + 0.027*"美國" + 0.022*"中國" + 0.022*"半導體" + 0.019*"表示" + 0.018*"投資" + 0.017*"台灣" + 0.016*"英特爾" + 0.011*"製程" + 0.011*"先進" 2025-04-19 00:10:33,103 : INFO : topic #7 (0.125): 0.047*"勞基法" + 0.017*"時間" + 0.017*"加班費" + 0.016*"南港" + 0.016*"填寫" + 0.016*"時數" + 0.015*"給付" + 0.013*"發放" + 0.012*"超過" + 0.012*"勞務" 2025-04-19 00:10:33,103 : INFO : topic diff=0.291625, rho=0.286829 2025-04-19 00:10:33,104 : INFO : PROGRESS: pass 3, at document #10000/16310 2025-04-19 00:10:33,346 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:33,350 : INFO : topic #1 (0.125): 0.030*"工作" + 0.012*"情形" + 0.012*"第一項" + 0.012*"砍除" + 0.012*"方式" + 0.012*"文字" + 0.011*"空白" + 0.010*"推定" + 0.010*"資訊" + 0.010*"聯絡" 2025-04-19 00:10:33,351 : INFO : topic #7 (0.125): 0.052*"勞基法" + 0.030*"加班費" + 0.022*"填寫" + 0.021*"時數" + 0.020*"超過" + 0.019*"時間" + 0.017*"給付" + 0.015*"南港" + 0.013*"符合" + 0.013*"東京" 2025-04-19 00:10:33,351 : INFO : topic #0 (0.125): 0.055*"工作" + 0.029*"方式" + 0.020*"小時" + 0.017*"時間" + 0.015*"每日" + 0.013*"工資" + 0.013*"依法" + 0.013*"內容" + 0.012*"推定" + 0.012*"休息" 2025-04-19 00:10:33,352 : INFO : topic #3 (0.125): 0.027*"美國" + 0.026*"晶片" + 0.023*"中國" + 0.022*"半導體" + 0.019*"表示" + 0.018*"台灣" + 0.017*"投資" + 0.015*"英特爾" + 0.011*"製程" + 0.011*"先進" 2025-04-19 00:10:33,352 : INFO : topic #5 (0.125): 0.017*"公司" + 0.008*"工程師" + 0.007*"開發" + 0.007*"技術" + 0.006*"團隊" + 0.005*"產品" + 0.005*"目前" + 0.005*"台灣" + 0.005*"問題" + 0.005*"工作" 2025-04-19 00:10:33,353 : INFO : topic diff=0.238144, rho=0.286829 2025-04-19 00:10:33,353 : INFO : PROGRESS: pass 3, at document #12000/16310 2025-04-19 00:10:33,547 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:33,551 : INFO : topic #1 (0.125): 0.030*"工作" + 0.012*"情形" + 0.012*"第一項" + 0.012*"砍除" + 0.012*"文字" + 0.012*"方式" + 0.011*"空白" + 0.010*"資訊" + 0.010*"推定" + 0.010*"聯絡" 2025-04-19 00:10:33,551 : INFO : topic #2 (0.125): 0.023*"工作" + 0.019*"面試" + 0.014*"公司" + 0.011*"比較" + 0.010*"覺得" + 0.009*"時間" + 0.009*"問題" + 0.008*"真的" + 0.008*"知道" + 0.007*"程式" 2025-04-19 00:10:33,552 : INFO : topic #4 (0.125): 0.033*"工作" + 0.015*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 00:10:33,552 : INFO : topic #6 (0.125): 0.027*"活動" + 0.027*"報名" + 0.015*"電話" + 0.013*"研究" + 0.012*"台北市" + 0.012*"舉辦" + 0.011*"資料" + 0.011*"進行" + 0.011*"參加" + 0.011*"參與" 2025-04-19 00:10:33,553 : INFO : topic #0 (0.125): 0.056*"工作" + 0.029*"方式" + 0.020*"小時" + 0.018*"時間" + 0.015*"每日" + 0.013*"工資" + 0.012*"內容" + 0.012*"依法" + 0.012*"聯絡" + 0.012*"休息" 2025-04-19 00:10:33,553 : INFO : topic diff=0.277253, rho=0.286829 2025-04-19 00:10:33,554 : INFO : PROGRESS: pass 3, at document #14000/16310 2025-04-19 00:10:33,833 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:33,837 : INFO : topic #0 (0.125): 0.056*"工作" + 0.029*"方式" + 0.020*"小時" + 0.017*"時間" + 0.015*"每日" + 0.013*"工資" + 0.012*"內容" + 0.012*"依法" + 0.012*"聯絡" + 0.012*"休息" 2025-04-19 00:10:33,838 : INFO : topic #7 (0.125): 0.042*"勞基法" + 0.034*"加班費" + 0.021*"超過" + 0.020*"填寫" + 0.020*"時數" + 0.019*"東京" + 0.017*"時間" + 0.017*"發放" + 0.016*"給付" + 0.016*"南港" 2025-04-19 00:10:33,841 : INFO : topic #3 (0.125): 0.028*"晶片" + 0.023*"半導體" + 0.022*"台灣" + 0.021*"美國" + 0.021*"表示" + 0.020*"中國" + 0.014*"英特爾" + 0.012*"產業" + 0.012*"全球" + 0.011*"投資" 2025-04-19 00:10:33,842 : INFO : topic #5 (0.125): 0.014*"公司" + 0.006*"技術" + 0.006*"工程師" + 0.005*"員工" + 0.005*"台灣" + 0.005*"開發" + 0.005*"科技" + 0.004*"產品" + 0.004*"目前" + 0.004*"工作" 2025-04-19 00:10:33,843 : INFO : topic #6 (0.125): 0.025*"活動" + 0.023*"報名" + 0.014*"研究" + 0.013*"電話" + 0.011*"進行" + 0.011*"問卷" + 0.010*"舉辦" + 0.010*"參與" + 0.010*"台北市" + 0.010*"資料" 2025-04-19 00:10:33,844 : INFO : topic diff=0.271297, rho=0.286829 2025-04-19 00:10:33,845 : INFO : PROGRESS: pass 3, at document #16000/16310 2025-04-19 00:10:34,063 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:34,067 : INFO : topic #7 (0.125): 0.039*"勞基法" + 0.030*"加班費" + 0.023*"東京" + 0.022*"發放" + 0.020*"超過" + 0.019*"給付" + 0.018*"南港" + 0.017*"時數" + 0.015*"填寫" + 0.015*"時間" 2025-04-19 00:10:34,068 : INFO : topic #4 (0.125): 0.033*"工作" + 0.015*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 00:10:34,069 : INFO : topic #3 (0.125): 0.029*"晶片" + 0.025*"美國" + 0.022*"半導體" + 0.021*"中國" + 0.021*"表示" + 0.020*"台灣" + 0.016*"英特爾" + 0.012*"產業" + 0.011*"全球" + 0.011*"投資" 2025-04-19 00:10:34,069 : INFO : topic #5 (0.125): 0.014*"公司" + 0.006*"技術" + 0.006*"員工" + 0.005*"工程師" + 0.005*"台灣" + 0.005*"科技" + 0.005*"台積電" + 0.004*"開發" + 0.004*"報導" + 0.004*"目前" 2025-04-19 00:10:34,070 : INFO : topic #2 (0.125): 0.022*"工作" + 0.015*"面試" + 0.014*"公司" + 0.009*"比較" + 0.009*"覺得" + 0.008*"時間" + 0.008*"問題" + 0.008*"主管" + 0.008*"真的" + 0.007*"知道" 2025-04-19 00:10:34,070 : INFO : topic diff=0.227152, rho=0.286829 2025-04-19 00:10:34,170 : INFO : -8.360 per-word bound, 328.5 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:10:34,171 : INFO : PROGRESS: pass 3, at document #16310/16310 2025-04-19 00:10:34,215 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:10:34,219 : INFO : topic #5 (0.125): 0.014*"公司" + 0.008*"技術" + 0.006*"員工" + 0.006*"台積電" + 0.006*"科技" + 0.005*"台積" + 0.005*"工程師" + 0.005*"台灣" + 0.005*"報導" + 0.004*"產品" 2025-04-19 00:10:34,220 : INFO : topic #4 (0.125): 0.033*"工作" + 0.015*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"內容" + 0.011*"資訊" + 0.011*"聯絡" 2025-04-19 00:10:34,220 : INFO : topic #0 (0.125): 0.058*"工作" + 0.027*"方式" + 0.022*"小時" + 0.017*"時間" + 0.015*"工時" + 0.015*"每日" + 0.012*"工資" + 0.012*"聯絡" + 0.012*"內容" + 0.012*"休息" 2025-04-19 00:10:34,221 : INFO : topic #1 (0.125): 0.030*"工作" + 0.012*"情形" + 0.012*"第一項" + 0.012*"文字" + 0.012*"砍除" + 0.011*"方式" + 0.011*"空白" + 0.010*"資訊" + 0.010*"推定" + 0.010*"聯絡" 2025-04-19 00:10:34,221 : INFO : topic #2 (0.125): 0.020*"工作" + 0.014*"公司" + 0.013*"面試" + 0.009*"知道" + 0.009*"真的" + 0.008*"覺得" + 0.008*"比較" + 0.008*"問題" + 0.008*"時間" + 0.007*"應該" 2025-04-19 00:10:34,222 : INFO : topic diff=0.227475, rho=0.286829 2025-04-19 00:10:34,222 : INFO : PROGRESS: pass 4, at document #2000/16310 2025-04-19 00:10:34,759 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:34,763 : INFO : topic #1 (0.125): 0.030*"工作" + 0.012*"情形" + 0.012*"第一項" + 0.012*"砍除" + 0.012*"文字" + 0.011*"方式" + 0.011*"空白" + 0.010*"推定" + 0.010*"資訊" + 0.010*"聯絡" 2025-04-19 00:10:34,763 : INFO : topic #0 (0.125): 0.054*"工作" + 0.028*"方式" + 0.019*"小時" + 0.016*"時間" + 0.015*"工資" + 0.014*"每日" + 0.014*"依法" + 0.013*"推定" + 0.013*"休息" + 0.012*"內容" 2025-04-19 00:10:34,764 : INFO : topic #2 (0.125): 0.020*"工作" + 0.014*"公司" + 0.013*"面試" + 0.008*"知道" + 0.008*"真的" + 0.008*"比較" + 0.008*"時間" + 0.008*"覺得" + 0.007*"問題" + 0.007*"主管" 2025-04-19 00:10:34,764 : INFO : topic #5 (0.125): 0.014*"公司" + 0.008*"技術" + 0.006*"員工" + 0.006*"台積電" + 0.006*"科技" + 0.005*"台積" + 0.005*"台灣" + 0.005*"工程師" + 0.004*"報導" + 0.004*"產品" 2025-04-19 00:10:34,765 : INFO : topic #6 (0.125): 0.025*"報名" + 0.025*"活動" + 0.016*"電話" + 0.013*"台北市" + 0.013*"舉辦" + 0.012*"研究" + 0.012*"參與" + 0.011*"進行" + 0.010*"人數" + 0.010*"資料" 2025-04-19 00:10:34,765 : INFO : topic diff=0.640666, rho=0.275711 2025-04-19 00:10:34,765 : INFO : PROGRESS: pass 4, at document #4000/16310 2025-04-19 00:10:35,333 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:35,337 : INFO : topic #1 (0.125): 0.030*"工作" + 0.012*"第一項" + 0.012*"情形" + 0.012*"砍除" + 0.012*"文字" + 0.011*"空白" + 0.011*"方式" + 0.010*"資訊" + 0.010*"推定" + 0.010*"聯絡" 2025-04-19 00:10:35,337 : INFO : topic #6 (0.125): 0.028*"報名" + 0.027*"活動" + 0.019*"電話" + 0.014*"台北市" + 0.013*"舉辦" + 0.012*"車馬費" + 0.012*"人數" + 0.012*"參與" + 0.011*"資料" + 0.011*"進行" 2025-04-19 00:10:35,338 : INFO : topic #3 (0.125): 0.032*"美國" + 0.028*"晶片" + 0.021*"台灣" + 0.020*"中國" + 0.019*"半導體" + 0.019*"表示" + 0.017*"投資" + 0.015*"英特爾" + 0.011*"產業" + 0.011*"全球" 2025-04-19 00:10:35,338 : INFO : topic #2 (0.125): 0.020*"工作" + 0.013*"公司" + 0.013*"面試" + 0.008*"時間" + 0.008*"知道" + 0.008*"比較" + 0.007*"問題" + 0.007*"真的" + 0.007*"覺得" + 0.006*"主管" 2025-04-19 00:10:35,339 : INFO : topic #4 (0.125): 0.033*"工作" + 0.016*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"第一項" + 0.011*"資訊" 2025-04-19 00:10:35,339 : INFO : topic diff=0.310708, rho=0.275711 2025-04-19 00:10:35,339 : INFO : PROGRESS: pass 4, at document #6000/16310 2025-04-19 00:10:35,799 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:35,803 : INFO : topic #0 (0.125): 0.052*"工作" + 0.028*"方式" + 0.018*"小時" + 0.016*"時間" + 0.015*"工資" + 0.015*"每日" + 0.015*"推定" + 0.014*"依法" + 0.013*"內容" + 0.013*"休息" 2025-04-19 00:10:35,804 : INFO : topic #4 (0.125): 0.033*"工作" + 0.016*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"第一項" + 0.011*"資訊" 2025-04-19 00:10:35,804 : INFO : topic #3 (0.125): 0.031*"美國" + 0.028*"晶片" + 0.021*"台灣" + 0.020*"中國" + 0.019*"半導體" + 0.018*"表示" + 0.017*"投資" + 0.015*"英特爾" + 0.011*"產業" + 0.010*"全球" 2025-04-19 00:10:35,805 : INFO : topic #6 (0.125): 0.030*"報名" + 0.028*"活動" + 0.020*"電話" + 0.016*"台北市" + 0.013*"車馬費" + 0.013*"舉辦" + 0.013*"人數" + 0.012*"資料" + 0.012*"訪問" + 0.011*"參與" 2025-04-19 00:10:35,805 : INFO : topic #5 (0.125): 0.015*"公司" + 0.007*"技術" + 0.006*"員工" + 0.005*"科技" + 0.005*"工程師" + 0.005*"產品" + 0.005*"台灣" + 0.005*"台積電" + 0.004*"開發" + 0.004*"台積" 2025-04-19 00:10:35,806 : INFO : topic diff=0.175712, rho=0.275711 2025-04-19 00:10:35,806 : INFO : PROGRESS: pass 4, at document #8000/16310 2025-04-19 00:10:36,057 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:36,060 : INFO : topic #3 (0.125): 0.032*"美國" + 0.027*"晶片" + 0.022*"台灣" + 0.021*"中國" + 0.019*"半導體" + 0.018*"表示" + 0.016*"投資" + 0.014*"英特爾" + 0.011*"產業" + 0.011*"全球" 2025-04-19 00:10:36,061 : INFO : topic #4 (0.125): 0.033*"工作" + 0.016*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"第一項" 2025-04-19 00:10:36,061 : INFO : topic #6 (0.125): 0.029*"報名" + 0.027*"活動" + 0.019*"電話" + 0.015*"台北市" + 0.013*"舉辦" + 0.012*"車馬費" + 0.012*"人數" + 0.012*"資料" + 0.011*"參與" + 0.011*"參加" 2025-04-19 00:10:36,062 : INFO : topic #7 (0.125): 0.051*"勞基法" + 0.047*"加班費" + 0.021*"時間" + 0.021*"填寫" + 0.019*"超過" + 0.016*"計算" + 0.016*"發放" + 0.015*"時數" + 0.014*"南港" + 0.013*"給付" 2025-04-19 00:10:36,063 : INFO : topic #1 (0.125): 0.030*"工作" + 0.012*"第一項" + 0.012*"情形" + 0.012*"砍除" + 0.012*"文字" + 0.012*"空白" + 0.011*"方式" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"推定" 2025-04-19 00:10:36,063 : INFO : topic diff=0.275699, rho=0.275711 2025-04-19 00:10:36,063 : INFO : PROGRESS: pass 4, at document #10000/16310 2025-04-19 00:10:36,262 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:36,266 : INFO : topic #5 (0.125): 0.016*"公司" + 0.008*"開發" + 0.008*"技術" + 0.007*"工程師" + 0.006*"團隊" + 0.006*"產品" + 0.005*"台灣" + 0.005*"員工" + 0.005*"軟體" + 0.005*"目前" 2025-04-19 00:10:36,266 : INFO : topic #0 (0.125): 0.057*"工作" + 0.030*"方式" + 0.020*"小時" + 0.018*"時間" + 0.015*"每日" + 0.013*"工資" + 0.013*"內容" + 0.013*"依法" + 0.012*"休息" + 0.012*"推定" 2025-04-19 00:10:36,267 : INFO : topic #4 (0.125): 0.033*"工作" + 0.016*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"第一項" 2025-04-19 00:10:36,267 : INFO : topic #6 (0.125): 0.029*"報名" + 0.027*"活動" + 0.018*"電話" + 0.014*"台北市" + 0.012*"舉辦" + 0.012*"資料" + 0.012*"研究" + 0.012*"人數" + 0.011*"車馬費" + 0.011*"時間" 2025-04-19 00:10:36,268 : INFO : topic #1 (0.125): 0.030*"工作" + 0.012*"第一項" + 0.012*"情形" + 0.012*"砍除" + 0.012*"文字" + 0.011*"空白" + 0.011*"方式" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"推定" 2025-04-19 00:10:36,268 : INFO : topic diff=0.222152, rho=0.275711 2025-04-19 00:10:36,268 : INFO : PROGRESS: pass 4, at document #12000/16310 2025-04-19 00:10:36,496 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:36,500 : INFO : topic #1 (0.125): 0.030*"工作" + 0.012*"第一項" + 0.012*"情形" + 0.012*"砍除" + 0.012*"文字" + 0.011*"空白" + 0.011*"方式" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"推定" 2025-04-19 00:10:36,500 : INFO : topic #3 (0.125): 0.027*"晶片" + 0.023*"美國" + 0.023*"半導體" + 0.022*"台灣" + 0.018*"表示" + 0.017*"中國" + 0.012*"投資" + 0.012*"產業" + 0.012*"製程" + 0.011*"英特爾" 2025-04-19 00:10:36,501 : INFO : topic #7 (0.125): 0.064*"加班費" + 0.053*"勞基法" + 0.028*"填寫" + 0.027*"超過" + 0.024*"時間" + 0.022*"計算" + 0.018*"薪資" + 0.018*"時數" + 0.017*"發放" + 0.016*"符合" 2025-04-19 00:10:36,502 : INFO : topic #5 (0.125): 0.015*"公司" + 0.007*"技術" + 0.007*"開發" + 0.007*"工程師" + 0.005*"產品" + 0.005*"團隊" + 0.005*"員工" + 0.005*"台灣" + 0.004*"軟體" + 0.004*"目前" 2025-04-19 00:10:36,502 : INFO : topic #2 (0.125): 0.022*"工作" + 0.018*"面試" + 0.015*"公司" + 0.011*"問題" + 0.010*"比較" + 0.010*"覺得" + 0.008*"時間" + 0.008*"知道" + 0.007*"真的" + 0.007*"一些" 2025-04-19 00:10:36,502 : INFO : topic diff=0.254074, rho=0.275711 2025-04-19 00:10:36,503 : INFO : PROGRESS: pass 4, at document #14000/16310 2025-04-19 00:10:36,781 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:36,787 : INFO : topic #1 (0.125): 0.030*"工作" + 0.012*"第一項" + 0.012*"情形" + 0.012*"砍除" + 0.012*"文字" + 0.011*"空白" + 0.011*"方式" + 0.010*"資訊" + 0.010*"聯絡" + 0.010*"推定" 2025-04-19 00:10:36,788 : INFO : topic #3 (0.125): 0.026*"晶片" + 0.024*"台灣" + 0.022*"美國" + 0.021*"半導體" + 0.019*"表示" + 0.019*"中國" + 0.013*"英特爾" + 0.012*"產業" + 0.012*"全球" + 0.011*"投資" 2025-04-19 00:10:36,792 : INFO : topic #4 (0.125): 0.033*"工作" + 0.016*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"資訊" + 0.011*"單位" 2025-04-19 00:10:36,793 : INFO : topic #5 (0.125): 0.014*"公司" + 0.007*"技術" + 0.006*"工程師" + 0.006*"員工" + 0.005*"開發" + 0.005*"台灣" + 0.005*"科技" + 0.005*"產品" + 0.004*"團隊" + 0.004*"目前" 2025-04-19 00:10:36,794 : INFO : topic #0 (0.125): 0.059*"工作" + 0.029*"方式" + 0.020*"小時" + 0.018*"時間" + 0.015*"每日" + 0.013*"工資" + 0.013*"內容" + 0.012*"依法" + 0.012*"休息" + 0.012*"聯絡" 2025-04-19 00:10:36,795 : INFO : topic diff=0.256628, rho=0.275711 2025-04-19 00:10:36,795 : INFO : PROGRESS: pass 4, at document #16000/16310 2025-04-19 00:10:37,032 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:37,035 : INFO : topic #6 (0.125): 0.025*"活動" + 0.023*"報名" + 0.015*"研究" + 0.012*"電話" + 0.011*"問卷" + 0.011*"進行" + 0.010*"舉辦" + 0.010*"參加" + 0.010*"參與" + 0.010*"台北市" 2025-04-19 00:10:37,036 : INFO : topic #5 (0.125): 0.014*"公司" + 0.007*"技術" + 0.006*"員工" + 0.005*"工程師" + 0.005*"科技" + 0.005*"台積電" + 0.005*"台灣" + 0.004*"開發" + 0.004*"報導" + 0.004*"產品" 2025-04-19 00:10:37,036 : INFO : topic #4 (0.125): 0.033*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"資訊" 2025-04-19 00:10:37,037 : INFO : topic #7 (0.125): 0.057*"加班費" + 0.042*"勞基法" + 0.025*"發放" + 0.024*"超過" + 0.020*"填寫" + 0.019*"時間" + 0.019*"東京" + 0.019*"計算" + 0.017*"薪資" + 0.016*"時數" 2025-04-19 00:10:37,037 : INFO : topic #0 (0.125): 0.058*"工作" + 0.029*"方式" + 0.021*"小時" + 0.018*"時間" + 0.015*"每日" + 0.013*"工資" + 0.012*"內容" + 0.012*"工時" + 0.012*"依法" + 0.012*"休息" 2025-04-19 00:10:37,038 : INFO : topic diff=0.215227, rho=0.275711 2025-04-19 00:10:37,115 : INFO : -8.339 per-word bound, 323.8 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:10:37,116 : INFO : PROGRESS: pass 4, at document #16310/16310 2025-04-19 00:10:37,146 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:10:37,150 : INFO : topic #6 (0.125): 0.023*"活動" + 0.020*"報名" + 0.017*"研究" + 0.013*"問卷" + 0.010*"進行" + 0.010*"華為" + 0.010*"時間" + 0.010*"電話" + 0.010*"參與" + 0.010*"台北市" 2025-04-19 00:10:37,150 : INFO : topic #5 (0.125): 0.014*"公司" + 0.008*"技術" + 0.007*"員工" + 0.006*"台積電" + 0.006*"科技" + 0.005*"台積" + 0.005*"工程師" + 0.005*"報導" + 0.004*"台灣" + 0.004*"產品" 2025-04-19 00:10:37,151 : INFO : topic #1 (0.125): 0.029*"工作" + 0.012*"情形" + 0.012*"第一項" + 0.012*"文字" + 0.012*"砍除" + 0.011*"空白" + 0.011*"方式" + 0.010*"資訊" + 0.010*"分類" + 0.010*"聯絡" 2025-04-19 00:10:37,151 : INFO : topic #3 (0.125): 0.034*"美國" + 0.028*"晶片" + 0.023*"台灣" + 0.020*"中國" + 0.019*"半導體" + 0.019*"表示" + 0.016*"投資" + 0.015*"英特爾" + 0.011*"產業" + 0.011*"全球" 2025-04-19 00:10:37,152 : INFO : topic #2 (0.125): 0.019*"工作" + 0.015*"公司" + 0.013*"面試" + 0.009*"知道" + 0.009*"問題" + 0.008*"真的" + 0.008*"比較" + 0.008*"覺得" + 0.007*"時間" + 0.007*"主管" 2025-04-19 00:10:37,152 : INFO : topic diff=0.214082, rho=0.275711 2025-04-19 00:10:37,153 : INFO : LdaModel lifecycle event {'msg': 'trained LdaModel<num_terms=23261, num_topics=8, decay=0.5, chunksize=2000> in 15.70s', 'datetime': '2025-04-19T00:10:37.153168', 'gensim': '4.3.3', 'python': '3.11.2 (main, Apr 21 2023, 22:51:21) [Clang 14.0.3 (clang-1403.0.22.14.1)]', 'platform': 'macOS-15.3.2-arm64-arm-64bit', 'event': 'created'} 2025-04-19 00:10:42,243 : INFO : -7.035 per-word bound, 131.1 perplexity estimate based on a held-out corpus of 16310 documents with 3460358 words 2025-04-19 00:10:42,246 : INFO : using ParallelWordOccurrenceAccumulator<processes=7, batch_size=64> to estimate probabilities from sliding windows 2025-04-19 00:10:46,154 : INFO : 1 batches submitted to accumulate stats from 64 documents (22660 virtual) 2025-04-19 00:10:46,156 : INFO : 2 batches submitted to accumulate stats from 128 documents (45646 virtual) 2025-04-19 00:10:46,159 : INFO : 3 batches submitted to accumulate stats from 192 documents (67171 virtual) 2025-04-19 00:10:46,162 : INFO : 4 batches submitted to accumulate stats from 256 documents (88330 virtual) 2025-04-19 00:10:46,168 : INFO : 5 batches submitted to accumulate stats from 320 documents (109687 virtual) 2025-04-19 00:10:46,172 : INFO : 6 batches submitted to accumulate stats from 384 documents (131042 virtual) 2025-04-19 00:10:46,179 : INFO : 7 batches submitted to accumulate stats from 448 documents (153774 virtual) 2025-04-19 00:10:46,185 : INFO : 8 batches submitted to accumulate stats from 512 documents (176164 virtual) 2025-04-19 00:10:46,193 : INFO : 9 batches submitted to accumulate stats from 576 documents (197020 virtual) 2025-04-19 00:10:46,198 : INFO : 10 batches submitted to accumulate stats from 640 documents (218505 virtual) 2025-04-19 00:10:46,209 : INFO : 11 batches submitted to accumulate stats from 704 documents (240803 virtual) 2025-04-19 00:10:46,215 : INFO : 12 batches submitted to accumulate stats from 768 documents (265360 virtual) 2025-04-19 00:10:46,218 : INFO : 13 batches submitted to accumulate stats from 832 documents (286615 virtual) 2025-04-19 00:10:46,222 : INFO : 14 batches submitted to accumulate stats from 896 documents (310833 virtual) 2025-04-19 00:10:46,307 : INFO : 15 batches submitted to accumulate stats from 960 documents (331313 virtual) 2025-04-19 00:10:46,312 : INFO : 16 batches submitted to accumulate stats from 1024 documents (350940 virtual) 2025-04-19 00:10:46,349 : INFO : 17 batches submitted to accumulate stats from 1088 documents (368371 virtual) 2025-04-19 00:10:46,353 : INFO : 18 batches submitted to accumulate stats from 1152 documents (390334 virtual) 2025-04-19 00:10:46,365 : INFO : 19 batches submitted to accumulate stats from 1216 documents (414153 virtual) 2025-04-19 00:10:46,377 : INFO : 20 batches submitted to accumulate stats from 1280 documents (435684 virtual) 2025-04-19 00:10:46,382 : INFO : 21 batches submitted to accumulate stats from 1344 documents (459433 virtual) 2025-04-19 00:10:46,473 : INFO : 22 batches submitted to accumulate stats from 1408 documents (483210 virtual) 2025-04-19 00:10:46,481 : INFO : 23 batches submitted to accumulate stats from 1472 documents (507391 virtual) 2025-04-19 00:10:46,494 : INFO : 24 batches submitted to accumulate stats from 1536 documents (527404 virtual) 2025-04-19 00:10:46,514 : INFO : 25 batches submitted to accumulate stats from 1600 documents (550178 virtual) 2025-04-19 00:10:46,519 : INFO : 26 batches submitted to accumulate stats from 1664 documents (575041 virtual) 2025-04-19 00:10:46,533 : INFO : 27 batches submitted to accumulate stats from 1728 documents (598912 virtual) 2025-04-19 00:10:46,554 : INFO : 28 batches submitted to accumulate stats from 1792 documents (622487 virtual) 2025-04-19 00:10:46,612 : INFO : 29 batches submitted to accumulate stats from 1856 documents (648902 virtual) 2025-04-19 00:10:46,625 : INFO : 30 batches submitted to accumulate stats from 1920 documents (671126 virtual) 2025-04-19 00:10:46,639 : INFO : 31 batches submitted to accumulate stats from 1984 documents (693717 virtual) 2025-04-19 00:10:46,674 : INFO : 32 batches submitted to accumulate stats from 2048 documents (714139 virtual) 2025-04-19 00:10:46,677 : INFO : 33 batches submitted to accumulate stats from 2112 documents (736202 virtual) 2025-04-19 00:10:46,682 : INFO : 34 batches submitted to accumulate stats from 2176 documents (758687 virtual) 2025-04-19 00:10:46,752 : INFO : 35 batches submitted to accumulate stats from 2240 documents (779677 virtual) 2025-04-19 00:10:46,809 : INFO : 36 batches submitted to accumulate stats from 2304 documents (800483 virtual) 2025-04-19 00:10:46,826 : INFO : 37 batches submitted to accumulate stats from 2368 documents (821258 virtual) 2025-04-19 00:10:46,835 : INFO : 38 batches submitted to accumulate stats from 2432 documents (844326 virtual) 2025-04-19 00:10:46,885 : INFO : 39 batches submitted to accumulate stats from 2496 documents (868823 virtual) 2025-04-19 00:10:46,890 : INFO : 40 batches submitted to accumulate stats from 2560 documents (888215 virtual) 2025-04-19 00:10:46,898 : INFO : 41 batches submitted to accumulate stats from 2624 documents (910499 virtual) 2025-04-19 00:10:46,925 : INFO : 42 batches submitted to accumulate stats from 2688 documents (931945 virtual) 2025-04-19 00:10:46,982 : INFO : 43 batches submitted to accumulate stats from 2752 documents (954111 virtual) 2025-04-19 00:10:46,987 : INFO : 44 batches submitted to accumulate stats from 2816 documents (975617 virtual) 2025-04-19 00:10:47,008 : INFO : 45 batches submitted to accumulate stats from 2880 documents (995125 virtual) 2025-04-19 00:10:47,039 : INFO : 46 batches submitted to accumulate stats from 2944 documents (1016531 virtual) 2025-04-19 00:10:47,060 : INFO : 47 batches submitted to accumulate stats from 3008 documents (1038247 virtual) 2025-04-19 00:10:47,071 : INFO : 48 batches submitted to accumulate stats from 3072 documents (1063862 virtual) 2025-04-19 00:10:47,076 : INFO : 49 batches submitted to accumulate stats from 3136 documents (1087898 virtual) 2025-04-19 00:10:47,136 : INFO : 50 batches submitted to accumulate stats from 3200 documents (1110531 virtual) 2025-04-19 00:10:47,150 : INFO : 51 batches submitted to accumulate stats from 3264 documents (1133127 virtual) 2025-04-19 00:10:47,204 : INFO : 52 batches submitted to accumulate stats from 3328 documents (1153766 virtual) 2025-04-19 00:10:47,210 : INFO : 53 batches submitted to accumulate stats from 3392 documents (1177684 virtual) 2025-04-19 00:10:47,249 : INFO : 54 batches submitted to accumulate stats from 3456 documents (1200190 virtual) 2025-04-19 00:10:47,254 : INFO : 55 batches submitted to accumulate stats from 3520 documents (1225029 virtual) 2025-04-19 00:10:47,281 : INFO : 56 batches submitted to accumulate stats from 3584 documents (1249662 virtual) 2025-04-19 00:10:47,291 : INFO : 57 batches submitted to accumulate stats from 3648 documents (1274547 virtual) 2025-04-19 00:10:47,324 : INFO : 58 batches submitted to accumulate stats from 3712 documents (1297434 virtual) 2025-04-19 00:10:47,329 : INFO : 59 batches submitted to accumulate stats from 3776 documents (1319261 virtual) 2025-04-19 00:10:47,375 : INFO : 60 batches submitted to accumulate stats from 3840 documents (1341972 virtual) 2025-04-19 00:10:47,386 : INFO : 61 batches submitted to accumulate stats from 3904 documents (1364269 virtual) 2025-04-19 00:10:47,408 : INFO : 62 batches submitted to accumulate stats from 3968 documents (1386796 virtual) 2025-04-19 00:10:47,427 : INFO : 63 batches submitted to accumulate stats from 4032 documents (1410249 virtual) 2025-04-19 00:10:47,433 : INFO : 64 batches submitted to accumulate stats from 4096 documents (1433115 virtual) 2025-04-19 00:10:47,470 : INFO : 65 batches submitted to accumulate stats from 4160 documents (1453873 virtual) 2025-04-19 00:10:47,515 : INFO : 66 batches submitted to accumulate stats from 4224 documents (1475474 virtual) 2025-04-19 00:10:47,587 : INFO : 67 batches submitted to accumulate stats from 4288 documents (1497524 virtual) 2025-04-19 00:10:47,592 : INFO : 68 batches submitted to accumulate stats from 4352 documents (1516835 virtual) 2025-04-19 00:10:47,637 : INFO : 69 batches submitted to accumulate stats from 4416 documents (1536986 virtual) 2025-04-19 00:10:47,645 : INFO : 70 batches submitted to accumulate stats from 4480 documents (1558454 virtual) 2025-04-19 00:10:47,650 : INFO : 71 batches submitted to accumulate stats from 4544 documents (1580610 virtual) 2025-04-19 00:10:47,670 : INFO : 72 batches submitted to accumulate stats from 4608 documents (1603508 virtual) 2025-04-19 00:10:47,675 : INFO : 73 batches submitted to accumulate stats from 4672 documents (1624378 virtual) 2025-04-19 00:10:47,748 : INFO : 74 batches submitted to accumulate stats from 4736 documents (1646402 virtual) 2025-04-19 00:10:47,774 : INFO : 75 batches submitted to accumulate stats from 4800 documents (1668704 virtual) 2025-04-19 00:10:47,779 : INFO : 76 batches submitted to accumulate stats from 4864 documents (1690394 virtual) 2025-04-19 00:10:47,791 : INFO : 77 batches submitted to accumulate stats from 4928 documents (1713028 virtual) 2025-04-19 00:10:47,815 : INFO : 78 batches submitted to accumulate stats from 4992 documents (1735434 virtual) 2025-04-19 00:10:47,820 : INFO : 79 batches submitted to accumulate stats from 5056 documents (1755430 virtual) 2025-04-19 00:10:47,831 : INFO : 80 batches submitted to accumulate stats from 5120 documents (1779164 virtual) 2025-04-19 00:10:47,892 : INFO : 81 batches submitted to accumulate stats from 5184 documents (1799023 virtual) 2025-04-19 00:10:47,911 : INFO : 82 batches submitted to accumulate stats from 5248 documents (1821516 virtual) 2025-04-19 00:10:47,914 : INFO : 83 batches submitted to accumulate stats from 5312 documents (1844224 virtual) 2025-04-19 00:10:47,962 : INFO : 84 batches submitted to accumulate stats from 5376 documents (1864739 virtual) 2025-04-19 00:10:47,966 : INFO : 85 batches submitted to accumulate stats from 5440 documents (1885053 virtual) 2025-04-19 00:10:47,979 : INFO : 86 batches submitted to accumulate stats from 5504 documents (1902170 virtual) 2025-04-19 00:10:47,983 : INFO : 87 batches submitted to accumulate stats from 5568 documents (1924910 virtual) 2025-04-19 00:10:48,029 : INFO : 88 batches submitted to accumulate stats from 5632 documents (1931530 virtual) 2025-04-19 00:10:48,049 : INFO : 89 batches submitted to accumulate stats from 5696 documents (1941414 virtual) 2025-04-19 00:10:48,053 : INFO : 90 batches submitted to accumulate stats from 5760 documents (1950642 virtual) 2025-04-19 00:10:48,121 : INFO : 91 batches submitted to accumulate stats from 5824 documents (1957200 virtual) 2025-04-19 00:10:48,126 : INFO : 92 batches submitted to accumulate stats from 5888 documents (1964937 virtual) 2025-04-19 00:10:48,130 : INFO : 93 batches submitted to accumulate stats from 5952 documents (1974259 virtual) 2025-04-19 00:10:48,185 : INFO : 94 batches submitted to accumulate stats from 6016 documents (1988296 virtual) 2025-04-19 00:10:48,191 : INFO : 95 batches submitted to accumulate stats from 6080 documents (1997659 virtual) 2025-04-19 00:10:48,219 : INFO : 96 batches submitted to accumulate stats from 6144 documents (2009678 virtual) 2025-04-19 00:10:48,225 : INFO : 97 batches submitted to accumulate stats from 6208 documents (2019297 virtual) 2025-04-19 00:10:48,270 : INFO : 98 batches submitted to accumulate stats from 6272 documents (2031857 virtual) 2025-04-19 00:10:48,276 : INFO : 99 batches submitted to accumulate stats from 6336 documents (2044117 virtual) 2025-04-19 00:10:48,281 : INFO : 100 batches submitted to accumulate stats from 6400 documents (2053380 virtual) 2025-04-19 00:10:48,285 : INFO : 101 batches submitted to accumulate stats from 6464 documents (2066889 virtual) 2025-04-19 00:10:48,310 : INFO : 102 batches submitted to accumulate stats from 6528 documents (2075479 virtual) 2025-04-19 00:10:48,316 : INFO : 103 batches submitted to accumulate stats from 6592 documents (2085095 virtual) 2025-04-19 00:10:48,320 : INFO : 104 batches submitted to accumulate stats from 6656 documents (2093845 virtual) 2025-04-19 00:10:48,322 : INFO : 105 batches submitted to accumulate stats from 6720 documents (2102407 virtual) 2025-04-19 00:10:48,328 : INFO : 106 batches submitted to accumulate stats from 6784 documents (2111466 virtual) 2025-04-19 00:10:48,351 : INFO : 107 batches submitted to accumulate stats from 6848 documents (2121845 virtual) 2025-04-19 00:10:48,375 : INFO : 108 batches submitted to accumulate stats from 6912 documents (2129219 virtual) 2025-04-19 00:10:48,382 : INFO : 109 batches submitted to accumulate stats from 6976 documents (2137886 virtual) 2025-04-19 00:10:48,420 : INFO : 110 batches submitted to accumulate stats from 7040 documents (2145150 virtual) 2025-04-19 00:10:48,431 : INFO : 111 batches submitted to accumulate stats from 7104 documents (2155495 virtual) 2025-04-19 00:10:48,444 : INFO : 112 batches submitted to accumulate stats from 7168 documents (2164720 virtual) 2025-04-19 00:10:48,449 : INFO : 113 batches submitted to accumulate stats from 7232 documents (2172193 virtual) 2025-04-19 00:10:48,463 : INFO : 114 batches submitted to accumulate stats from 7296 documents (2183458 virtual) 2025-04-19 00:10:48,475 : INFO : 115 batches submitted to accumulate stats from 7360 documents (2191706 virtual) 2025-04-19 00:10:48,493 : INFO : 116 batches submitted to accumulate stats from 7424 documents (2202020 virtual) 2025-04-19 00:10:48,495 : INFO : 117 batches submitted to accumulate stats from 7488 documents (2211055 virtual) 2025-04-19 00:10:48,496 : INFO : 118 batches submitted to accumulate stats from 7552 documents (2223321 virtual) 2025-04-19 00:10:48,499 : INFO : 119 batches submitted to accumulate stats from 7616 documents (2230121 virtual) 2025-04-19 00:10:48,506 : INFO : 120 batches submitted to accumulate stats from 7680 documents (2243511 virtual) 2025-04-19 00:10:48,525 : INFO : 121 batches submitted to accumulate stats from 7744 documents (2258370 virtual) 2025-04-19 00:10:48,527 : INFO : 122 batches submitted to accumulate stats from 7808 documents (2269267 virtual) 2025-04-19 00:10:48,536 : INFO : 123 batches submitted to accumulate stats from 7872 documents (2280490 virtual) 2025-04-19 00:10:48,550 : INFO : 124 batches submitted to accumulate stats from 7936 documents (2289945 virtual) 2025-04-19 00:10:48,553 : INFO : 125 batches submitted to accumulate stats from 8000 documents (2298931 virtual) 2025-04-19 00:10:48,569 : INFO : 126 batches submitted to accumulate stats from 8064 documents (2309719 virtual) 2025-04-19 00:10:48,574 : INFO : 127 batches submitted to accumulate stats from 8128 documents (2320328 virtual) 2025-04-19 00:10:48,577 : INFO : 128 batches submitted to accumulate stats from 8192 documents (2331614 virtual) 2025-04-19 00:10:48,599 : INFO : 129 batches submitted to accumulate stats from 8256 documents (2342568 virtual) 2025-04-19 00:10:48,601 : INFO : 130 batches submitted to accumulate stats from 8320 documents (2351306 virtual) 2025-04-19 00:10:48,621 : INFO : 131 batches submitted to accumulate stats from 8384 documents (2359488 virtual) 2025-04-19 00:10:48,623 : INFO : 132 batches submitted to accumulate stats from 8448 documents (2368497 virtual) 2025-04-19 00:10:48,685 : INFO : 133 batches submitted to accumulate stats from 8512 documents (2378449 virtual) 2025-04-19 00:10:48,697 : INFO : 134 batches submitted to accumulate stats from 8576 documents (2388057 virtual) 2025-04-19 00:10:48,702 : INFO : 135 batches submitted to accumulate stats from 8640 documents (2395926 virtual) 2025-04-19 00:10:48,711 : INFO : 136 batches submitted to accumulate stats from 8704 documents (2403405 virtual) 2025-04-19 00:10:48,715 : INFO : 137 batches submitted to accumulate stats from 8768 documents (2411628 virtual) 2025-04-19 00:10:48,720 : INFO : 138 batches submitted to accumulate stats from 8832 documents (2419219 virtual) 2025-04-19 00:10:48,753 : INFO : 139 batches submitted to accumulate stats from 8896 documents (2428220 virtual) 2025-04-19 00:10:48,775 : INFO : 140 batches submitted to accumulate stats from 8960 documents (2436470 virtual) 2025-04-19 00:10:48,786 : INFO : 141 batches submitted to accumulate stats from 9024 documents (2446006 virtual) 2025-04-19 00:10:48,793 : INFO : 142 batches submitted to accumulate stats from 9088 documents (2453039 virtual) 2025-04-19 00:10:48,796 : INFO : 143 batches submitted to accumulate stats from 9152 documents (2460905 virtual) 2025-04-19 00:10:48,803 : INFO : 144 batches submitted to accumulate stats from 9216 documents (2468645 virtual) 2025-04-19 00:10:48,810 : INFO : 145 batches submitted to accumulate stats from 9280 documents (2476321 virtual) 2025-04-19 00:10:48,836 : INFO : 146 batches submitted to accumulate stats from 9344 documents (2481981 virtual) 2025-04-19 00:10:48,860 : INFO : 147 batches submitted to accumulate stats from 9408 documents (2489833 virtual) 2025-04-19 00:10:48,872 : INFO : 148 batches submitted to accumulate stats from 9472 documents (2496627 virtual) 2025-04-19 00:10:48,874 : INFO : 149 batches submitted to accumulate stats from 9536 documents (2502106 virtual) 2025-04-19 00:10:48,876 : INFO : 150 batches submitted to accumulate stats from 9600 documents (2508434 virtual) 2025-04-19 00:10:48,883 : INFO : 151 batches submitted to accumulate stats from 9664 documents (2517654 virtual) 2025-04-19 00:10:48,885 : INFO : 152 batches submitted to accumulate stats from 9728 documents (2525651 virtual) 2025-04-19 00:10:48,910 : INFO : 153 batches submitted to accumulate stats from 9792 documents (2534661 virtual) 2025-04-19 00:10:48,925 : INFO : 154 batches submitted to accumulate stats from 9856 documents (2542846 virtual) 2025-04-19 00:10:48,961 : INFO : 155 batches submitted to accumulate stats from 9920 documents (2549206 virtual) 2025-04-19 00:10:48,976 : INFO : 156 batches submitted to accumulate stats from 9984 documents (2556742 virtual) 2025-04-19 00:10:48,978 : INFO : 157 batches submitted to accumulate stats from 10048 documents (2565026 virtual) 2025-04-19 00:10:48,979 : INFO : 158 batches submitted to accumulate stats from 10112 documents (2571434 virtual) 2025-04-19 00:10:49,009 : INFO : 159 batches submitted to accumulate stats from 10176 documents (2581280 virtual) 2025-04-19 00:10:49,011 : INFO : 160 batches submitted to accumulate stats from 10240 documents (2589671 virtual) 2025-04-19 00:10:49,015 : INFO : 161 batches submitted to accumulate stats from 10304 documents (2596979 virtual) 2025-04-19 00:10:49,024 : INFO : 162 batches submitted to accumulate stats from 10368 documents (2604556 virtual) 2025-04-19 00:10:49,029 : INFO : 163 batches submitted to accumulate stats from 10432 documents (2613656 virtual) 2025-04-19 00:10:49,032 : INFO : 164 batches submitted to accumulate stats from 10496 documents (2623890 virtual) 2025-04-19 00:10:49,060 : INFO : 165 batches submitted to accumulate stats from 10560 documents (2629308 virtual) 2025-04-19 00:10:49,063 : INFO : 166 batches submitted to accumulate stats from 10624 documents (2636085 virtual) 2025-04-19 00:10:49,065 : INFO : 167 batches submitted to accumulate stats from 10688 documents (2642039 virtual) 2025-04-19 00:10:49,069 : INFO : 168 batches submitted to accumulate stats from 10752 documents (2648389 virtual) 2025-04-19 00:10:49,088 : INFO : 169 batches submitted to accumulate stats from 10816 documents (2661959 virtual) 2025-04-19 00:10:49,133 : INFO : 170 batches submitted to accumulate stats from 10880 documents (2672949 virtual) 2025-04-19 00:10:49,137 : INFO : 171 batches submitted to accumulate stats from 10944 documents (2683365 virtual) 2025-04-19 00:10:49,150 : INFO : 172 batches submitted to accumulate stats from 11008 documents (2690484 virtual) 2025-04-19 00:10:49,161 : INFO : 173 batches submitted to accumulate stats from 11072 documents (2700627 virtual) 2025-04-19 00:10:49,166 : INFO : 174 batches submitted to accumulate stats from 11136 documents (2708742 virtual) 2025-04-19 00:10:49,180 : INFO : 175 batches submitted to accumulate stats from 11200 documents (2718156 virtual) 2025-04-19 00:10:49,195 : INFO : 176 batches submitted to accumulate stats from 11264 documents (2727801 virtual) 2025-04-19 00:10:49,196 : INFO : 177 batches submitted to accumulate stats from 11328 documents (2736288 virtual) 2025-04-19 00:10:49,202 : INFO : 178 batches submitted to accumulate stats from 11392 documents (2743845 virtual) 2025-04-19 00:10:49,211 : INFO : 179 batches submitted to accumulate stats from 11456 documents (2750885 virtual) 2025-04-19 00:10:49,219 : INFO : 180 batches submitted to accumulate stats from 11520 documents (2759213 virtual) 2025-04-19 00:10:49,221 : INFO : 181 batches submitted to accumulate stats from 11584 documents (2770309 virtual) 2025-04-19 00:10:49,226 : INFO : 182 batches submitted to accumulate stats from 11648 documents (2781566 virtual) 2025-04-19 00:10:49,267 : INFO : 183 batches submitted to accumulate stats from 11712 documents (2793513 virtual) 2025-04-19 00:10:49,270 : INFO : 184 batches submitted to accumulate stats from 11776 documents (2805133 virtual) 2025-04-19 00:10:49,313 : INFO : 185 batches submitted to accumulate stats from 11840 documents (2814621 virtual) 2025-04-19 00:10:49,315 : INFO : 186 batches submitted to accumulate stats from 11904 documents (2825917 virtual) 2025-04-19 00:10:49,319 : INFO : 187 batches submitted to accumulate stats from 11968 documents (2834764 virtual) 2025-04-19 00:10:49,344 : INFO : 188 batches submitted to accumulate stats from 12032 documents (2844523 virtual) 2025-04-19 00:10:49,354 : INFO : 189 batches submitted to accumulate stats from 12096 documents (2854512 virtual) 2025-04-19 00:10:49,363 : INFO : 190 batches submitted to accumulate stats from 12160 documents (2863511 virtual) 2025-04-19 00:10:49,375 : INFO : 191 batches submitted to accumulate stats from 12224 documents (2872492 virtual) 2025-04-19 00:10:49,377 : INFO : 192 batches submitted to accumulate stats from 12288 documents (2881543 virtual) 2025-04-19 00:10:49,383 : INFO : 193 batches submitted to accumulate stats from 12352 documents (2891233 virtual) 2025-04-19 00:10:49,387 : INFO : 194 batches submitted to accumulate stats from 12416 documents (2899835 virtual) 2025-04-19 00:10:49,423 : INFO : 195 batches submitted to accumulate stats from 12480 documents (2908542 virtual) 2025-04-19 00:10:49,438 : INFO : 196 batches submitted to accumulate stats from 12544 documents (2920162 virtual) 2025-04-19 00:10:49,444 : INFO : 197 batches submitted to accumulate stats from 12608 documents (2931072 virtual) 2025-04-19 00:10:49,453 : INFO : 198 batches submitted to accumulate stats from 12672 documents (2942168 virtual) 2025-04-19 00:10:49,457 : INFO : 199 batches submitted to accumulate stats from 12736 documents (2951378 virtual) 2025-04-19 00:10:49,464 : INFO : 200 batches submitted to accumulate stats from 12800 documents (2964980 virtual) 2025-04-19 00:10:49,544 : INFO : 201 batches submitted to accumulate stats from 12864 documents (2974742 virtual) 2025-04-19 00:10:49,546 : INFO : 202 batches submitted to accumulate stats from 12928 documents (2984778 virtual) 2025-04-19 00:10:49,553 : INFO : 203 batches submitted to accumulate stats from 12992 documents (2994073 virtual) 2025-04-19 00:10:49,563 : INFO : 204 batches submitted to accumulate stats from 13056 documents (3002522 virtual) 2025-04-19 00:10:49,568 : INFO : 205 batches submitted to accumulate stats from 13120 documents (3012040 virtual) 2025-04-19 00:10:49,587 : INFO : 206 batches submitted to accumulate stats from 13184 documents (3019919 virtual) 2025-04-19 00:10:49,589 : INFO : 207 batches submitted to accumulate stats from 13248 documents (3029004 virtual) 2025-04-19 00:10:49,618 : INFO : 208 batches submitted to accumulate stats from 13312 documents (3037489 virtual) 2025-04-19 00:10:49,626 : INFO : 209 batches submitted to accumulate stats from 13376 documents (3044929 virtual) 2025-04-19 00:10:49,648 : INFO : 210 batches submitted to accumulate stats from 13440 documents (3054034 virtual) 2025-04-19 00:10:49,675 : INFO : 211 batches submitted to accumulate stats from 13504 documents (3064099 virtual) 2025-04-19 00:10:49,678 : INFO : 212 batches submitted to accumulate stats from 13568 documents (3074522 virtual) 2025-04-19 00:10:49,682 : INFO : 213 batches submitted to accumulate stats from 13632 documents (3083808 virtual) 2025-04-19 00:10:49,687 : INFO : 214 batches submitted to accumulate stats from 13696 documents (3093078 virtual) 2025-04-19 00:10:49,695 : INFO : 215 batches submitted to accumulate stats from 13760 documents (3102171 virtual) 2025-04-19 00:10:49,701 : INFO : 216 batches submitted to accumulate stats from 13824 documents (3111128 virtual) 2025-04-19 00:10:49,720 : INFO : 217 batches submitted to accumulate stats from 13888 documents (3120517 virtual) 2025-04-19 00:10:49,732 : INFO : 218 batches submitted to accumulate stats from 13952 documents (3130614 virtual) 2025-04-19 00:10:49,737 : INFO : 219 batches submitted to accumulate stats from 14016 documents (3139268 virtual) 2025-04-19 00:10:49,759 : INFO : 220 batches submitted to accumulate stats from 14080 documents (3148635 virtual) 2025-04-19 00:10:49,762 : INFO : 221 batches submitted to accumulate stats from 14144 documents (3157335 virtual) 2025-04-19 00:10:49,764 : INFO : 222 batches submitted to accumulate stats from 14208 documents (3165838 virtual) 2025-04-19 00:10:49,793 : INFO : 223 batches submitted to accumulate stats from 14272 documents (3175765 virtual) 2025-04-19 00:10:49,794 : INFO : 224 batches submitted to accumulate stats from 14336 documents (3183123 virtual) 2025-04-19 00:10:49,835 : INFO : 225 batches submitted to accumulate stats from 14400 documents (3189537 virtual) 2025-04-19 00:10:49,840 : INFO : 226 batches submitted to accumulate stats from 14464 documents (3197239 virtual) 2025-04-19 00:10:49,854 : INFO : 227 batches submitted to accumulate stats from 14528 documents (3205518 virtual) 2025-04-19 00:10:49,860 : INFO : 228 batches submitted to accumulate stats from 14592 documents (3215608 virtual) 2025-04-19 00:10:49,863 : INFO : 229 batches submitted to accumulate stats from 14656 documents (3223376 virtual) 2025-04-19 00:10:49,866 : INFO : 230 batches submitted to accumulate stats from 14720 documents (3232304 virtual) 2025-04-19 00:10:49,875 : INFO : 231 batches submitted to accumulate stats from 14784 documents (3240270 virtual) 2025-04-19 00:10:49,894 : INFO : 232 batches submitted to accumulate stats from 14848 documents (3249755 virtual) 2025-04-19 00:10:49,919 : INFO : 233 batches submitted to accumulate stats from 14912 documents (3259377 virtual) 2025-04-19 00:10:49,925 : INFO : 234 batches submitted to accumulate stats from 14976 documents (3269637 virtual) 2025-04-19 00:10:49,931 : INFO : 235 batches submitted to accumulate stats from 15040 documents (3278311 virtual) 2025-04-19 00:10:49,941 : INFO : 236 batches submitted to accumulate stats from 15104 documents (3286321 virtual) 2025-04-19 00:10:49,943 : INFO : 237 batches submitted to accumulate stats from 15168 documents (3293385 virtual) 2025-04-19 00:10:49,945 : INFO : 238 batches submitted to accumulate stats from 15232 documents (3300334 virtual) 2025-04-19 00:10:49,963 : INFO : 239 batches submitted to accumulate stats from 15296 documents (3308226 virtual) 2025-04-19 00:10:50,013 : INFO : 240 batches submitted to accumulate stats from 15360 documents (3317325 virtual) 2025-04-19 00:10:50,024 : INFO : 241 batches submitted to accumulate stats from 15424 documents (3325778 virtual) 2025-04-19 00:10:50,032 : INFO : 242 batches submitted to accumulate stats from 15488 documents (3335373 virtual) 2025-04-19 00:10:50,041 : INFO : 243 batches submitted to accumulate stats from 15552 documents (3342716 virtual) 2025-04-19 00:10:50,043 : INFO : 244 batches submitted to accumulate stats from 15616 documents (3350508 virtual) 2025-04-19 00:10:50,050 : INFO : 245 batches submitted to accumulate stats from 15680 documents (3360131 virtual) 2025-04-19 00:10:50,052 : INFO : 246 batches submitted to accumulate stats from 15744 documents (3370635 virtual) 2025-04-19 00:10:50,087 : INFO : 247 batches submitted to accumulate stats from 15808 documents (3380994 virtual) 2025-04-19 00:10:50,096 : INFO : 248 batches submitted to accumulate stats from 15872 documents (3389920 virtual) 2025-04-19 00:10:50,098 : INFO : 249 batches submitted to accumulate stats from 15936 documents (3397487 virtual) 2025-04-19 00:10:50,104 : INFO : 250 batches submitted to accumulate stats from 16000 documents (3406129 virtual) 2025-04-19 00:10:50,107 : INFO : 251 batches submitted to accumulate stats from 16064 documents (3416805 virtual) 2025-04-19 00:10:50,109 : INFO : 252 batches submitted to accumulate stats from 16128 documents (3426189 virtual) 2025-04-19 00:10:50,117 : INFO : 253 batches submitted to accumulate stats from 16192 documents (3433824 virtual) 2025-04-19 00:10:50,160 : INFO : 254 batches submitted to accumulate stats from 16256 documents (3443379 virtual) 2025-04-19 00:10:50,161 : INFO : 255 batches submitted to accumulate stats from 16320 documents (3450914 virtual) 2025-04-19 00:10:50,385 : INFO : 7 accumulators retrieved from output queue 2025-04-19 00:10:50,395 : INFO : accumulated word occurrence stats for 3451622 virtual documents 2025-04-19 00:10:50,518 : INFO : using symmetric alpha at 0.1111111111111111 2025-04-19 00:10:50,519 : INFO : using symmetric eta at 0.1111111111111111 2025-04-19 00:10:50,520 : INFO : using serial LDA version on this node 2025-04-19 00:10:50,528 : INFO : running online (multi-pass) LDA training, 9 topics, 5 passes over the supplied corpus of 16310 documents, updating model once every 2000 documents, evaluating perplexity every 16310 documents, iterating 50x with a convergence threshold of 0.001000 2025-04-19 00:10:50,529 : INFO : PROGRESS: pass 0, at document #2000/16310 2025-04-19 00:10:51,191 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:51,195 : INFO : topic #5 (0.111): 0.018*"工作" + 0.013*"方式" + 0.011*"空白" + 0.010*"聯絡" + 0.010*"應徵" + 0.009*"標題" + 0.009*"小時" + 0.008*"內容" + 0.008*"分類" + 0.008*"資訊" 2025-04-19 00:10:51,196 : INFO : topic #4 (0.111): 0.039*"工作" + 0.017*"推定" + 0.013*"空白" + 0.012*"方式" + 0.011*"聯絡" + 0.010*"單位" + 0.010*"第一項" + 0.010*"聯絡人" + 0.009*"內容" + 0.009*"承攬" 2025-04-19 00:10:51,196 : INFO : topic #2 (0.111): 0.041*"工作" + 0.013*"內容" + 0.012*"工資" + 0.012*"推定" + 0.012*"方式" + 0.012*"應徵" + 0.011*"小時" + 0.010*"砍除" + 0.010*"聯絡" + 0.010*"情形" 2025-04-19 00:10:51,197 : INFO : topic #7 (0.111): 0.026*"工作" + 0.014*"空白" + 0.012*"推定" + 0.011*"聯絡" + 0.011*"砍除" + 0.011*"內容" + 0.011*"第一項" + 0.011*"資訊" + 0.010*"情形" + 0.010*"方式" 2025-04-19 00:10:51,197 : INFO : topic #0 (0.111): 0.030*"工作" + 0.015*"方式" + 0.014*"應徵" + 0.012*"砍除" + 0.012*"推定" + 0.012*"單位" + 0.012*"空白" + 0.010*"內容" + 0.010*"資訊" + 0.009*"第一項" 2025-04-19 00:10:51,198 : INFO : topic diff=8.691695, rho=1.000000 2025-04-19 00:10:51,199 : INFO : PROGRESS: pass 0, at document #4000/16310 2025-04-19 00:10:51,853 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:51,857 : INFO : topic #5 (0.111): 0.019*"工作" + 0.017*"方式" + 0.009*"應徵" + 0.009*"依法" + 0.009*"通知" + 0.009*"聯絡" + 0.009*"時間" + 0.008*"標題" + 0.008*"內容" + 0.007*"小時" 2025-04-19 00:10:51,858 : INFO : topic #7 (0.111): 0.026*"工作" + 0.013*"空白" + 0.011*"推定" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"砍除" + 0.010*"資訊" + 0.010*"第一項" + 0.010*"情形" + 0.009*"方式" 2025-04-19 00:10:51,858 : INFO : topic #1 (0.111): 0.030*"工作" + 0.016*"方式" + 0.012*"砍除" + 0.012*"推定" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"資訊" + 0.011*"單位" + 0.011*"內容" 2025-04-19 00:10:51,859 : INFO : topic #8 (0.111): 0.032*"工作" + 0.015*"推定" + 0.013*"國定假日" + 0.013*"情形" + 0.013*"空白" + 0.011*"應徵" + 0.011*"方式" + 0.011*"第一項" + 0.010*"單位" + 0.010*"砍除" 2025-04-19 00:10:51,860 : INFO : topic #6 (0.111): 0.027*"報名" + 0.024*"活動" + 0.022*"電話" + 0.015*"台北市" + 0.014*"車馬費" + 0.014*"人數" + 0.012*"資料" + 0.011*"舉辦" + 0.011*"訪問" + 0.011*"時間" 2025-04-19 00:10:51,860 : INFO : topic diff=0.773415, rho=0.707107 2025-04-19 00:10:51,861 : INFO : PROGRESS: pass 0, at document #6000/16310 2025-04-19 00:10:52,403 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:52,407 : INFO : topic #0 (0.111): 0.030*"工作" + 0.014*"砍除" + 0.013*"應徵" + 0.012*"方式" + 0.012*"空白" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"推定" + 0.011*"內容" + 0.010*"單位" 2025-04-19 00:10:52,408 : INFO : topic #3 (0.111): 0.013*"工作" + 0.012*"方式" + 0.009*"時間" + 0.009*"公司" + 0.009*"聯絡人" + 0.009*"資訊" + 0.008*"連結" + 0.008*"內容" + 0.008*"分類" + 0.008*"文字" 2025-04-19 00:10:52,408 : INFO : topic #6 (0.111): 0.028*"報名" + 0.025*"活動" + 0.020*"電話" + 0.016*"台北市" + 0.013*"車馬費" + 0.013*"資料" + 0.012*"人數" + 0.012*"訪問" + 0.011*"舉辦" + 0.011*"時間" 2025-04-19 00:10:52,409 : INFO : topic #7 (0.111): 0.024*"工作" + 0.012*"空白" + 0.010*"內容" + 0.010*"資訊" + 0.010*"推定" + 0.010*"聯絡" + 0.010*"砍除" + 0.009*"第一項" + 0.009*"情形" + 0.009*"方式" 2025-04-19 00:10:52,409 : INFO : topic #5 (0.111): 0.020*"工作" + 0.017*"方式" + 0.010*"時間" + 0.009*"依法" + 0.009*"通知" + 0.007*"應徵" + 0.007*"聯絡" + 0.007*"面試" + 0.007*"每日" + 0.007*"電話" 2025-04-19 00:10:52,410 : INFO : topic diff=0.557848, rho=0.577350 2025-04-19 00:10:52,410 : INFO : PROGRESS: pass 0, at document #8000/16310 2025-04-19 00:10:52,789 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:52,793 : INFO : topic #2 (0.111): 0.034*"工作" + 0.014*"公司" + 0.010*"方式" + 0.009*"面試" + 0.009*"時間" + 0.008*"內容" + 0.008*"小時" + 0.006*"覺得" + 0.006*"開發" + 0.006*"推定" 2025-04-19 00:10:52,794 : INFO : topic #4 (0.111): 0.037*"工作" + 0.016*"推定" + 0.015*"空白" + 0.012*"聯絡" + 0.012*"第一項" + 0.011*"方式" + 0.011*"砍除" + 0.010*"資訊" + 0.010*"情形" + 0.010*"承攬" 2025-04-19 00:10:52,794 : INFO : topic #0 (0.111): 0.030*"工作" + 0.014*"砍除" + 0.013*"應徵" + 0.012*"方式" + 0.012*"空白" + 0.011*"第一項" + 0.011*"資訊" + 0.011*"推定" + 0.010*"內容" + 0.010*"單位" 2025-04-19 00:10:52,795 : INFO : topic #6 (0.111): 0.018*"報名" + 0.017*"活動" + 0.016*"產品" + 0.014*"資料" + 0.013*"電話" + 0.013*"使用" + 0.011*"台北市" + 0.010*"目前" + 0.010*"進行" + 0.010*"介紹" 2025-04-19 00:10:52,795 : INFO : topic #7 (0.111): 0.023*"工作" + 0.011*"空白" + 0.010*"內容" + 0.009*"資訊" + 0.009*"推定" + 0.009*"聯絡" + 0.009*"砍除" + 0.008*"第一項" + 0.008*"情形" + 0.008*"方式" 2025-04-19 00:10:52,796 : INFO : topic diff=0.651236, rho=0.500000 2025-04-19 00:10:52,796 : INFO : PROGRESS: pass 0, at document #10000/16310 2025-04-19 00:10:53,069 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:53,073 : INFO : topic #0 (0.111): 0.030*"工作" + 0.014*"砍除" + 0.013*"應徵" + 0.012*"方式" + 0.012*"空白" + 0.011*"第一項" + 0.011*"資訊" + 0.010*"推定" + 0.010*"內容" + 0.010*"單位" 2025-04-19 00:10:53,073 : INFO : topic #3 (0.111): 0.016*"研發" + 0.013*"公司" + 0.012*"資工" + 0.011*"工作" + 0.010*"數學" + 0.010*"職場" + 0.009*"方式" + 0.008*"時間" + 0.008*"資訊" + 0.007*"連結" 2025-04-19 00:10:53,074 : INFO : topic #4 (0.111): 0.037*"工作" + 0.016*"推定" + 0.015*"空白" + 0.012*"聯絡" + 0.011*"第一項" + 0.011*"方式" + 0.011*"砍除" + 0.010*"資訊" + 0.010*"情形" + 0.010*"承攬" 2025-04-19 00:10:53,074 : INFO : topic #6 (0.111): 0.020*"產品" + 0.016*"資料" + 0.016*"報名" + 0.015*"活動" + 0.014*"使用" + 0.011*"進行" + 0.010*"目前" + 0.010*"介紹" + 0.010*"電話" + 0.010*"公司" 2025-04-19 00:10:53,075 : INFO : topic #5 (0.111): 0.017*"公司" + 0.014*"工作" + 0.012*"面試" + 0.011*"問題" + 0.010*"工程師" + 0.009*"經驗" + 0.008*"技術" + 0.008*"團隊" + 0.008*"時間" + 0.008*"目前" 2025-04-19 00:10:53,075 : INFO : topic diff=0.540137, rho=0.447214 2025-04-19 00:10:53,076 : INFO : PROGRESS: pass 0, at document #12000/16310 2025-04-19 00:10:53,344 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:53,347 : INFO : topic #1 (0.111): 0.029*"工作" + 0.015*"方式" + 0.013*"砍除" + 0.011*"第一項" + 0.011*"推定" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"資訊" + 0.011*"內容" + 0.011*"單位" 2025-04-19 00:10:53,348 : INFO : topic #2 (0.111): 0.025*"工作" + 0.015*"公司" + 0.008*"覺得" + 0.008*"面試" + 0.007*"程式" + 0.007*"時間" + 0.007*"比較" + 0.007*"開發" + 0.006*"真的" + 0.006*"應該" 2025-04-19 00:10:53,348 : INFO : topic #0 (0.111): 0.030*"工作" + 0.014*"砍除" + 0.012*"應徵" + 0.012*"方式" + 0.012*"空白" + 0.011*"第一項" + 0.011*"資訊" + 0.010*"推定" + 0.010*"內容" + 0.010*"單位" 2025-04-19 00:10:53,349 : INFO : topic #4 (0.111): 0.037*"工作" + 0.016*"推定" + 0.015*"空白" + 0.011*"聯絡" + 0.011*"第一項" + 0.011*"方式" + 0.011*"砍除" + 0.010*"資訊" + 0.010*"情形" + 0.010*"承攬" 2025-04-19 00:10:53,350 : INFO : topic #5 (0.111): 0.016*"公司" + 0.011*"工作" + 0.009*"問題" + 0.009*"面試" + 0.008*"工程師" + 0.008*"技術" + 0.007*"經驗" + 0.007*"目前" + 0.006*"台灣" + 0.006*"時間" 2025-04-19 00:10:53,350 : INFO : topic diff=0.489343, rho=0.408248 2025-04-19 00:10:53,351 : INFO : PROGRESS: pass 0, at document #14000/16310 2025-04-19 00:10:53,666 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:53,677 : INFO : topic #3 (0.111): 0.096*"半導體" + 0.043*"研發" + 0.042*"製程" + 0.031*"中國" + 0.026*"表示" + 0.014*"英特爾" + 0.012*"職場" + 0.009*"公司" + 0.008*"資工" + 0.007*"數學" 2025-04-19 00:10:53,679 : INFO : topic #5 (0.111): 0.014*"公司" + 0.009*"台灣" + 0.008*"工作" + 0.007*"技術" + 0.007*"工程師" + 0.007*"問題" + 0.006*"員工" + 0.006*"美國" + 0.006*"面試" + 0.005*"目前" 2025-04-19 00:10:53,680 : INFO : topic #4 (0.111): 0.037*"工作" + 0.016*"推定" + 0.014*"空白" + 0.011*"聯絡" + 0.011*"第一項" + 0.011*"方式" + 0.011*"砍除" + 0.010*"資訊" + 0.010*"情形" + 0.010*"承攬" 2025-04-19 00:10:53,682 : INFO : topic #6 (0.111): 0.017*"產品" + 0.012*"進行" + 0.012*"使用" + 0.012*"資料" + 0.012*"活動" + 0.011*"研究" + 0.010*"影響" + 0.010*"報名" + 0.009*"模型" + 0.008*"目前" 2025-04-19 00:10:53,683 : INFO : topic #2 (0.111): 0.022*"工作" + 0.014*"公司" + 0.007*"覺得" + 0.006*"面試" + 0.006*"比較" + 0.006*"時間" + 0.006*"程式" + 0.006*"應該" + 0.005*"真的" + 0.005*"開發" 2025-04-19 00:10:53,684 : INFO : topic diff=0.463579, rho=0.377964 2025-04-19 00:10:53,686 : INFO : PROGRESS: pass 0, at document #16000/16310 2025-04-19 00:10:53,956 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:53,960 : INFO : topic #2 (0.111): 0.019*"工作" + 0.014*"公司" + 0.007*"覺得" + 0.006*"應該" + 0.005*"真的" + 0.005*"面試" + 0.005*"比較" + 0.005*"時間" + 0.004*"程式" + 0.004*"開發" 2025-04-19 00:10:53,961 : INFO : topic #7 (0.111): 0.046*"退休" + 0.016*"工作" + 0.008*"終止" + 0.007*"表示" + 0.007*"日本" + 0.006*"空白" + 0.006*"資訊" + 0.006*"內容" + 0.006*"商用" + 0.006*"東京" 2025-04-19 00:10:53,961 : INFO : topic #4 (0.111): 0.036*"工作" + 0.016*"推定" + 0.014*"空白" + 0.011*"聯絡" + 0.011*"第一項" + 0.011*"方式" + 0.011*"砍除" + 0.010*"資訊" + 0.010*"情形" + 0.010*"承攬" 2025-04-19 00:10:53,962 : INFO : topic #5 (0.111): 0.013*"公司" + 0.010*"台灣" + 0.008*"美國" + 0.007*"技術" + 0.007*"員工" + 0.007*"晶片" + 0.007*"工作" + 0.006*"工程師" + 0.006*"台積電" + 0.005*"問題" 2025-04-19 00:10:53,963 : INFO : topic #1 (0.111): 0.029*"工作" + 0.015*"方式" + 0.012*"砍除" + 0.011*"第一項" + 0.011*"推定" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"資訊" + 0.010*"內容" + 0.010*"單位" 2025-04-19 00:10:53,963 : INFO : topic diff=0.441207, rho=0.353553 2025-04-19 00:10:54,038 : INFO : -9.361 per-word bound, 657.5 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:10:54,038 : INFO : PROGRESS: pass 0, at document #16310/16310 2025-04-19 00:10:54,075 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:10:54,079 : INFO : topic #8 (0.111): 0.031*"工作" + 0.013*"推定" + 0.013*"國定假日" + 0.012*"情形" + 0.012*"空白" + 0.011*"應徵" + 0.010*"第一項" + 0.010*"方式" + 0.010*"砍除" + 0.010*"單位" 2025-04-19 00:10:54,080 : INFO : topic #1 (0.111): 0.028*"工作" + 0.015*"方式" + 0.012*"砍除" + 0.011*"第一項" + 0.011*"推定" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"資訊" + 0.010*"內容" + 0.010*"單位" 2025-04-19 00:10:54,080 : INFO : topic #7 (0.111): 0.048*"退休" + 0.014*"工作" + 0.009*"東京" + 0.007*"終止" + 0.007*"商用" + 0.007*"表示" + 0.006*"三家" + 0.006*"日本" + 0.006*"空白" + 0.006*"資訊" 2025-04-19 00:10:54,081 : INFO : topic #6 (0.111): 0.016*"產品" + 0.012*"研究" + 0.012*"進行" + 0.011*"影響" + 0.011*"模型" + 0.011*"蘋果" + 0.010*"資料" + 0.010*"機器人" + 0.009*"活動" + 0.009*"使用" 2025-04-19 00:10:54,081 : INFO : topic #4 (0.111): 0.035*"工作" + 0.015*"推定" + 0.014*"空白" + 0.011*"聯絡" + 0.011*"第一項" + 0.011*"方式" + 0.011*"砍除" + 0.010*"資訊" + 0.010*"情形" + 0.010*"承攬" 2025-04-19 00:10:54,082 : INFO : topic diff=0.406976, rho=0.333333 2025-04-19 00:10:54,082 : INFO : PROGRESS: pass 1, at document #2000/16310 2025-04-19 00:10:54,747 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:54,751 : INFO : topic #8 (0.111): 0.034*"工作" + 0.014*"推定" + 0.013*"情形" + 0.013*"國定假日" + 0.012*"空白" + 0.012*"應徵" + 0.011*"方式" + 0.011*"砍除" + 0.011*"第一項" + 0.010*"單位" 2025-04-19 00:10:54,751 : INFO : topic #0 (0.111): 0.032*"工作" + 0.012*"方式" + 0.012*"應徵" + 0.011*"砍除" + 0.011*"內容" + 0.010*"空白" + 0.010*"資訊" + 0.010*"推定" + 0.010*"單位" + 0.010*"第一項" 2025-04-19 00:10:54,752 : INFO : topic #4 (0.111): 0.037*"工作" + 0.017*"推定" + 0.014*"空白" + 0.012*"方式" + 0.012*"砍除" + 0.012*"聯絡" + 0.011*"內容" + 0.011*"單位" + 0.011*"第一項" + 0.011*"資訊" 2025-04-19 00:10:54,752 : INFO : topic #2 (0.111): 0.019*"工作" + 0.014*"公司" + 0.006*"真的" + 0.006*"覺得" + 0.006*"時間" + 0.006*"應該" + 0.005*"面試" + 0.005*"比較" + 0.004*"台積" + 0.004*"東西" 2025-04-19 00:10:54,753 : INFO : topic #6 (0.111): 0.019*"報名" + 0.018*"活動" + 0.012*"進行" + 0.011*"電話" + 0.011*"資料" + 0.011*"研究" + 0.010*"台北市" + 0.010*"產品" + 0.010*"舉辦" + 0.009*"參與" 2025-04-19 00:10:54,753 : INFO : topic diff=1.092497, rho=0.313805 2025-04-19 00:10:54,754 : INFO : PROGRESS: pass 1, at document #4000/16310 2025-04-19 00:10:55,372 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:55,376 : INFO : topic #5 (0.111): 0.013*"公司" + 0.010*"美國" + 0.010*"台灣" + 0.008*"技術" + 0.007*"員工" + 0.006*"晶片" + 0.006*"台積電" + 0.006*"工作" + 0.006*"科技" + 0.005*"問題" 2025-04-19 00:10:55,376 : INFO : topic #7 (0.111): 0.019*"工作" + 0.016*"退休" + 0.013*"訪員" + 0.010*"時間" + 0.009*"內容" + 0.008*"台北市" + 0.007*"南港" + 0.007*"南港區" + 0.007*"規定" + 0.007*"店家" 2025-04-19 00:10:55,377 : INFO : topic #2 (0.111): 0.019*"工作" + 0.013*"公司" + 0.006*"時間" + 0.006*"面試" + 0.006*"真的" + 0.005*"覺得" + 0.005*"應該" + 0.005*"比較" + 0.004*"內容" + 0.004*"需要" 2025-04-19 00:10:55,378 : INFO : topic #4 (0.111): 0.036*"工作" + 0.017*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.012*"聯絡" + 0.011*"內容" + 0.011*"資訊" + 0.011*"單位" + 0.011*"第一項" 2025-04-19 00:10:55,378 : INFO : topic #3 (0.111): 0.044*"半導體" + 0.028*"中國" + 0.024*"英特爾" + 0.024*"研發" + 0.024*"製程" + 0.023*"表示" + 0.018*"川普" + 0.018*"投資" + 0.012*"先進" + 0.009*"魏哲家" 2025-04-19 00:10:55,379 : INFO : topic diff=0.414560, rho=0.313805 2025-04-19 00:10:55,379 : INFO : PROGRESS: pass 1, at document #6000/16310 2025-04-19 00:10:55,911 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:55,916 : INFO : topic #1 (0.111): 0.037*"工作" + 0.021*"方式" + 0.013*"推定" + 0.012*"內容" + 0.012*"聯絡" + 0.012*"單位" + 0.012*"小時" + 0.011*"未註明" + 0.010*"工資" + 0.010*"時間" 2025-04-19 00:10:55,917 : INFO : topic #3 (0.111): 0.039*"半導體" + 0.026*"中國" + 0.022*"研發" + 0.021*"英特爾" + 0.021*"表示" + 0.021*"製程" + 0.016*"川普" + 0.016*"投資" + 0.011*"先進" + 0.009*"職場" 2025-04-19 00:10:55,917 : INFO : topic #6 (0.111): 0.028*"報名" + 0.025*"活動" + 0.018*"電話" + 0.015*"台北市" + 0.013*"資料" + 0.012*"車馬費" + 0.012*"舉辦" + 0.012*"人數" + 0.011*"進行" + 0.011*"訪問" 2025-04-19 00:10:55,918 : INFO : topic #0 (0.111): 0.032*"工作" + 0.012*"方式" + 0.011*"砍除" + 0.011*"應徵" + 0.010*"內容" + 0.010*"資訊" + 0.010*"第一項" + 0.010*"空白" + 0.010*"情形" + 0.010*"文字" 2025-04-19 00:10:55,918 : INFO : topic #4 (0.111): 0.035*"工作" + 0.016*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.012*"聯絡" + 0.011*"內容" + 0.011*"資訊" + 0.011*"第一項" + 0.011*"單位" 2025-04-19 00:10:55,919 : INFO : topic diff=0.226413, rho=0.313805 2025-04-19 00:10:55,919 : INFO : PROGRESS: pass 1, at document #8000/16310 2025-04-19 00:10:56,202 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:56,206 : INFO : topic #0 (0.111): 0.033*"工作" + 0.012*"方式" + 0.011*"應徵" + 0.011*"砍除" + 0.011*"資訊" + 0.010*"內容" + 0.010*"第一項" + 0.010*"空白" + 0.010*"情形" + 0.010*"聯絡" 2025-04-19 00:10:56,207 : INFO : topic #8 (0.111): 0.033*"工作" + 0.013*"情形" + 0.013*"推定" + 0.013*"空白" + 0.012*"國定假日" + 0.012*"第一項" + 0.012*"砍除" + 0.011*"應徵" + 0.011*"方式" + 0.010*"文字" 2025-04-19 00:10:56,207 : INFO : topic #3 (0.111): 0.039*"半導體" + 0.032*"研發" + 0.025*"中國" + 0.020*"表示" + 0.019*"製程" + 0.019*"英特爾" + 0.015*"職場" + 0.015*"投資" + 0.015*"川普" + 0.010*"資工" 2025-04-19 00:10:56,208 : INFO : topic #6 (0.111): 0.026*"報名" + 0.024*"活動" + 0.017*"電話" + 0.015*"台北市" + 0.013*"資料" + 0.011*"舉辦" + 0.011*"進行" + 0.011*"車馬費" + 0.011*"人數" + 0.010*"參加" 2025-04-19 00:10:56,208 : INFO : topic #5 (0.111): 0.016*"公司" + 0.008*"工程師" + 0.008*"技術" + 0.007*"工作" + 0.007*"問題" + 0.006*"台灣" + 0.006*"團隊" + 0.006*"目前" + 0.006*"面試" + 0.005*"經驗" 2025-04-19 00:10:56,209 : INFO : topic diff=0.331599, rho=0.313805 2025-04-19 00:10:56,209 : INFO : PROGRESS: pass 1, at document #10000/16310 2025-04-19 00:10:56,466 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:56,470 : INFO : topic #2 (0.111): 0.024*"工作" + 0.016*"公司" + 0.012*"面試" + 0.010*"覺得" + 0.009*"比較" + 0.008*"程式" + 0.008*"時間" + 0.007*"應該" + 0.006*"真的" + 0.006*"一些" 2025-04-19 00:10:56,470 : INFO : topic #5 (0.111): 0.016*"公司" + 0.008*"工程師" + 0.008*"工作" + 0.007*"技術" + 0.007*"問題" + 0.007*"目前" + 0.006*"台灣" + 0.006*"團隊" + 0.006*"經驗" + 0.006*"面試" 2025-04-19 00:10:56,471 : INFO : topic #0 (0.111): 0.033*"工作" + 0.012*"方式" + 0.011*"應徵" + 0.011*"資訊" + 0.011*"砍除" + 0.010*"內容" + 0.010*"第一項" + 0.010*"空白" + 0.010*"聯絡" + 0.010*"情形" 2025-04-19 00:10:56,472 : INFO : topic #3 (0.111): 0.038*"半導體" + 0.036*"研發" + 0.024*"中國" + 0.019*"表示" + 0.018*"製程" + 0.018*"職場" + 0.017*"資工" + 0.017*"英特爾" + 0.014*"投資" + 0.013*"川普" 2025-04-19 00:10:56,472 : INFO : topic #1 (0.111): 0.038*"工作" + 0.022*"方式" + 0.013*"推定" + 0.013*"小時" + 0.012*"聯絡" + 0.012*"內容" + 0.012*"單位" + 0.011*"時間" + 0.011*"未註明" + 0.010*"工資" 2025-04-19 00:10:56,473 : INFO : topic diff=0.281504, rho=0.313805 2025-04-19 00:10:56,473 : INFO : PROGRESS: pass 1, at document #12000/16310 2025-04-19 00:10:56,684 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:56,688 : INFO : topic #1 (0.111): 0.038*"工作" + 0.022*"方式" + 0.013*"小時" + 0.012*"推定" + 0.012*"聯絡" + 0.012*"內容" + 0.012*"單位" + 0.011*"時間" + 0.011*"未註明" + 0.010*"工資" 2025-04-19 00:10:56,689 : INFO : topic #5 (0.111): 0.015*"公司" + 0.007*"技術" + 0.007*"工程師" + 0.007*"工作" + 0.006*"台灣" + 0.006*"問題" + 0.006*"目前" + 0.005*"經驗" + 0.005*"相關" + 0.005*"員工" 2025-04-19 00:10:56,689 : INFO : topic #7 (0.111): 0.045*"日本" + 0.023*"退休" + 0.019*"工作" + 0.015*"東京" + 0.012*"時間" + 0.009*"勞基法" + 0.008*"南港" + 0.008*"時數" + 0.008*"篇幅" + 0.007*"接案" 2025-04-19 00:10:56,690 : INFO : topic #6 (0.111): 0.024*"活動" + 0.024*"報名" + 0.014*"電話" + 0.013*"研究" + 0.012*"資料" + 0.012*"進行" + 0.012*"台北市" + 0.011*"使用" + 0.011*"參加" + 0.010*"舉辦" 2025-04-19 00:10:56,690 : INFO : topic #3 (0.111): 0.039*"半導體" + 0.023*"表示" + 0.021*"中國" + 0.020*"製程" + 0.019*"研發" + 0.016*"晶片" + 0.016*"英特爾" + 0.013*"投資" + 0.011*"全球" + 0.011*"先進" 2025-04-19 00:10:56,691 : INFO : topic diff=0.348719, rho=0.313805 2025-04-19 00:10:56,691 : INFO : PROGRESS: pass 1, at document #14000/16310 2025-04-19 00:10:56,968 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:56,978 : INFO : topic #1 (0.111): 0.038*"工作" + 0.022*"方式" + 0.013*"小時" + 0.012*"推定" + 0.012*"聯絡" + 0.012*"內容" + 0.012*"單位" + 0.011*"時間" + 0.010*"未註明" + 0.010*"工資" 2025-04-19 00:10:56,979 : INFO : topic #2 (0.111): 0.020*"工作" + 0.015*"公司" + 0.010*"面試" + 0.008*"覺得" + 0.008*"比較" + 0.007*"時間" + 0.006*"應該" + 0.006*"真的" + 0.006*"程式" + 0.005*"一些" 2025-04-19 00:10:56,980 : INFO : topic #4 (0.111): 0.034*"工作" + 0.016*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"資訊" + 0.011*"單位" + 0.011*"第一項" 2025-04-19 00:10:56,983 : INFO : topic #3 (0.111): 0.030*"半導體" + 0.024*"表示" + 0.024*"中國" + 0.022*"晶片" + 0.017*"英特爾" + 0.014*"研發" + 0.014*"製程" + 0.013*"全球" + 0.012*"投資" + 0.011*"先進" 2025-04-19 00:10:56,984 : INFO : topic #8 (0.111): 0.033*"工作" + 0.013*"情形" + 0.013*"推定" + 0.013*"空白" + 0.012*"國定假日" + 0.012*"第一項" + 0.012*"砍除" + 0.011*"應徵" + 0.011*"方式" + 0.010*"文字" 2025-04-19 00:10:56,984 : INFO : topic diff=0.333906, rho=0.313805 2025-04-19 00:10:56,985 : INFO : PROGRESS: pass 1, at document #16000/16310 2025-04-19 00:10:57,233 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:57,237 : INFO : topic #1 (0.111): 0.038*"工作" + 0.022*"方式" + 0.013*"小時" + 0.012*"推定" + 0.012*"聯絡" + 0.012*"內容" + 0.012*"單位" + 0.011*"時間" + 0.010*"工資" + 0.010*"未註明" 2025-04-19 00:10:57,238 : INFO : topic #5 (0.111): 0.013*"公司" + 0.008*"台灣" + 0.007*"技術" + 0.006*"員工" + 0.006*"美國" + 0.005*"工程師" + 0.005*"科技" + 0.005*"目前" + 0.005*"工作" + 0.005*"報導" 2025-04-19 00:10:57,238 : INFO : topic #2 (0.111): 0.019*"工作" + 0.015*"公司" + 0.009*"面試" + 0.007*"覺得" + 0.007*"比較" + 0.006*"真的" + 0.006*"應該" + 0.006*"時間" + 0.005*"主管" + 0.005*"程式" 2025-04-19 00:10:57,239 : INFO : topic #0 (0.111): 0.032*"工作" + 0.012*"方式" + 0.011*"應徵" + 0.010*"資訊" + 0.010*"內容" + 0.010*"砍除" + 0.010*"第一項" + 0.009*"空白" + 0.009*"聯絡" + 0.009*"情形" 2025-04-19 00:10:57,240 : INFO : topic #8 (0.111): 0.033*"工作" + 0.013*"情形" + 0.013*"推定" + 0.013*"空白" + 0.012*"國定假日" + 0.012*"第一項" + 0.012*"砍除" + 0.011*"應徵" + 0.011*"方式" + 0.010*"文字" 2025-04-19 00:10:57,240 : INFO : topic diff=0.297242, rho=0.313805 2025-04-19 00:10:57,320 : INFO : -8.773 per-word bound, 437.6 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:10:57,321 : INFO : PROGRESS: pass 1, at document #16310/16310 2025-04-19 00:10:57,356 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:10:57,360 : INFO : topic #8 (0.111): 0.032*"工作" + 0.013*"情形" + 0.013*"國定假日" + 0.013*"推定" + 0.012*"空白" + 0.012*"第一項" + 0.011*"砍除" + 0.011*"應徵" + 0.011*"方式" + 0.010*"文字" 2025-04-19 00:10:57,360 : INFO : topic #5 (0.111): 0.014*"公司" + 0.008*"技術" + 0.008*"台灣" + 0.007*"員工" + 0.007*"美國" + 0.006*"科技" + 0.005*"報導" + 0.005*"工程師" + 0.005*"台積" + 0.005*"台積電" 2025-04-19 00:10:57,361 : INFO : topic #6 (0.111): 0.019*"活動" + 0.017*"研究" + 0.017*"報名" + 0.015*"蘋果" + 0.013*"機器人" + 0.013*"問卷" + 0.011*"進行" + 0.010*"資料" + 0.010*"女性" + 0.009*"參與" 2025-04-19 00:10:57,361 : INFO : topic #0 (0.111): 0.031*"工作" + 0.011*"方式" + 0.011*"應徵" + 0.010*"資訊" + 0.010*"內容" + 0.010*"砍除" + 0.009*"單位" + 0.009*"第一項" + 0.009*"空白" + 0.009*"聯絡" 2025-04-19 00:10:57,362 : INFO : topic #7 (0.111): 0.097*"日本" + 0.044*"退休" + 0.029*"勞工" + 0.018*"東京" + 0.016*"勞動" + 0.015*"日圓" + 0.014*"持有" + 0.011*"工作" + 0.010*"契約" + 0.009*"南港" 2025-04-19 00:10:57,363 : INFO : topic diff=0.296723, rho=0.313805 2025-04-19 00:10:57,363 : INFO : PROGRESS: pass 2, at document #2000/16310 2025-04-19 00:10:57,940 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:57,943 : INFO : topic #2 (0.111): 0.018*"工作" + 0.015*"公司" + 0.008*"面試" + 0.007*"真的" + 0.007*"覺得" + 0.006*"應該" + 0.006*"時間" + 0.006*"比較" + 0.005*"知道" + 0.005*"主管" 2025-04-19 00:10:57,944 : INFO : topic #8 (0.111): 0.032*"工作" + 0.013*"情形" + 0.013*"空白" + 0.013*"第一項" + 0.012*"砍除" + 0.012*"推定" + 0.012*"國定假日" + 0.011*"應徵" + 0.011*"文字" + 0.011*"方式" 2025-04-19 00:10:57,945 : INFO : topic #0 (0.111): 0.031*"工作" + 0.011*"方式" + 0.011*"資訊" + 0.010*"應徵" + 0.010*"內容" + 0.010*"砍除" + 0.010*"第一項" + 0.009*"情形" + 0.009*"空白" + 0.009*"文字" 2025-04-19 00:10:57,945 : INFO : topic #5 (0.111): 0.014*"公司" + 0.008*"台灣" + 0.008*"技術" + 0.007*"員工" + 0.007*"美國" + 0.006*"科技" + 0.005*"報導" + 0.005*"工程師" + 0.005*"台積" + 0.005*"台積電" 2025-04-19 00:10:57,946 : INFO : topic #4 (0.111): 0.033*"工作" + 0.016*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"資訊" + 0.011*"第一項" + 0.011*"單位" 2025-04-19 00:10:57,946 : INFO : topic diff=0.808031, rho=0.299409 2025-04-19 00:10:57,947 : INFO : PROGRESS: pass 2, at document #4000/16310 2025-04-19 00:10:58,553 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:58,557 : INFO : topic #3 (0.111): 0.023*"晶片" + 0.020*"中國" + 0.020*"半導體" + 0.019*"表示" + 0.017*"美國" + 0.017*"投資" + 0.016*"英特爾" + 0.011*"研發" + 0.011*"製程" + 0.011*"台灣" 2025-04-19 00:10:58,557 : INFO : topic #1 (0.111): 0.043*"工作" + 0.024*"方式" + 0.015*"小時" + 0.015*"推定" + 0.014*"時間" + 0.013*"工資" + 0.013*"單位" + 0.012*"依法" + 0.012*"內容" + 0.012*"未註明" 2025-04-19 00:10:58,558 : INFO : topic #8 (0.111): 0.032*"工作" + 0.013*"情形" + 0.013*"第一項" + 0.013*"空白" + 0.013*"砍除" + 0.012*"推定" + 0.011*"國定假日" + 0.011*"文字" + 0.011*"方式" + 0.011*"應徵" 2025-04-19 00:10:58,559 : INFO : topic #5 (0.111): 0.014*"公司" + 0.008*"台灣" + 0.008*"技術" + 0.007*"員工" + 0.007*"美國" + 0.006*"科技" + 0.005*"報導" + 0.005*"工程師" + 0.005*"台積" + 0.004*"台積電" 2025-04-19 00:10:58,559 : INFO : topic #0 (0.111): 0.031*"工作" + 0.011*"方式" + 0.011*"資訊" + 0.010*"內容" + 0.010*"應徵" + 0.010*"砍除" + 0.010*"情形" + 0.010*"第一項" + 0.009*"文字" + 0.009*"空白" 2025-04-19 00:10:58,560 : INFO : topic diff=0.341741, rho=0.299409 2025-04-19 00:10:58,560 : INFO : PROGRESS: pass 2, at document #6000/16310 2025-04-19 00:10:59,030 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:59,035 : INFO : topic #3 (0.111): 0.023*"晶片" + 0.020*"中國" + 0.019*"半導體" + 0.019*"表示" + 0.017*"美國" + 0.016*"投資" + 0.015*"英特爾" + 0.011*"研發" + 0.010*"台灣" + 0.010*"製程" 2025-04-19 00:10:59,035 : INFO : topic #1 (0.111): 0.044*"工作" + 0.025*"方式" + 0.016*"小時" + 0.014*"推定" + 0.014*"時間" + 0.013*"工資" + 0.013*"依法" + 0.013*"單位" + 0.012*"內容" + 0.012*"每日" 2025-04-19 00:10:59,036 : INFO : topic #0 (0.111): 0.031*"工作" + 0.011*"方式" + 0.011*"資訊" + 0.010*"內容" + 0.010*"情形" + 0.010*"應徵" + 0.010*"文字" + 0.010*"砍除" + 0.009*"第一項" + 0.009*"聯絡" 2025-04-19 00:10:59,036 : INFO : topic #5 (0.111): 0.014*"公司" + 0.008*"技術" + 0.007*"台灣" + 0.006*"員工" + 0.006*"美國" + 0.006*"科技" + 0.005*"工程師" + 0.005*"產品" + 0.005*"目前" + 0.004*"工作" 2025-04-19 00:10:59,037 : INFO : topic #8 (0.111): 0.032*"工作" + 0.013*"第一項" + 0.013*"情形" + 0.013*"空白" + 0.013*"砍除" + 0.011*"推定" + 0.011*"文字" + 0.011*"國定假日" + 0.011*"方式" + 0.010*"應徵" 2025-04-19 00:10:59,037 : INFO : topic diff=0.177899, rho=0.299409 2025-04-19 00:10:59,038 : INFO : PROGRESS: pass 2, at document #8000/16310 2025-04-19 00:10:59,322 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:59,327 : INFO : topic #5 (0.111): 0.016*"公司" + 0.008*"技術" + 0.008*"工程師" + 0.006*"台灣" + 0.006*"團隊" + 0.006*"產品" + 0.006*"目前" + 0.005*"工作" + 0.005*"員工" + 0.005*"問題" 2025-04-19 00:10:59,327 : INFO : topic #3 (0.111): 0.021*"晶片" + 0.020*"中國" + 0.020*"半導體" + 0.018*"表示" + 0.016*"美國" + 0.015*"投資" + 0.014*"英特爾" + 0.014*"研發" + 0.011*"台灣" + 0.010*"全球" 2025-04-19 00:10:59,328 : INFO : topic #1 (0.111): 0.046*"工作" + 0.026*"方式" + 0.017*"小時" + 0.015*"時間" + 0.014*"推定" + 0.013*"工資" + 0.013*"每日" + 0.012*"依法" + 0.012*"內容" + 0.012*"單位" 2025-04-19 00:10:59,329 : INFO : topic #4 (0.111): 0.033*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"資訊" + 0.011*"第一項" + 0.011*"單位" 2025-04-19 00:10:59,329 : INFO : topic #8 (0.111): 0.032*"工作" + 0.013*"第一項" + 0.013*"情形" + 0.013*"空白" + 0.013*"砍除" + 0.011*"推定" + 0.011*"文字" + 0.011*"國定假日" + 0.011*"方式" + 0.010*"應徵" 2025-04-19 00:10:59,330 : INFO : topic diff=0.308188, rho=0.299409 2025-04-19 00:10:59,330 : INFO : PROGRESS: pass 2, at document #10000/16310 2025-04-19 00:10:59,550 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:59,554 : INFO : topic #7 (0.111): 0.050*"日本" + 0.027*"勞基法" + 0.021*"時間" + 0.021*"加班費" + 0.021*"填寫" + 0.018*"工作" + 0.018*"超過" + 0.017*"時數" + 0.017*"雇主" + 0.016*"勞工" 2025-04-19 00:10:59,555 : INFO : topic #1 (0.111): 0.048*"工作" + 0.027*"方式" + 0.017*"小時" + 0.015*"時間" + 0.013*"每日" + 0.013*"推定" + 0.012*"工資" + 0.012*"內容" + 0.012*"聯絡" + 0.012*"單位" 2025-04-19 00:10:59,556 : INFO : topic #0 (0.111): 0.031*"工作" + 0.011*"資訊" + 0.011*"方式" + 0.010*"內容" + 0.010*"應徵" + 0.009*"情形" + 0.009*"文字" + 0.009*"聯絡" + 0.009*"砍除" + 0.009*"第一項" 2025-04-19 00:10:59,556 : INFO : topic #3 (0.111): 0.021*"中國" + 0.020*"半導體" + 0.019*"晶片" + 0.018*"表示" + 0.015*"美國" + 0.015*"投資" + 0.015*"研發" + 0.013*"英特爾" + 0.011*"台灣" + 0.010*"全球" 2025-04-19 00:10:59,557 : INFO : topic #6 (0.111): 0.029*"報名" + 0.027*"活動" + 0.019*"電話" + 0.015*"台北市" + 0.013*"資料" + 0.012*"舉辦" + 0.012*"研究" + 0.011*"人數" + 0.011*"時間" + 0.011*"參加" 2025-04-19 00:10:59,557 : INFO : topic diff=0.259077, rho=0.299409 2025-04-19 00:10:59,558 : INFO : PROGRESS: pass 2, at document #12000/16310 2025-04-19 00:10:59,769 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:10:59,773 : INFO : topic #0 (0.111): 0.030*"工作" + 0.011*"資訊" + 0.011*"方式" + 0.010*"內容" + 0.010*"應徵" + 0.009*"文字" + 0.009*"情形" + 0.009*"聯絡" + 0.009*"徵才" + 0.009*"砍除" 2025-04-19 00:10:59,774 : INFO : topic #8 (0.111): 0.032*"工作" + 0.013*"第一項" + 0.013*"情形" + 0.013*"空白" + 0.013*"砍除" + 0.011*"推定" + 0.011*"文字" + 0.011*"國定假日" + 0.011*"方式" + 0.010*"應徵" 2025-04-19 00:10:59,774 : INFO : topic #5 (0.111): 0.015*"公司" + 0.008*"技術" + 0.007*"工程師" + 0.006*"台灣" + 0.006*"目前" + 0.006*"開發" + 0.005*"員工" + 0.005*"團隊" + 0.005*"產品" + 0.005*"工作" 2025-04-19 00:10:59,775 : INFO : topic #1 (0.111): 0.048*"工作" + 0.027*"方式" + 0.018*"小時" + 0.016*"時間" + 0.013*"每日" + 0.013*"推定" + 0.012*"聯絡" + 0.012*"內容" + 0.012*"工資" + 0.012*"單位" 2025-04-19 00:10:59,776 : INFO : topic #3 (0.111): 0.023*"晶片" + 0.022*"半導體" + 0.017*"表示" + 0.016*"中國" + 0.014*"美國" + 0.013*"台灣" + 0.012*"台積電" + 0.011*"投資" + 0.011*"製程" + 0.011*"全球" 2025-04-19 00:10:59,776 : INFO : topic diff=0.304038, rho=0.299409 2025-04-19 00:10:59,777 : INFO : PROGRESS: pass 2, at document #14000/16310 2025-04-19 00:11:00,107 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:11:00,127 : INFO : topic #8 (0.111): 0.032*"工作" + 0.013*"第一項" + 0.013*"情形" + 0.013*"空白" + 0.012*"砍除" + 0.011*"推定" + 0.011*"文字" + 0.011*"國定假日" + 0.011*"方式" + 0.010*"應徵" 2025-04-19 00:11:00,128 : INFO : topic #1 (0.111): 0.048*"工作" + 0.027*"方式" + 0.017*"小時" + 0.016*"時間" + 0.013*"每日" + 0.013*"推定" + 0.012*"聯絡" + 0.012*"工資" + 0.012*"內容" + 0.012*"單位" 2025-04-19 00:11:00,129 : INFO : topic #3 (0.111): 0.022*"晶片" + 0.019*"半導體" + 0.018*"表示" + 0.017*"中國" + 0.016*"台灣" + 0.015*"美國" + 0.014*"台積電" + 0.011*"英特爾" + 0.011*"全球" + 0.010*"投資" 2025-04-19 00:11:00,130 : INFO : topic #0 (0.111): 0.030*"工作" + 0.011*"資訊" + 0.010*"方式" + 0.010*"內容" + 0.010*"應徵" + 0.009*"文字" + 0.009*"情形" + 0.009*"徵才" + 0.009*"聯絡" + 0.009*"砍除" 2025-04-19 00:11:00,133 : INFO : topic #2 (0.111): 0.020*"工作" + 0.016*"公司" + 0.012*"面試" + 0.008*"比較" + 0.008*"覺得" + 0.007*"問題" + 0.007*"時間" + 0.006*"真的" + 0.006*"應該" + 0.006*"程式" 2025-04-19 00:11:00,134 : INFO : topic diff=0.285355, rho=0.299409 2025-04-19 00:11:00,135 : INFO : PROGRESS: pass 2, at document #16000/16310 2025-04-19 00:11:00,343 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:11:00,347 : INFO : topic #7 (0.111): 0.121*"日本" + 0.029*"勞工" + 0.019*"退休" + 0.019*"日圓" + 0.018*"超過" + 0.016*"時間" + 0.015*"勞基法" + 0.015*"勞動" + 0.015*"雇主" + 0.015*"加班費" 2025-04-19 00:11:00,348 : INFO : topic #0 (0.111): 0.029*"工作" + 0.010*"資訊" + 0.010*"方式" + 0.010*"內容" + 0.009*"應徵" + 0.009*"情形" + 0.009*"文字" + 0.009*"徵才" + 0.009*"聯絡" + 0.009*"砍除" 2025-04-19 00:11:00,349 : INFO : topic #3 (0.111): 0.023*"晶片" + 0.019*"美國" + 0.019*"半導體" + 0.018*"表示" + 0.017*"中國" + 0.015*"台灣" + 0.015*"台積電" + 0.013*"英特爾" + 0.010*"全球" + 0.010*"積電" 2025-04-19 00:11:00,349 : INFO : topic #8 (0.111): 0.031*"工作" + 0.013*"情形" + 0.013*"第一項" + 0.013*"空白" + 0.012*"砍除" + 0.011*"推定" + 0.011*"文字" + 0.011*"方式" + 0.011*"國定假日" + 0.010*"應徵" 2025-04-19 00:11:00,350 : INFO : topic #1 (0.111): 0.048*"工作" + 0.027*"方式" + 0.018*"小時" + 0.015*"時間" + 0.013*"每日" + 0.012*"工資" + 0.012*"推定" + 0.012*"內容" + 0.012*"單位" + 0.012*"聯絡" 2025-04-19 00:11:00,351 : INFO : topic diff=0.245612, rho=0.299409 2025-04-19 00:11:00,452 : INFO : -8.597 per-word bound, 387.2 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:11:00,452 : INFO : PROGRESS: pass 2, at document #16310/16310 2025-04-19 00:11:00,483 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:11:00,487 : INFO : topic #5 (0.111): 0.014*"公司" + 0.009*"技術" + 0.008*"員工" + 0.007*"科技" + 0.006*"台灣" + 0.006*"台積" + 0.006*"報導" + 0.005*"工程師" + 0.004*"目前" + 0.004*"產品" 2025-04-19 00:11:00,488 : INFO : topic #2 (0.111): 0.017*"工作" + 0.017*"公司" + 0.010*"面試" + 0.007*"真的" + 0.007*"知道" + 0.007*"覺得" + 0.007*"問題" + 0.007*"比較" + 0.007*"應該" + 0.006*"時間" 2025-04-19 00:11:00,488 : INFO : topic #3 (0.111): 0.025*"美國" + 0.023*"晶片" + 0.017*"台積電" + 0.017*"表示" + 0.017*"台灣" + 0.016*"中國" + 0.016*"半導體" + 0.014*"投資" + 0.013*"英特爾" + 0.010*"積電" 2025-04-19 00:11:00,489 : INFO : topic #6 (0.111): 0.024*"活動" + 0.021*"報名" + 0.018*"研究" + 0.015*"問卷" + 0.011*"進行" + 0.011*"電話" + 0.011*"參與" + 0.010*"女性" + 0.010*"時間" + 0.010*"台北市" 2025-04-19 00:11:00,489 : INFO : topic #8 (0.111): 0.031*"工作" + 0.013*"情形" + 0.013*"第一項" + 0.013*"空白" + 0.012*"砍除" + 0.011*"推定" + 0.011*"文字" + 0.011*"方式" + 0.011*"國定假日" + 0.010*"資訊" 2025-04-19 00:11:00,490 : INFO : topic diff=0.253693, rho=0.299409 2025-04-19 00:11:00,490 : INFO : PROGRESS: pass 3, at document #2000/16310 2025-04-19 00:11:01,067 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:11:01,071 : INFO : topic #0 (0.111): 0.028*"工作" + 0.011*"資訊" + 0.010*"內容" + 0.009*"方式" + 0.009*"應徵" + 0.009*"徵才" + 0.009*"文字" + 0.009*"情形" + 0.008*"國定假日" + 0.008*"聯絡" 2025-04-19 00:11:01,072 : INFO : topic #3 (0.111): 0.025*"美國" + 0.023*"晶片" + 0.017*"台積電" + 0.017*"表示" + 0.017*"台灣" + 0.017*"中國" + 0.016*"半導體" + 0.014*"投資" + 0.013*"英特爾" + 0.010*"積電" 2025-04-19 00:11:01,072 : INFO : topic #5 (0.111): 0.014*"公司" + 0.009*"技術" + 0.008*"員工" + 0.007*"科技" + 0.006*"台灣" + 0.006*"台積" + 0.006*"報導" + 0.005*"工程師" + 0.004*"目前" + 0.004*"產品" 2025-04-19 00:11:01,073 : INFO : topic #7 (0.111): 0.086*"日本" + 0.037*"勞工" + 0.015*"勞動" + 0.014*"勞基法" + 0.014*"超過" + 0.014*"時間" + 0.013*"退休" + 0.013*"雇主" + 0.013*"東京" + 0.012*"時數" 2025-04-19 00:11:01,073 : INFO : topic #2 (0.111): 0.017*"工作" + 0.016*"公司" + 0.010*"面試" + 0.007*"真的" + 0.007*"問題" + 0.007*"知道" + 0.007*"比較" + 0.006*"覺得" + 0.006*"應該" + 0.006*"時間" 2025-04-19 00:11:01,074 : INFO : topic diff=0.707778, rho=0.286829 2025-04-19 00:11:01,074 : INFO : PROGRESS: pass 3, at document #4000/16310 2025-04-19 00:11:01,679 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:11:01,683 : INFO : topic #6 (0.111): 0.030*"報名" + 0.027*"活動" + 0.020*"電話" + 0.015*"台北市" + 0.014*"舉辦" + 0.013*"車馬費" + 0.012*"人數" + 0.012*"參與" + 0.012*"資料" + 0.011*"訪問" 2025-04-19 00:11:01,684 : INFO : topic #8 (0.111): 0.031*"工作" + 0.013*"第一項" + 0.013*"空白" + 0.013*"情形" + 0.013*"砍除" + 0.011*"推定" + 0.011*"文字" + 0.011*"資訊" + 0.011*"方式" + 0.010*"聯絡" 2025-04-19 00:11:01,685 : INFO : topic #0 (0.111): 0.028*"工作" + 0.011*"資訊" + 0.010*"內容" + 0.009*"方式" + 0.009*"應徵" + 0.009*"情形" + 0.009*"文字" + 0.009*"徵才" + 0.008*"聯絡" + 0.008*"分類" 2025-04-19 00:11:01,685 : INFO : topic #3 (0.111): 0.025*"美國" + 0.022*"晶片" + 0.017*"中國" + 0.016*"台積電" + 0.016*"表示" + 0.016*"台灣" + 0.015*"半導體" + 0.014*"投資" + 0.013*"英特爾" + 0.010*"積電" 2025-04-19 00:11:01,686 : INFO : topic #2 (0.111): 0.017*"工作" + 0.016*"公司" + 0.010*"面試" + 0.007*"問題" + 0.007*"真的" + 0.006*"比較" + 0.006*"知道" + 0.006*"時間" + 0.006*"覺得" + 0.006*"應該" 2025-04-19 00:11:01,686 : INFO : topic diff=0.317537, rho=0.286829 2025-04-19 00:11:01,687 : INFO : PROGRESS: pass 3, at document #6000/16310 2025-04-19 00:11:02,149 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:11:02,153 : INFO : topic #1 (0.111): 0.048*"工作" + 0.027*"方式" + 0.017*"小時" + 0.016*"時間" + 0.014*"推定" + 0.014*"工資" + 0.014*"依法" + 0.013*"每日" + 0.013*"單位" + 0.012*"內容" 2025-04-19 00:11:02,153 : INFO : topic #8 (0.111): 0.031*"工作" + 0.013*"第一項" + 0.013*"空白" + 0.013*"情形" + 0.013*"砍除" + 0.011*"文字" + 0.011*"推定" + 0.011*"資訊" + 0.011*"方式" + 0.010*"聯絡" 2025-04-19 00:11:02,154 : INFO : topic #3 (0.111): 0.024*"美國" + 0.022*"晶片" + 0.017*"中國" + 0.016*"表示" + 0.016*"台灣" + 0.016*"台積電" + 0.015*"半導體" + 0.014*"投資" + 0.012*"英特爾" + 0.010*"積電" 2025-04-19 00:11:02,155 : INFO : topic #2 (0.111): 0.018*"工作" + 0.016*"公司" + 0.010*"面試" + 0.007*"問題" + 0.007*"比較" + 0.006*"知道" + 0.006*"時間" + 0.006*"覺得" + 0.006*"真的" + 0.006*"應該" 2025-04-19 00:11:02,155 : INFO : topic #0 (0.111): 0.028*"工作" + 0.011*"資訊" + 0.010*"內容" + 0.009*"方式" + 0.009*"文字" + 0.009*"情形" + 0.009*"應徵" + 0.009*"徵才" + 0.008*"聯絡" + 0.008*"分類" 2025-04-19 00:11:02,156 : INFO : topic diff=0.166437, rho=0.286829 2025-04-19 00:11:02,156 : INFO : PROGRESS: pass 3, at document #8000/16310 2025-04-19 00:11:02,434 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:11:02,438 : INFO : topic #4 (0.111): 0.033*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"資訊" + 0.011*"情形" + 0.011*"第一項" 2025-04-19 00:11:02,439 : INFO : topic #3 (0.111): 0.024*"美國" + 0.021*"晶片" + 0.017*"中國" + 0.017*"台灣" + 0.016*"表示" + 0.015*"半導體" + 0.015*"台積電" + 0.013*"投資" + 0.011*"英特爾" + 0.009*"全球" 2025-04-19 00:11:02,439 : INFO : topic #5 (0.111): 0.016*"公司" + 0.009*"技術" + 0.007*"工程師" + 0.007*"團隊" + 0.007*"產品" + 0.006*"開發" + 0.006*"員工" + 0.006*"台灣" + 0.005*"目前" + 0.005*"相關" 2025-04-19 00:11:02,440 : INFO : topic #7 (0.111): 0.048*"日本" + 0.035*"加班費" + 0.027*"勞基法" + 0.025*"時間" + 0.022*"超過" + 0.019*"填寫" + 0.017*"勞工" + 0.016*"工作" + 0.015*"小時" + 0.014*"勞動" 2025-04-19 00:11:02,440 : INFO : topic #0 (0.111): 0.028*"工作" + 0.011*"資訊" + 0.010*"內容" + 0.009*"方式" + 0.009*"徵才" + 0.009*"應徵" + 0.009*"文字" + 0.009*"情形" + 0.008*"聯絡" + 0.008*"分類" 2025-04-19 00:11:02,441 : INFO : topic diff=0.302470, rho=0.286829 2025-04-19 00:11:02,441 : INFO : PROGRESS: pass 3, at document #10000/16310 2025-04-19 00:11:02,652 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:11:02,657 : INFO : topic #7 (0.111): 0.048*"日本" + 0.045*"加班費" + 0.030*"勞基法" + 0.030*"時間" + 0.026*"超過" + 0.024*"填寫" + 0.022*"小時" + 0.021*"工作" + 0.017*"勞工" + 0.017*"薪資" 2025-04-19 00:11:02,657 : INFO : topic #8 (0.111): 0.031*"工作" + 0.013*"第一項" + 0.013*"空白" + 0.013*"情形" + 0.013*"砍除" + 0.011*"文字" + 0.011*"推定" + 0.011*"資訊" + 0.011*"方式" + 0.010*"聯絡" 2025-04-19 00:11:02,658 : INFO : topic #2 (0.111): 0.020*"工作" + 0.018*"公司" + 0.016*"面試" + 0.010*"問題" + 0.009*"比較" + 0.009*"覺得" + 0.007*"時間" + 0.007*"程式" + 0.007*"知道" + 0.007*"一些" 2025-04-19 00:11:02,658 : INFO : topic #6 (0.111): 0.030*"報名" + 0.028*"活動" + 0.019*"電話" + 0.015*"台北市" + 0.013*"舉辦" + 0.012*"資料" + 0.012*"研究" + 0.012*"人數" + 0.012*"車馬費" + 0.012*"參加" 2025-04-19 00:11:02,659 : INFO : topic #5 (0.111): 0.016*"公司" + 0.009*"技術" + 0.007*"開發" + 0.007*"工程師" + 0.007*"團隊" + 0.007*"產品" + 0.006*"員工" + 0.006*"台灣" + 0.006*"目前" + 0.005*"相關" 2025-04-19 00:11:02,659 : INFO : topic diff=0.247877, rho=0.286829 2025-04-19 00:11:02,659 : INFO : PROGRESS: pass 3, at document #12000/16310 2025-04-19 00:11:02,862 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:11:02,866 : INFO : topic #3 (0.111): 0.021*"晶片" + 0.018*"半導體" + 0.018*"美國" + 0.017*"台灣" + 0.015*"表示" + 0.015*"台積電" + 0.014*"中國" + 0.010*"投資" + 0.010*"全球" + 0.009*"積電" 2025-04-19 00:11:02,866 : INFO : topic #0 (0.111): 0.027*"工作" + 0.011*"資訊" + 0.010*"內容" + 0.009*"徵才" + 0.009*"文字" + 0.009*"方式" + 0.009*"應徵" + 0.009*"情形" + 0.008*"聯絡" + 0.008*"分類" 2025-04-19 00:11:02,867 : INFO : topic #4 (0.111): 0.033*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"資訊" + 0.011*"情形" + 0.011*"單位" 2025-04-19 00:11:02,867 : INFO : topic #8 (0.111): 0.031*"工作" + 0.013*"第一項" + 0.013*"情形" + 0.013*"空白" + 0.013*"砍除" + 0.011*"文字" + 0.011*"推定" + 0.011*"資訊" + 0.011*"方式" + 0.010*"聯絡" 2025-04-19 00:11:02,868 : INFO : topic #7 (0.111): 0.078*"日本" + 0.042*"加班費" + 0.029*"時間" + 0.027*"超過" + 0.026*"勞基法" + 0.023*"填寫" + 0.022*"小時" + 0.022*"工作" + 0.018*"薪資" + 0.016*"勞工" 2025-04-19 00:11:02,868 : INFO : topic diff=0.264252, rho=0.286829 2025-04-19 00:11:02,869 : INFO : PROGRESS: pass 3, at document #14000/16310 2025-04-19 00:11:03,166 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:11:03,177 : INFO : topic #6 (0.111): 0.028*"活動" + 0.027*"報名" + 0.016*"電話" + 0.015*"研究" + 0.012*"問卷" + 0.012*"台北市" + 0.011*"舉辦" + 0.011*"參加" + 0.011*"進行" + 0.011*"參與" 2025-04-19 00:11:03,178 : INFO : topic #5 (0.111): 0.014*"公司" + 0.008*"技術" + 0.007*"員工" + 0.006*"工程師" + 0.006*"台灣" + 0.005*"科技" + 0.005*"產品" + 0.005*"開發" + 0.005*"目前" + 0.005*"團隊" 2025-04-19 00:11:03,179 : INFO : topic #0 (0.111): 0.027*"工作" + 0.011*"資訊" + 0.010*"內容" + 0.009*"徵才" + 0.009*"文字" + 0.009*"方式" + 0.008*"應徵" + 0.008*"情形" + 0.008*"聯絡" + 0.008*"分類" 2025-04-19 00:11:03,180 : INFO : topic #8 (0.111): 0.031*"工作" + 0.013*"第一項" + 0.013*"情形" + 0.013*"空白" + 0.013*"砍除" + 0.011*"文字" + 0.011*"推定" + 0.011*"資訊" + 0.011*"方式" + 0.010*"聯絡" 2025-04-19 00:11:03,182 : INFO : topic #2 (0.111): 0.019*"工作" + 0.017*"公司" + 0.013*"面試" + 0.009*"問題" + 0.008*"比較" + 0.007*"覺得" + 0.007*"知道" + 0.007*"時間" + 0.006*"真的" + 0.006*"應該" 2025-04-19 00:11:03,183 : INFO : topic diff=0.261292, rho=0.286829 2025-04-19 00:11:03,183 : INFO : PROGRESS: pass 3, at document #16000/16310 2025-04-19 00:11:03,410 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:11:03,414 : INFO : topic #0 (0.111): 0.026*"工作" + 0.010*"資訊" + 0.009*"內容" + 0.009*"徵才" + 0.009*"方式" + 0.008*"文字" + 0.008*"情形" + 0.008*"應徵" + 0.008*"聯絡" + 0.007*"分類" 2025-04-19 00:11:03,415 : INFO : topic #3 (0.111): 0.021*"晶片" + 0.021*"美國" + 0.018*"台灣" + 0.017*"表示" + 0.017*"半導體" + 0.016*"中國" + 0.016*"台積電" + 0.012*"英特爾" + 0.009*"全球" + 0.009*"積電" 2025-04-19 00:11:03,415 : INFO : topic #2 (0.111): 0.018*"工作" + 0.017*"公司" + 0.012*"面試" + 0.008*"問題" + 0.008*"比較" + 0.007*"覺得" + 0.007*"主管" + 0.006*"知道" + 0.006*"真的" + 0.006*"時間" 2025-04-19 00:11:03,416 : INFO : topic #7 (0.111): 0.101*"日本" + 0.032*"加班費" + 0.028*"勞工" + 0.024*"超過" + 0.024*"時間" + 0.019*"勞基法" + 0.018*"工作" + 0.018*"小時" + 0.018*"薪資" + 0.015*"日圓" 2025-04-19 00:11:03,417 : INFO : topic #4 (0.111): 0.032*"工作" + 0.015*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"資訊" + 0.011*"情形" + 0.011*"單位" 2025-04-19 00:11:03,417 : INFO : topic diff=0.224013, rho=0.286829 2025-04-19 00:11:03,515 : INFO : -8.521 per-word bound, 367.3 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:11:03,515 : INFO : PROGRESS: pass 3, at document #16310/16310 2025-04-19 00:11:03,548 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:11:03,552 : INFO : topic #0 (0.111): 0.024*"工作" + 0.010*"資訊" + 0.010*"國定假日" + 0.010*"徵才" + 0.010*"內容" + 0.009*"應徵" + 0.008*"水桶" + 0.008*"方式" + 0.008*"文字" + 0.008*"單位" 2025-04-19 00:11:03,553 : INFO : topic #3 (0.111): 0.026*"美國" + 0.022*"晶片" + 0.018*"台灣" + 0.017*"台積電" + 0.016*"表示" + 0.015*"中國" + 0.015*"半導體" + 0.013*"投資" + 0.012*"英特爾" + 0.010*"積電" 2025-04-19 00:11:03,553 : INFO : topic #2 (0.111): 0.017*"公司" + 0.017*"工作" + 0.010*"面試" + 0.008*"問題" + 0.008*"知道" + 0.007*"真的" + 0.007*"比較" + 0.007*"覺得" + 0.006*"應該" + 0.006*"現在" 2025-04-19 00:11:03,554 : INFO : topic #1 (0.111): 0.054*"工作" + 0.027*"方式" + 0.019*"小時" + 0.016*"時間" + 0.014*"每日" + 0.012*"聯絡" + 0.012*"工資" + 0.012*"內容" + 0.012*"單位" + 0.011*"依法" 2025-04-19 00:11:03,555 : INFO : topic #6 (0.111): 0.025*"活動" + 0.023*"報名" + 0.018*"研究" + 0.016*"問卷" + 0.012*"電話" + 0.011*"台北市" + 0.011*"時間" + 0.011*"參與" + 0.011*"進行" + 0.010*"舉辦" 2025-04-19 00:11:03,555 : INFO : topic diff=0.227837, rho=0.286829 2025-04-19 00:11:03,555 : INFO : PROGRESS: pass 4, at document #2000/16310 2025-04-19 00:11:04,164 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:11:04,169 : INFO : topic #1 (0.111): 0.050*"工作" + 0.027*"方式" + 0.017*"小時" + 0.016*"時間" + 0.014*"工資" + 0.013*"推定" + 0.013*"每日" + 0.013*"依法" + 0.012*"單位" + 0.012*"內容" 2025-04-19 00:11:04,169 : INFO : topic #7 (0.111): 0.075*"日本" + 0.036*"勞工" + 0.028*"加班費" + 0.021*"時間" + 0.019*"小時" + 0.019*"超過" + 0.018*"工時" + 0.015*"勞基法" + 0.015*"工作" + 0.015*"薪資" 2025-04-19 00:11:04,170 : INFO : topic #5 (0.111): 0.014*"公司" + 0.009*"技術" + 0.008*"員工" + 0.007*"科技" + 0.006*"台積" + 0.006*"報導" + 0.005*"台灣" + 0.005*"產品" + 0.005*"工程師" + 0.004*"目前" 2025-04-19 00:11:04,170 : INFO : topic #4 (0.111): 0.032*"工作" + 0.016*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.013*"方式" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"情形" + 0.011*"資訊" + 0.011*"單位" 2025-04-19 00:11:04,171 : INFO : topic #6 (0.111): 0.028*"報名" + 0.027*"活動" + 0.018*"電話" + 0.015*"台北市" + 0.014*"舉辦" + 0.013*"研究" + 0.013*"參與" + 0.011*"車馬費" + 0.011*"人數" + 0.011*"時間" 2025-04-19 00:11:04,171 : INFO : topic diff=0.657799, rho=0.275711 2025-04-19 00:11:04,172 : INFO : PROGRESS: pass 4, at document #4000/16310 2025-04-19 00:11:04,799 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:11:04,803 : INFO : topic #6 (0.111): 0.030*"報名" + 0.028*"活動" + 0.020*"電話" + 0.015*"台北市" + 0.014*"舉辦" + 0.013*"車馬費" + 0.013*"人數" + 0.012*"參與" + 0.012*"資料" + 0.011*"訪問" 2025-04-19 00:11:04,803 : INFO : topic #7 (0.111): 0.054*"日本" + 0.025*"勞工" + 0.022*"時間" + 0.021*"加班費" + 0.016*"超過" + 0.016*"小時" + 0.015*"薪資" + 0.014*"工作" + 0.014*"勞基法" + 0.014*"工時" 2025-04-19 00:11:04,804 : INFO : topic #4 (0.111): 0.033*"工作" + 0.016*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.013*"方式" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"情形" + 0.011*"資訊" + 0.011*"單位" 2025-04-19 00:11:04,805 : INFO : topic #5 (0.111): 0.014*"公司" + 0.009*"技術" + 0.008*"員工" + 0.007*"科技" + 0.006*"台積" + 0.006*"報導" + 0.005*"台灣" + 0.005*"產品" + 0.004*"工程師" + 0.004*"相關" 2025-04-19 00:11:04,805 : INFO : topic #3 (0.111): 0.026*"美國" + 0.021*"晶片" + 0.018*"台灣" + 0.017*"台積電" + 0.016*"表示" + 0.016*"中國" + 0.014*"半導體" + 0.013*"投資" + 0.012*"英特爾" + 0.010*"積電" 2025-04-19 00:11:04,806 : INFO : topic diff=0.302198, rho=0.275711 2025-04-19 00:11:04,806 : INFO : PROGRESS: pass 4, at document #6000/16310 2025-04-19 00:11:05,320 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:11:05,324 : INFO : topic #1 (0.111): 0.049*"工作" + 0.027*"方式" + 0.017*"小時" + 0.016*"時間" + 0.014*"工資" + 0.014*"推定" + 0.014*"每日" + 0.014*"依法" + 0.013*"單位" + 0.012*"內容" 2025-04-19 00:11:05,324 : INFO : topic #0 (0.111): 0.025*"工作" + 0.011*"資訊" + 0.010*"內容" + 0.010*"徵才" + 0.008*"文字" + 0.008*"國定假日" + 0.008*"分類" + 0.008*"應徵" + 0.008*"情形" + 0.008*"方式" 2025-04-19 00:11:05,325 : INFO : topic #2 (0.111): 0.017*"公司" + 0.017*"工作" + 0.010*"面試" + 0.008*"問題" + 0.007*"知道" + 0.007*"比較" + 0.006*"時間" + 0.006*"真的" + 0.006*"覺得" + 0.006*"應該" 2025-04-19 00:11:05,325 : INFO : topic #8 (0.111): 0.031*"工作" + 0.013*"第一項" + 0.013*"砍除" + 0.013*"空白" + 0.013*"情形" + 0.011*"文字" + 0.011*"資訊" + 0.011*"推定" + 0.011*"方式" + 0.010*"聯絡" 2025-04-19 00:11:05,326 : INFO : topic #6 (0.111): 0.032*"報名" + 0.028*"活動" + 0.021*"電話" + 0.016*"台北市" + 0.014*"車馬費" + 0.014*"舉辦" + 0.013*"人數" + 0.013*"訪問" + 0.012*"資料" + 0.011*"參加" 2025-04-19 00:11:05,326 : INFO : topic diff=0.161728, rho=0.275711 2025-04-19 00:11:05,327 : INFO : PROGRESS: pass 4, at document #8000/16310 2025-04-19 00:11:05,570 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:11:05,574 : INFO : topic #8 (0.111): 0.031*"工作" + 0.013*"第一項" + 0.013*"砍除" + 0.013*"情形" + 0.013*"空白" + 0.011*"文字" + 0.011*"資訊" + 0.011*"推定" + 0.011*"方式" + 0.010*"聯絡" 2025-04-19 00:11:05,575 : INFO : topic #5 (0.111): 0.016*"公司" + 0.009*"技術" + 0.007*"產品" + 0.007*"團隊" + 0.007*"開發" + 0.007*"員工" + 0.006*"工程師" + 0.005*"台灣" + 0.005*"科技" + 0.005*"能力" 2025-04-19 00:11:05,575 : INFO : topic #0 (0.111): 0.025*"工作" + 0.011*"資訊" + 0.010*"內容" + 0.010*"徵才" + 0.008*"文字" + 0.008*"應徵" + 0.008*"分類" + 0.008*"水桶" + 0.008*"國定假日" + 0.008*"方式" 2025-04-19 00:11:05,576 : INFO : topic #6 (0.111): 0.031*"報名" + 0.028*"活動" + 0.021*"電話" + 0.016*"台北市" + 0.014*"舉辦" + 0.013*"車馬費" + 0.013*"人數" + 0.012*"資料" + 0.012*"訪問" + 0.012*"參加" 2025-04-19 00:11:05,577 : INFO : topic #4 (0.111): 0.033*"工作" + 0.016*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.013*"方式" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"情形" + 0.011*"資訊" + 0.011*"單位" 2025-04-19 00:11:05,577 : INFO : topic diff=0.301738, rho=0.275711 2025-04-19 00:11:05,578 : INFO : PROGRESS: pass 4, at document #10000/16310 2025-04-19 00:11:05,807 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:11:05,812 : INFO : topic #4 (0.111): 0.033*"工作" + 0.016*"推定" + 0.013*"砍除" + 0.013*"空白" + 0.013*"方式" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"情形" + 0.011*"資訊" + 0.011*"單位" 2025-04-19 00:11:05,813 : INFO : topic #1 (0.111): 0.053*"工作" + 0.029*"方式" + 0.018*"小時" + 0.017*"時間" + 0.015*"每日" + 0.013*"工資" + 0.013*"內容" + 0.013*"推定" + 0.012*"依法" + 0.012*"休息" 2025-04-19 00:11:05,816 : INFO : topic #5 (0.111): 0.016*"公司" + 0.009*"技術" + 0.008*"開發" + 0.007*"團隊" + 0.007*"產品" + 0.007*"工程師" + 0.006*"員工" + 0.005*"台灣" + 0.005*"使用" + 0.005*"相關" 2025-04-19 00:11:05,817 : INFO : topic #6 (0.111): 0.031*"報名" + 0.028*"活動" + 0.020*"電話" + 0.015*"台北市" + 0.013*"舉辦" + 0.012*"資料" + 0.012*"人數" + 0.012*"研究" + 0.012*"車馬費" + 0.012*"參加" 2025-04-19 00:11:05,817 : INFO : topic #3 (0.111): 0.026*"美國" + 0.020*"台灣" + 0.018*"晶片" + 0.016*"中國" + 0.015*"表示" + 0.014*"台積電" + 0.014*"半導體" + 0.012*"投資" + 0.010*"英特爾" + 0.009*"全球" 2025-04-19 00:11:05,818 : INFO : topic diff=0.237890, rho=0.275711 2025-04-19 00:11:05,818 : INFO : PROGRESS: pass 4, at document #12000/16310 2025-04-19 00:11:06,031 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:11:06,035 : INFO : topic #8 (0.111): 0.031*"工作" + 0.013*"第一項" + 0.013*"砍除" + 0.013*"情形" + 0.013*"空白" + 0.011*"文字" + 0.011*"資訊" + 0.011*"推定" + 0.011*"方式" + 0.010*"聯絡" 2025-04-19 00:11:06,036 : INFO : topic #0 (0.111): 0.023*"工作" + 0.011*"資訊" + 0.010*"徵才" + 0.010*"內容" + 0.009*"文字" + 0.008*"水桶" + 0.008*"分類" + 0.008*"應徵" + 0.007*"方式" + 0.007*"情形" 2025-04-19 00:11:06,036 : INFO : topic #3 (0.111): 0.020*"晶片" + 0.019*"美國" + 0.018*"台灣" + 0.017*"半導體" + 0.015*"台積電" + 0.015*"表示" + 0.013*"中國" + 0.010*"投資" + 0.009*"全球" + 0.009*"積電" 2025-04-19 00:11:06,037 : INFO : topic #2 (0.111): 0.019*"工作" + 0.017*"公司" + 0.015*"面試" + 0.010*"問題" + 0.009*"比較" + 0.008*"覺得" + 0.007*"知道" + 0.007*"時間" + 0.007*"程式" + 0.006*"一些" 2025-04-19 00:11:06,037 : INFO : topic #6 (0.111): 0.030*"報名" + 0.030*"活動" + 0.018*"電話" + 0.014*"台北市" + 0.013*"研究" + 0.013*"舉辦" + 0.012*"參加" + 0.012*"資料" + 0.012*"人數" + 0.011*"時間" 2025-04-19 00:11:06,037 : INFO : topic diff=0.242141, rho=0.275711 2025-04-19 00:11:06,038 : INFO : PROGRESS: pass 4, at document #14000/16310 2025-04-19 00:11:06,329 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:11:06,333 : INFO : topic #5 (0.111): 0.014*"公司" + 0.008*"技術" + 0.007*"員工" + 0.006*"開發" + 0.006*"產品" + 0.005*"科技" + 0.005*"工程師" + 0.005*"台灣" + 0.005*"團隊" + 0.004*"報導" 2025-04-19 00:11:06,335 : INFO : topic #0 (0.111): 0.023*"工作" + 0.011*"資訊" + 0.010*"徵才" + 0.010*"內容" + 0.008*"文字" + 0.008*"水桶" + 0.007*"應徵" + 0.007*"分類" + 0.007*"方式" + 0.007*"情形" 2025-04-19 00:11:06,336 : INFO : topic #7 (0.111): 0.069*"日本" + 0.043*"加班費" + 0.032*"時間" + 0.030*"工作" + 0.029*"薪資" + 0.026*"小時" + 0.024*"工時" + 0.023*"勞基法" + 0.022*"超過" + 0.018*"填寫" 2025-04-19 00:11:06,337 : INFO : topic #1 (0.111): 0.054*"工作" + 0.029*"方式" + 0.017*"小時" + 0.017*"時間" + 0.014*"每日" + 0.013*"工資" + 0.012*"內容" + 0.012*"依法" + 0.012*"推定" + 0.012*"單位" 2025-04-19 00:11:06,341 : INFO : topic #3 (0.111): 0.020*"晶片" + 0.020*"台灣" + 0.018*"美國" + 0.016*"半導體" + 0.016*"表示" + 0.015*"台積電" + 0.015*"中國" + 0.010*"英特爾" + 0.009*"全球" + 0.009*"產業" 2025-04-19 00:11:06,342 : INFO : topic diff=0.247436, rho=0.275711 2025-04-19 00:11:06,343 : INFO : PROGRESS: pass 4, at document #16000/16310 2025-04-19 00:11:06,558 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 00:11:06,561 : INFO : topic #7 (0.111): 0.074*"日本" + 0.037*"加班費" + 0.029*"薪資" + 0.029*"時間" + 0.028*"工作" + 0.026*"工時" + 0.025*"小時" + 0.024*"勞工" + 0.021*"超過" + 0.021*"勞基法" 2025-04-19 00:11:06,562 : INFO : topic #2 (0.111): 0.018*"工作" + 0.017*"公司" + 0.012*"面試" + 0.009*"問題" + 0.008*"比較" + 0.007*"知道" + 0.007*"主管" + 0.007*"覺得" + 0.006*"真的" + 0.006*"時間" 2025-04-19 00:11:06,562 : INFO : topic #5 (0.111): 0.014*"公司" + 0.009*"技術" + 0.008*"員工" + 0.006*"科技" + 0.006*"報導" + 0.005*"產品" + 0.005*"台積" + 0.005*"台灣" + 0.005*"開發" + 0.005*"工程師" 2025-04-19 00:11:06,563 : INFO : topic #6 (0.111): 0.028*"活動" + 0.026*"報名" + 0.016*"研究" + 0.014*"電話" + 0.012*"問卷" + 0.011*"舉辦" + 0.011*"參加" + 0.011*"台北市" + 0.011*"進行" + 0.010*"參與" 2025-04-19 00:11:06,564 : INFO : topic #3 (0.111): 0.021*"美國" + 0.021*"晶片" + 0.018*"台灣" + 0.016*"表示" + 0.016*"半導體" + 0.015*"台積電" + 0.015*"中國" + 0.011*"英特爾" + 0.009*"全球" + 0.009*"產業" 2025-04-19 00:11:06,564 : INFO : topic diff=0.211250, rho=0.275711 2025-04-19 00:11:06,631 : INFO : -8.481 per-word bound, 357.3 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 00:11:06,631 : INFO : PROGRESS: pass 4, at document #16310/16310 2025-04-19 00:11:06,690 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 00:11:06,694 : INFO : topic #3 (0.111): 0.026*"美國" + 0.021*"晶片" + 0.019*"台灣" + 0.017*"台積電" + 0.016*"表示" + 0.015*"中國" + 0.014*"半導體" + 0.013*"投資" + 0.011*"英特爾" + 0.010*"積電" 2025-04-19 00:11:06,694 : INFO : topic #4 (0.111): 0.032*"工作" + 0.015*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"內容" + 0.011*"情形" + 0.011*"資訊" 2025-04-19 00:11:06,695 : INFO : topic #8 (0.111): 0.030*"工作" + 0.013*"第一項" + 0.013*"砍除" + 0.013*"情形" + 0.013*"空白" + 0.011*"文字" + 0.011*"資訊" + 0.011*"推定" + 0.011*"方式" + 0.010*"聯絡" 2025-04-19 00:11:06,695 : INFO : topic #2 (0.111): 0.017*"公司" + 0.017*"工作" + 0.010*"面試" + 0.009*"問題" + 0.008*"知道" + 0.007*"真的" + 0.007*"比較" + 0.007*"現在" + 0.006*"覺得" + 0.006*"應該" 2025-04-19 00:11:06,696 : INFO : topic #7 (0.111): 0.068*"日本" + 0.040*"加班費" + 0.036*"勞工" + 0.035*"工時" + 0.031*"小時" + 0.028*"薪資" + 0.026*"工作" + 0.026*"時間" + 0.019*"超過" + 0.017*"勞基法" 2025-04-19 00:11:06,696 : INFO : topic diff=0.210967, rho=0.275711 2025-04-19 00:11:06,697 : INFO : LdaModel lifecycle event {'msg': 'trained LdaModel<num_terms=23261, num_topics=9, decay=0.5, chunksize=2000> in 16.17s', 'datetime': '2025-04-19T00:11:06.697129', 'gensim': '4.3.3', 'python': '3.11.2 (main, Apr 21 2023, 22:51:21) [Clang 14.0.3 (clang-1403.0.22.14.1)]', 'platform': 'macOS-15.3.2-arm64-arm-64bit', 'event': 'created'} 2025-04-19 00:11:11,840 : INFO : -7.098 per-word bound, 137.0 perplexity estimate based on a held-out corpus of 16310 documents with 3460358 words 2025-04-19 00:11:11,843 : INFO : using ParallelWordOccurrenceAccumulator<processes=7, batch_size=64> to estimate probabilities from sliding windows 2025-04-19 00:11:15,516 : INFO : 1 batches submitted to accumulate stats from 64 documents (22660 virtual) 2025-04-19 00:11:15,519 : INFO : 2 batches submitted to accumulate stats from 128 documents (45646 virtual) 2025-04-19 00:11:15,521 : INFO : 3 batches submitted to accumulate stats from 192 documents (67171 virtual) 2025-04-19 00:11:15,523 : INFO : 4 batches submitted to accumulate stats from 256 documents (88330 virtual) 2025-04-19 00:11:15,528 : INFO : 5 batches submitted to accumulate stats from 320 documents (109687 virtual) 2025-04-19 00:11:15,532 : INFO : 6 batches submitted to accumulate stats from 384 documents (131042 virtual) 2025-04-19 00:11:15,552 : INFO : 7 batches submitted to accumulate stats from 448 documents (153774 virtual) 2025-04-19 00:11:15,554 : INFO : 8 batches submitted to accumulate stats from 512 documents (176164 virtual) 2025-04-19 00:11:15,559 : INFO : 9 batches submitted to accumulate stats from 576 documents (197020 virtual) 2025-04-19 00:11:15,563 : INFO : 10 batches submitted to accumulate stats from 640 documents (218505 virtual) 2025-04-19 00:11:15,566 : INFO : 11 batches submitted to accumulate stats from 704 documents (240803 virtual) 2025-04-19 00:11:15,570 : INFO : 12 batches submitted to accumulate stats from 768 documents (265360 virtual) 2025-04-19 00:11:15,576 : INFO : 13 batches submitted to accumulate stats from 832 documents (286615 virtual) 2025-04-19 00:11:15,594 : INFO : 14 batches submitted to accumulate stats from 896 documents (310833 virtual) 2025-04-19 00:11:15,679 : INFO : 15 batches submitted to accumulate stats from 960 documents (331313 virtual) 2025-04-19 00:11:15,688 : INFO : 16 batches submitted to accumulate stats from 1024 documents (350940 virtual) 2025-04-19 00:11:15,692 : INFO : 17 batches submitted to accumulate stats from 1088 documents (368371 virtual) 2025-04-19 00:11:15,697 : INFO : 18 batches submitted to accumulate stats from 1152 documents (390334 virtual) 2025-04-19 00:11:15,704 : INFO : 19 batches submitted to accumulate stats from 1216 documents (414153 virtual) 2025-04-19 00:11:15,751 : INFO : 20 batches submitted to accumulate stats from 1280 documents (435684 virtual) 2025-04-19 00:11:15,873 : INFO : 21 batches submitted to accumulate stats from 1344 documents (459433 virtual) 2025-04-19 00:11:15,890 : INFO : 22 batches submitted to accumulate stats from 1408 documents (483210 virtual) 2025-04-19 00:11:15,902 : INFO : 23 batches submitted to accumulate stats from 1472 documents (507391 virtual) 2025-04-19 00:11:15,941 : INFO : 24 batches submitted to accumulate stats from 1536 documents (527404 virtual) 2025-04-19 00:11:15,960 : INFO : 25 batches submitted to accumulate stats from 1600 documents (550178 virtual) 2025-04-19 00:11:15,979 : INFO : 26 batches submitted to accumulate stats from 1664 documents (575041 virtual) 2025-04-19 00:11:16,000 : INFO : 27 batches submitted to accumulate stats from 1728 documents (598912 virtual) 2025-04-19 00:11:16,041 : INFO : 28 batches submitted to accumulate stats from 1792 documents (622487 virtual) 2025-04-19 00:11:16,068 : INFO : 29 batches submitted to accumulate stats from 1856 documents (648902 virtual) 2025-04-19 00:11:16,075 : INFO : 30 batches submitted to accumulate stats from 1920 documents (671126 virtual) 2025-04-19 00:11:16,107 : INFO : 31 batches submitted to accumulate stats from 1984 documents (693717 virtual) 2025-04-19 00:11:16,116 : INFO : 32 batches submitted to accumulate stats from 2048 documents (714139 virtual) 2025-04-19 00:11:16,158 : INFO : 33 batches submitted to accumulate stats from 2112 documents (736202 virtual) 2025-04-19 00:11:16,249 : INFO : 34 batches submitted to accumulate stats from 2176 documents (758687 virtual) 2025-04-19 00:11:16,262 : INFO : 35 batches submitted to accumulate stats from 2240 documents (779677 virtual) 2025-04-19 00:11:16,271 : INFO : 36 batches submitted to accumulate stats from 2304 documents (800483 virtual) 2025-04-19 00:11:16,286 : INFO : 37 batches submitted to accumulate stats from 2368 documents (821258 virtual) 2025-04-19 00:11:16,292 : INFO : 38 batches submitted to accumulate stats from 2432 documents (844326 virtual) 2025-04-19 00:11:16,348 : INFO : 39 batches submitted to accumulate stats from 2496 documents (868823 virtual) 2025-04-19 00:11:16,382 : INFO : 40 batches submitted to accumulate stats from 2560 documents (888215 virtual) 2025-04-19 00:11:16,436 : INFO : 41 batches submitted to accumulate stats from 2624 documents (910499 virtual) 2025-04-19 00:11:16,459 : INFO : 42 batches submitted to accumulate stats from 2688 documents (931945 virtual) 2025-04-19 00:11:16,466 : INFO : 43 batches submitted to accumulate stats from 2752 documents (954111 virtual) 2025-04-19 00:11:16,470 : INFO : 44 batches submitted to accumulate stats from 2816 documents (975617 virtual) 2025-04-19 00:11:16,484 : INFO : 45 batches submitted to accumulate stats from 2880 documents (995125 virtual) 2025-04-19 00:11:16,512 : INFO : 46 batches submitted to accumulate stats from 2944 documents (1016531 virtual) 2025-04-19 00:11:16,607 : INFO : 47 batches submitted to accumulate stats from 3008 documents (1038247 virtual) 2025-04-19 00:11:16,642 : INFO : 48 batches submitted to accumulate stats from 3072 documents (1063862 virtual) 2025-04-19 00:11:16,646 : INFO : 49 batches submitted to accumulate stats from 3136 documents (1087898 virtual) 2025-04-19 00:11:16,650 : INFO : 50 batches submitted to accumulate stats from 3200 documents (1110531 virtual) 2025-04-19 00:11:16,686 : INFO : 51 batches submitted to accumulate stats from 3264 documents (1133127 virtual) 2025-04-19 00:11:16,696 : INFO : 52 batches submitted to accumulate stats from 3328 documents (1153766 virtual) 2025-04-19 00:11:16,711 : INFO : 53 batches submitted to accumulate stats from 3392 documents (1177684 virtual) 2025-04-19 00:11:16,758 : INFO : 54 batches submitted to accumulate stats from 3456 documents (1200190 virtual) 2025-04-19 00:11:16,799 : INFO : 55 batches submitted to accumulate stats from 3520 documents (1225029 virtual) 2025-04-19 00:11:16,837 : INFO : 56 batches submitted to accumulate stats from 3584 documents (1249662 virtual) 2025-04-19 00:11:16,845 : INFO : 57 batches submitted to accumulate stats from 3648 documents (1274547 virtual) 2025-04-19 00:11:16,850 : INFO : 58 batches submitted to accumulate stats from 3712 documents (1297434 virtual) 2025-04-19 00:11:16,855 : INFO : 59 batches submitted to accumulate stats from 3776 documents (1319261 virtual) 2025-04-19 00:11:16,943 : INFO : 60 batches submitted to accumulate stats from 3840 documents (1341972 virtual) 2025-04-19 00:11:16,954 : INFO : 61 batches submitted to accumulate stats from 3904 documents (1364269 virtual) 2025-04-19 00:11:17,012 : INFO : 62 batches submitted to accumulate stats from 3968 documents (1386796 virtual) 2025-04-19 00:11:17,027 : INFO : 63 batches submitted to accumulate stats from 4032 documents (1410249 virtual) 2025-04-19 00:11:17,041 : INFO : 64 batches submitted to accumulate stats from 4096 documents (1433115 virtual) 2025-04-19 00:11:17,061 : INFO : 65 batches submitted to accumulate stats from 4160 documents (1453873 virtual) 2025-04-19 00:11:17,067 : INFO : 66 batches submitted to accumulate stats from 4224 documents (1475474 virtual) 2025-04-19 00:11:17,125 : INFO : 67 batches submitted to accumulate stats from 4288 documents (1497524 virtual) 2025-04-19 00:11:17,129 : INFO : 68 batches submitted to accumulate stats from 4352 documents (1516835 virtual) 2025-04-19 00:11:17,201 : INFO : 69 batches submitted to accumulate stats from 4416 documents (1536986 virtual) 2025-04-19 00:11:17,206 : INFO : 70 batches submitted to accumulate stats from 4480 documents (1558454 virtual) 2025-04-19 00:11:17,215 : INFO : 71 batches submitted to accumulate stats from 4544 documents (1580610 virtual) 2025-04-19 00:11:17,238 : INFO : 72 batches submitted to accumulate stats from 4608 documents (1603508 virtual) 2025-04-19 00:11:17,245 : INFO : 73 batches submitted to accumulate stats from 4672 documents (1624378 virtual) 2025-04-19 00:11:17,301 : INFO : 74 batches submitted to accumulate stats from 4736 documents (1646402 virtual) 2025-04-19 00:11:17,342 : INFO : 75 batches submitted to accumulate stats from 4800 documents (1668704 virtual) 2025-04-19 00:11:17,402 : INFO : 76 batches submitted to accumulate stats from 4864 documents (1690394 virtual) 2025-04-19 00:11:17,409 : INFO : 77 batches submitted to accumulate stats from 4928 documents (1713028 virtual) 2025-04-19 00:11:17,443 : INFO : 78 batches submitted to accumulate stats from 4992 documents (1735434 virtual) 2025-04-19 00:11:17,447 : INFO : 79 batches submitted to accumulate stats from 5056 documents (1755430 virtual) 2025-04-19 00:11:17,469 : INFO : 80 batches submitted to accumulate stats from 5120 documents (1779164 virtual) 2025-04-19 00:11:17,482 : INFO : 81 batches submitted to accumulate stats from 5184 documents (1799023 virtual) 2025-04-19 00:11:17,486 : INFO : 82 batches submitted to accumulate stats from 5248 documents (1821516 virtual) 2025-04-19 00:11:17,557 : INFO : 83 batches submitted to accumulate stats from 5312 documents (1844224 virtual) 2025-04-19 00:11:17,570 : INFO : 84 batches submitted to accumulate stats from 5376 documents (1864739 virtual) 2025-04-19 00:11:17,596 : INFO : 85 batches submitted to accumulate stats from 5440 documents (1885053 virtual) 2025-04-19 00:11:17,667 : INFO : 86 batches submitted to accumulate stats from 5504 documents (1902170 virtual) 2025-04-19 00:11:17,674 : INFO : 87 batches submitted to accumulate stats from 5568 documents (1924910 virtual) 2025-04-19 00:11:17,676 : INFO : 88 batches submitted to accumulate stats from 5632 documents (1931530 virtual) 2025-04-19 00:11:17,698 : INFO : 89 batches submitted to accumulate stats from 5696 documents (1941414 virtual) 2025-04-19 00:11:17,781 : INFO : 90 batches submitted to accumulate stats from 5760 documents (1950642 virtual) 2025-04-19 00:11:17,820 : INFO : 91 batches submitted to accumulate stats from 5824 documents (1957200 virtual) 2025-04-19 00:11:17,841 : INFO : 92 batches submitted to accumulate stats from 5888 documents (1964937 virtual) 2025-04-19 00:11:17,843 : INFO : 93 batches submitted to accumulate stats from 5952 documents (1974259 virtual) 2025-04-19 00:11:17,865 : INFO : 94 batches submitted to accumulate stats from 6016 documents (1988296 virtual) 2025-04-19 00:11:17,875 : INFO : 95 batches submitted to accumulate stats from 6080 documents (1997659 virtual) 2025-04-19 00:11:17,913 : INFO : 96 batches submitted to accumulate stats from 6144 documents (2009678 virtual) 2025-04-19 00:11:17,925 : INFO : 97 batches submitted to accumulate stats from 6208 documents (2019297 virtual) 2025-04-19 00:11:17,992 : INFO : 98 batches submitted to accumulate stats from 6272 documents (2031857 virtual) 2025-04-19 00:11:18,001 : INFO : 99 batches submitted to accumulate stats from 6336 documents (2044117 virtual) 2025-04-19 00:11:18,010 : INFO : 100 batches submitted to accumulate stats from 6400 documents (2053380 virtual) 2025-04-19 00:11:18,015 : INFO : 101 batches submitted to accumulate stats from 6464 documents (2066889 virtual) 2025-04-19 00:11:18,018 : INFO : 102 batches submitted to accumulate stats from 6528 documents (2075479 virtual) 2025-04-19 00:11:18,057 : INFO : 103 batches submitted to accumulate stats from 6592 documents (2085095 virtual) 2025-04-19 00:11:18,059 : INFO : 104 batches submitted to accumulate stats from 6656 documents (2093845 virtual) 2025-04-19 00:11:18,061 : INFO : 105 batches submitted to accumulate stats from 6720 documents (2102407 virtual) 2025-04-19 00:11:18,069 : INFO : 106 batches submitted to accumulate stats from 6784 documents (2111466 virtual) 2025-04-19 00:11:18,087 : INFO : 107 batches submitted to accumulate stats from 6848 documents (2121845 virtual) 2025-04-19 00:11:18,094 : INFO : 108 batches submitted to accumulate stats from 6912 documents (2129219 virtual) 2025-04-19 00:11:18,103 : INFO : 109 batches submitted to accumulate stats from 6976 documents (2137886 virtual) 2025-04-19 00:11:18,159 : INFO : 110 batches submitted to accumulate stats from 7040 documents (2145150 virtual) 2025-04-19 00:11:18,170 : INFO : 111 batches submitted to accumulate stats from 7104 documents (2155495 virtual) 2025-04-19 00:11:18,177 : INFO : 112 batches submitted to accumulate stats from 7168 documents (2164720 virtual) 2025-04-19 00:11:18,190 : INFO : 113 batches submitted to accumulate stats from 7232 documents (2172193 virtual) 2025-04-19 00:11:18,199 : INFO : 114 batches submitted to accumulate stats from 7296 documents (2183458 virtual) 2025-04-19 00:11:18,207 : INFO : 115 batches submitted to accumulate stats from 7360 documents (2191706 virtual) 2025-04-19 00:11:18,226 : INFO : 116 batches submitted to accumulate stats from 7424 documents (2202020 virtual) 2025-04-19 00:11:18,241 : INFO : 117 batches submitted to accumulate stats from 7488 documents (2211055 virtual) 2025-04-19 00:11:18,247 : INFO : 118 batches submitted to accumulate stats from 7552 documents (2223321 virtual) 2025-04-19 00:11:18,250 : INFO : 119 batches submitted to accumulate stats from 7616 documents (2230121 virtual) 2025-04-19 00:11:18,268 : INFO : 120 batches submitted to accumulate stats from 7680 documents (2243511 virtual) 2025-04-19 00:11:18,271 : INFO : 121 batches submitted to accumulate stats from 7744 documents (2258370 virtual) 2025-04-19 00:11:18,277 : INFO : 122 batches submitted to accumulate stats from 7808 documents (2269267 virtual) 2025-04-19 00:11:18,295 : INFO : 123 batches submitted to accumulate stats from 7872 documents (2280490 virtual) 2025-04-19 00:11:18,326 : INFO : 124 batches submitted to accumulate stats from 7936 documents (2289945 virtual) 2025-04-19 00:11:18,345 : INFO : 125 batches submitted to accumulate stats from 8000 documents (2298931 virtual) 2025-04-19 00:11:18,359 : INFO : 126 batches submitted to accumulate stats from 8064 documents (2309719 virtual) 2025-04-19 00:11:18,361 : INFO : 127 batches submitted to accumulate stats from 8128 documents (2320328 virtual) 2025-04-19 00:11:18,374 : INFO : 128 batches submitted to accumulate stats from 8192 documents (2331614 virtual) 2025-04-19 00:11:18,410 : INFO : 129 batches submitted to accumulate stats from 8256 documents (2342568 virtual) 2025-04-19 00:11:18,415 : INFO : 130 batches submitted to accumulate stats from 8320 documents (2351306 virtual) 2025-04-19 00:11:18,419 : INFO : 131 batches submitted to accumulate stats from 8384 documents (2359488 virtual) 2025-04-19 00:11:18,421 : INFO : 132 batches submitted to accumulate stats from 8448 documents (2368497 virtual) 2025-04-19 00:11:18,453 : INFO : 133 batches submitted to accumulate stats from 8512 documents (2378449 virtual) 2025-04-19 00:11:18,457 : INFO : 134 batches submitted to accumulate stats from 8576 documents (2388057 virtual) 2025-04-19 00:11:18,475 : INFO : 135 batches submitted to accumulate stats from 8640 documents (2395926 virtual) 2025-04-19 00:11:18,491 : INFO : 136 batches submitted to accumulate stats from 8704 documents (2403405 virtual) 2025-04-19 00:11:18,494 : INFO : 137 batches submitted to accumulate stats from 8768 documents (2411628 virtual) 2025-04-19 00:11:18,495 : INFO : 138 batches submitted to accumulate stats from 8832 documents (2419219 virtual) 2025-04-19 00:11:18,558 : INFO : 139 batches submitted to accumulate stats from 8896 documents (2428220 virtual) 2025-04-19 00:11:18,582 : INFO : 140 batches submitted to accumulate stats from 8960 documents (2436470 virtual) 2025-04-19 00:11:18,584 : INFO : 141 batches submitted to accumulate stats from 9024 documents (2446006 virtual) 2025-04-19 00:11:18,590 : INFO : 142 batches submitted to accumulate stats from 9088 documents (2453039 virtual) 2025-04-19 00:11:18,604 : INFO : 143 batches submitted to accumulate stats from 9152 documents (2460905 virtual) 2025-04-19 00:11:18,610 : INFO : 144 batches submitted to accumulate stats from 9216 documents (2468645 virtual) 2025-04-19 00:11:18,634 : INFO : 145 batches submitted to accumulate stats from 9280 documents (2476321 virtual) 2025-04-19 00:11:18,658 : INFO : 146 batches submitted to accumulate stats from 9344 documents (2481981 virtual) 2025-04-19 00:11:18,665 : INFO : 147 batches submitted to accumulate stats from 9408 documents (2489833 virtual) 2025-04-19 00:11:18,669 : INFO : 148 batches submitted to accumulate stats from 9472 documents (2496627 virtual) 2025-04-19 00:11:18,676 : INFO : 149 batches submitted to accumulate stats from 9536 documents (2502106 virtual) 2025-04-19 00:11:18,684 : INFO : 150 batches submitted to accumulate stats from 9600 documents (2508434 virtual) 2025-04-19 00:11:18,689 : INFO : 151 batches submitted to accumulate stats from 9664 documents (2517654 virtual) 2025-04-19 00:11:18,692 : INFO : 152 batches submitted to accumulate stats from 9728 documents (2525651 virtual) 2025-04-19 00:11:18,726 : INFO : 153 batches submitted to accumulate stats from 9792 documents (2534661 virtual) 2025-04-19 00:11:18,765 : INFO : 154 batches submitted to accumulate stats from 9856 documents (2542846 virtual) 2025-04-19 00:11:18,774 : INFO : 155 batches submitted to accumulate stats from 9920 documents (2549206 virtual) 2025-04-19 00:11:18,778 : INFO : 156 batches submitted to accumulate stats from 9984 documents (2556742 virtual) 2025-04-19 00:11:18,782 : INFO : 157 batches submitted to accumulate stats from 10048 documents (2565026 virtual) 2025-04-19 00:11:18,793 : INFO : 158 batches submitted to accumulate stats from 10112 documents (2571434 virtual) 2025-04-19 00:11:18,796 : INFO : 159 batches submitted to accumulate stats from 10176 documents (2581280 virtual) 2025-04-19 00:11:18,798 : INFO : 160 batches submitted to accumulate stats from 10240 documents (2589671 virtual) 2025-04-19 00:11:18,828 : INFO : 161 batches submitted to accumulate stats from 10304 documents (2596979 virtual) 2025-04-19 00:11:18,830 : INFO : 162 batches submitted to accumulate stats from 10368 documents (2604556 virtual) 2025-04-19 00:11:18,833 : INFO : 163 batches submitted to accumulate stats from 10432 documents (2613656 virtual) 2025-04-19 00:11:18,840 : INFO : 164 batches submitted to accumulate stats from 10496 documents (2623890 virtual) 2025-04-19 00:11:18,856 : INFO : 165 batches submitted to accumulate stats from 10560 documents (2629308 virtual) 2025-04-19 00:11:18,865 : INFO : 166 batches submitted to accumulate stats from 10624 documents (2636085 virtual) 2025-04-19 00:11:18,866 : INFO : 167 batches submitted to accumulate stats from 10688 documents (2642039 virtual) 2025-04-19 00:11:18,871 : INFO : 168 batches submitted to accumulate stats from 10752 documents (2648389 virtual) 2025-04-19 00:11:18,927 : INFO : 169 batches submitted to accumulate stats from 10816 documents (2661959 virtual) 2025-04-19 00:11:18,938 : INFO : 170 batches submitted to accumulate stats from 10880 documents (2672949 virtual) 2025-04-19 00:11:18,969 : INFO : 171 batches submitted to accumulate stats from 10944 documents (2683365 virtual) 2025-04-19 00:11:18,975 : INFO : 172 batches submitted to accumulate stats from 11008 documents (2690484 virtual) 2025-04-19 00:11:18,985 : INFO : 173 batches submitted to accumulate stats from 11072 documents (2700627 virtual) 2025-04-19 00:11:18,993 : INFO : 174 batches submitted to accumulate stats from 11136 documents (2708742 virtual) 2025-04-19 00:11:18,997 : INFO : 175 batches submitted to accumulate stats from 11200 documents (2718156 virtual) 2025-04-19 00:11:19,002 : INFO : 176 batches submitted to accumulate stats from 11264 documents (2727801 virtual) 2025-04-19 00:11:19,012 : INFO : 177 batches submitted to accumulate stats from 11328 documents (2736288 virtual) 2025-04-19 00:11:19,024 : INFO : 178 batches submitted to accumulate stats from 11392 documents (2743845 virtual) 2025-04-19 00:11:19,043 : INFO : 179 batches submitted to accumulate stats from 11456 documents (2750885 virtual) 2025-04-19 00:11:19,045 : INFO : 180 batches submitted to accumulate stats from 11520 documents (2759213 virtual) 2025-04-19 00:11:19,047 : INFO : 181 batches submitted to accumulate stats from 11584 documents (2770309 virtual) 2025-04-19 00:11:19,050 : INFO : 182 batches submitted to accumulate stats from 11648 documents (2781566 virtual) 2025-04-19 00:11:19,086 : INFO : 183 batches submitted to accumulate stats from 11712 documents (2793513 virtual) 2025-04-19 00:11:19,090 : INFO : 184 batches submitted to accumulate stats from 11776 documents (2805133 virtual) 2025-04-19 00:11:19,096 : INFO : 185 batches submitted to accumulate stats from 11840 documents (2814621 virtual) 2025-04-19 00:11:19,102 : INFO : 186 batches submitted to accumulate stats from 11904 documents (2825917 virtual) 2025-04-19 00:11:19,110 : INFO : 187 batches submitted to accumulate stats from 11968 documents (2834764 virtual) 2025-04-19 00:11:19,112 : INFO : 188 batches submitted to accumulate stats from 12032 documents (2844523 virtual) 2025-04-19 00:11:19,115 : INFO : 189 batches submitted to accumulate stats from 12096 documents (2854512 virtual) 2025-04-19 00:11:19,174 : INFO : 190 batches submitted to accumulate stats from 12160 documents (2863511 virtual) 2025-04-19 00:11:19,188 : INFO : 191 batches submitted to accumulate stats from 12224 documents (2872492 virtual) 2025-04-19 00:11:19,196 : INFO : 192 batches submitted to accumulate stats from 12288 documents (2881543 virtual) 2025-04-19 00:11:19,201 : INFO : 193 batches submitted to accumulate stats from 12352 documents (2891233 virtual) 2025-04-19 00:11:19,203 : INFO : 194 batches submitted to accumulate stats from 12416 documents (2899835 virtual) 2025-04-19 00:11:19,220 : INFO : 195 batches submitted to accumulate stats from 12480 documents (2908542 virtual) 2025-04-19 00:11:19,239 : INFO : 196 batches submitted to accumulate stats from 12544 documents (2920162 virtual) 2025-04-19 00:11:19,253 : INFO : 197 batches submitted to accumulate stats from 12608 documents (2931072 virtual) 2025-04-19 00:11:19,300 : INFO : 198 batches submitted to accumulate stats from 12672 documents (2942168 virtual) 2025-04-19 00:11:19,301 : INFO : 199 batches submitted to accumulate stats from 12736 documents (2951378 virtual) 2025-04-19 00:11:19,309 : INFO : 200 batches submitted to accumulate stats from 12800 documents (2964980 virtual) 2025-04-19 00:11:19,314 : INFO : 201 batches submitted to accumulate stats from 12864 documents (2974742 virtual) 2025-04-19 00:11:19,317 : INFO : 202 batches submitted to accumulate stats from 12928 documents (2984778 virtual) 2025-04-19 00:11:19,363 : INFO : 203 batches submitted to accumulate stats from 12992 documents (2994073 virtual) 2025-04-19 00:11:19,379 : INFO : 204 batches submitted to accumulate stats from 13056 documents (3002522 virtual) 2025-04-19 00:11:19,401 : INFO : 205 batches submitted to accumulate stats from 13120 documents (3012040 virtual) 2025-04-19 00:11:19,409 : INFO : 206 batches submitted to accumulate stats from 13184 documents (3019919 virtual) 2025-04-19 00:11:19,411 : INFO : 207 batches submitted to accumulate stats from 13248 documents (3029004 virtual) 2025-04-19 00:11:19,420 : INFO : 208 batches submitted to accumulate stats from 13312 documents (3037489 virtual) 2025-04-19 00:11:19,433 : INFO : 209 batches submitted to accumulate stats from 13376 documents (3044929 virtual) 2025-04-19 00:11:19,443 : INFO : 210 batches submitted to accumulate stats from 13440 documents (3054034 virtual) 2025-04-19 00:11:19,467 : INFO : 211 batches submitted to accumulate stats from 13504 documents (3064099 virtual) 2025-04-19 00:11:19,479 : INFO : 212 batches submitted to accumulate stats from 13568 documents (3074522 virtual) 2025-04-19 00:11:19,482 : INFO : 213 batches submitted to accumulate stats from 13632 documents (3083808 virtual) 2025-04-19 00:11:19,497 : INFO : 214 batches submitted to accumulate stats from 13696 documents (3093078 virtual) 2025-04-19 00:11:19,499 : INFO : 215 batches submitted to accumulate stats from 13760 documents (3102171 virtual) 2025-04-19 00:11:19,504 : INFO : 216 batches submitted to accumulate stats from 13824 documents (3111128 virtual) 2025-04-19 00:11:19,537 : INFO : 217 batches submitted to accumulate stats from 13888 documents (3120517 virtual) 2025-04-19 00:11:19,560 : INFO : 218 batches submitted to accumulate stats from 13952 documents (3130614 virtual) 2025-04-19 00:11:19,571 : INFO : 219 batches submitted to accumulate stats from 14016 documents (3139268 virtual) 2025-04-19 00:11:19,577 : INFO : 220 batches submitted to accumulate stats from 14080 documents (3148635 virtual) 2025-04-19 00:11:19,590 : INFO : 221 batches submitted to accumulate stats from 14144 documents (3157335 virtual) 2025-04-19 00:11:19,592 : INFO : 222 batches submitted to accumulate stats from 14208 documents (3165838 virtual) 2025-04-19 00:11:19,610 : INFO : 223 batches submitted to accumulate stats from 14272 documents (3175765 virtual) 2025-04-19 00:11:19,635 : INFO : 224 batches submitted to accumulate stats from 14336 documents (3183123 virtual) 2025-04-19 00:11:19,640 : INFO : 225 batches submitted to accumulate stats from 14400 documents (3189537 virtual) 2025-04-19 00:11:19,649 : INFO : 226 batches submitted to accumulate stats from 14464 documents (3197239 virtual) 2025-04-19 00:11:19,651 : INFO : 227 batches submitted to accumulate stats from 14528 documents (3205518 virtual) 2025-04-19 00:11:19,653 : INFO : 228 batches submitted to accumulate stats from 14592 documents (3215608 virtual) 2025-04-19 00:11:19,665 : INFO : 229 batches submitted to accumulate stats from 14656 documents (3223376 virtual) 2025-04-19 00:11:19,692 : INFO : 230 batches submitted to accumulate stats from 14720 documents (3232304 virtual) 2025-04-19 00:11:19,704 : INFO : 231 batches submitted to accumulate stats from 14784 documents (3240270 virtual) 2025-04-19 00:11:19,708 : INFO : 232 batches submitted to accumulate stats from 14848 documents (3249755 virtual) 2025-04-19 00:11:19,712 : INFO : 233 batches submitted to accumulate stats from 14912 documents (3259377 virtual) 2025-04-19 00:11:19,764 : INFO : 234 batches submitted to accumulate stats from 14976 documents (3269637 virtual) 2025-04-19 00:11:19,773 : INFO : 235 batches submitted to accumulate stats from 15040 documents (3278311 virtual) 2025-04-19 00:11:19,791 : INFO : 236 batches submitted to accumulate stats from 15104 documents (3286321 virtual) 2025-04-19 00:11:19,793 : INFO : 237 batches submitted to accumulate stats from 15168 documents (3293385 virtual) 2025-04-19 00:11:19,796 : INFO : 238 batches submitted to accumulate stats from 15232 documents (3300334 virtual) 2025-04-19 00:11:19,803 : INFO : 239 batches submitted to accumulate stats from 15296 documents (3308226 virtual) 2025-04-19 00:11:19,806 : INFO : 240 batches submitted to accumulate stats from 15360 documents (3317325 virtual) 2025-04-19 00:11:19,828 : INFO : 241 batches submitted to accumulate stats from 15424 documents (3325778 virtual) 2025-04-19 00:11:19,831 : INFO : 242 batches submitted to accumulate stats from 15488 documents (3335373 virtual) 2025-04-19 00:11:19,859 : INFO : 243 batches submitted to accumulate stats from 15552 documents (3342716 virtual) 2025-04-19 00:11:19,862 : INFO : 244 batches submitted to accumulate stats from 15616 documents (3350508 virtual) 2025-04-19 00:11:19,866 : INFO : 245 batches submitted to accumulate stats from 15680 documents (3360131 virtual) 2025-04-19 00:11:19,909 : INFO : 246 batches submitted to accumulate stats from 15744 documents (3370635 virtual) 2025-04-19 00:11:19,958 : INFO : 247 batches submitted to accumulate stats from 15808 documents (3380994 virtual) 2025-04-19 00:11:19,961 : INFO : 248 batches submitted to accumulate stats from 15872 documents (3389920 virtual) 2025-04-19 00:11:19,963 : INFO : 249 batches submitted to accumulate stats from 15936 documents (3397487 virtual) 2025-04-19 00:11:19,965 : INFO : 250 batches submitted to accumulate stats from 16000 documents (3406129 virtual) 2025-04-19 00:11:19,966 : INFO : 251 batches submitted to accumulate stats from 16064 documents (3416805 virtual) 2025-04-19 00:11:19,968 : INFO : 252 batches submitted to accumulate stats from 16128 documents (3426189 virtual) 2025-04-19 00:11:19,996 : INFO : 253 batches submitted to accumulate stats from 16192 documents (3433824 virtual) 2025-04-19 00:11:20,005 : INFO : 254 batches submitted to accumulate stats from 16256 documents (3443379 virtual) 2025-04-19 00:11:20,008 : INFO : 255 batches submitted to accumulate stats from 16320 documents (3450914 virtual) 2025-04-19 00:11:20,311 : INFO : 7 accumulators retrieved from output queue 2025-04-19 00:11:20,320 : INFO : accumulated word occurrence stats for 3451622 virtual documents
花費時間: 213.75661492347717 sec
result = pd.DataFrame(result)
result
topic_num | perplexity | pmi | |
---|---|---|---|
0 | 2 | 1284.941121 | -0.054689 |
1 | 3 | 1176.735561 | -0.048405 |
2 | 4 | 1142.876082 | -0.024995 |
3 | 5 | 1081.005299 | -0.040911 |
4 | 6 | 1074.750669 | 0.000634 |
5 | 7 | 1120.856056 | -0.038652 |
6 | 8 | 1135.193597 | -0.021426 |
7 | 9 | 1209.470612 | -0.021101 |
result.plot.line(x='topic_num', y='perplexity')
/Users/wengwulin/Desktop/社群媒體分析/SMAenv/lib/python3.11/site-packages/IPython/core/pylabtools.py:77: DeprecationWarning: backend2gui is deprecated since IPython 8.24, backends are managed in matplotlib and can be externally registered. warnings.warn( /Users/wengwulin/Desktop/社群媒體分析/SMAenv/lib/python3.11/site-packages/IPython/core/pylabtools.py:77: DeprecationWarning: backend2gui is deprecated since IPython 8.24, backends are managed in matplotlib and can be externally registered. warnings.warn( /Users/wengwulin/Desktop/社群媒體分析/SMAenv/lib/python3.11/site-packages/IPython/core/pylabtools.py:77: DeprecationWarning: backend2gui is deprecated since IPython 8.24, backends are managed in matplotlib and can be externally registered. warnings.warn(
<Axes: xlabel='topic_num'>
result.plot.line(x='topic_num', y='pmi')
<Axes: xlabel='topic_num'>
就訓練結果來看,perplexity 在 6 不錯,coherence 最高則是 6
best_model = LdaModel(
corpus = corpus,
num_topics = 6,
id2word=dictionary,
random_state = 1500,
passes = 5 # 訓練次數
)
2025-04-19 16:03:13,927 : INFO : using symmetric alpha at 0.16666666666666666 2025-04-19 16:03:13,928 : INFO : using symmetric eta at 0.16666666666666666 2025-04-19 16:03:13,931 : INFO : using serial LDA version on this node 2025-04-19 16:03:13,942 : INFO : running online (multi-pass) LDA training, 6 topics, 5 passes over the supplied corpus of 16310 documents, updating model once every 2000 documents, evaluating perplexity every 16310 documents, iterating 50x with a convergence threshold of 0.001000 2025-04-19 16:03:13,943 : INFO : PROGRESS: pass 0, at document #2000/16310 2025-04-19 16:03:14,590 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:14,594 : INFO : topic #0 (0.167): 0.029*"工作" + 0.014*"方式" + 0.013*"應徵" + 0.012*"推定" + 0.012*"單位" + 0.012*"空白" + 0.012*"砍除" + 0.010*"內容" + 0.010*"資訊" + 0.009*"聯絡" 2025-04-19 16:03:14,595 : INFO : topic #1 (0.167): 0.030*"工作" + 0.016*"方式" + 0.012*"推定" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"砍除" + 0.011*"第一項" + 0.010*"國定假日" + 0.010*"情形" + 0.010*"內容" 2025-04-19 16:03:14,596 : INFO : topic #4 (0.167): 0.038*"工作" + 0.017*"推定" + 0.014*"空白" + 0.012*"方式" + 0.011*"聯絡" + 0.010*"第一項" + 0.010*"單位" + 0.010*"聯絡人" + 0.009*"情形" + 0.009*"內容" 2025-04-19 16:03:14,596 : INFO : topic #2 (0.167): 0.040*"工作" + 0.013*"內容" + 0.012*"推定" + 0.012*"工資" + 0.011*"方式" + 0.011*"應徵" + 0.010*"情形" + 0.010*"聯絡" + 0.010*"砍除" + 0.010*"小時" 2025-04-19 16:03:14,597 : INFO : topic #5 (0.167): 0.017*"工作" + 0.012*"方式" + 0.011*"空白" + 0.010*"聯絡" + 0.010*"應徵" + 0.009*"內容" + 0.008*"分類" + 0.008*"聯絡人" + 0.008*"資訊" + 0.008*"小時" 2025-04-19 16:03:14,597 : INFO : topic diff=6.809205, rho=1.000000 2025-04-19 16:03:14,598 : INFO : PROGRESS: pass 0, at document #4000/16310 2025-04-19 16:03:15,227 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:15,230 : INFO : topic #0 (0.167): 0.029*"工作" + 0.014*"應徵" + 0.013*"砍除" + 0.013*"方式" + 0.012*"空白" + 0.011*"推定" + 0.011*"單位" + 0.011*"資訊" + 0.010*"第一項" + 0.010*"內容" 2025-04-19 16:03:15,230 : INFO : topic #1 (0.167): 0.029*"工作" + 0.015*"方式" + 0.012*"砍除" + 0.012*"推定" + 0.012*"情形" + 0.012*"第一項" + 0.011*"聯絡" + 0.011*"國定假日" + 0.011*"單位" + 0.011*"資訊" 2025-04-19 16:03:15,231 : INFO : topic #4 (0.167): 0.037*"工作" + 0.017*"推定" + 0.015*"空白" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"方式" + 0.011*"情形" + 0.010*"砍除" + 0.010*"國定假日" + 0.010*"聯絡人" 2025-04-19 16:03:15,231 : INFO : topic #3 (0.167): 0.013*"工作" + 0.013*"報名" + 0.012*"方式" + 0.010*"電話" + 0.009*"時間" + 0.009*"聯絡人" + 0.009*"活動" + 0.008*"聯絡" + 0.008*"內容" + 0.008*"資訊" 2025-04-19 16:03:15,232 : INFO : topic #5 (0.167): 0.013*"工作" + 0.013*"方式" + 0.012*"報名" + 0.012*"活動" + 0.012*"電話" + 0.010*"聯絡" + 0.010*"時間" + 0.010*"台北市" + 0.010*"通知" + 0.009*"內容" 2025-04-19 16:03:15,232 : INFO : topic diff=0.648008, rho=0.707107 2025-04-19 16:03:15,233 : INFO : PROGRESS: pass 0, at document #6000/16310 2025-04-19 16:03:15,770 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:15,773 : INFO : topic #0 (0.167): 0.029*"工作" + 0.013*"應徵" + 0.013*"砍除" + 0.012*"方式" + 0.012*"空白" + 0.011*"第一項" + 0.011*"資訊" + 0.010*"推定" + 0.010*"內容" + 0.010*"單位" 2025-04-19 16:03:15,773 : INFO : topic #3 (0.167): 0.013*"報名" + 0.012*"電話" + 0.011*"方式" + 0.010*"時間" + 0.009*"工作" + 0.009*"活動" + 0.009*"聯絡人" + 0.009*"公司" + 0.009*"聯絡" + 0.008*"內容" 2025-04-19 16:03:15,774 : INFO : topic #4 (0.167): 0.037*"工作" + 0.016*"推定" + 0.015*"空白" + 0.012*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"方式" + 0.011*"砍除" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 16:03:15,774 : INFO : topic #2 (0.167): 0.043*"工作" + 0.016*"方式" + 0.013*"推定" + 0.012*"小時" + 0.012*"工資" + 0.012*"內容" + 0.011*"單位" + 0.010*"未註明" + 0.010*"依法" + 0.010*"應徵" 2025-04-19 16:03:15,775 : INFO : topic #1 (0.167): 0.029*"工作" + 0.015*"方式" + 0.013*"砍除" + 0.012*"情形" + 0.012*"推定" + 0.012*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"資訊" 2025-04-19 16:03:15,775 : INFO : topic diff=0.664007, rho=0.577350 2025-04-19 16:03:15,776 : INFO : PROGRESS: pass 0, at document #8000/16310 2025-04-19 16:03:16,066 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:16,069 : INFO : topic #0 (0.167): 0.029*"工作" + 0.013*"應徵" + 0.013*"砍除" + 0.012*"方式" + 0.012*"空白" + 0.011*"第一項" + 0.011*"資訊" + 0.010*"推定" + 0.010*"內容" + 0.010*"單位" 2025-04-19 16:03:16,069 : INFO : topic #4 (0.167): 0.037*"工作" + 0.016*"推定" + 0.015*"空白" + 0.012*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"方式" + 0.011*"砍除" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 16:03:16,070 : INFO : topic #2 (0.167): 0.044*"工作" + 0.013*"方式" + 0.010*"小時" + 0.010*"內容" + 0.009*"推定" + 0.009*"公司" + 0.009*"工資" + 0.009*"時間" + 0.008*"單位" + 0.008*"面試" 2025-04-19 16:03:16,070 : INFO : topic #1 (0.167): 0.029*"工作" + 0.015*"方式" + 0.013*"砍除" + 0.012*"情形" + 0.012*"推定" + 0.012*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"資訊" 2025-04-19 16:03:16,071 : INFO : topic #3 (0.167): 0.013*"報名" + 0.011*"電話" + 0.011*"方式" + 0.009*"時間" + 0.009*"工作" + 0.009*"公司" + 0.009*"活動" + 0.009*"聯絡人" + 0.008*"聯絡" + 0.008*"內容" 2025-04-19 16:03:16,071 : INFO : topic diff=0.848046, rho=0.500000 2025-04-19 16:03:16,071 : INFO : PROGRESS: pass 0, at document #10000/16310 2025-04-19 16:03:16,329 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:16,332 : INFO : topic #3 (0.167): 0.013*"報名" + 0.011*"電話" + 0.010*"方式" + 0.009*"時間" + 0.009*"工作" + 0.009*"活動" + 0.008*"公司" + 0.008*"聯絡人" + 0.008*"聯絡" + 0.008*"內容" 2025-04-19 16:03:16,332 : INFO : topic #5 (0.167): 0.016*"公司" + 0.011*"工作" + 0.008*"面試" + 0.007*"問題" + 0.007*"時間" + 0.006*"工程師" + 0.006*"開發" + 0.006*"經驗" + 0.006*"目前" + 0.005*"技術" 2025-04-19 16:03:16,333 : INFO : topic #2 (0.167): 0.044*"工作" + 0.012*"方式" + 0.010*"小時" + 0.009*"內容" + 0.009*"公司" + 0.009*"覺得" + 0.008*"時間" + 0.008*"面試" + 0.008*"單位" + 0.008*"推定" 2025-04-19 16:03:16,333 : INFO : topic #1 (0.167): 0.029*"工作" + 0.015*"方式" + 0.013*"砍除" + 0.012*"情形" + 0.012*"推定" + 0.012*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"資訊" 2025-04-19 16:03:16,334 : INFO : topic #4 (0.167): 0.037*"工作" + 0.016*"推定" + 0.015*"空白" + 0.012*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"方式" + 0.011*"砍除" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 16:03:16,334 : INFO : topic diff=0.569089, rho=0.447214 2025-04-19 16:03:16,335 : INFO : PROGRESS: pass 0, at document #12000/16310 2025-04-19 16:03:16,567 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:16,570 : INFO : topic #5 (0.167): 0.014*"公司" + 0.009*"工作" + 0.007*"面試" + 0.006*"問題" + 0.005*"時間" + 0.005*"工程師" + 0.005*"開發" + 0.005*"目前" + 0.005*"技術" + 0.005*"經驗" 2025-04-19 16:03:16,571 : INFO : topic #3 (0.167): 0.014*"報名" + 0.013*"活動" + 0.011*"方式" + 0.010*"電話" + 0.009*"時間" + 0.009*"聯絡" + 0.008*"工作" + 0.008*"連結" + 0.008*"公司" + 0.007*"聯絡人" 2025-04-19 16:03:16,571 : INFO : topic #2 (0.167): 0.042*"工作" + 0.011*"方式" + 0.009*"公司" + 0.009*"小時" + 0.008*"覺得" + 0.008*"內容" + 0.008*"時間" + 0.008*"單位" + 0.008*"面試" + 0.007*"推定" 2025-04-19 16:03:16,572 : INFO : topic #0 (0.167): 0.028*"工作" + 0.013*"應徵" + 0.012*"砍除" + 0.012*"方式" + 0.011*"空白" + 0.010*"第一項" + 0.010*"資訊" + 0.010*"推定" + 0.010*"內容" + 0.010*"單位" 2025-04-19 16:03:16,572 : INFO : topic #4 (0.167): 0.037*"工作" + 0.016*"推定" + 0.015*"空白" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"方式" + 0.011*"砍除" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 16:03:16,573 : INFO : topic diff=0.534139, rho=0.408248 2025-04-19 16:03:16,573 : INFO : PROGRESS: pass 0, at document #14000/16310 2025-04-19 16:03:16,800 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:16,803 : INFO : topic #1 (0.167): 0.028*"工作" + 0.014*"方式" + 0.013*"砍除" + 0.012*"情形" + 0.012*"推定" + 0.011*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.011*"文字" + 0.011*"資訊" 2025-04-19 16:03:16,803 : INFO : topic #5 (0.167): 0.013*"公司" + 0.008*"工作" + 0.006*"台灣" + 0.005*"問題" + 0.004*"工程師" + 0.004*"面試" + 0.004*"技術" + 0.004*"時間" + 0.004*"目前" + 0.004*"員工" 2025-04-19 16:03:16,804 : INFO : topic #2 (0.167): 0.039*"工作" + 0.009*"方式" + 0.009*"公司" + 0.008*"小時" + 0.008*"覺得" + 0.008*"內容" + 0.008*"單位" + 0.007*"時間" + 0.007*"面試" + 0.006*"工資" 2025-04-19 16:03:16,804 : INFO : topic #0 (0.167): 0.027*"工作" + 0.012*"應徵" + 0.012*"砍除" + 0.011*"方式" + 0.011*"空白" + 0.010*"第一項" + 0.010*"資訊" + 0.010*"單位" + 0.010*"推定" + 0.010*"內容" 2025-04-19 16:03:16,805 : INFO : topic #3 (0.167): 0.014*"報名" + 0.012*"量子" + 0.012*"活動" + 0.011*"問卷" + 0.011*"研究" + 0.010*"方式" + 0.008*"時間" + 0.008*"連結" + 0.008*"電話" + 0.008*"聯絡" 2025-04-19 16:03:16,805 : INFO : topic diff=0.474830, rho=0.377964 2025-04-19 16:03:16,806 : INFO : PROGRESS: pass 0, at document #16000/16310 2025-04-19 16:03:17,052 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:17,055 : INFO : topic #4 (0.167): 0.036*"工作" + 0.016*"推定" + 0.014*"空白" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"情形" + 0.010*"方式" + 0.010*"砍除" + 0.010*"國定假日" + 0.010*"單位" 2025-04-19 16:03:17,055 : INFO : topic #3 (0.167): 0.021*"量子" + 0.017*"研究" + 0.017*"報名" + 0.014*"問卷" + 0.012*"活動" + 0.010*"眼鏡" + 0.009*"方式" + 0.009*"連結" + 0.008*"聯絡" + 0.008*"時間" 2025-04-19 16:03:17,056 : INFO : topic #0 (0.167): 0.026*"工作" + 0.012*"應徵" + 0.011*"砍除" + 0.011*"方式" + 0.011*"空白" + 0.010*"第一項" + 0.010*"資訊" + 0.009*"單位" + 0.009*"推定" + 0.009*"內容" 2025-04-19 16:03:17,057 : INFO : topic #1 (0.167): 0.028*"工作" + 0.014*"方式" + 0.012*"砍除" + 0.012*"情形" + 0.011*"推定" + 0.011*"第一項" + 0.011*"國定假日" + 0.011*"聯絡" + 0.010*"文字" + 0.010*"資訊" 2025-04-19 16:03:17,057 : INFO : topic #2 (0.167): 0.037*"工作" + 0.009*"公司" + 0.008*"方式" + 0.008*"小時" + 0.007*"內容" + 0.007*"覺得" + 0.007*"預期" + 0.007*"單位" + 0.007*"時間" + 0.007*"製程" 2025-04-19 16:03:17,058 : INFO : topic diff=0.378381, rho=0.353553 2025-04-19 16:03:17,127 : INFO : -8.638 per-word bound, 398.4 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 16:03:17,127 : INFO : PROGRESS: pass 0, at document #16310/16310 2025-04-19 16:03:17,164 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 16:03:17,167 : INFO : topic #3 (0.167): 0.028*"問卷" + 0.027*"研究" + 0.017*"報名" + 0.017*"量子" + 0.011*"時間" + 0.011*"活動" + 0.010*"工作" + 0.010*"眼鏡" + 0.010*"連結" + 0.009*"填寫" 2025-04-19 16:03:17,167 : INFO : topic #5 (0.167): 0.012*"公司" + 0.007*"美國" + 0.007*"台灣" + 0.005*"工作" + 0.005*"技術" + 0.005*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"台積電" + 0.004*"問題" 2025-04-19 16:03:17,168 : INFO : topic #4 (0.167): 0.035*"工作" + 0.015*"推定" + 0.014*"空白" + 0.011*"第一項" + 0.010*"聯絡" + 0.010*"情形" + 0.010*"方式" + 0.010*"砍除" + 0.010*"國定假日" + 0.009*"單位" 2025-04-19 16:03:17,168 : INFO : topic #1 (0.167): 0.027*"工作" + 0.014*"方式" + 0.012*"砍除" + 0.011*"情形" + 0.011*"推定" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"國定假日" + 0.010*"連結" + 0.010*"文字" 2025-04-19 16:03:17,169 : INFO : topic #2 (0.167): 0.036*"工作" + 0.009*"公司" + 0.008*"小時" + 0.008*"覺得" + 0.008*"預期" + 0.007*"方式" + 0.007*"單位" + 0.007*"時間" + 0.007*"內容" + 0.006*"製程" 2025-04-19 16:03:17,169 : INFO : topic diff=0.331835, rho=0.333333 2025-04-19 16:03:17,169 : INFO : PROGRESS: pass 1, at document #2000/16310 2025-04-19 16:03:17,774 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:17,777 : INFO : topic #1 (0.167): 0.030*"工作" + 0.015*"方式" + 0.012*"砍除" + 0.012*"推定" + 0.012*"情形" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"單位" + 0.011*"第一項" + 0.011*"文字" 2025-04-19 16:03:17,778 : INFO : topic #4 (0.167): 0.038*"工作" + 0.017*"推定" + 0.015*"空白" + 0.011*"方式" + 0.011*"砍除" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"第一項" + 0.010*"單位" + 0.010*"內容" 2025-04-19 16:03:17,778 : INFO : topic #0 (0.167): 0.029*"工作" + 0.013*"方式" + 0.012*"應徵" + 0.010*"內容" + 0.010*"推定" + 0.010*"單位" + 0.009*"資訊" + 0.009*"工資" + 0.009*"砍除" + 0.009*"聯絡" 2025-04-19 16:03:17,779 : INFO : topic #2 (0.167): 0.042*"工作" + 0.012*"方式" + 0.011*"小時" + 0.011*"時間" + 0.008*"內容" + 0.007*"單位" + 0.007*"面試" + 0.007*"每日" + 0.006*"工資" + 0.006*"休息" 2025-04-19 16:03:17,779 : INFO : topic #3 (0.167): 0.025*"報名" + 0.021*"活動" + 0.017*"電話" + 0.013*"台北市" + 0.013*"時間" + 0.012*"舉辦" + 0.011*"參與" + 0.011*"車馬費" + 0.010*"研究" + 0.010*"通知" 2025-04-19 16:03:17,780 : INFO : topic diff=1.299584, rho=0.313805 2025-04-19 16:03:17,780 : INFO : PROGRESS: pass 1, at document #4000/16310 2025-04-19 16:03:18,373 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:18,376 : INFO : topic #3 (0.167): 0.026*"報名" + 0.023*"活動" + 0.020*"電話" + 0.014*"台北市" + 0.013*"時間" + 0.012*"車馬費" + 0.012*"舉辦" + 0.012*"人數" + 0.011*"通知" + 0.010*"資料" 2025-04-19 16:03:18,376 : INFO : topic #4 (0.167): 0.037*"工作" + 0.017*"推定" + 0.015*"空白" + 0.012*"砍除" + 0.011*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"內容" 2025-04-19 16:03:18,377 : INFO : topic #0 (0.167): 0.029*"工作" + 0.013*"方式" + 0.011*"應徵" + 0.010*"推定" + 0.010*"內容" + 0.009*"單位" + 0.009*"工資" + 0.009*"聯絡" + 0.009*"文字" + 0.009*"資訊" 2025-04-19 16:03:18,377 : INFO : topic #2 (0.167): 0.045*"工作" + 0.016*"方式" + 0.014*"小時" + 0.014*"時間" + 0.010*"每日" + 0.009*"內容" + 0.009*"工資" + 0.009*"面試" + 0.008*"休息" + 0.008*"單位" 2025-04-19 16:03:18,378 : INFO : topic #5 (0.167): 0.012*"公司" + 0.007*"台灣" + 0.007*"美國" + 0.005*"工作" + 0.005*"技術" + 0.005*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"台積電" + 0.004*"問題" 2025-04-19 16:03:18,378 : INFO : topic diff=0.486948, rho=0.313805 2025-04-19 16:03:18,379 : INFO : PROGRESS: pass 1, at document #6000/16310 2025-04-19 16:03:18,879 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:18,882 : INFO : topic #3 (0.167): 0.029*"報名" + 0.025*"活動" + 0.021*"電話" + 0.016*"台北市" + 0.013*"車馬費" + 0.013*"時間" + 0.012*"舉辦" + 0.012*"人數" + 0.012*"通知" + 0.011*"訪問" 2025-04-19 16:03:18,883 : INFO : topic #1 (0.167): 0.030*"工作" + 0.015*"方式" + 0.013*"砍除" + 0.012*"情形" + 0.012*"推定" + 0.011*"第一項" + 0.011*"文字" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"單位" 2025-04-19 16:03:18,883 : INFO : topic #0 (0.167): 0.029*"工作" + 0.013*"方式" + 0.011*"應徵" + 0.010*"內容" + 0.010*"推定" + 0.009*"單位" + 0.009*"文字" + 0.009*"聯絡" + 0.009*"工資" + 0.009*"資訊" 2025-04-19 16:03:18,884 : INFO : topic #4 (0.167): 0.037*"工作" + 0.017*"推定" + 0.015*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"內容" 2025-04-19 16:03:18,884 : INFO : topic #2 (0.167): 0.048*"工作" + 0.020*"方式" + 0.015*"時間" + 0.015*"小時" + 0.012*"每日" + 0.010*"內容" + 0.010*"休息" + 0.010*"工資" + 0.010*"依法" + 0.009*"面試" 2025-04-19 16:03:18,885 : INFO : topic diff=0.306307, rho=0.313805 2025-04-19 16:03:18,885 : INFO : PROGRESS: pass 1, at document #8000/16310 2025-04-19 16:03:19,159 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:19,162 : INFO : topic #3 (0.167): 0.028*"報名" + 0.025*"活動" + 0.020*"電話" + 0.015*"台北市" + 0.013*"時間" + 0.012*"車馬費" + 0.012*"舉辦" + 0.012*"人數" + 0.011*"資料" + 0.011*"通知" 2025-04-19 16:03:19,163 : INFO : topic #0 (0.167): 0.029*"工作" + 0.013*"方式" + 0.011*"應徵" + 0.010*"內容" + 0.010*"推定" + 0.009*"聯絡" + 0.009*"單位" + 0.009*"文字" + 0.009*"資訊" + 0.009*"工資" 2025-04-19 16:03:19,163 : INFO : topic #4 (0.167): 0.037*"工作" + 0.017*"推定" + 0.015*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"內容" 2025-04-19 16:03:19,164 : INFO : topic #2 (0.167): 0.054*"工作" + 0.021*"方式" + 0.017*"時間" + 0.016*"小時" + 0.011*"每日" + 0.011*"內容" + 0.010*"面試" + 0.009*"休息" + 0.008*"工時" + 0.008*"依法" 2025-04-19 16:03:19,165 : INFO : topic #5 (0.167): 0.015*"公司" + 0.008*"工作" + 0.006*"面試" + 0.006*"問題" + 0.006*"工程師" + 0.005*"技術" + 0.005*"開發" + 0.004*"台灣" + 0.004*"目前" + 0.004*"經驗" 2025-04-19 16:03:19,165 : INFO : topic diff=0.342322, rho=0.313805 2025-04-19 16:03:19,165 : INFO : PROGRESS: pass 1, at document #10000/16310 2025-04-19 16:03:19,389 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:19,392 : INFO : topic #3 (0.167): 0.029*"報名" + 0.025*"活動" + 0.019*"電話" + 0.015*"台北市" + 0.013*"時間" + 0.012*"舉辦" + 0.012*"車馬費" + 0.011*"人數" + 0.011*"研究" + 0.011*"資料" 2025-04-19 16:03:19,392 : INFO : topic #2 (0.167): 0.056*"工作" + 0.021*"方式" + 0.019*"時間" + 0.017*"小時" + 0.011*"內容" + 0.011*"每日" + 0.010*"面試" + 0.009*"工時" + 0.008*"休息" + 0.008*"聯絡" 2025-04-19 16:03:19,393 : INFO : topic #4 (0.167): 0.037*"工作" + 0.017*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.011*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"內容" 2025-04-19 16:03:19,393 : INFO : topic #1 (0.167): 0.030*"工作" + 0.015*"方式" + 0.013*"砍除" + 0.012*"情形" + 0.012*"推定" + 0.011*"第一項" + 0.011*"文字" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"單位" 2025-04-19 16:03:19,394 : INFO : topic #0 (0.167): 0.029*"工作" + 0.013*"方式" + 0.011*"應徵" + 0.010*"內容" + 0.010*"推定" + 0.009*"資訊" + 0.009*"聯絡" + 0.009*"單位" + 0.009*"文字" + 0.009*"工資" 2025-04-19 16:03:19,394 : INFO : topic diff=0.295443, rho=0.313805 2025-04-19 16:03:19,394 : INFO : PROGRESS: pass 1, at document #12000/16310 2025-04-19 16:03:19,599 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:19,601 : INFO : topic #4 (0.167): 0.037*"工作" + 0.017*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.011*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"內容" 2025-04-19 16:03:19,602 : INFO : topic #5 (0.167): 0.014*"公司" + 0.008*"工作" + 0.006*"面試" + 0.006*"問題" + 0.005*"工程師" + 0.005*"開發" + 0.004*"技術" + 0.004*"台灣" + 0.004*"目前" + 0.004*"比較" 2025-04-19 16:03:19,602 : INFO : topic #2 (0.167): 0.055*"工作" + 0.020*"方式" + 0.019*"時間" + 0.017*"小時" + 0.011*"內容" + 0.010*"每日" + 0.010*"工時" + 0.009*"面試" + 0.008*"休息" + 0.007*"聯絡" 2025-04-19 16:03:19,603 : INFO : topic #1 (0.167): 0.030*"工作" + 0.014*"方式" + 0.013*"砍除" + 0.012*"情形" + 0.012*"推定" + 0.011*"文字" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"單位" 2025-04-19 16:03:19,603 : INFO : topic #3 (0.167): 0.030*"報名" + 0.027*"活動" + 0.017*"電話" + 0.013*"台北市" + 0.013*"研究" + 0.012*"時間" + 0.012*"舉辦" + 0.012*"問卷" + 0.011*"人數" + 0.011*"資料" 2025-04-19 16:03:19,604 : INFO : topic diff=0.310465, rho=0.313805 2025-04-19 16:03:19,604 : INFO : PROGRESS: pass 1, at document #14000/16310 2025-04-19 16:03:19,799 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:19,802 : INFO : topic #0 (0.167): 0.028*"工作" + 0.013*"方式" + 0.011*"應徵" + 0.010*"內容" + 0.009*"推定" + 0.009*"文字" + 0.009*"單位" + 0.009*"資訊" + 0.009*"聯絡" + 0.008*"情形" 2025-04-19 16:03:19,802 : INFO : topic #4 (0.167): 0.036*"工作" + 0.017*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.011*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"內容" 2025-04-19 16:03:19,803 : INFO : topic #3 (0.167): 0.028*"報名" + 0.026*"活動" + 0.016*"電話" + 0.016*"研究" + 0.014*"問卷" + 0.012*"台北市" + 0.012*"時間" + 0.011*"舉辦" + 0.011*"人數" + 0.010*"參與" 2025-04-19 16:03:19,803 : INFO : topic #2 (0.167): 0.053*"工作" + 0.018*"方式" + 0.018*"時間" + 0.016*"小時" + 0.010*"內容" + 0.009*"工時" + 0.009*"每日" + 0.009*"面試" + 0.007*"地點" + 0.007*"單位" 2025-04-19 16:03:19,804 : INFO : topic #5 (0.167): 0.013*"公司" + 0.007*"工作" + 0.006*"台灣" + 0.005*"問題" + 0.004*"面試" + 0.004*"工程師" + 0.004*"技術" + 0.004*"目前" + 0.004*"員工" + 0.003*"美國" 2025-04-19 16:03:19,804 : INFO : topic diff=0.305308, rho=0.313805 2025-04-19 16:03:19,804 : INFO : PROGRESS: pass 1, at document #16000/16310 2025-04-19 16:03:19,985 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:19,988 : INFO : topic #0 (0.167): 0.027*"工作" + 0.012*"方式" + 0.011*"應徵" + 0.009*"內容" + 0.009*"推定" + 0.008*"資訊" + 0.008*"文字" + 0.008*"單位" + 0.008*"聯絡" + 0.008*"工資" 2025-04-19 16:03:19,989 : INFO : topic #4 (0.167): 0.036*"工作" + 0.017*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.011*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"內容" 2025-04-19 16:03:19,989 : INFO : topic #2 (0.167): 0.051*"工作" + 0.017*"時間" + 0.016*"方式" + 0.015*"小時" + 0.010*"內容" + 0.010*"工時" + 0.009*"面試" + 0.008*"地點" + 0.008*"每日" + 0.007*"單位" 2025-04-19 16:03:19,990 : INFO : topic #1 (0.167): 0.029*"工作" + 0.014*"方式" + 0.012*"砍除" + 0.012*"情形" + 0.012*"推定" + 0.011*"文字" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"單位" 2025-04-19 16:03:19,990 : INFO : topic #5 (0.167): 0.012*"公司" + 0.006*"台灣" + 0.006*"工作" + 0.005*"美國" + 0.004*"技術" + 0.004*"晶片" + 0.004*"問題" + 0.004*"工程師" + 0.004*"員工" + 0.004*"表示" 2025-04-19 16:03:19,991 : INFO : topic diff=0.280011, rho=0.313805 2025-04-19 16:03:20,054 : INFO : -8.469 per-word bound, 354.3 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 16:03:20,054 : INFO : PROGRESS: pass 1, at document #16310/16310 2025-04-19 16:03:20,097 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 16:03:20,100 : INFO : topic #2 (0.167): 0.050*"工作" + 0.016*"小時" + 0.016*"時間" + 0.014*"方式" + 0.011*"工時" + 0.009*"內容" + 0.007*"面試" + 0.007*"地點" + 0.007*"單位" + 0.006*"每日" 2025-04-19 16:03:20,101 : INFO : topic #4 (0.167): 0.036*"工作" + 0.016*"推定" + 0.014*"空白" + 0.011*"砍除" + 0.011*"方式" + 0.011*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"內容" 2025-04-19 16:03:20,101 : INFO : topic #5 (0.167): 0.012*"公司" + 0.007*"美國" + 0.006*"台灣" + 0.005*"工作" + 0.005*"技術" + 0.005*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.004*"台積電" 2025-04-19 16:03:20,102 : INFO : topic #0 (0.167): 0.026*"工作" + 0.012*"方式" + 0.010*"應徵" + 0.009*"單位" + 0.009*"內容" + 0.009*"推定" + 0.008*"國定假日" + 0.008*"聯絡" + 0.008*"資訊" + 0.008*"工資" 2025-04-19 16:03:20,102 : INFO : topic #1 (0.167): 0.029*"工作" + 0.014*"方式" + 0.012*"砍除" + 0.012*"情形" + 0.011*"推定" + 0.011*"文字" + 0.011*"第一項" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"單位" 2025-04-19 16:03:20,103 : INFO : topic diff=0.285298, rho=0.313805 2025-04-19 16:03:20,103 : INFO : PROGRESS: pass 2, at document #2000/16310 2025-04-19 16:03:20,738 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:20,741 : INFO : topic #3 (0.167): 0.028*"報名" + 0.025*"活動" + 0.018*"電話" + 0.014*"台北市" + 0.013*"舉辦" + 0.012*"研究" + 0.012*"時間" + 0.012*"參與" + 0.012*"人數" + 0.011*"車馬費" 2025-04-19 16:03:20,741 : INFO : topic #4 (0.167): 0.035*"工作" + 0.017*"推定" + 0.014*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"第一項" 2025-04-19 16:03:20,742 : INFO : topic #5 (0.167): 0.012*"公司" + 0.007*"台灣" + 0.006*"美國" + 0.005*"工作" + 0.005*"技術" + 0.005*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.004*"台積電" 2025-04-19 16:03:20,743 : INFO : topic #0 (0.167): 0.027*"工作" + 0.014*"方式" + 0.011*"推定" + 0.011*"工資" + 0.010*"依法" + 0.010*"應徵" + 0.010*"發薪日" + 0.010*"內容" + 0.010*"單位" + 0.009*"聯絡" 2025-04-19 16:03:20,743 : INFO : topic #1 (0.167): 0.030*"工作" + 0.014*"方式" + 0.012*"砍除" + 0.012*"情形" + 0.012*"推定" + 0.011*"第一項" + 0.011*"文字" + 0.011*"聯絡" + 0.011*"內容" + 0.010*"單位" 2025-04-19 16:03:20,744 : INFO : topic diff=0.967354, rho=0.299409 2025-04-19 16:03:20,744 : INFO : PROGRESS: pass 2, at document #4000/16310 2025-04-19 16:03:21,330 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:21,333 : INFO : topic #2 (0.167): 0.052*"工作" + 0.022*"方式" + 0.019*"時間" + 0.018*"小時" + 0.012*"每日" + 0.011*"內容" + 0.010*"休息" + 0.009*"面試" + 0.009*"工時" + 0.009*"工資" 2025-04-19 16:03:21,333 : INFO : topic #4 (0.167): 0.035*"工作" + 0.017*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.011*"單位" + 0.011*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"第一項" 2025-04-19 16:03:21,334 : INFO : topic #3 (0.167): 0.029*"報名" + 0.026*"活動" + 0.020*"電話" + 0.015*"台北市" + 0.013*"舉辦" + 0.013*"車馬費" + 0.013*"人數" + 0.012*"時間" + 0.011*"資料" + 0.011*"通知" 2025-04-19 16:03:21,335 : INFO : topic #1 (0.167): 0.031*"工作" + 0.013*"方式" + 0.012*"砍除" + 0.012*"情形" + 0.012*"第一項" + 0.011*"推定" + 0.011*"文字" + 0.011*"空白" + 0.011*"聯絡" + 0.010*"資訊" 2025-04-19 16:03:21,335 : INFO : topic #5 (0.167): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.005*"工作" + 0.005*"技術" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.004*"問題" 2025-04-19 16:03:21,335 : INFO : topic diff=0.401616, rho=0.299409 2025-04-19 16:03:21,336 : INFO : PROGRESS: pass 2, at document #6000/16310 2025-04-19 16:03:21,827 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:21,830 : INFO : topic #4 (0.167): 0.034*"工作" + 0.017*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"情形" + 0.011*"單位" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"第一項" 2025-04-19 16:03:21,831 : INFO : topic #1 (0.167): 0.031*"工作" + 0.013*"方式" + 0.012*"砍除" + 0.012*"情形" + 0.012*"第一項" + 0.011*"文字" + 0.011*"推定" + 0.011*"空白" + 0.011*"聯絡" + 0.010*"資訊" 2025-04-19 16:03:21,831 : INFO : topic #2 (0.167): 0.051*"工作" + 0.024*"方式" + 0.019*"時間" + 0.017*"小時" + 0.013*"每日" + 0.011*"內容" + 0.010*"休息" + 0.010*"依法" + 0.009*"工資" + 0.009*"面試" 2025-04-19 16:03:21,832 : INFO : topic #0 (0.167): 0.027*"工作" + 0.014*"方式" + 0.014*"推定" + 0.011*"工資" + 0.011*"依法" + 0.011*"未註明" + 0.011*"發薪日" + 0.010*"單位" + 0.010*"應徵" + 0.010*"內容" 2025-04-19 16:03:21,832 : INFO : topic #3 (0.167): 0.031*"報名" + 0.027*"活動" + 0.021*"電話" + 0.016*"台北市" + 0.013*"車馬費" + 0.013*"舉辦" + 0.013*"人數" + 0.012*"訪問" + 0.012*"資料" + 0.012*"時間" 2025-04-19 16:03:21,833 : INFO : topic diff=0.239542, rho=0.299409 2025-04-19 16:03:21,833 : INFO : PROGRESS: pass 2, at document #8000/16310 2025-04-19 16:03:22,084 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:22,087 : INFO : topic #4 (0.167): 0.034*"工作" + 0.017*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"情形" + 0.011*"單位" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"第一項" 2025-04-19 16:03:22,088 : INFO : topic #2 (0.167): 0.056*"工作" + 0.023*"方式" + 0.021*"時間" + 0.018*"小時" + 0.012*"內容" + 0.011*"每日" + 0.011*"面試" + 0.010*"工時" + 0.009*"休息" + 0.008*"聯絡" 2025-04-19 16:03:22,088 : INFO : topic #3 (0.167): 0.030*"報名" + 0.027*"活動" + 0.020*"電話" + 0.016*"台北市" + 0.013*"舉辦" + 0.013*"車馬費" + 0.012*"人數" + 0.012*"資料" + 0.012*"時間" + 0.011*"訪問" 2025-04-19 16:03:22,089 : INFO : topic #1 (0.167): 0.031*"工作" + 0.013*"方式" + 0.012*"砍除" + 0.012*"情形" + 0.012*"第一項" + 0.011*"文字" + 0.011*"推定" + 0.011*"空白" + 0.011*"聯絡" + 0.010*"資訊" 2025-04-19 16:03:22,089 : INFO : topic #0 (0.167): 0.027*"工作" + 0.014*"方式" + 0.014*"推定" + 0.011*"工資" + 0.011*"依法" + 0.011*"應徵" + 0.011*"未註明" + 0.010*"發薪日" + 0.010*"單位" + 0.010*"內容" 2025-04-19 16:03:22,090 : INFO : topic diff=0.310660, rho=0.299409 2025-04-19 16:03:22,090 : INFO : PROGRESS: pass 2, at document #10000/16310 2025-04-19 16:03:22,299 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:22,301 : INFO : topic #0 (0.167): 0.027*"工作" + 0.014*"方式" + 0.014*"推定" + 0.011*"依法" + 0.011*"工資" + 0.010*"應徵" + 0.010*"未註明" + 0.010*"單位" + 0.010*"發薪日" + 0.010*"內容" 2025-04-19 16:03:22,302 : INFO : topic #3 (0.167): 0.031*"報名" + 0.027*"活動" + 0.019*"電話" + 0.015*"台北市" + 0.012*"舉辦" + 0.012*"研究" + 0.012*"資料" + 0.012*"車馬費" + 0.012*"人數" + 0.012*"時間" 2025-04-19 16:03:22,302 : INFO : topic #4 (0.167): 0.034*"工作" + 0.017*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"情形" + 0.011*"單位" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"第一項" 2025-04-19 16:03:22,303 : INFO : topic #5 (0.167): 0.016*"公司" + 0.008*"工作" + 0.007*"面試" + 0.006*"問題" + 0.006*"工程師" + 0.005*"開發" + 0.005*"技術" + 0.005*"目前" + 0.004*"比較" + 0.004*"覺得" 2025-04-19 16:03:22,303 : INFO : topic #2 (0.167): 0.056*"工作" + 0.022*"方式" + 0.022*"時間" + 0.018*"小時" + 0.012*"內容" + 0.011*"每日" + 0.011*"工時" + 0.010*"面試" + 0.009*"經驗" + 0.008*"聯絡" 2025-04-19 16:03:22,304 : INFO : topic diff=0.276710, rho=0.299409 2025-04-19 16:03:22,304 : INFO : PROGRESS: pass 2, at document #12000/16310 2025-04-19 16:03:22,501 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:22,504 : INFO : topic #0 (0.167): 0.026*"工作" + 0.014*"方式" + 0.013*"推定" + 0.011*"依法" + 0.010*"工資" + 0.010*"應徵" + 0.010*"單位" + 0.010*"未註明" + 0.010*"發薪日" + 0.009*"內容" 2025-04-19 16:03:22,504 : INFO : topic #3 (0.167): 0.031*"報名" + 0.029*"活動" + 0.018*"電話" + 0.014*"研究" + 0.013*"台北市" + 0.013*"舉辦" + 0.012*"問卷" + 0.011*"資料" + 0.011*"人數" + 0.011*"時間" 2025-04-19 16:03:22,505 : INFO : topic #4 (0.167): 0.034*"工作" + 0.017*"推定" + 0.014*"空白" + 0.013*"砍除" + 0.012*"方式" + 0.011*"情形" + 0.011*"單位" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"第一項" 2025-04-19 16:03:22,505 : INFO : topic #2 (0.167): 0.056*"工作" + 0.022*"方式" + 0.021*"時間" + 0.018*"小時" + 0.011*"內容" + 0.011*"工時" + 0.010*"每日" + 0.010*"面試" + 0.009*"經驗" + 0.008*"地點" 2025-04-19 16:03:22,505 : INFO : topic #1 (0.167): 0.031*"工作" + 0.013*"方式" + 0.012*"砍除" + 0.012*"情形" + 0.012*"第一項" + 0.011*"文字" + 0.011*"推定" + 0.011*"空白" + 0.011*"聯絡" + 0.010*"資訊" 2025-04-19 16:03:22,506 : INFO : topic diff=0.284897, rho=0.299409 2025-04-19 16:03:22,506 : INFO : PROGRESS: pass 2, at document #14000/16310 2025-04-19 16:03:22,698 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:22,701 : INFO : topic #1 (0.167): 0.031*"工作" + 0.013*"方式" + 0.012*"砍除" + 0.012*"情形" + 0.012*"第一項" + 0.011*"文字" + 0.011*"推定" + 0.011*"空白" + 0.010*"聯絡" + 0.010*"資訊" 2025-04-19 16:03:22,701 : INFO : topic #5 (0.167): 0.013*"公司" + 0.006*"工作" + 0.006*"台灣" + 0.005*"問題" + 0.004*"面試" + 0.004*"工程師" + 0.004*"技術" + 0.004*"目前" + 0.004*"美國" + 0.004*"開發" 2025-04-19 16:03:22,702 : INFO : topic #2 (0.167): 0.054*"工作" + 0.020*"時間" + 0.020*"方式" + 0.017*"小時" + 0.011*"內容" + 0.011*"工時" + 0.009*"面試" + 0.009*"每日" + 0.009*"地點" + 0.008*"經驗" 2025-04-19 16:03:22,702 : INFO : topic #0 (0.167): 0.025*"工作" + 0.014*"方式" + 0.013*"推定" + 0.011*"工資" + 0.010*"依法" + 0.010*"單位" + 0.010*"應徵" + 0.010*"未註明" + 0.010*"發薪日" + 0.009*"內容" 2025-04-19 16:03:22,703 : INFO : topic #3 (0.167): 0.030*"報名" + 0.029*"活動" + 0.016*"研究" + 0.016*"電話" + 0.014*"問卷" + 0.012*"舉辦" + 0.012*"台北市" + 0.011*"人數" + 0.011*"參與" + 0.011*"參加" 2025-04-19 16:03:22,703 : INFO : topic diff=0.283423, rho=0.299409 2025-04-19 16:03:22,704 : INFO : PROGRESS: pass 2, at document #16000/16310 2025-04-19 16:03:22,880 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:22,883 : INFO : topic #1 (0.167): 0.031*"工作" + 0.013*"方式" + 0.012*"情形" + 0.012*"砍除" + 0.012*"第一項" + 0.011*"文字" + 0.011*"推定" + 0.011*"空白" + 0.010*"聯絡" + 0.010*"資訊" 2025-04-19 16:03:22,884 : INFO : topic #3 (0.167): 0.029*"報名" + 0.028*"活動" + 0.019*"研究" + 0.015*"電話" + 0.014*"問卷" + 0.012*"舉辦" + 0.011*"台北市" + 0.011*"參加" + 0.011*"人數" + 0.011*"參與" 2025-04-19 16:03:22,884 : INFO : topic #5 (0.167): 0.012*"公司" + 0.006*"台灣" + 0.005*"工作" + 0.005*"美國" + 0.004*"技術" + 0.004*"晶片" + 0.004*"問題" + 0.004*"工程師" + 0.004*"員工" + 0.004*"表示" 2025-04-19 16:03:22,885 : INFO : topic #4 (0.167): 0.034*"工作" + 0.017*"推定" + 0.013*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"單位" + 0.011*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"第一項" 2025-04-19 16:03:22,885 : INFO : topic #0 (0.167): 0.025*"工作" + 0.013*"方式" + 0.012*"推定" + 0.010*"工資" + 0.010*"依法" + 0.010*"單位" + 0.009*"應徵" + 0.009*"未註明" + 0.009*"發薪日" + 0.009*"內容" 2025-04-19 16:03:22,885 : INFO : topic diff=0.262100, rho=0.299409 2025-04-19 16:03:22,950 : INFO : -8.448 per-word bound, 349.2 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 16:03:22,950 : INFO : PROGRESS: pass 2, at document #16310/16310 2025-04-19 16:03:22,980 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 16:03:22,983 : INFO : topic #0 (0.167): 0.023*"工作" + 0.013*"方式" + 0.012*"推定" + 0.010*"單位" + 0.010*"工資" + 0.009*"依法" + 0.009*"應徵" + 0.009*"未註明" + 0.009*"發薪日" + 0.008*"內容" 2025-04-19 16:03:22,984 : INFO : topic #5 (0.167): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.005*"技術" + 0.005*"工作" + 0.005*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.004*"問題" 2025-04-19 16:03:22,984 : INFO : topic #3 (0.167): 0.027*"活動" + 0.027*"報名" + 0.022*"研究" + 0.019*"問卷" + 0.012*"電話" + 0.011*"時間" + 0.011*"舉辦" + 0.011*"參與" + 0.010*"人數" + 0.010*"台北市" 2025-04-19 16:03:22,985 : INFO : topic #4 (0.167): 0.033*"工作" + 0.016*"推定" + 0.013*"空白" + 0.012*"砍除" + 0.012*"方式" + 0.011*"單位" + 0.011*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"應徵" 2025-04-19 16:03:22,985 : INFO : topic #2 (0.167): 0.052*"工作" + 0.018*"時間" + 0.017*"小時" + 0.016*"方式" + 0.012*"工時" + 0.010*"內容" + 0.008*"地點" + 0.008*"面試" + 0.007*"經驗" + 0.007*"以上" 2025-04-19 16:03:22,986 : INFO : topic diff=0.270630, rho=0.299409 2025-04-19 16:03:22,986 : INFO : PROGRESS: pass 3, at document #2000/16310 2025-04-19 16:03:23,574 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:23,577 : INFO : topic #5 (0.167): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.005*"技術" + 0.005*"工作" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.004*"問題" 2025-04-19 16:03:23,578 : INFO : topic #2 (0.167): 0.053*"工作" + 0.020*"方式" + 0.020*"時間" + 0.018*"小時" + 0.011*"內容" + 0.011*"工時" + 0.010*"每日" + 0.009*"地點" + 0.009*"面試" + 0.008*"休息" 2025-04-19 16:03:23,579 : INFO : topic #0 (0.167): 0.025*"工作" + 0.017*"推定" + 0.017*"方式" + 0.015*"依法" + 0.015*"工資" + 0.014*"未註明" + 0.012*"發薪日" + 0.012*"單位" + 0.010*"應徵" + 0.009*"排班" 2025-04-19 16:03:23,579 : INFO : topic #3 (0.167): 0.029*"報名" + 0.027*"活動" + 0.018*"電話" + 0.014*"台北市" + 0.014*"舉辦" + 0.013*"研究" + 0.012*"參與" + 0.012*"人數" + 0.012*"車馬費" + 0.011*"時間" 2025-04-19 16:03:23,580 : INFO : topic #4 (0.167): 0.033*"工作" + 0.017*"推定" + 0.013*"空白" + 0.013*"方式" + 0.013*"砍除" + 0.012*"單位" + 0.011*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"應徵" 2025-04-19 16:03:23,580 : INFO : topic diff=0.904409, rho=0.286829 2025-04-19 16:03:23,580 : INFO : PROGRESS: pass 3, at document #4000/16310 2025-04-19 16:03:24,160 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:24,163 : INFO : topic #0 (0.167): 0.027*"工作" + 0.021*"推定" + 0.018*"方式" + 0.016*"未註明" + 0.016*"依法" + 0.016*"工資" + 0.014*"單位" + 0.013*"發薪日" + 0.010*"應徵" + 0.010*"排班" 2025-04-19 16:03:24,164 : INFO : topic #3 (0.167): 0.029*"報名" + 0.027*"活動" + 0.020*"電話" + 0.015*"台北市" + 0.013*"舉辦" + 0.013*"車馬費" + 0.013*"人數" + 0.012*"時間" + 0.011*"參與" + 0.011*"資料" 2025-04-19 16:03:24,164 : INFO : topic #2 (0.167): 0.054*"工作" + 0.022*"方式" + 0.021*"時間" + 0.019*"小時" + 0.011*"內容" + 0.011*"每日" + 0.010*"工時" + 0.009*"面試" + 0.009*"地點" + 0.009*"休息" 2025-04-19 16:03:24,165 : INFO : topic #1 (0.167): 0.031*"工作" + 0.012*"第一項" + 0.012*"砍除" + 0.012*"情形" + 0.012*"方式" + 0.012*"空白" + 0.011*"文字" + 0.011*"推定" + 0.010*"資訊" + 0.010*"聯絡" 2025-04-19 16:03:24,166 : INFO : topic #4 (0.167): 0.033*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"方式" + 0.013*"砍除" + 0.011*"單位" + 0.011*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"國定假日" 2025-04-19 16:03:24,166 : INFO : topic diff=0.383204, rho=0.286829 2025-04-19 16:03:24,166 : INFO : PROGRESS: pass 3, at document #6000/16310 2025-04-19 16:03:24,678 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:24,681 : INFO : topic #5 (0.167): 0.013*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.005*"工作" + 0.005*"技術" + 0.004*"問題" + 0.004*"工程師" + 0.004*"員工" + 0.004*"晶片" + 0.004*"科技" 2025-04-19 16:03:24,681 : INFO : topic #1 (0.167): 0.032*"工作" + 0.012*"第一項" + 0.012*"砍除" + 0.012*"情形" + 0.012*"方式" + 0.012*"空白" + 0.011*"文字" + 0.011*"推定" + 0.011*"資訊" + 0.010*"聯絡" 2025-04-19 16:03:24,682 : INFO : topic #2 (0.167): 0.054*"工作" + 0.023*"方式" + 0.021*"時間" + 0.018*"小時" + 0.012*"每日" + 0.011*"內容" + 0.010*"工時" + 0.009*"面試" + 0.009*"地點" + 0.009*"休息" 2025-04-19 16:03:24,682 : INFO : topic #4 (0.167): 0.033*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"單位" + 0.011*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"資訊" 2025-04-19 16:03:24,683 : INFO : topic #3 (0.167): 0.031*"報名" + 0.028*"活動" + 0.021*"電話" + 0.016*"台北市" + 0.014*"車馬費" + 0.013*"舉辦" + 0.013*"人數" + 0.012*"訪問" + 0.012*"資料" + 0.011*"時間" 2025-04-19 16:03:24,683 : INFO : topic diff=0.225247, rho=0.286829 2025-04-19 16:03:24,683 : INFO : PROGRESS: pass 3, at document #8000/16310 2025-04-19 16:03:24,923 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:24,926 : INFO : topic #2 (0.167): 0.056*"工作" + 0.022*"時間" + 0.021*"方式" + 0.018*"小時" + 0.012*"內容" + 0.011*"面試" + 0.010*"每日" + 0.010*"工時" + 0.010*"經驗" + 0.008*"地點" 2025-04-19 16:03:24,926 : INFO : topic #0 (0.167): 0.028*"工作" + 0.022*"推定" + 0.019*"方式" + 0.017*"未註明" + 0.016*"依法" + 0.016*"工資" + 0.015*"單位" + 0.014*"發薪日" + 0.010*"應徵" + 0.010*"內容" 2025-04-19 16:03:24,927 : INFO : topic #1 (0.167): 0.032*"工作" + 0.012*"第一項" + 0.012*"砍除" + 0.012*"情形" + 0.012*"方式" + 0.012*"空白" + 0.011*"文字" + 0.011*"推定" + 0.011*"資訊" + 0.010*"聯絡" 2025-04-19 16:03:24,927 : INFO : topic #4 (0.167): 0.033*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"單位" + 0.011*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"資訊" 2025-04-19 16:03:24,928 : INFO : topic #3 (0.167): 0.031*"報名" + 0.028*"活動" + 0.020*"電話" + 0.016*"台北市" + 0.013*"舉辦" + 0.013*"車馬費" + 0.012*"人數" + 0.012*"資料" + 0.012*"訪問" + 0.011*"時間" 2025-04-19 16:03:24,928 : INFO : topic diff=0.290583, rho=0.286829 2025-04-19 16:03:24,929 : INFO : PROGRESS: pass 3, at document #10000/16310 2025-04-19 16:03:25,144 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:25,147 : INFO : topic #1 (0.167): 0.031*"工作" + 0.012*"第一項" + 0.012*"砍除" + 0.012*"情形" + 0.012*"方式" + 0.012*"空白" + 0.011*"文字" + 0.011*"資訊" + 0.011*"推定" + 0.010*"聯絡" 2025-04-19 16:03:25,147 : INFO : topic #0 (0.167): 0.028*"工作" + 0.022*"推定" + 0.019*"方式" + 0.017*"未註明" + 0.016*"依法" + 0.016*"工資" + 0.015*"單位" + 0.014*"發薪日" + 0.010*"應徵" + 0.010*"內容" 2025-04-19 16:03:25,148 : INFO : topic #3 (0.167): 0.032*"報名" + 0.028*"活動" + 0.019*"電話" + 0.015*"台北市" + 0.013*"舉辦" + 0.013*"研究" + 0.012*"車馬費" + 0.012*"資料" + 0.012*"人數" + 0.011*"時間" 2025-04-19 16:03:25,148 : INFO : topic #5 (0.167): 0.016*"公司" + 0.007*"工作" + 0.007*"面試" + 0.007*"問題" + 0.006*"工程師" + 0.005*"開發" + 0.005*"技術" + 0.005*"目前" + 0.004*"比較" + 0.004*"台灣" 2025-04-19 16:03:25,149 : INFO : topic #2 (0.167): 0.056*"工作" + 0.022*"時間" + 0.020*"方式" + 0.017*"小時" + 0.012*"內容" + 0.011*"經驗" + 0.011*"工時" + 0.010*"面試" + 0.009*"每日" + 0.008*"地點" 2025-04-19 16:03:25,149 : INFO : topic diff=0.260057, rho=0.286829 2025-04-19 16:03:25,150 : INFO : PROGRESS: pass 3, at document #12000/16310 2025-04-19 16:03:25,349 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:25,352 : INFO : topic #3 (0.167): 0.032*"報名" + 0.030*"活動" + 0.018*"電話" + 0.015*"研究" + 0.013*"台北市" + 0.013*"舉辦" + 0.012*"問卷" + 0.012*"人數" + 0.012*"資料" + 0.012*"參加" 2025-04-19 16:03:25,352 : INFO : topic #0 (0.167): 0.028*"工作" + 0.021*"推定" + 0.019*"方式" + 0.017*"未註明" + 0.016*"依法" + 0.015*"工資" + 0.015*"單位" + 0.013*"發薪日" + 0.010*"應徵" + 0.010*"內容" 2025-04-19 16:03:25,353 : INFO : topic #2 (0.167): 0.055*"工作" + 0.022*"時間" + 0.019*"方式" + 0.017*"小時" + 0.012*"內容" + 0.011*"工時" + 0.011*"經驗" + 0.010*"面試" + 0.008*"每日" + 0.008*"地點" 2025-04-19 16:03:25,353 : INFO : topic #4 (0.167): 0.033*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"單位" + 0.011*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"資訊" 2025-04-19 16:03:25,354 : INFO : topic #1 (0.167): 0.031*"工作" + 0.012*"第一項" + 0.012*"砍除" + 0.012*"情形" + 0.012*"方式" + 0.012*"空白" + 0.011*"文字" + 0.011*"資訊" + 0.010*"推定" + 0.010*"聯絡" 2025-04-19 16:03:25,354 : INFO : topic diff=0.268807, rho=0.286829 2025-04-19 16:03:25,354 : INFO : PROGRESS: pass 3, at document #14000/16310 2025-04-19 16:03:25,545 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:25,548 : INFO : topic #2 (0.167): 0.054*"工作" + 0.021*"時間" + 0.018*"方式" + 0.016*"小時" + 0.011*"內容" + 0.011*"工時" + 0.010*"經驗" + 0.009*"面試" + 0.008*"地點" + 0.008*"每日" 2025-04-19 16:03:25,549 : INFO : topic #3 (0.167): 0.030*"報名" + 0.029*"活動" + 0.017*"研究" + 0.017*"電話" + 0.014*"問卷" + 0.013*"舉辦" + 0.012*"台北市" + 0.011*"人數" + 0.011*"參與" + 0.011*"參加" 2025-04-19 16:03:25,549 : INFO : topic #0 (0.167): 0.027*"工作" + 0.021*"推定" + 0.018*"方式" + 0.016*"未註明" + 0.016*"工資" + 0.016*"單位" + 0.015*"依法" + 0.013*"發薪日" + 0.009*"應徵" + 0.009*"內容" 2025-04-19 16:03:25,550 : INFO : topic #4 (0.167): 0.033*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"單位" + 0.011*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"資訊" 2025-04-19 16:03:25,550 : INFO : topic #5 (0.167): 0.013*"公司" + 0.006*"工作" + 0.006*"台灣" + 0.005*"問題" + 0.004*"工程師" + 0.004*"技術" + 0.004*"面試" + 0.004*"目前" + 0.004*"美國" + 0.004*"開發" 2025-04-19 16:03:25,550 : INFO : topic diff=0.269076, rho=0.286829 2025-04-19 16:03:25,551 : INFO : PROGRESS: pass 3, at document #16000/16310 2025-04-19 16:03:25,731 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:25,734 : INFO : topic #2 (0.167): 0.053*"工作" + 0.020*"時間" + 0.016*"方式" + 0.016*"小時" + 0.011*"內容" + 0.011*"工時" + 0.010*"經驗" + 0.009*"面試" + 0.009*"地點" + 0.007*"以上" 2025-04-19 16:03:25,734 : INFO : topic #3 (0.167): 0.030*"報名" + 0.029*"活動" + 0.019*"研究" + 0.015*"電話" + 0.014*"問卷" + 0.013*"舉辦" + 0.011*"參加" + 0.011*"台北市" + 0.011*"人數" + 0.011*"參與" 2025-04-19 16:03:25,735 : INFO : topic #0 (0.167): 0.026*"工作" + 0.020*"推定" + 0.018*"方式" + 0.016*"單位" + 0.016*"未註明" + 0.015*"工資" + 0.015*"依法" + 0.012*"發薪日" + 0.009*"內容" + 0.009*"應徵" 2025-04-19 16:03:25,735 : INFO : topic #5 (0.167): 0.012*"公司" + 0.006*"台灣" + 0.005*"工作" + 0.005*"美國" + 0.004*"技術" + 0.004*"問題" + 0.004*"晶片" + 0.004*"工程師" + 0.004*"員工" + 0.004*"表示" 2025-04-19 16:03:25,736 : INFO : topic #1 (0.167): 0.031*"工作" + 0.012*"第一項" + 0.012*"情形" + 0.012*"砍除" + 0.012*"方式" + 0.012*"空白" + 0.011*"文字" + 0.010*"資訊" + 0.010*"推定" + 0.010*"聯絡" 2025-04-19 16:03:25,736 : INFO : topic diff=0.248752, rho=0.286829 2025-04-19 16:03:25,800 : INFO : -8.440 per-word bound, 347.4 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 16:03:25,800 : INFO : PROGRESS: pass 3, at document #16310/16310 2025-04-19 16:03:25,830 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 16:03:25,833 : INFO : topic #0 (0.167): 0.025*"工作" + 0.019*"推定" + 0.017*"方式" + 0.015*"單位" + 0.015*"未註明" + 0.015*"工資" + 0.014*"依法" + 0.012*"發薪日" + 0.009*"內容" + 0.009*"通知" 2025-04-19 16:03:25,834 : INFO : topic #4 (0.167): 0.033*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"方式" + 0.013*"砍除" + 0.011*"內容" + 0.011*"單位" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"資訊" 2025-04-19 16:03:25,834 : INFO : topic #2 (0.167): 0.052*"工作" + 0.019*"時間" + 0.017*"小時" + 0.015*"方式" + 0.012*"工時" + 0.011*"內容" + 0.009*"經驗" + 0.008*"面試" + 0.008*"地點" + 0.007*"以上" 2025-04-19 16:03:25,835 : INFO : topic #1 (0.167): 0.031*"工作" + 0.012*"第一項" + 0.012*"情形" + 0.012*"砍除" + 0.012*"方式" + 0.011*"空白" + 0.011*"文字" + 0.010*"資訊" + 0.010*"推定" + 0.010*"聯絡" 2025-04-19 16:03:25,835 : INFO : topic #3 (0.167): 0.028*"活動" + 0.027*"報名" + 0.022*"研究" + 0.019*"問卷" + 0.013*"電話" + 0.012*"舉辦" + 0.011*"參與" + 0.011*"時間" + 0.010*"人數" + 0.010*"台北市" 2025-04-19 16:03:25,835 : INFO : topic diff=0.259452, rho=0.286829 2025-04-19 16:03:25,836 : INFO : PROGRESS: pass 4, at document #2000/16310 2025-04-19 16:03:26,407 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:26,409 : INFO : topic #4 (0.167): 0.033*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"方式" + 0.013*"砍除" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"單位" + 0.011*"應徵" 2025-04-19 16:03:26,410 : INFO : topic #1 (0.167): 0.031*"工作" + 0.012*"第一項" + 0.012*"砍除" + 0.012*"情形" + 0.012*"空白" + 0.012*"方式" + 0.011*"文字" + 0.011*"推定" + 0.010*"資訊" + 0.010*"聯絡" 2025-04-19 16:03:26,410 : INFO : topic #3 (0.167): 0.029*"報名" + 0.027*"活動" + 0.018*"電話" + 0.014*"台北市" + 0.014*"舉辦" + 0.013*"研究" + 0.013*"參與" + 0.012*"人數" + 0.012*"車馬費" + 0.011*"時間" 2025-04-19 16:03:26,411 : INFO : topic #5 (0.167): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.005*"技術" + 0.005*"工作" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" + 0.004*"問題" + 0.004*"表示" 2025-04-19 16:03:26,411 : INFO : topic #0 (0.167): 0.030*"工作" + 0.023*"方式" + 0.023*"推定" + 0.019*"工資" + 0.018*"依法" + 0.018*"未註明" + 0.017*"單位" + 0.014*"發薪日" + 0.012*"休息" + 0.011*"每日" 2025-04-19 16:03:26,412 : INFO : topic diff=0.857019, rho=0.275711 2025-04-19 16:03:26,412 : INFO : PROGRESS: pass 4, at document #4000/16310 2025-04-19 16:03:26,977 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:26,981 : INFO : topic #4 (0.167): 0.033*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"情形" + 0.011*"單位" + 0.011*"資訊" 2025-04-19 16:03:26,982 : INFO : topic #3 (0.167): 0.030*"報名" + 0.028*"活動" + 0.020*"電話" + 0.015*"台北市" + 0.013*"舉辦" + 0.013*"車馬費" + 0.013*"人數" + 0.012*"參與" + 0.011*"時間" + 0.011*"資料" 2025-04-19 16:03:26,982 : INFO : topic #0 (0.167): 0.033*"工作" + 0.024*"方式" + 0.023*"推定" + 0.019*"工資" + 0.018*"未註明" + 0.018*"依法" + 0.018*"單位" + 0.015*"發薪日" + 0.012*"休息" + 0.012*"每日" 2025-04-19 16:03:26,983 : INFO : topic #1 (0.167): 0.031*"工作" + 0.013*"第一項" + 0.012*"砍除" + 0.012*"情形" + 0.012*"空白" + 0.011*"方式" + 0.011*"文字" + 0.010*"資訊" + 0.010*"推定" + 0.010*"聯絡" 2025-04-19 16:03:26,983 : INFO : topic #2 (0.167): 0.054*"工作" + 0.022*"時間" + 0.018*"小時" + 0.018*"方式" + 0.011*"內容" + 0.011*"工時" + 0.009*"面試" + 0.009*"地點" + 0.008*"每日" + 0.008*"經驗" 2025-04-19 16:03:26,983 : INFO : topic diff=0.351575, rho=0.275711 2025-04-19 16:03:26,984 : INFO : PROGRESS: pass 4, at document #6000/16310 2025-04-19 16:03:27,483 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:27,486 : INFO : topic #2 (0.167): 0.053*"工作" + 0.022*"時間" + 0.018*"方式" + 0.017*"小時" + 0.011*"內容" + 0.010*"工時" + 0.009*"面試" + 0.009*"地點" + 0.009*"經驗" + 0.008*"每日" 2025-04-19 16:03:27,487 : INFO : topic #3 (0.167): 0.031*"報名" + 0.028*"活動" + 0.021*"電話" + 0.016*"台北市" + 0.014*"車馬費" + 0.013*"舉辦" + 0.013*"人數" + 0.012*"訪問" + 0.012*"資料" + 0.011*"通知" 2025-04-19 16:03:27,487 : INFO : topic #1 (0.167): 0.031*"工作" + 0.013*"第一項" + 0.012*"砍除" + 0.012*"情形" + 0.012*"空白" + 0.011*"方式" + 0.011*"文字" + 0.011*"資訊" + 0.010*"聯絡" + 0.010*"推定" 2025-04-19 16:03:27,488 : INFO : topic #0 (0.167): 0.034*"工作" + 0.025*"方式" + 0.023*"推定" + 0.019*"工資" + 0.019*"依法" + 0.018*"未註明" + 0.017*"單位" + 0.015*"發薪日" + 0.013*"休息" + 0.013*"每日" 2025-04-19 16:03:27,489 : INFO : topic #5 (0.167): 0.013*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.005*"技術" + 0.005*"工作" + 0.004*"問題" + 0.004*"工程師" + 0.004*"晶片" + 0.004*"員工" + 0.004*"科技" 2025-04-19 16:03:27,489 : INFO : topic diff=0.209608, rho=0.275711 2025-04-19 16:03:27,489 : INFO : PROGRESS: pass 4, at document #8000/16310 2025-04-19 16:03:27,731 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:27,733 : INFO : topic #2 (0.167): 0.054*"工作" + 0.022*"時間" + 0.017*"方式" + 0.016*"小時" + 0.012*"經驗" + 0.012*"內容" + 0.011*"面試" + 0.010*"工時" + 0.007*"地點" + 0.007*"每日" 2025-04-19 16:03:27,734 : INFO : topic #1 (0.167): 0.031*"工作" + 0.013*"第一項" + 0.012*"砍除" + 0.012*"情形" + 0.012*"空白" + 0.011*"方式" + 0.011*"文字" + 0.011*"資訊" + 0.010*"聯絡" + 0.010*"推定" 2025-04-19 16:03:27,734 : INFO : topic #3 (0.167): 0.031*"報名" + 0.028*"活動" + 0.020*"電話" + 0.016*"台北市" + 0.013*"舉辦" + 0.013*"車馬費" + 0.013*"人數" + 0.012*"資料" + 0.012*"訪問" + 0.011*"參與" 2025-04-19 16:03:27,735 : INFO : topic #4 (0.167): 0.033*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"單位" + 0.011*"資訊" 2025-04-19 16:03:27,735 : INFO : topic #5 (0.167): 0.016*"公司" + 0.006*"工作" + 0.006*"問題" + 0.006*"工程師" + 0.005*"面試" + 0.005*"技術" + 0.005*"台灣" + 0.005*"開發" + 0.004*"目前" + 0.004*"產品" 2025-04-19 16:03:27,736 : INFO : topic diff=0.287010, rho=0.275711 2025-04-19 16:03:27,736 : INFO : PROGRESS: pass 4, at document #10000/16310 2025-04-19 16:03:27,949 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:27,952 : INFO : topic #1 (0.167): 0.031*"工作" + 0.013*"第一項" + 0.012*"砍除" + 0.012*"情形" + 0.012*"空白" + 0.011*"方式" + 0.011*"文字" + 0.011*"資訊" + 0.010*"聯絡" + 0.010*"推定" 2025-04-19 16:03:27,952 : INFO : topic #3 (0.167): 0.032*"報名" + 0.028*"活動" + 0.019*"電話" + 0.015*"台北市" + 0.013*"舉辦" + 0.013*"研究" + 0.012*"車馬費" + 0.012*"人數" + 0.012*"資料" + 0.011*"參加" 2025-04-19 16:03:27,953 : INFO : topic #2 (0.167): 0.054*"工作" + 0.022*"時間" + 0.016*"方式" + 0.015*"小時" + 0.013*"經驗" + 0.012*"內容" + 0.011*"面試" + 0.010*"工時" + 0.008*"職缺" + 0.007*"薪資" 2025-04-19 16:03:27,953 : INFO : topic #5 (0.167): 0.016*"公司" + 0.007*"工作" + 0.007*"問題" + 0.006*"面試" + 0.006*"工程師" + 0.005*"開發" + 0.005*"技術" + 0.005*"目前" + 0.004*"比較" + 0.004*"台灣" 2025-04-19 16:03:27,954 : INFO : topic #4 (0.167): 0.033*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"單位" + 0.011*"資訊" 2025-04-19 16:03:27,954 : INFO : topic diff=0.244169, rho=0.275711 2025-04-19 16:03:27,955 : INFO : PROGRESS: pass 4, at document #12000/16310 2025-04-19 16:03:28,159 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:28,161 : INFO : topic #5 (0.167): 0.014*"公司" + 0.006*"工作" + 0.006*"問題" + 0.006*"面試" + 0.005*"工程師" + 0.005*"技術" + 0.005*"台灣" + 0.005*"開發" + 0.004*"目前" + 0.004*"比較" 2025-04-19 16:03:28,162 : INFO : topic #3 (0.167): 0.032*"報名" + 0.030*"活動" + 0.018*"電話" + 0.015*"研究" + 0.014*"台北市" + 0.013*"舉辦" + 0.012*"人數" + 0.012*"問卷" + 0.012*"參加" + 0.012*"資料" 2025-04-19 16:03:28,162 : INFO : topic #4 (0.167): 0.033*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"單位" + 0.011*"資訊" 2025-04-19 16:03:28,163 : INFO : topic #2 (0.167): 0.053*"工作" + 0.021*"時間" + 0.015*"方式" + 0.015*"小時" + 0.013*"經驗" + 0.011*"內容" + 0.011*"面試" + 0.010*"工時" + 0.008*"職缺" + 0.008*"薪資" 2025-04-19 16:03:28,163 : INFO : topic #0 (0.167): 0.033*"工作" + 0.025*"方式" + 0.022*"推定" + 0.018*"工資" + 0.018*"依法" + 0.018*"單位" + 0.017*"未註明" + 0.014*"發薪日" + 0.013*"休息" + 0.012*"每日" 2025-04-19 16:03:28,164 : INFO : topic diff=0.254106, rho=0.275711 2025-04-19 16:03:28,164 : INFO : PROGRESS: pass 4, at document #14000/16310 2025-04-19 16:03:28,358 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:28,361 : INFO : topic #5 (0.167): 0.013*"公司" + 0.006*"台灣" + 0.005*"工作" + 0.005*"問題" + 0.004*"工程師" + 0.004*"技術" + 0.004*"面試" + 0.004*"目前" + 0.004*"美國" + 0.004*"開發" 2025-04-19 16:03:28,362 : INFO : topic #3 (0.167): 0.030*"報名" + 0.030*"活動" + 0.017*"電話" + 0.017*"研究" + 0.014*"問卷" + 0.013*"舉辦" + 0.012*"台北市" + 0.012*"人數" + 0.011*"參與" + 0.011*"參加" 2025-04-19 16:03:28,362 : INFO : topic #4 (0.167): 0.033*"工作" + 0.016*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"情形" + 0.011*"單位" + 0.011*"資訊" 2025-04-19 16:03:28,363 : INFO : topic #1 (0.167): 0.031*"工作" + 0.012*"第一項" + 0.012*"砍除" + 0.012*"情形" + 0.012*"空白" + 0.011*"文字" + 0.011*"方式" + 0.011*"資訊" + 0.010*"聯絡" + 0.010*"推定" 2025-04-19 16:03:28,363 : INFO : topic #2 (0.167): 0.052*"工作" + 0.020*"時間" + 0.014*"小時" + 0.014*"方式" + 0.012*"經驗" + 0.011*"內容" + 0.010*"面試" + 0.010*"工時" + 0.008*"薪資" + 0.008*"職缺" 2025-04-19 16:03:28,363 : INFO : topic diff=0.255894, rho=0.275711 2025-04-19 16:03:28,364 : INFO : PROGRESS: pass 4, at document #16000/16310 2025-04-19 16:03:28,544 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:28,546 : INFO : topic #5 (0.167): 0.012*"公司" + 0.006*"台灣" + 0.005*"美國" + 0.005*"工作" + 0.004*"技術" + 0.004*"問題" + 0.004*"晶片" + 0.004*"工程師" + 0.004*"員工" + 0.004*"表示" 2025-04-19 16:03:28,547 : INFO : topic #4 (0.167): 0.032*"工作" + 0.015*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"情形" + 0.011*"內容" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"資訊" 2025-04-19 16:03:28,548 : INFO : topic #2 (0.167): 0.051*"工作" + 0.019*"時間" + 0.014*"小時" + 0.013*"方式" + 0.011*"經驗" + 0.011*"內容" + 0.010*"面試" + 0.010*"工時" + 0.008*"薪資" + 0.008*"地點" 2025-04-19 16:03:28,548 : INFO : topic #0 (0.167): 0.032*"工作" + 0.024*"方式" + 0.021*"推定" + 0.019*"工資" + 0.018*"單位" + 0.018*"依法" + 0.017*"未註明" + 0.014*"發薪日" + 0.012*"休息" + 0.012*"每日" 2025-04-19 16:03:28,549 : INFO : topic #3 (0.167): 0.030*"報名" + 0.030*"活動" + 0.019*"研究" + 0.015*"電話" + 0.014*"問卷" + 0.013*"舉辦" + 0.011*"參加" + 0.011*"台北市" + 0.011*"人數" + 0.011*"參與" 2025-04-19 16:03:28,549 : INFO : topic diff=0.237228, rho=0.275711 2025-04-19 16:03:28,612 : INFO : -8.434 per-word bound, 345.8 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 16:03:28,613 : INFO : PROGRESS: pass 4, at document #16310/16310 2025-04-19 16:03:28,643 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 16:03:28,645 : INFO : topic #4 (0.167): 0.032*"工作" + 0.015*"推定" + 0.013*"空白" + 0.013*"砍除" + 0.013*"方式" + 0.011*"內容" + 0.011*"情形" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"資訊" 2025-04-19 16:03:28,646 : INFO : topic #1 (0.167): 0.031*"工作" + 0.012*"第一項" + 0.012*"砍除" + 0.012*"情形" + 0.012*"空白" + 0.011*"文字" + 0.011*"方式" + 0.011*"資訊" + 0.010*"聯絡" + 0.010*"推定" 2025-04-19 16:03:28,646 : INFO : topic #0 (0.167): 0.031*"工作" + 0.023*"方式" + 0.020*"推定" + 0.018*"工資" + 0.018*"單位" + 0.017*"依法" + 0.016*"未註明" + 0.013*"發薪日" + 0.012*"每日" + 0.012*"休息" 2025-04-19 16:03:28,647 : INFO : topic #2 (0.167): 0.051*"工作" + 0.018*"時間" + 0.015*"小時" + 0.012*"方式" + 0.011*"工時" + 0.011*"經驗" + 0.011*"內容" + 0.009*"面試" + 0.008*"薪資" + 0.007*"地點" 2025-04-19 16:03:28,647 : INFO : topic #3 (0.167): 0.029*"活動" + 0.028*"報名" + 0.022*"研究" + 0.018*"問卷" + 0.013*"電話" + 0.012*"舉辦" + 0.011*"參與" + 0.011*"台北市" + 0.011*"人數" + 0.011*"時間" 2025-04-19 16:03:28,648 : INFO : topic diff=0.248883, rho=0.275711 2025-04-19 16:03:28,648 : INFO : LdaModel lifecycle event {'msg': 'trained LdaModel<num_terms=23261, num_topics=6, decay=0.5, chunksize=2000> in 14.71s', 'datetime': '2025-04-19T16:03:28.648362', 'gensim': '4.3.3', 'python': '3.11.2 (main, Apr 21 2023, 22:51:21) [Clang 14.0.3 (clang-1403.0.22.14.1)]', 'platform': 'macOS-15.3.2-arm64-arm-64bit', 'event': 'created'}
pyLDAvis.enable_notebook()
p = pyLDAvis.gensim_models.prepare(best_model, corpus, dictionary)
p
可以看(2,3,5) 很接近,試試看跑四個主題
model_5 = LdaModel(
corpus = corpus,
num_topics = 4,
id2word=dictionary,
random_state = 1500,
passes = 4 # 訓練次數
)
pyLDAvis.enable_notebook()
p = pyLDAvis.gensim_models.prepare(model_5, corpus, dictionary)
p
2025-04-19 16:03:44,013 : INFO : using symmetric alpha at 0.25 2025-04-19 16:03:44,014 : INFO : using symmetric eta at 0.25 2025-04-19 16:03:44,038 : INFO : using serial LDA version on this node 2025-04-19 16:03:44,066 : INFO : running online (multi-pass) LDA training, 4 topics, 4 passes over the supplied corpus of 16310 documents, updating model once every 2000 documents, evaluating perplexity every 16310 documents, iterating 50x with a convergence threshold of 0.001000 2025-04-19 16:03:44,066 : INFO : PROGRESS: pass 0, at document #2000/16310 2025-04-19 16:03:44,648 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:44,650 : INFO : topic #0 (0.250): 0.029*"工作" + 0.014*"方式" + 0.013*"推定" + 0.012*"應徵" + 0.012*"空白" + 0.011*"單位" + 0.011*"砍除" + 0.010*"內容" + 0.010*"聯絡" + 0.010*"資訊" 2025-04-19 16:03:44,651 : INFO : topic #1 (0.250): 0.030*"工作" + 0.015*"方式" + 0.013*"推定" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"國定假日" + 0.010*"第一項" + 0.010*"空白" + 0.010*"情形" + 0.010*"砍除" 2025-04-19 16:03:44,651 : INFO : topic #2 (0.250): 0.040*"工作" + 0.013*"內容" + 0.013*"推定" + 0.012*"工資" + 0.011*"應徵" + 0.011*"方式" + 0.010*"聯絡" + 0.010*"情形" + 0.010*"砍除" + 0.010*"小時" 2025-04-19 16:03:44,652 : INFO : topic #3 (0.250): 0.020*"工作" + 0.012*"方式" + 0.011*"砍除" + 0.010*"聯絡人" + 0.010*"推定" + 0.010*"應徵" + 0.009*"空白" + 0.009*"文字" + 0.008*"資訊" + 0.008*"情形" 2025-04-19 16:03:44,652 : INFO : topic diff=5.686805, rho=1.000000 2025-04-19 16:03:44,652 : INFO : PROGRESS: pass 0, at document #4000/16310 2025-04-19 16:03:45,191 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:45,194 : INFO : topic #0 (0.250): 0.030*"工作" + 0.013*"空白" + 0.013*"方式" + 0.013*"應徵" + 0.012*"推定" + 0.011*"砍除" + 0.011*"單位" + 0.010*"內容" + 0.010*"資訊" + 0.010*"第一項" 2025-04-19 16:03:45,194 : INFO : topic #1 (0.250): 0.030*"工作" + 0.014*"方式" + 0.013*"推定" + 0.012*"第一項" + 0.012*"空白" + 0.012*"砍除" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"國定假日" + 0.011*"單位" 2025-04-19 16:03:45,195 : INFO : topic #2 (0.250): 0.042*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"工資" + 0.012*"內容" + 0.011*"小時" + 0.011*"應徵" + 0.010*"單位" + 0.010*"聯絡" + 0.010*"情形" 2025-04-19 16:03:45,195 : INFO : topic #3 (0.250): 0.014*"報名" + 0.012*"活動" + 0.012*"電話" + 0.011*"工作" + 0.011*"方式" + 0.009*"時間" + 0.009*"台北市" + 0.009*"聯絡" + 0.009*"內容" + 0.008*"聯絡人" 2025-04-19 16:03:45,196 : INFO : topic diff=0.569569, rho=0.707107 2025-04-19 16:03:45,196 : INFO : PROGRESS: pass 0, at document #6000/16310 2025-04-19 16:03:45,667 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:45,670 : INFO : topic #0 (0.250): 0.030*"工作" + 0.012*"應徵" + 0.012*"空白" + 0.012*"方式" + 0.011*"推定" + 0.011*"砍除" + 0.010*"第一項" + 0.010*"內容" + 0.010*"資訊" + 0.010*"單位" 2025-04-19 16:03:45,670 : INFO : topic #1 (0.250): 0.031*"工作" + 0.013*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"第一項" + 0.012*"情形" + 0.011*"聯絡" + 0.011*"國定假日" + 0.011*"資訊" 2025-04-19 16:03:45,671 : INFO : topic #2 (0.250): 0.042*"工作" + 0.015*"方式" + 0.013*"推定" + 0.012*"內容" + 0.011*"工資" + 0.011*"小時" + 0.010*"依法" + 0.010*"單位" + 0.010*"應徵" + 0.009*"聯絡" 2025-04-19 16:03:45,671 : INFO : topic #3 (0.250): 0.017*"報名" + 0.014*"活動" + 0.013*"電話" + 0.010*"台北市" + 0.010*"時間" + 0.009*"方式" + 0.008*"聯絡" + 0.008*"內容" + 0.008*"資料" + 0.008*"人數" 2025-04-19 16:03:45,672 : INFO : topic diff=0.765869, rho=0.577350 2025-04-19 16:03:45,672 : INFO : PROGRESS: pass 0, at document #8000/16310 2025-04-19 16:03:45,972 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:45,974 : INFO : topic #0 (0.250): 0.029*"工作" + 0.012*"應徵" + 0.012*"空白" + 0.012*"方式" + 0.011*"推定" + 0.011*"砍除" + 0.010*"第一項" + 0.010*"內容" + 0.010*"資訊" + 0.010*"單位" 2025-04-19 16:03:45,975 : INFO : topic #1 (0.250): 0.031*"工作" + 0.013*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"第一項" + 0.012*"情形" + 0.011*"聯絡" + 0.011*"國定假日" + 0.011*"資訊" 2025-04-19 16:03:45,975 : INFO : topic #2 (0.250): 0.041*"工作" + 0.011*"方式" + 0.010*"面試" + 0.009*"內容" + 0.009*"小時" + 0.008*"推定" + 0.008*"時間" + 0.007*"工資" + 0.007*"公司" + 0.007*"應徵" 2025-04-19 16:03:45,976 : INFO : topic #3 (0.250): 0.016*"公司" + 0.008*"時間" + 0.007*"問題" + 0.006*"工作" + 0.006*"工程師" + 0.006*"目前" + 0.005*"產品" + 0.005*"資料" + 0.005*"使用" + 0.005*"報名" 2025-04-19 16:03:45,976 : INFO : topic diff=1.008501, rho=0.500000 2025-04-19 16:03:45,977 : INFO : PROGRESS: pass 0, at document #10000/16310 2025-04-19 16:03:46,296 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:46,299 : INFO : topic #0 (0.250): 0.029*"工作" + 0.012*"應徵" + 0.012*"空白" + 0.011*"方式" + 0.011*"推定" + 0.011*"砍除" + 0.010*"單位" + 0.009*"第一項" + 0.009*"內容" + 0.009*"資訊" 2025-04-19 16:03:46,299 : INFO : topic #1 (0.250): 0.031*"工作" + 0.013*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"第一項" + 0.012*"情形" + 0.011*"聯絡" + 0.011*"國定假日" + 0.011*"資訊" 2025-04-19 16:03:46,300 : INFO : topic #2 (0.250): 0.044*"工作" + 0.011*"面試" + 0.010*"方式" + 0.009*"小時" + 0.008*"內容" + 0.008*"時間" + 0.007*"公司" + 0.007*"推定" + 0.006*"應徵" + 0.006*"工資" 2025-04-19 16:03:46,301 : INFO : topic #3 (0.250): 0.016*"公司" + 0.007*"問題" + 0.007*"工作" + 0.007*"時間" + 0.006*"面試" + 0.006*"目前" + 0.006*"工程師" + 0.005*"開發" + 0.005*"經驗" + 0.005*"技術" 2025-04-19 16:03:46,301 : INFO : topic diff=0.542565, rho=0.447214 2025-04-19 16:03:46,301 : INFO : PROGRESS: pass 0, at document #12000/16310 2025-04-19 16:03:46,591 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:46,594 : INFO : topic #0 (0.250): 0.028*"工作" + 0.011*"應徵" + 0.011*"空白" + 0.011*"方式" + 0.010*"推定" + 0.010*"砍除" + 0.009*"單位" + 0.009*"內容" + 0.009*"資訊" + 0.009*"第一項" 2025-04-19 16:03:46,594 : INFO : topic #1 (0.250): 0.031*"工作" + 0.013*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"第一項" + 0.012*"情形" + 0.011*"聯絡" + 0.011*"國定假日" + 0.010*"資訊" 2025-04-19 16:03:46,595 : INFO : topic #2 (0.250): 0.044*"工作" + 0.010*"面試" + 0.010*"方式" + 0.009*"小時" + 0.008*"內容" + 0.008*"時間" + 0.008*"公司" + 0.006*"單位" + 0.006*"推定" + 0.006*"應徵" 2025-04-19 16:03:46,595 : INFO : topic #3 (0.250): 0.014*"公司" + 0.007*"工作" + 0.006*"問題" + 0.005*"面試" + 0.005*"時間" + 0.005*"工程師" + 0.005*"目前" + 0.004*"開發" + 0.004*"技術" + 0.004*"台灣" 2025-04-19 16:03:46,596 : INFO : topic diff=0.532430, rho=0.408248 2025-04-19 16:03:46,596 : INFO : PROGRESS: pass 0, at document #14000/16310 2025-04-19 16:03:46,835 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:46,838 : INFO : topic #0 (0.250): 0.027*"工作" + 0.011*"應徵" + 0.011*"方式" + 0.011*"空白" + 0.010*"推定" + 0.010*"砍除" + 0.009*"單位" + 0.009*"資訊" + 0.009*"內容" + 0.009*"第一項" 2025-04-19 16:03:46,838 : INFO : topic #1 (0.250): 0.031*"工作" + 0.013*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"國定假日" + 0.010*"資訊" 2025-04-19 16:03:46,839 : INFO : topic #2 (0.250): 0.045*"工作" + 0.010*"面試" + 0.009*"方式" + 0.008*"小時" + 0.008*"內容" + 0.008*"時間" + 0.007*"公司" + 0.006*"單位" + 0.006*"工時" + 0.005*"覺得" 2025-04-19 16:03:46,839 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"台灣" + 0.006*"工作" + 0.005*"問題" + 0.004*"技術" + 0.004*"工程師" + 0.004*"目前" + 0.004*"時間" + 0.004*"面試" + 0.003*"員工" 2025-04-19 16:03:46,840 : INFO : topic diff=0.418933, rho=0.377964 2025-04-19 16:03:46,840 : INFO : PROGRESS: pass 0, at document #16000/16310 2025-04-19 16:03:47,067 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:47,069 : INFO : topic #0 (0.250): 0.025*"工作" + 0.010*"方式" + 0.010*"應徵" + 0.010*"空白" + 0.009*"推定" + 0.009*"砍除" + 0.009*"單位" + 0.009*"資訊" + 0.008*"內容" + 0.008*"第一項" 2025-04-19 16:03:47,070 : INFO : topic #1 (0.250): 0.030*"工作" + 0.013*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"國定假日" + 0.010*"資訊" 2025-04-19 16:03:47,070 : INFO : topic #2 (0.250): 0.045*"工作" + 0.011*"面試" + 0.009*"方式" + 0.008*"內容" + 0.008*"小時" + 0.008*"公司" + 0.008*"時間" + 0.006*"工時" + 0.006*"單位" + 0.005*"覺得" 2025-04-19 16:03:47,071 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"台灣" + 0.005*"美國" + 0.005*"工作" + 0.004*"晶片" + 0.004*"技術" + 0.004*"問題" + 0.004*"員工" + 0.004*"表示" + 0.004*"工程師" 2025-04-19 16:03:47,071 : INFO : topic diff=0.321705, rho=0.353553 2025-04-19 16:03:47,143 : INFO : -8.523 per-word bound, 367.9 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 16:03:47,143 : INFO : PROGRESS: pass 0, at document #16310/16310 2025-04-19 16:03:47,180 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 16:03:47,183 : INFO : topic #0 (0.250): 0.023*"工作" + 0.010*"應徵" + 0.010*"方式" + 0.009*"空白" + 0.009*"單位" + 0.008*"推定" + 0.008*"砍除" + 0.008*"資訊" + 0.008*"內容" + 0.008*"分類" 2025-04-19 16:03:47,183 : INFO : topic #1 (0.250): 0.030*"工作" + 0.013*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.011*"第一項" + 0.011*"情形" + 0.011*"聯絡" + 0.010*"國定假日" + 0.010*"資訊" 2025-04-19 16:03:47,184 : INFO : topic #2 (0.250): 0.044*"工作" + 0.010*"面試" + 0.009*"小時" + 0.008*"公司" + 0.008*"內容" + 0.008*"方式" + 0.008*"時間" + 0.007*"工時" + 0.006*"覺得" + 0.006*"單位" 2025-04-19 16:03:47,184 : INFO : topic #3 (0.250): 0.012*"公司" + 0.007*"美國" + 0.007*"台灣" + 0.005*"技術" + 0.005*"晶片" + 0.004*"工作" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.004*"台積電" 2025-04-19 16:03:47,185 : INFO : topic diff=0.324760, rho=0.333333 2025-04-19 16:03:47,185 : INFO : PROGRESS: pass 1, at document #2000/16310 2025-04-19 16:03:47,671 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:47,673 : INFO : topic #0 (0.250): 0.022*"工作" + 0.014*"方式" + 0.012*"台北市" + 0.011*"內容" + 0.011*"聯絡" + 0.009*"應徵" + 0.009*"通知" + 0.009*"電話" + 0.009*"地點" + 0.008*"工資" 2025-04-19 16:03:47,674 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.011*"情形" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"第一項" + 0.011*"單位" 2025-04-19 16:03:47,674 : INFO : topic #2 (0.250): 0.047*"工作" + 0.013*"方式" + 0.012*"時間" + 0.011*"小時" + 0.010*"面試" + 0.009*"內容" + 0.007*"工時" + 0.007*"每日" + 0.006*"工資" + 0.006*"休息" 2025-04-19 16:03:47,675 : INFO : topic #3 (0.250): 0.011*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.005*"技術" + 0.004*"晶片" + 0.004*"工作" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.004*"台積電" 2025-04-19 16:03:47,675 : INFO : topic diff=0.963431, rho=0.313805 2025-04-19 16:03:47,675 : INFO : PROGRESS: pass 1, at document #4000/16310 2025-04-19 16:03:48,160 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:48,163 : INFO : topic #0 (0.250): 0.019*"工作" + 0.015*"方式" + 0.015*"電話" + 0.014*"台北市" + 0.012*"聯絡" + 0.012*"內容" + 0.012*"通知" + 0.010*"地點" + 0.009*"單位名稱" + 0.008*"單位地址" 2025-04-19 16:03:48,163 : INFO : topic #1 (0.250): 0.033*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"內容" 2025-04-19 16:03:48,164 : INFO : topic #2 (0.250): 0.048*"工作" + 0.016*"方式" + 0.014*"時間" + 0.013*"小時" + 0.010*"面試" + 0.009*"每日" + 0.009*"內容" + 0.008*"休息" + 0.008*"工資" + 0.008*"依法" 2025-04-19 16:03:48,165 : INFO : topic #3 (0.250): 0.011*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.004*"技術" + 0.004*"晶片" + 0.004*"工作" + 0.004*"員工" + 0.004*"表示" + 0.003*"科技" + 0.003*"時間" 2025-04-19 16:03:48,165 : INFO : topic diff=0.473894, rho=0.313805 2025-04-19 16:03:48,165 : INFO : PROGRESS: pass 1, at document #6000/16310 2025-04-19 16:03:48,557 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:48,560 : INFO : topic #0 (0.250): 0.019*"電話" + 0.017*"台北市" + 0.015*"報名" + 0.015*"工作" + 0.015*"方式" + 0.013*"聯絡" + 0.013*"通知" + 0.013*"內容" + 0.012*"活動" + 0.011*"地點" 2025-04-19 16:03:48,560 : INFO : topic #1 (0.250): 0.033*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"單位" 2025-04-19 16:03:48,561 : INFO : topic #2 (0.250): 0.049*"工作" + 0.019*"方式" + 0.015*"時間" + 0.014*"小時" + 0.011*"每日" + 0.010*"內容" + 0.010*"面試" + 0.009*"依法" + 0.009*"休息" + 0.009*"工資" 2025-04-19 16:03:48,561 : INFO : topic #3 (0.250): 0.012*"公司" + 0.005*"台灣" + 0.005*"美國" + 0.004*"技術" + 0.004*"資料" + 0.004*"工作" + 0.004*"目前" + 0.004*"問題" + 0.004*"產品" + 0.004*"時間" 2025-04-19 16:03:48,562 : INFO : topic diff=0.326303, rho=0.313805 2025-04-19 16:03:48,562 : INFO : PROGRESS: pass 1, at document #8000/16310 2025-04-19 16:03:48,846 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:48,848 : INFO : topic #0 (0.250): 0.019*"電話" + 0.017*"台北市" + 0.015*"報名" + 0.015*"工作" + 0.015*"方式" + 0.013*"聯絡" + 0.013*"通知" + 0.012*"內容" + 0.012*"活動" + 0.011*"地點" 2025-04-19 16:03:48,849 : INFO : topic #1 (0.250): 0.033*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"單位" 2025-04-19 16:03:48,849 : INFO : topic #2 (0.250): 0.054*"工作" + 0.018*"方式" + 0.016*"時間" + 0.014*"小時" + 0.013*"面試" + 0.010*"內容" + 0.009*"每日" + 0.008*"經驗" + 0.008*"工時" + 0.007*"休息" 2025-04-19 16:03:48,850 : INFO : topic #3 (0.250): 0.015*"公司" + 0.006*"問題" + 0.005*"工作" + 0.005*"工程師" + 0.005*"技術" + 0.005*"面試" + 0.004*"目前" + 0.004*"台灣" + 0.004*"開發" + 0.004*"產品" 2025-04-19 16:03:48,850 : INFO : topic diff=0.345988, rho=0.313805 2025-04-19 16:03:48,851 : INFO : PROGRESS: pass 1, at document #10000/16310 2025-04-19 16:03:49,099 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:49,101 : INFO : topic #0 (0.250): 0.018*"電話" + 0.016*"台北市" + 0.016*"報名" + 0.015*"方式" + 0.014*"工作" + 0.013*"聯絡" + 0.013*"通知" + 0.012*"內容" + 0.012*"活動" + 0.011*"地點" 2025-04-19 16:03:49,101 : INFO : topic #1 (0.250): 0.033*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" + 0.011*"單位" 2025-04-19 16:03:49,102 : INFO : topic #2 (0.250): 0.055*"工作" + 0.017*"方式" + 0.017*"時間" + 0.014*"小時" + 0.013*"面試" + 0.011*"內容" + 0.009*"經驗" + 0.009*"工時" + 0.009*"每日" + 0.007*"職缺" 2025-04-19 16:03:49,102 : INFO : topic #3 (0.250): 0.015*"公司" + 0.006*"問題" + 0.006*"工作" + 0.006*"面試" + 0.005*"工程師" + 0.005*"開發" + 0.005*"目前" + 0.005*"技術" + 0.004*"比較" + 0.004*"台灣" 2025-04-19 16:03:49,103 : INFO : topic diff=0.277496, rho=0.313805 2025-04-19 16:03:49,103 : INFO : PROGRESS: pass 1, at document #12000/16310 2025-04-19 16:03:49,322 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:49,324 : INFO : topic #0 (0.250): 0.017*"電話" + 0.017*"報名" + 0.015*"台北市" + 0.015*"方式" + 0.014*"工作" + 0.014*"活動" + 0.013*"聯絡" + 0.012*"通知" + 0.012*"內容" + 0.011*"地點" 2025-04-19 16:03:49,324 : INFO : topic #1 (0.250): 0.033*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"內容" 2025-04-19 16:03:49,325 : INFO : topic #2 (0.250): 0.055*"工作" + 0.017*"時間" + 0.016*"方式" + 0.014*"小時" + 0.013*"面試" + 0.010*"內容" + 0.009*"經驗" + 0.009*"工時" + 0.008*"每日" + 0.007*"職缺" 2025-04-19 16:03:49,325 : INFO : topic #3 (0.250): 0.014*"公司" + 0.006*"工作" + 0.006*"問題" + 0.005*"面試" + 0.005*"工程師" + 0.004*"技術" + 0.004*"台灣" + 0.004*"目前" + 0.004*"開發" + 0.004*"比較" 2025-04-19 16:03:49,326 : INFO : topic diff=0.287867, rho=0.313805 2025-04-19 16:03:49,326 : INFO : PROGRESS: pass 1, at document #14000/16310 2025-04-19 16:03:49,527 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:49,529 : INFO : topic #0 (0.250): 0.017*"報名" + 0.017*"電話" + 0.015*"台北市" + 0.014*"方式" + 0.013*"活動" + 0.013*"工作" + 0.013*"聯絡" + 0.012*"通知" + 0.011*"內容" + 0.011*"地點" 2025-04-19 16:03:49,529 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"內容" 2025-04-19 16:03:49,530 : INFO : topic #2 (0.250): 0.055*"工作" + 0.016*"時間" + 0.015*"方式" + 0.013*"小時" + 0.013*"面試" + 0.010*"內容" + 0.010*"工時" + 0.009*"經驗" + 0.007*"每日" + 0.007*"職缺" 2025-04-19 16:03:49,530 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"台灣" + 0.005*"工作" + 0.005*"問題" + 0.004*"技術" + 0.004*"工程師" + 0.004*"面試" + 0.004*"目前" + 0.003*"美國" + 0.003*"員工" 2025-04-19 16:03:49,531 : INFO : topic diff=0.283267, rho=0.313805 2025-04-19 16:03:49,531 : INFO : PROGRESS: pass 1, at document #16000/16310 2025-04-19 16:03:49,720 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:49,722 : INFO : topic #0 (0.250): 0.019*"報名" + 0.015*"電話" + 0.014*"台北市" + 0.014*"方式" + 0.013*"活動" + 0.012*"工作" + 0.012*"聯絡" + 0.012*"通知" + 0.010*"內容" + 0.010*"地點" 2025-04-19 16:03:49,723 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"單位" + 0.011*"內容" 2025-04-19 16:03:49,723 : INFO : topic #2 (0.250): 0.055*"工作" + 0.015*"時間" + 0.013*"方式" + 0.013*"面試" + 0.013*"小時" + 0.011*"內容" + 0.011*"工時" + 0.009*"經驗" + 0.007*"職缺" + 0.007*"地點" 2025-04-19 16:03:49,724 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"台灣" + 0.005*"美國" + 0.005*"工作" + 0.004*"技術" + 0.004*"晶片" + 0.004*"問題" + 0.004*"工程師" + 0.004*"員工" + 0.004*"表示" 2025-04-19 16:03:49,724 : INFO : topic diff=0.252586, rho=0.313805 2025-04-19 16:03:49,790 : INFO : -8.446 per-word bound, 348.8 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 16:03:49,791 : INFO : PROGRESS: pass 1, at document #16310/16310 2025-04-19 16:03:49,823 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 16:03:49,825 : INFO : topic #0 (0.250): 0.019*"報名" + 0.015*"問卷" + 0.014*"台北市" + 0.013*"電話" + 0.013*"工作" + 0.013*"活動" + 0.012*"方式" + 0.011*"通知" + 0.011*"聯絡" + 0.010*"地點" 2025-04-19 16:03:49,825 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 16:03:49,826 : INFO : topic #2 (0.250): 0.053*"工作" + 0.015*"時間" + 0.014*"小時" + 0.012*"方式" + 0.012*"面試" + 0.011*"工時" + 0.010*"內容" + 0.009*"經驗" + 0.007*"公司" + 0.006*"薪資" 2025-04-19 16:03:49,826 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"美國" + 0.006*"台灣" + 0.005*"技術" + 0.005*"晶片" + 0.004*"工作" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.004*"台積電" 2025-04-19 16:03:49,826 : INFO : topic diff=0.279421, rho=0.313805 2025-04-19 16:03:49,827 : INFO : PROGRESS: pass 2, at document #2000/16310 2025-04-19 16:03:50,204 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:50,206 : INFO : topic #0 (0.250): 0.021*"報名" + 0.019*"電話" + 0.018*"活動" + 0.017*"台北市" + 0.014*"通知" + 0.013*"方式" + 0.012*"人數" + 0.011*"聯絡" + 0.011*"時間" + 0.011*"地點" 2025-04-19 16:03:50,207 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"單位" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 16:03:50,207 : INFO : topic #2 (0.250): 0.054*"工作" + 0.017*"時間" + 0.016*"方式" + 0.015*"小時" + 0.012*"面試" + 0.011*"工時" + 0.011*"內容" + 0.009*"每日" + 0.008*"休息" + 0.007*"經驗" 2025-04-19 16:03:50,208 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.005*"技術" + 0.004*"晶片" + 0.004*"工作" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.004*"問題" 2025-04-19 16:03:50,208 : INFO : topic diff=0.815427, rho=0.299409 2025-04-19 16:03:50,208 : INFO : PROGRESS: pass 2, at document #4000/16310 2025-04-19 16:03:50,570 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:50,572 : INFO : topic #0 (0.250): 0.021*"報名" + 0.020*"電話" + 0.019*"活動" + 0.017*"台北市" + 0.014*"通知" + 0.012*"人數" + 0.012*"方式" + 0.011*"時間" + 0.011*"聯絡" + 0.011*"地點" 2025-04-19 16:03:50,573 : INFO : topic #1 (0.250): 0.033*"工作" + 0.015*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 16:03:50,573 : INFO : topic #2 (0.250): 0.053*"工作" + 0.019*"方式" + 0.017*"時間" + 0.016*"小時" + 0.012*"面試" + 0.011*"每日" + 0.011*"內容" + 0.010*"工時" + 0.009*"休息" + 0.008*"依法" 2025-04-19 16:03:50,574 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.005*"技術" + 0.004*"晶片" + 0.004*"工作" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.004*"問題" 2025-04-19 16:03:50,574 : INFO : topic diff=0.366523, rho=0.299409 2025-04-19 16:03:50,575 : INFO : PROGRESS: pass 2, at document #6000/16310 2025-04-19 16:03:50,914 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:50,916 : INFO : topic #0 (0.250): 0.024*"報名" + 0.021*"活動" + 0.021*"電話" + 0.017*"台北市" + 0.013*"通知" + 0.013*"人數" + 0.011*"聯絡" + 0.011*"時間" + 0.011*"車馬費" + 0.010*"方式" 2025-04-19 16:03:50,917 : INFO : topic #1 (0.250): 0.033*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 16:03:50,917 : INFO : topic #2 (0.250): 0.052*"工作" + 0.021*"方式" + 0.017*"時間" + 0.015*"小時" + 0.012*"每日" + 0.011*"內容" + 0.011*"面試" + 0.010*"休息" + 0.009*"依法" + 0.009*"工時" 2025-04-19 16:03:50,918 : INFO : topic #3 (0.250): 0.013*"公司" + 0.006*"台灣" + 0.005*"美國" + 0.005*"技術" + 0.004*"工作" + 0.004*"問題" + 0.004*"晶片" + 0.004*"工程師" + 0.004*"員工" + 0.003*"目前" 2025-04-19 16:03:50,918 : INFO : topic diff=0.279284, rho=0.299409 2025-04-19 16:03:50,918 : INFO : PROGRESS: pass 2, at document #8000/16310 2025-04-19 16:03:51,174 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:51,176 : INFO : topic #0 (0.250): 0.024*"報名" + 0.021*"活動" + 0.021*"電話" + 0.017*"台北市" + 0.013*"通知" + 0.012*"人數" + 0.011*"時間" + 0.011*"聯絡" + 0.010*"方式" + 0.010*"車馬費" 2025-04-19 16:03:51,177 : INFO : topic #1 (0.250): 0.033*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 16:03:51,177 : INFO : topic #2 (0.250): 0.054*"工作" + 0.019*"方式" + 0.018*"時間" + 0.015*"小時" + 0.013*"面試" + 0.011*"內容" + 0.010*"每日" + 0.009*"經驗" + 0.009*"工時" + 0.008*"休息" 2025-04-19 16:03:51,178 : INFO : topic #3 (0.250): 0.016*"公司" + 0.006*"問題" + 0.006*"工作" + 0.006*"工程師" + 0.005*"技術" + 0.005*"面試" + 0.005*"台灣" + 0.004*"開發" + 0.004*"目前" + 0.004*"產品" 2025-04-19 16:03:51,178 : INFO : topic diff=0.315344, rho=0.299409 2025-04-19 16:03:51,178 : INFO : PROGRESS: pass 2, at document #10000/16310 2025-04-19 16:03:51,404 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:51,406 : INFO : topic #0 (0.250): 0.025*"報名" + 0.022*"活動" + 0.020*"電話" + 0.016*"台北市" + 0.013*"通知" + 0.012*"人數" + 0.011*"時間" + 0.011*"聯絡" + 0.010*"方式" + 0.010*"內容" 2025-04-19 16:03:51,407 : INFO : topic #1 (0.250): 0.033*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 16:03:51,407 : INFO : topic #2 (0.250): 0.053*"工作" + 0.018*"時間" + 0.018*"方式" + 0.015*"小時" + 0.012*"面試" + 0.011*"內容" + 0.010*"經驗" + 0.009*"工時" + 0.009*"每日" + 0.007*"職缺" 2025-04-19 16:03:51,408 : INFO : topic #3 (0.250): 0.016*"公司" + 0.007*"工作" + 0.007*"問題" + 0.006*"面試" + 0.006*"工程師" + 0.005*"技術" + 0.005*"開發" + 0.005*"目前" + 0.004*"比較" + 0.004*"覺得" 2025-04-19 16:03:51,408 : INFO : topic diff=0.258780, rho=0.299409 2025-04-19 16:03:51,408 : INFO : PROGRESS: pass 2, at document #12000/16310 2025-04-19 16:03:51,620 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:51,622 : INFO : topic #0 (0.250): 0.026*"報名" + 0.024*"活動" + 0.019*"電話" + 0.015*"台北市" + 0.012*"通知" + 0.012*"人數" + 0.011*"時間" + 0.011*"聯絡" + 0.010*"方式" + 0.010*"舉辦" 2025-04-19 16:03:51,622 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 16:03:51,623 : INFO : topic #2 (0.250): 0.052*"工作" + 0.018*"時間" + 0.017*"方式" + 0.014*"小時" + 0.012*"面試" + 0.011*"內容" + 0.010*"經驗" + 0.010*"工時" + 0.008*"每日" + 0.008*"職缺" 2025-04-19 16:03:51,623 : INFO : topic #3 (0.250): 0.014*"公司" + 0.006*"工作" + 0.006*"問題" + 0.005*"面試" + 0.005*"工程師" + 0.005*"技術" + 0.005*"台灣" + 0.004*"開發" + 0.004*"目前" + 0.004*"比較" 2025-04-19 16:03:51,624 : INFO : topic diff=0.269441, rho=0.299409 2025-04-19 16:03:51,624 : INFO : PROGRESS: pass 2, at document #14000/16310 2025-04-19 16:03:51,857 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:51,859 : INFO : topic #0 (0.250): 0.025*"報名" + 0.023*"活動" + 0.018*"電話" + 0.014*"台北市" + 0.012*"通知" + 0.012*"人數" + 0.011*"問卷" + 0.011*"研究" + 0.011*"時間" + 0.010*"聯絡" 2025-04-19 16:03:51,860 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 16:03:51,861 : INFO : topic #2 (0.250): 0.053*"工作" + 0.017*"時間" + 0.016*"方式" + 0.014*"小時" + 0.011*"面試" + 0.011*"內容" + 0.010*"經驗" + 0.010*"工時" + 0.008*"職缺" + 0.007*"每日" 2025-04-19 16:03:51,861 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"台灣" + 0.006*"工作" + 0.005*"問題" + 0.004*"技術" + 0.004*"工程師" + 0.004*"面試" + 0.004*"目前" + 0.004*"美國" + 0.003*"開發" 2025-04-19 16:03:51,862 : INFO : topic diff=0.269494, rho=0.299409 2025-04-19 16:03:51,862 : INFO : PROGRESS: pass 2, at document #16000/16310 2025-04-19 16:03:52,050 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:52,052 : INFO : topic #0 (0.250): 0.026*"報名" + 0.023*"活動" + 0.016*"電話" + 0.013*"研究" + 0.013*"台北市" + 0.012*"問卷" + 0.012*"通知" + 0.011*"人數" + 0.010*"時間" + 0.010*"聯絡" 2025-04-19 16:03:52,053 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 16:03:52,053 : INFO : topic #2 (0.250): 0.052*"工作" + 0.017*"時間" + 0.014*"方式" + 0.014*"小時" + 0.012*"面試" + 0.011*"內容" + 0.011*"工時" + 0.010*"經驗" + 0.008*"薪資" + 0.007*"職缺" 2025-04-19 16:03:52,054 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"台灣" + 0.005*"美國" + 0.005*"工作" + 0.004*"技術" + 0.004*"晶片" + 0.004*"問題" + 0.004*"工程師" + 0.004*"表示" + 0.004*"員工" 2025-04-19 16:03:52,054 : INFO : topic diff=0.240447, rho=0.299409 2025-04-19 16:03:52,119 : INFO : -8.429 per-word bound, 344.8 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 16:03:52,120 : INFO : PROGRESS: pass 2, at document #16310/16310 2025-04-19 16:03:52,151 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 16:03:52,153 : INFO : topic #0 (0.250): 0.024*"報名" + 0.023*"活動" + 0.017*"問卷" + 0.017*"研究" + 0.014*"電話" + 0.013*"台北市" + 0.011*"時間" + 0.011*"人數" + 0.010*"通知" + 0.009*"舉辦" 2025-04-19 16:03:52,154 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"單位" + 0.011*"第一項" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 16:03:52,155 : INFO : topic #2 (0.250): 0.052*"工作" + 0.016*"時間" + 0.015*"小時" + 0.013*"方式" + 0.012*"工時" + 0.011*"面試" + 0.011*"內容" + 0.010*"經驗" + 0.008*"薪資" + 0.007*"地點" 2025-04-19 16:03:52,155 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"美國" + 0.006*"台灣" + 0.005*"技術" + 0.005*"晶片" + 0.004*"工作" + 0.004*"員工" + 0.004*"科技" + 0.004*"表示" + 0.004*"問題" 2025-04-19 16:03:52,155 : INFO : topic diff=0.261425, rho=0.299409 2025-04-19 16:03:52,156 : INFO : PROGRESS: pass 3, at document #2000/16310 2025-04-19 16:03:52,500 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:52,502 : INFO : topic #0 (0.250): 0.025*"報名" + 0.023*"活動" + 0.019*"電話" + 0.016*"台北市" + 0.012*"通知" + 0.012*"人數" + 0.012*"時間" + 0.011*"舉辦" + 0.010*"車馬費" + 0.010*"聯絡" 2025-04-19 16:03:52,502 : INFO : topic #1 (0.250): 0.032*"工作" + 0.015*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"單位" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 16:03:52,503 : INFO : topic #2 (0.250): 0.052*"工作" + 0.018*"方式" + 0.018*"時間" + 0.015*"小時" + 0.011*"面試" + 0.011*"內容" + 0.011*"工時" + 0.009*"每日" + 0.008*"休息" + 0.008*"地點" 2025-04-19 16:03:52,503 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.005*"技術" + 0.005*"晶片" + 0.004*"工作" + 0.004*"科技" + 0.004*"員工" + 0.004*"表示" + 0.004*"問題" 2025-04-19 16:03:52,504 : INFO : topic diff=0.697368, rho=0.286829 2025-04-19 16:03:52,504 : INFO : PROGRESS: pass 3, at document #4000/16310 2025-04-19 16:03:52,839 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:52,841 : INFO : topic #0 (0.250): 0.025*"報名" + 0.023*"活動" + 0.020*"電話" + 0.016*"台北市" + 0.013*"人數" + 0.012*"通知" + 0.011*"時間" + 0.011*"舉辦" + 0.011*"車馬費" + 0.010*"資料" 2025-04-19 16:03:52,841 : INFO : topic #1 (0.250): 0.033*"工作" + 0.015*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"單位" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 16:03:52,842 : INFO : topic #2 (0.250): 0.051*"工作" + 0.020*"方式" + 0.018*"時間" + 0.016*"小時" + 0.011*"面試" + 0.011*"每日" + 0.011*"內容" + 0.010*"工時" + 0.009*"休息" + 0.008*"地點" 2025-04-19 16:03:52,842 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.005*"技術" + 0.004*"晶片" + 0.004*"工作" + 0.004*"科技" + 0.004*"員工" + 0.004*"表示" + 0.004*"問題" 2025-04-19 16:03:52,843 : INFO : topic diff=0.335287, rho=0.286829 2025-04-19 16:03:52,844 : INFO : PROGRESS: pass 3, at document #6000/16310 2025-04-19 16:03:53,158 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:53,160 : INFO : topic #0 (0.250): 0.027*"報名" + 0.025*"活動" + 0.021*"電話" + 0.017*"台北市" + 0.013*"人數" + 0.012*"通知" + 0.012*"車馬費" + 0.011*"舉辦" + 0.011*"時間" + 0.011*"資料" 2025-04-19 16:03:53,160 : INFO : topic #1 (0.250): 0.033*"工作" + 0.015*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"第一項" + 0.011*"單位" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 16:03:53,161 : INFO : topic #2 (0.250): 0.050*"工作" + 0.021*"方式" + 0.017*"時間" + 0.015*"小時" + 0.012*"每日" + 0.011*"內容" + 0.010*"面試" + 0.010*"休息" + 0.009*"依法" + 0.009*"工時" 2025-04-19 16:03:53,161 : INFO : topic #3 (0.250): 0.013*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.005*"技術" + 0.004*"工作" + 0.004*"問題" + 0.004*"晶片" + 0.004*"工程師" + 0.004*"科技" + 0.004*"員工" 2025-04-19 16:03:53,162 : INFO : topic diff=0.259937, rho=0.286829 2025-04-19 16:03:53,162 : INFO : PROGRESS: pass 3, at document #8000/16310 2025-04-19 16:03:53,408 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:53,410 : INFO : topic #0 (0.250): 0.027*"報名" + 0.025*"活動" + 0.020*"電話" + 0.016*"台北市" + 0.012*"人數" + 0.012*"通知" + 0.011*"舉辦" + 0.011*"車馬費" + 0.011*"時間" + 0.011*"資料" 2025-04-19 16:03:53,411 : INFO : topic #1 (0.250): 0.033*"工作" + 0.015*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"單位" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 16:03:53,411 : INFO : topic #2 (0.250): 0.051*"工作" + 0.019*"方式" + 0.018*"時間" + 0.015*"小時" + 0.012*"面試" + 0.011*"內容" + 0.010*"每日" + 0.009*"經驗" + 0.009*"工時" + 0.008*"休息" 2025-04-19 16:03:53,412 : INFO : topic #3 (0.250): 0.015*"公司" + 0.006*"工作" + 0.006*"問題" + 0.006*"工程師" + 0.005*"面試" + 0.005*"技術" + 0.005*"台灣" + 0.004*"開發" + 0.004*"目前" + 0.004*"覺得" 2025-04-19 16:03:53,412 : INFO : topic diff=0.295338, rho=0.286829 2025-04-19 16:03:53,412 : INFO : PROGRESS: pass 3, at document #10000/16310 2025-04-19 16:03:53,631 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:53,633 : INFO : topic #0 (0.250): 0.028*"報名" + 0.025*"活動" + 0.019*"電話" + 0.015*"台北市" + 0.012*"人數" + 0.012*"通知" + 0.011*"時間" + 0.011*"舉辦" + 0.011*"資料" + 0.011*"車馬費" 2025-04-19 16:03:53,633 : INFO : topic #1 (0.250): 0.033*"工作" + 0.015*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"單位" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 16:03:53,634 : INFO : topic #2 (0.250): 0.050*"工作" + 0.018*"時間" + 0.018*"方式" + 0.014*"小時" + 0.011*"面試" + 0.011*"內容" + 0.010*"經驗" + 0.009*"工時" + 0.009*"每日" + 0.007*"職缺" 2025-04-19 16:03:53,634 : INFO : topic #3 (0.250): 0.016*"公司" + 0.007*"工作" + 0.007*"問題" + 0.006*"面試" + 0.006*"工程師" + 0.005*"技術" + 0.005*"開發" + 0.005*"目前" + 0.004*"比較" + 0.004*"覺得" 2025-04-19 16:03:53,635 : INFO : topic diff=0.243741, rho=0.286829 2025-04-19 16:03:53,635 : INFO : PROGRESS: pass 3, at document #12000/16310 2025-04-19 16:03:53,851 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:53,853 : INFO : topic #0 (0.250): 0.028*"報名" + 0.027*"活動" + 0.018*"電話" + 0.014*"台北市" + 0.012*"研究" + 0.012*"人數" + 0.011*"舉辦" + 0.011*"通知" + 0.011*"時間" + 0.011*"資料" 2025-04-19 16:03:53,854 : INFO : topic #1 (0.250): 0.032*"工作" + 0.015*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"單位" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 16:03:53,854 : INFO : topic #2 (0.250): 0.050*"工作" + 0.018*"時間" + 0.017*"方式" + 0.014*"小時" + 0.011*"內容" + 0.011*"面試" + 0.010*"經驗" + 0.009*"工時" + 0.008*"每日" + 0.008*"職缺" 2025-04-19 16:03:53,855 : INFO : topic #3 (0.250): 0.014*"公司" + 0.006*"工作" + 0.006*"問題" + 0.006*"面試" + 0.005*"工程師" + 0.005*"技術" + 0.005*"台灣" + 0.004*"開發" + 0.004*"目前" + 0.004*"比較" 2025-04-19 16:03:53,855 : INFO : topic diff=0.254071, rho=0.286829 2025-04-19 16:03:53,855 : INFO : PROGRESS: pass 3, at document #14000/16310 2025-04-19 16:03:54,054 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:54,056 : INFO : topic #0 (0.250): 0.027*"報名" + 0.026*"活動" + 0.017*"電話" + 0.014*"研究" + 0.013*"台北市" + 0.012*"問卷" + 0.011*"人數" + 0.011*"舉辦" + 0.011*"通知" + 0.010*"時間" 2025-04-19 16:03:54,057 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"單位" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 16:03:54,057 : INFO : topic #2 (0.250): 0.050*"工作" + 0.017*"時間" + 0.016*"方式" + 0.014*"小時" + 0.011*"內容" + 0.011*"面試" + 0.010*"經驗" + 0.010*"工時" + 0.008*"薪資" + 0.008*"職缺" 2025-04-19 16:03:54,058 : INFO : topic #3 (0.250): 0.013*"公司" + 0.006*"台灣" + 0.006*"工作" + 0.005*"問題" + 0.004*"技術" + 0.004*"工程師" + 0.004*"面試" + 0.004*"目前" + 0.004*"美國" + 0.003*"開發" 2025-04-19 16:03:54,058 : INFO : topic diff=0.255783, rho=0.286829 2025-04-19 16:03:54,058 : INFO : PROGRESS: pass 3, at document #16000/16310 2025-04-19 16:03:54,247 : INFO : merging changes from 2000 documents into a model of 16310 documents 2025-04-19 16:03:54,249 : INFO : topic #0 (0.250): 0.027*"報名" + 0.026*"活動" + 0.016*"研究" + 0.015*"電話" + 0.013*"問卷" + 0.012*"台北市" + 0.011*"人數" + 0.011*"舉辦" + 0.010*"通知" + 0.010*"參加" 2025-04-19 16:03:54,249 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"單位" + 0.011*"第一項" + 0.011*"聯絡" + 0.011*"內容" 2025-04-19 16:03:54,250 : INFO : topic #2 (0.250): 0.049*"工作" + 0.017*"時間" + 0.015*"方式" + 0.013*"小時" + 0.011*"內容" + 0.011*"面試" + 0.010*"工時" + 0.010*"經驗" + 0.008*"薪資" + 0.007*"職缺" 2025-04-19 16:03:54,250 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"台灣" + 0.005*"美國" + 0.005*"工作" + 0.004*"技術" + 0.004*"問題" + 0.004*"晶片" + 0.004*"工程師" + 0.004*"表示" + 0.004*"員工" 2025-04-19 16:03:54,251 : INFO : topic diff=0.228391, rho=0.286829 2025-04-19 16:03:54,317 : INFO : -8.423 per-word bound, 343.3 perplexity estimate based on a held-out corpus of 310 documents with 43091 words 2025-04-19 16:03:54,317 : INFO : PROGRESS: pass 3, at document #16310/16310 2025-04-19 16:03:54,349 : INFO : merging changes from 310 documents into a model of 16310 documents 2025-04-19 16:03:54,351 : INFO : topic #0 (0.250): 0.025*"活動" + 0.025*"報名" + 0.018*"研究" + 0.017*"問卷" + 0.013*"電話" + 0.012*"台北市" + 0.011*"人數" + 0.011*"時間" + 0.010*"舉辦" + 0.010*"參與" 2025-04-19 16:03:54,352 : INFO : topic #1 (0.250): 0.032*"工作" + 0.014*"推定" + 0.013*"方式" + 0.012*"砍除" + 0.012*"空白" + 0.012*"情形" + 0.011*"單位" + 0.011*"第一項" + 0.011*"內容" + 0.011*"聯絡" 2025-04-19 16:03:54,352 : INFO : topic #2 (0.250): 0.049*"工作" + 0.016*"時間" + 0.015*"小時" + 0.013*"方式" + 0.011*"工時" + 0.011*"內容" + 0.010*"面試" + 0.010*"經驗" + 0.009*"薪資" + 0.007*"地點" 2025-04-19 16:03:54,353 : INFO : topic #3 (0.250): 0.012*"公司" + 0.006*"台灣" + 0.006*"美國" + 0.005*"技術" + 0.005*"晶片" + 0.004*"工作" + 0.004*"科技" + 0.004*"表示" + 0.004*"員工" + 0.004*"問題" 2025-04-19 16:03:54,353 : INFO : topic diff=0.247362, rho=0.286829 2025-04-19 16:03:54,353 : INFO : LdaModel lifecycle event {'msg': 'trained LdaModel<num_terms=23261, num_topics=4, decay=0.5, chunksize=2000> in 10.29s', 'datetime': '2025-04-19T16:03:54.353935', 'gensim': '4.3.3', 'python': '3.11.2 (main, Apr 21 2023, 22:51:21) [Clang 14.0.3 (clang-1403.0.22.14.1)]', 'platform': 'macOS-15.3.2-arm64-arm-64bit', 'event': 'created'}
在嘗試後不同主題數量後,發現4個主題最為合適。
# 取得每條新聞的主題分佈
topics_doc = model_5.get_document_topics(corpus)
topics_doc[100]
[(1, 0.97264475), (2, 0.02634671)]
# 把 gensim 的稀疏表示法轉成稀疏矩陣
m_theta = corpus2csc(topics_doc).T.toarray() # 倒置讓shape變為(num_docs, num_topics)
m_theta
array([[0. , 0.33038548, 0.66557795, 0. ], [0.9956888 , 0. , 0. , 0. ], [0. , 0.99831349, 0. , 0. ], ..., [0. , 0. , 0. , 0.99198657], [0. , 0. , 0. , 0.9972536 ], [0. , 0. , 0.17628665, 0.82183146]])
# 將主題的機率分布轉換成主題標籤
data['topic_label'] = m_theta.argmax(axis=1) + 1
# 儲存分類結果
data.to_csv('./raw_data/topicData.csv',index=False)
# 取得所有唯一的 topic_label,並排序保證一致性
all_topics = sorted(data['topic_label'].unique())
# 使用較深的 colormap,例如 'tab20'
cmap = plt.cm.get_cmap('tab20', len(all_topics))
colors = {topic: cmap(i) for i, topic in enumerate(all_topics)}
/var/folders/tz/hplj27qd26n9qxr1cd83m32c0000gn/T/ipykernel_12814/893968679.py:5: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed two minor releases later. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap(obj)`` instead. cmap = plt.cm.get_cmap('tab20', len(all_topics))
data = pd.read_csv("./raw_data/topicData.csv")
# 日期轉換與欄位整理
data['artDate'] = pd.to_datetime(data['artDate'])
value_counts = data[data['artDate'].dt.year >= 2024]['topic_label'].value_counts()
# 畫出圓餅圖
plt.figure(figsize=(4, 4))
plt.pie(value_counts, labels=value_counts.index, autopct='%1.1f%%', startangle=140, colors=[colors[col] for col in value_counts.index])
plt.title('Distribution of Topic Labels')
plt.axis('equal') # 讓圓餅圖是圓形的
plt.show()
date_topic = data[data['artDate'].dt.year >= 2024].groupby(data['artDate'].dt.date)['topic_label'].value_counts().unstack()
fig, ax = plt.subplots(figsize=(15, 6))
date_topic.plot.line(ax=ax, stacked=True, color=[colors[col] for col in date_topic.columns])
ax.legend(loc='lower right')
ax.set_title("2024 - 2025 topic trend")
Text(0.5, 1.0, '2024 - 2025 topic trend')
month_topic = data[data['artDate'].dt.year >= 2024].groupby(data['artDate'].dt.month)['topic_label'].value_counts(normalize=True).unstack()
fig, ax = plt.subplots(figsize=(15, 6))
month_topic.plot.bar(ax=ax, stacked=True, color=[colors[col] for col in month_topic.columns])
ax.legend(loc='lower right')
<matplotlib.legend.Legend at 0x3597e55d0>
# 畫三張圖(固定顏色)
fig, ax = plt.subplots(3, 1, figsize=(15, 22))
for i, cat in enumerate(data['artCatagory'].unique()):
date_topic = (
data[(data['artCatagory'] == cat) & (data['artDate'].dt.year >= 2024)]
.groupby(data['artDate'].dt.date)['topic_label']
.value_counts()
.unstack()
.reindex(columns=all_topics, fill_value=0) # 確保所有主題都出現
)
# 繪圖(固定顏色順序)
date_topic.plot.line(
ax=ax[i],
stacked=True,
color=[colors[col] for col in date_topic.columns]
)
ax[i].set_title(f'{cat}', fontsize=16)
ax[i].legend(loc='lower right')
plt.tight_layout()
plt.show()
import guidedlda
p
word2id = dictionary.token2id
# 嘗試將軟體與硬體分開
seed_topic_list = [
["受訪者", "訪談", "訪問", "實驗", "抽獎"],
["聯絡人", "連結", "電子郵件", "聯絡"],
["晶片","技術","英特爾","台積電"],
["軟體","資訊","網站","系統", "開發","專案"],
["面試", "加班費", "履歷", "職缺"]
]
seed_topics = {}
for t_id, st in enumerate(seed_topic_list):
for word in st:
seed_topics[word2id[word]] = t_id
# guidedlda 需要 DTM 格式作為 input,因此這邊利用 corpus2dense() 方法進行轉換
X = corpus2dense(corpus, len(dictionary), len(corpus)).T.astype(np.int64)
model = guidedlda.GuidedLDA(n_topics=5, n_iter=100, random_state=7, refresh=20)
model.fit(X, seed_topics=seed_topics, seed_confidence=1)
2025-04-19 15:48:34,425 : INFO : n_documents: 16310 2025-04-19 15:48:34,427 : INFO : vocab_size: 23261 2025-04-19 15:48:34,427 : INFO : n_words: 3460208 2025-04-19 15:48:34,427 : INFO : n_topics: 5 2025-04-19 15:48:34,427 : INFO : n_iter: 100 2025-04-19 15:48:34,569 : WARNING : all zero row in document-term matrix found 2025-04-19 15:48:48,928 : INFO : <0> log likelihood: -32236301 2025-04-19 15:48:51,118 : INFO : <20> log likelihood: -24644227 2025-04-19 15:48:53,125 : INFO : <40> log likelihood: -24503986 2025-04-19 15:48:55,176 : INFO : <60> log likelihood: -24410422 2025-04-19 15:48:57,211 : INFO : <80> log likelihood: -24380054 2025-04-19 15:48:59,208 : INFO : <99> log likelihood: -24372692
<guidedlda.guidedlda.GuidedLDA at 0x35be08350>
# 整理/顯示主題模型結果
n_top_words = 10
topic_word = model.topic_word_
# 取得corpus全部的詞彙表
vocab = tuple(dictionary.token2id.keys())
for i, topic_dist in enumerate(topic_word):
# 依照詞語機率從小到大排序,找出每個主題的前十個關鍵詞
topic_words = np.array(vocab)[np.argsort(topic_dist)][: -(n_top_words + 1) : -1]
print("Topic {}: {}".format(i, " ".join(topic_words)))
doc_topic = model.doc_topic_ # 文件-主題 分佈
term_freq = tuple(dictionary.cfs.values()) # 每個詞在整個語料中出現的總次數
doc_len = [sum(v for k, v in doc) for doc in corpus] # 每篇文章的長度
## LDAvis
pyLDAvis.enable_notebook()
p = pyLDAvis.prepare(topic_word, doc_topic, doc_len, vocab = vocab, term_frequency = term_freq)
p
Topic 0: 報名 活動 電話 台北市 人數 時間 通知 舉辦 車馬費 聯絡 Topic 1: 工作 推定 方式 砍除 空白 情形 單位 內容 聯絡 第一項 Topic 2: 台灣 美國 公司 晶片 表示 中國 半導體 台積電 員工 報導 Topic 3: 工作 時間 方式 公司 小時 經驗 每日 薪資 工時 職缺 Topic 4: 公司 工作 面試 問題 工程師 比較 覺得 知道 目前 時間
在使用seed list後,成功將軟體與硬體分開
plot_coef(logistic_reg_model=model_set['clf_logistic'], feature_names=vectorizer.get_feature_names_out(), top_n=10)