import pandas as pd
读数据,给定列名
# Read it, and remove the last row
popcon = pd.read_csv('../data/popularity-contest', sep=' ', )[:-1]
popcon.columns = ['atime', 'ctime', 'package-name', 'mru-program', 'tag']
前两列分别是改动时间和创建时间
popcon[:5]
atime | ctime | package-name | mru-program | tag | |
---|---|---|---|---|---|
0 | 1387295797 | 1367633260 | perl-base | /usr/bin/perl | NaN |
1 | 1387295796 | 1354370480 | login | /bin/su | NaN |
2 | 1387295743 | 1354341275 | libtalloc2 | /usr/lib/x86_64-linux-gnu/libtalloc.so.2.0.7 | NaN |
3 | 1387295743 | 1387224204 | libwbclient0 | /usr/lib/x86_64-linux-gnu/libwbclient.so.0 | <RECENT-CTIME> |
4 | 1387295742 | 1354341253 | libselinux1 | /lib/x86_64-linux-gnu/libselinux.so.1 | NaN |
开始的时候我们需要转成int型
popcon['atime'] = popcon['atime'].astype(int)
popcon['ctime'] = popcon['ctime'].astype(int)
然后可以用 pd.to_datetime
函数去把整型按照时间戳转成具体的日期和时间。
popcon['atime'] = pd.to_datetime(popcon['atime'], unit='s')
popcon['ctime'] = pd.to_datetime(popcon['ctime'], unit='s')
popcon['atime'].dtype
dtype('<M8[ns]')
你再看看 atime
和 ctime
字段,就变成标准时间啦!
popcon[:5]
atime | ctime | package-name | mru-program | tag | |
---|---|---|---|---|---|
0 | 2013-12-17 15:56:37 | 2013-05-04 02:07:40 | perl-base | /usr/bin/perl | NaN |
1 | 2013-12-17 15:56:36 | 2012-12-01 14:01:20 | login | /bin/su | NaN |
2 | 2013-12-17 15:55:43 | 2012-12-01 05:54:35 | libtalloc2 | /usr/lib/x86_64-linux-gnu/libtalloc.so.2.0.7 | NaN |
3 | 2013-12-17 15:55:43 | 2013-12-16 20:03:24 | libwbclient0 | /usr/lib/x86_64-linux-gnu/libwbclient.so.0 | <RECENT-CTIME> |
4 | 2013-12-17 15:55:42 | 2012-12-01 05:54:13 | libselinux1 | /lib/x86_64-linux-gnu/libselinux.so.1 | NaN |
另外一个操作是,我们需要按照时间去筛选数据,这个也很简单,你只需要告诉pandas,我需要大于/小于某个日期的数据!!
popcon = popcon[popcon['atime'] > '1970-01-01']
这个操作是帮你复习一下字符串的操作
nonlibraries = popcon[~popcon['package-name'].str.contains('lib')]
排个序
nonlibraries.sort('ctime', ascending=False)[:10]
atime | ctime | package-name | mru-program | tag | |
---|---|---|---|---|---|
57 | 2013-12-17 04:55:39 | 2013-12-17 04:55:42 | ddd | /usr/bin/ddd | <RECENT-CTIME> |
450 | 2013-12-16 20:03:20 | 2013-12-16 20:05:13 | nodejs | /usr/bin/npm | <RECENT-CTIME> |
454 | 2013-12-16 20:03:20 | 2013-12-16 20:05:04 | switchboard-plug-keyboard | /usr/lib/plugs/pantheon/keyboard/options.txt | <RECENT-CTIME> |
445 | 2013-12-16 20:03:20 | 2013-12-16 20:05:04 | thunderbird-locale-en | /usr/lib/thunderbird-addons/extensions/langpac... | <RECENT-CTIME> |
396 | 2013-12-16 20:08:27 | 2013-12-16 20:05:03 | software-center | /usr/sbin/update-software-center | <RECENT-CTIME> |
449 | 2013-12-16 20:03:20 | 2013-12-16 20:05:00 | samba-common-bin | /usr/bin/net.samba3 | <RECENT-CTIME> |
397 | 2013-12-16 20:08:25 | 2013-12-16 20:04:59 | postgresql-client-9.1 | /usr/lib/postgresql/9.1/bin/psql | <RECENT-CTIME> |
398 | 2013-12-16 20:08:23 | 2013-12-16 20:04:58 | postgresql-9.1 | /usr/lib/postgresql/9.1/bin/postmaster | <RECENT-CTIME> |
452 | 2013-12-16 20:03:20 | 2013-12-16 20:04:55 | php5-dev | /usr/include/php5/main/snprintf.h | <RECENT-CTIME> |
440 | 2013-12-16 20:03:20 | 2013-12-16 20:04:54 | php-pear | /usr/share/php/XML/Util.php | <RECENT-CTIME> |