天池之处理数据

01 Dec 2017

Reading time ~5 minutes

本文目录🧾

一、数据集
二、测试集
三、导入数据
四、获得干净的所需数据
五、得到数据集中各个商品所对应推荐商品
六、拆分为两块以便预测
七、导入测试集进行预测
八、完整代码

正文部分📝

一、数据集

搭配套餐id 商品列表

二、测试集

用户购买商品

三、导入数据

myFile=open('D:\四年记\天池\Taobao Clothes Matching Data\dim_fashion_matchsets（new).txt',
'r',encoding='utf-8')

四、获得干净的所需数据

因为数据集第一列对预测没有任何影响，所以需要去掉。第二列用分号来分割类别，介于这里无需考虑类别，所以就只需要将这些分号逗号去掉。

value=[]
for line in myFile:
    value.append(re.split(r'[;,\s]\s*', str(line))[1:-1])

处理结果:

五、得到数据集中各个商品所对应推荐商品

为了让测试集获得推荐结果，我们就需要将数据集里的各个商品一一对应，这样已经一个商品我们就可以获得对应在搭配套餐中的其他商品。

totalValue=[]
for line in value:
    totalValue.append(list(permutations(line, 2)))

处理结果:

六、拆分为两块以便预测

得到拆分的数据集的第一列第二列，为预测做准备。

train=[]
col1=[]
col2=[]
for line in totalValue:
    for i in line:
        train.append(i)
        col1.append(i[0])
        col2.append(i[1])

处理结果：

七、导入测试集进行预测

当测试集的值等于col1的值，即推荐col2的值，并用flagMat存每一个item的预测结果。

testFile = csv.reader(open(r'C:\Users\烫烫\Desktop\test_items（new).csv'))
outcome=[]
for i in testFile:
    flag=0
    flagMat=[]
    flagMat.append(i[0])
    index=i
    for j in col1:
        if(i[0]==j):
            flagMat.append(col2[flag])
        flag=flag+1
    outcome.append(flagMat)
f1 = open("outcome.csv", "wb")
pickle.dump(outcome, f1)
f1.close()

八、完整代码

from itertools import permutations
import re
import csv
import pickle
myFile=open('D:\四年记\天池\Taobao Clothes Matching Data\dim_fashion_matchsets（new).txt',
'r',encoding='utf-8')
value=[]
totalValue=[]
train=[]
col1=[]
col2=[]
for line in myFile:
    value.append(re.split(r'[;,\s]\s*', str(line))[1:-1])
for line in value:
    totalValue.append(list(permutations(line, 2)))
for line in totalValue:
    for i in line:
        train.append(i)
        col1.append(i[0])
        col2.append(i[1])
testFile = csv.reader(open(r'C:\Users\烫烫\Desktop\test_items（new).csv'))
outcome=[]
for i in testFile:
    flag=0
    flagMat=[]
    flagMat.append(i[0])
    index=i
    for j in col1:
        if(i[0]==j):
            flagMat.append(col2[flag])
        flag=flag+1
    outcome.append(flagMat)
f1 = open("outcome.csv", "wb")
pickle.dump(outcome, f1)
f1.close()

data science