python - Writer.add_document() function error Whoosh - mysql loop -
i'm trying index large amount of articles db encoded in latin1. i've solved encoding issue charset, not able add index each row.
i've tried : 1)
writer.add_document(id = unicode(row["id"]),body = unicode(row["body"]), name = unicode(row["name"]), brand = unicode(row["brand"]), familia = unicode(row["familia"]))
this indexes docs not respects index labels.
2)
writer.add_document(doc)
this reports add_document() takes 1 argument (2 given) error
here full code:
# open writer index ix.writer() writer: con= mdb.connect(host="myhost", user="myuser", passwd="pass", db="db", charset="utf8", use_unicode=true) con: cur = con.cursor(mdb.cursors.dictcursor) #cur.execute("select id, body, name, brand, familia articles") rows = cur.fetchall() row in rows: print row doc6 = row["brand"] doc2 = row["name"] print doc2 print 'body' doc3 = row["body"].replace("á", "a") doc3 = doc3.replace("é", "e") doc3 = doc3.replace("í", "i") doc3 = doc3.replace("ó", "o") doc3 = doc3.replace("ú", "u") doc3 = doc3.replace("ñ", "n") doc3 = doc3.replace(""", "") print doc3 print 'familia' doc4 = row["familia"] print doc4 print 'id' doc5 = row["id"] print doc5 writer.add_document(id = unicode(row["id"]),body = unicode(row["body"]), name = unicode(row["name"]), brand = unicode(row["brand"]), familia = unicode(row["familia"])) # # doc = unicode(doc5),unicode(doc3), unicode(doc2), unicode(doc6), unicode(doc4) # writer.add_document(doc) #reports add_document() takes 1 argument (2 given) error #writer.add_document(id = unicode(doc5),body = unicode(doc3), name = unicode(doc2), brand = unicode(doc6), familia = unicode(doc4)) numdocs = ix.doc_count_all() print "docs indexed =", numdocs
thank in advance!
solved way:
with con: cur = con.cursor(mdb.cursors.dictcursor) #cur.execute("select id, body, name, brand, familia articles") rows = cur.fetchall() row in rows: #print row row["body"]= row["body"].replace("á", "a") row["body"]= row["body"].replace("é", "e") row["body"]= row["body"].replace("í", "i") row["body"]= row["body"].replace("ó", "o") row["body"]= row["body"].replace("ú", "u") row["body"]= row["body"].replace("ñ", "n") row["body"]= row["body"].replace(""", "") writer.add_document(id=unicode(row["id"]), body=unicode(row["body"]), name=unicode(row["name"]), brand=unicode(row["brand"]), familia=unicode(row["familia"]), relevancia=row["relevancia"]) numdocs = ix.doc_count_all() print "docs indexed =", numdocs
special thank whoosh team patient , kindly solve doubts.
Comments
Post a Comment