python - Regex text between two strings -


i trying extract data fields pdf texts using regex.

the text is:

"sample experian customer\n2288150 - experian sample reports\ndata dictionary report\nfiltered by:\ncustom selection\nmarketing element:\npage 1 of 284\n2014-11-11 21:52:01 pm\nexperian , marks used herein service marks or registered trademarks of experian.\n© experian 2014 rights reserved. confidential , proprietary.\n**data dictionary**\ndate of birth acquired public , proprietary files. these sources provide, @ minimum, year of birth; month provided available. exact date of birth @ various levels of detail available \n\n\n\n\n\nnote: records coded dob exclusive of estimated age (101e)\n**element number**\n0100\ndescription\ndate of birth / exact age\n**data dictionary**\n\n\n\n\n\n\n\n\n\n\nfiller, 3 bytes\n**element number**\n0000\n**description**\nenhancement mandatory append\n**data dictionary**\n\n\nwhen there insufficient data match customer's record our enrichment master estimated age, median estimated age based on ages of other adult individuals in same zip+4 area provided. \n\n\n\n\n\n\n00 = unknown\n**element number**\n0101e\n**description**\nestimated age\n"

the field names in bold. texts between field names field values.

the first time tried extract 'description' field using following regex:

pattern = re.compile('\ndescription\n(.*?)\ndata dictionary\n') re.findall(pattern,text) 

the results correct:

['date of birth / exact age', 'enhancement mandatory append'] 

but using same idea extract 'data dictionary' field gives empty result:

pattern = re.compile('\ndata dictionary\n(.*?)\nelement number\n') re.findall(pattern,text) 

results:

[] 

any idea why?

. doesn't match newlines default. try:

pattern = re.compile('\ndata dictionary\n(.*?)\nelement number\n', flags=re.dotall) re.findall(pattern,text) 

notice how passed re.dotall flags argument re.compile.


Comments

Popular posts from this blog

yii2 - Yii 2 Running a Cron in the basic template -

asp.net - 'System.Web.HttpContext' does not contain a definition for 'GetOwinContext' Mystery -

mercurial graft feature, can it copy? -