java - Selecting innermost child of an element Jsoup -
i attempting scrape following html:
<table> <tr> <td class="cellright" style="cursor:pointer;"> <table cellpadding="0" cellspacing="0" width="100%"> <tr> <td class="cellright" style="border:0;color:#0066cc;" title="view summary" width="70%">92%</td> <td class="cellright" style="border:0;" width="30%"> </td> </tr> </table> </td> </tr> <tr class="listroweven"> <td class="cellleft" nowrap><span class="categorytab" onclick= "showassignmentsbympandcourse('08/03/2015','58100:6');" title= "display assignments art 5 ms. martinho"><span style= "text-decoration: underline">58100/6 - art 5 ms. martinho</span></span></td> <td class="cellleft" nowrap> martinho, suzette<br> <b>email:</b> <a href="mailto:smartinho@mtsd.us" style= "text-decoration:none"><img alt="" border="0" src= "/genesis/images/labelicon.png" title= "send e-mail teacher"></a> </td> <td class="cellright" onclick= "window.location.href = '/genesis/parents?tab1=studentdata&tab2=gradebook&tab3=coursesummary&studentid=100916&action=form&coursecode=58100&coursesection=6&mp=mp4';" style="cursor:pointer;"> <table cellpadding="0" cellspacing="0" width="100%"> <tr> <td class="cellcenter"><span style= "font-style:italic;color:brown;font-size: 8pt;">no grades</span></td> </tr> </table> </td> </tr> <tr class="listrowodd"> <td class="cellleft" nowrap><span class="categorytab" onclick= "showassignmentsbympandcourse('08/03/2015','58200:10');" title= "display assignments family , consumer sciences 5 sheerin"> <span style="text-decoration: underline">58200/10 - family , consumer sciences 5 sheerin</span></span></td> <td class="cellleft" nowrap> sheerin, susan<br> <b>email:</b> <a href="mailto:ssheerin@mtsd.us" style= "text-decoration:none"><img alt="" border="0" src= "/genesis/images/labelicon.png" title= "send e-mail teacher"></a> </td> <td class="cellright" style="cursor:pointer;"> <table cellpadding="0" cellspacing="0" width="100%"> <tr> <td class="cellcenter"><span style= "font-style:italic;color:brown;font-size: 8pt;">no grades</span></td> </tr> </table> </td> </tr> </table>
i trying extract values student's grades, , if no grades present, value "no grades" present in html if case. however, when select request such following:
doc.select("[class=cellright]")
i output of grade values listed twice (because nested within 2 elements containing [class=cellright] distinguisher, , normal amount of "no grades" listing. question is, how can select innermost child in document contains distinguisher [class=cellright]? (i have dealt issue of blank value) appreciated!!
there many possibilities to this.
one this: test each "cellright" element parents if carry class. discard if find it:
list<element> keeplist = new arraylist<>(); elements els = doc.select(".cellright"); (element el : els){ boolean keep = true; (element parentel : el.parents()){ if (parentel.hasclass("cellright")){ //parent has class -> discard! keep = false; break; } } if (keep){ keeplist.add(el); } } //keeplist contains inner elements class
note written without compiler , out of head. there might spelling/syntax errors.
other note. use of "[class=cellright]"
works if there single class. multiple clsses in random order (which totally expected) better use dot syntax ".cellright"
Comments
Post a Comment