FAQ Database Discussion Community

SgmlLinkExtractor not displaying results or following link

I am having problems fully understanding how SGML Link Extractor works. When making a crawler with Scrapy, I can successfully extract data from links using specific URLS. The problem is using Rules to follow a next page link in a particular URL. I think the problem lies in the allow()...

Are there valid cases in HTML/XML where tags would not be fully contained?

I think in XML and HTML that having cross-scoped tags is not allowed. Maybe SGML allows it. In XML/HTML though, are there any valid and allowed cases where this can occur? Something like: <p>This is <i>some <b>example</i> text</b> right here!</p> Which would likely generate output like: "This is some example...

Are parameter entity references in sgml/xml parsible using .NET?

When I try and parse the data below with XDocument I am getting the following error: "XMLException: A parameter entity reference is not allowed in internal markup" Here is an example data that I am trying to parse: <!DOCTYPE sgml [ <!ELEMENT sgml ANY> <!ENTITY % std "standard SGML"> <!ENTITY...