Thanks to @email@example.com (and I think I recall someone else being involved), a new Odysseus release is coming out soon support a “reader mode”.
I find it rediculous I feel need to support this feature, it’s saying “webdevs are doing such a poor job that I need to offer to clear away their mess!”
In celebration of this I will describe how this code (the same as used in Firefox and Pocket) works.
That isit looks through any visible <p>s (not in an <li>), <pre>s, or
-containing <div>s with more than 140 characters, discards some depending on class names, and sums the square roots of any remaining character count.
If so it sends a message to the UI telling it to show the button offering a reader mode.
The first layer uses document.write() to drop the existing markup from the page. Then in addition to the page’s text, it adds back in it’s extracted title and byline. It also computes an estimated reading time @ 200 words-per-minute, before removing any attribute besides “src” and “href” and annotating the page with a theme class.
The next layer might (if configured to do so) consider giving up if there’s too many elements on the page.
Then it removes all
From there it examines the metatags for useful information (filling in any missing excerpts), and after layer 3 it (unnecessarily here) makes links absolute and removes any classes.
Layer 3 considers elements which:
- is not marked hidden
- doesn’t look like a byline/the author name (those are rendered seperately)
- (optionally) based on the absence/presence of certain classes, unless it’s in a <table>
- is a <section>, <h2+>, <p>, <td>, or <pre>
- inline-containing <div>s (rewritten to <p>s)
- or <div> containing a single <p> (as on mobile.slate.com)
- and has more than 25 characters
From there it scores each of those elements by number of:
- characters by the hundred (up to 3)
These scores also count towards the parent elements, but scaled by depth (especially beyond depth 2)
From there it scales these scores by how much of the text is in links (a likely indicator of navigation) and tracks only the top 5 candidates.
Then it looks to see if it captures more useful text by looking at ancestors and/or siblings.