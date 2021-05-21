openbase logo
fathom-web

by mozilla
3.7.3 (see all)

A framework for extracting meaning from web pages

Popularity

Downloads/wk

6

GitHub Stars

1.9K

Maintenance

Last Commit

9mos ago

Contributors

21

Package

Dependencies

1

License

MPL-2.0

Type Definitions

DefinitelyTyped

Tree-Shakeable

No?

Readme

Fathom

Fathom is a supervised-learning system for recognizing parts of web pages—pop-ups, address forms, slideshows—or for classifying a page as a whole. A DOM flows in one side, and DOM nodes flow out the other, tagged with types and probabilities that those types are correct. A Prolog-like language makes it straightforward to specify the “smells” that suggest each type, and a neural-net-based trainer determines the optimal contribution of each smell. Finally, the FathomFox web extension lets you collect and label a corpus of web pages for training.

Continue reading at https://mozilla.github.io/fathom/intro.html#why.

