English language parser for retext producing nlcst nodes.

Install

This package is ESM only: Node 12+ is needed to use it and it must be import ed instead of require d.

npm:

npm install parse-english

Use

import inspect from 'unist-util-inspect' import {ParseEnglish} from 'parse-english' var tree = new ParseEnglish().parse( 'Mr. Henry Brown: A hapless but friendly City of London worker.' ) console .log(inspect(tree))

Yields:

RootNode[1] (1:1-1:63, 0-62) └─ ParagraphNode[1] (1:1-1:63, 0-62) └─ SentenceNode[23] (1:1-1:63, 0-62) ├─ WordNode[2] (1:1-1:4, 0-3) │ ├─ TextNode: "Mr" (1:1-1:3, 0-2) │ └─ PunctuationNode: "." (1:3-1:4, 2-3) ├─ WhiteSpaceNode: " " (1:4-1:5, 3-4) ├─ WordNode[1] (1:5-1:10, 4-9) │ └─ TextNode: "Henry" (1:5-1:10, 4-9) ├─ WhiteSpaceNode: " " (1:10-1:11, 9-10) ├─ WordNode[1] (1:11-1:16, 10-15) │ └─ TextNode: "Brown" (1:11-1:16, 10-15) ├─ PunctuationNode: ":" (1:16-1:17, 15-16) ├─ WhiteSpaceNode: " " (1:17-1:18, 16-17) ├─ WordNode[1] (1:18-1:19, 17-18) │ └─ TextNode: "A" (1:18-1:19, 17-18) ├─ WhiteSpaceNode: " " (1:19-1:20, 18-19) ├─ WordNode[1] (1:20-1:27, 19-26) │ └─ TextNode: "hapless" (1:20-1:27, 19-26) ├─ WhiteSpaceNode: " " (1:27-1:28, 26-27) ├─ WordNode[1] (1:28-1:31, 27-30) │ └─ TextNode: "but" (1:28-1:31, 27-30) ├─ WhiteSpaceNode: " " (1:31-1:32, 30-31) ├─ WordNode[1] (1:32-1:40, 31-39) │ └─ TextNode: "friendly" (1:32-1:40, 31-39) ├─ WhiteSpaceNode: " " (1:40-1:41, 39-40) ├─ WordNode[1] (1:41-1:45, 40-44) │ └─ TextNode: "City" (1:41-1:45, 40-44) ├─ WhiteSpaceNode: " " (1:45-1:46, 44-45) ├─ WordNode[1] (1:46-1:48, 45-47) │ └─ TextNode: "of" (1:46-1:48, 45-47) ├─ WhiteSpaceNode: " " (1:48-1:49, 47-48) ├─ WordNode[1] (1:49-1:55, 48-54) │ └─ TextNode: "London" (1:49-1:55, 48-54) ├─ WhiteSpaceNode: " " (1:55-1:56, 54-55) ├─ WordNode[1] (1:56-1:62, 55-61) │ └─ TextNode: "worker" (1:56-1:62, 55-61) └─ PunctuationNode: "." (1:62-1:63, 61-62)

API

This package exports the following identifiers: ParseEnglish . There is no default export.

parse-english has the same API as parse-latin .

Algorithm

All of parse-latin is included, and the following support for the English natural language:

Unit abbreviations ( tsp. , tbsp. , oz. , ft. , and more)

, , , , and more) Time references ( sec. , min. , tues. , thu. , feb. , and more)

, , , , , and more) Business Abbreviations ( Inc. and Ltd. )

and ) Social titles ( Mr. , Mmes. , Sr. , and more)

, , , and more) Rank and academic titles ( Dr. , Rep. , Gen. , Prof. , Pres. , and more)

, , , , , and more) Geographical abbreviations ( Ave. , Blvd. , Ft. , Hwy. , and more)

, , , , and more) American state abbreviations ( Ala. , Minn. , La. , Tex. , and more)

, , , , and more) Canadian province abbreviations ( Alta. , Qué. , Yuk. , and more)

, , , and more) English county abbreviations ( Beds. , Leics. , Shrops. , and more)

, , , and more) Common elision (omission of letters) ( ’n’ , ’o , ’em , ’twas , ’80s , and more)

License

MIT © Titus Wormer