|
1 | | -# This module was meant to be a proof of concept. You will probably run into a bunch of cors issues using this since it's making the requests from the browser. Using a reverse proxy libraries like https://github.com/Rob--W/cors-anywhere might help solve the problem, but you are probably better off making the requests in a backend service then pass in the html into openGraphScraper. |
2 | | - |
3 | 1 | # openGraphScraperLite |
4 | 2 |
|
5 | 3 | [](https://github.com/jshemas/openGraphScraperLite/actions?query=branch%3Amaster) |
6 | 4 | [](https://snyk.io/test/github/jshemas/openGraphScraperLite) |
7 | 5 |
|
8 | | -A simple javascript module for scraping Open Graph and Twitter Card info off a site. For Node.js usage, we recommend `open-graph-scraper` by the same people. |
| 6 | +A simple javascript module for scraping Open Graph and Twitter Card info from given HTML. For Node.js usage, we recommend `open-graph-scraper` by the same people and can do HTTP requests. |
9 | 7 |
|
10 | 8 | ## Installation |
11 | 9 |
|
12 | 10 | ```bash |
13 | | -npm install open-graph-scraper-lite |
| 11 | +npm install open-graph-scraper-lite --save |
14 | 12 | ``` |
15 | 13 |
|
16 | 14 | ## Usage |
17 | 15 |
|
18 | | -Callback Example: |
19 | | -```javascript |
20 | | -const ogs = require('open-graph-scraper-lite'); |
21 | | -const options = { url: 'http://ogp.me/' }; |
22 | | -ogs(options, (error, results, response) => { |
23 | | - console.log('error:', error); // This is returns true or false. True if there was a error. The error it self is inside the results object. |
24 | | - console.log('results:', results); // This contains all of the Open Graph results |
25 | | - console.log('response:', response); // This contains the HTML of page |
26 | | -}); |
27 | | -``` |
28 | | - |
29 | | -Promise Example: |
30 | 16 | ```javascript |
31 | | -const ogs = require('open-graph-scraper-lite'); |
32 | | -const options = { url: 'http://ogp.me/' }; |
| 17 | +const ogs = require('open-graph-scraper'); |
| 18 | +const options = { |
| 19 | + html: `<html><head> |
| 20 | + <link rel="icon" type="image/png" href="https://bar.com/foo.png" /> |
| 21 | + <meta charset="utf-8" /> |
| 22 | + <meta property="og:description" name="og:description" content="html description example" /> |
| 23 | + <meta property="og:image" name="og:image" content="https://www.foo.com/bar.jpg" /> |
| 24 | + <meta property="og:title" name="og:title" content="foobar" /> |
| 25 | + <meta property="og:type" name="og:type" content="website" /> |
| 26 | + </head></html>` |
| 27 | +}; |
33 | 28 | ogs(options) |
34 | 29 | .then((data) => { |
35 | | - const { error, result, response } = data; |
36 | | - console.log('error:', error); // This is returns true or false. True if there was a error. The error it self is inside the results object. |
37 | | - console.log('result:', result); // This contains all of the Open Graph results |
38 | | - console.log('response:', response); // This contains the HTML of page |
| 30 | + const { result } = data; |
| 31 | + console.log('result:', result); |
39 | 32 | }) |
40 | 33 | ``` |
41 | 34 |
|
42 | 35 | ## Results JSON |
43 | 36 |
|
44 | | -Check the return for a ```success``` flag. If success is set to true, then the url input was valid. Otherwise it will be set to false. The above example will return something like... |
45 | 37 | ```javascript |
46 | | -{ |
47 | | - ogTitle: 'Open Graph protocol', |
| 38 | +result: { |
| 39 | + ogDescription: 'html description example', |
| 40 | + ogTitle: 'foobar', |
48 | 41 | ogType: 'website', |
49 | | - ogUrl: 'http://ogp.me/', |
50 | | - ogDescription: 'The Open Graph protocol enables any web page to become a rich object in a social graph.', |
51 | | - ogImage: { |
52 | | - url: 'http://ogp.me/logo.png', |
53 | | - width: '300', |
54 | | - height: '300', |
55 | | - type: 'image/png' |
56 | | - }, |
57 | | - requestUrl: 'http://ogp.me/', |
| 42 | + ogImage: [ { url: 'https://www.foo.com/bar.jpg', type: 'jpg' } ], |
| 43 | + favicon: 'https://bar.com/foo.png', |
| 44 | + charset: 'utf-8', |
58 | 45 | success: true |
59 | 46 | } |
60 | 47 | ``` |
61 | 48 |
|
62 | 49 | ## Options |
| 50 | + |
63 | 51 | | Name | Info | Default Value | Required | |
64 | 52 | |----------------------|----------------------------------------------------------------------------|---------------|----------| |
65 | | -| url | URL of the site. | | x | |
66 | | -| timeout | Timeout of the request | 2000 ms | | |
67 | | -| html | You can pass in an HTML string to run ogs on it. (use without options.url) | | | |
68 | | -| blacklist | Pass in an array of sites you don't want ogs to run on. | [] | | |
| 53 | +| html | You can pass in an HTML string to run ogs on it. (use without options.url) | x | | |
69 | 54 | | onlyGetOpenGraphInfo | Only fetch open graph info and don't fall back on anything else. | false | | |
70 | | -| ogImageFallback | Fetch other images if no open graph ones are found. | true | | |
71 | 55 | | customMetaTags | Here you can define custom meta tags you want to scrape. | [] | | |
72 | | -| allMedia | By default, OGS will only send back the first image/video it finds | false | | |
73 | | -| retry | Number of times ogs will retry the request. | 2 | | |
74 | | -| headers | An object containing request headers. Useful for setting the user-agent | {} | | |
75 | | -| peekSize | Sets the peekSize for the request | 1024 | | |
76 | | -| urlValidatorSettings | Sets the options used by validator.js for testing the URL | [Here](https://github.com/jshemas/openGraphScraper/blob/master/lib/openGraphScraper.js#L21-L36) | | |
77 | 56 |
|
78 | | -Note: `open-graph-scraper-lite` uses [ky](https://github.com/sindresorhus/ky) for requests and most of [ky's options](https://github.com/sindresorhus/ky#api) should work as `open-graph-scraper-lite` options. |
| 57 | +## Custom Meta Tag Example |
79 | 58 |
|
80 | | -Custom Meta Tag Example: |
81 | 59 | ```javascript |
82 | | -const ogs = require('open-graph-scraper-lite'); |
| 60 | +const ogs = require('open-graph-scraper'); |
83 | 61 | const options = { |
84 | | - url: 'https://github.com/jshemas/openGraphScraper', |
| 62 | + html: `<html><head> |
| 63 | + <link rel="icon" type="image/png" href="https://bar.com/foo.png" /> |
| 64 | + <meta charset="utf-8" /> |
| 65 | + <meta property="og:description" name="og:description" content="html description example" /> |
| 66 | + <meta property="og:image" name="og:image" content="https://www.foo.com/bar.jpg" /> |
| 67 | + <meta property="og:title" name="og:title" content="foobar" /> |
| 68 | + <meta property="og:type" name="og:type" content="website" /> |
| 69 | + <meta name="hostname" content="github.com"> |
| 70 | + </head></html>`, |
85 | 71 | customMetaTags: [{ |
86 | | - multiple: false, // is there more then one of these tags on a page (normally this is false) |
| 72 | + multiple: false, // is there more than one of these tags on a page (normally this is false) |
87 | 73 | property: 'hostname', // meta tag name/property attribute |
88 | 74 | fieldName: 'hostnameMetaTag', // name of the result variable |
89 | 75 | }], |
90 | 76 | }; |
91 | 77 | ogs(options) |
92 | 78 | .then((data) => { |
93 | | - const { error, result, response } = data; |
94 | | - console.log('hostnameMetaTag:', result.hostnameMetaTag); // hostnameMetaTag: github.com |
| 79 | + const { result } = data; |
| 80 | + console.log('hostnameMetaTag:', result.customMetaTags.hostnameMetaTag); // hostnameMetaTag: github.com |
95 | 81 | }) |
96 | 82 | ``` |
97 | | - |
98 | | -## Tests |
99 | | - |
100 | | -Then you can run the tests by running... |
101 | | -```bash |
102 | | -npm run test |
103 | | -``` |
0 commit comments