Accepting URL Input for API

Userlevel 1

So on the developer demo one can input a URL, for example a media article, and the demo would scrape the relevant text from the URL and return the analysis. From what I see in the API docs only the actual text should be submitted - am I missing something or do I need to scrape the text from the URL myself?

When I send a URL I get empty results in the return. In the developer demo this does work and scrapes the URL successfully.


Best answer by bmunz 24 May 2021, 19:31

View original

This topic has been closed for comments

3 replies

Userlevel 1

So it seems that relevant text in media articles are all in `<p>` tags so it shouldn’t be too difficult to scrape myself. e.g.

Userlevel 4
Badge +2

Hi @BigDataDave ,

While the NL API does not scrape web pages out of box, there are many open source options that do a great job scraping.  In our demo that you mentioned, we use Unfluff, but there are other good ones depending on your programming language, such as autoscraper (Python), Goutte (PHP), and x-ray (JS).

Userlevel 1

Ah, thanks! Sounds much better than using Selenium lol. I will check those out.