Why do so many of my imports fail?

You may wonder why a lot of the urls you try to import fail?

This can have several reasons: 

  1. It requests URLs that are behind a login wall and you are not logged in to that particular service at the moment.
  2. Services block your requests temporarily because you try to fetch a lot of them (e.g. lots of google results)
  3. The page is not reachable anymore

How to (partially) solve this: 

Especially for reason 2 you can retry the download

  1. Go to "advanced settings" which you find next to the "start import" button
  2. Check "include previously failed urls"
  3. Re-run the import
Was this article helpful?
0 out of 0 found this helpful

1 reply

Another common reason for failure is for JS-only pages that don't contain any static content (like in the HTML), or very little. The importer can't run JS, it simply fetches the HTML page pointed to by a given URL and processes that. If the importer encounters a page without any content, it fails it rather than creating an unsearchable page.

The annoying case is pages that contain a little content, so they pass the content check but you end up with not-very-useful pages indexed (as in, they won't appear in many results).