1
Solved

Import foundt only 1500 of my approx 6000 history entries (on Chrome)

Of those, approx 400 failed. But that does not seem to be the reason why I hardly can find any content.

Any other mechanism I could try to get the content of the visited webpages into the knowledge base?

4 replies

Yeah, okay. The failing imports seemed to consist of google searches mainly. Of course, the optimum would be if all the words of the results pages of my google searches would be indexed, but if that fails regularly, I could live with it (do not really like it, though).

Another large group of failing entries are university library searches, but this seems to be normal, as I cannot reproduce the results of those queries from my history, neither. The library is almost only the library of Frankfurt university.

But still, memex does not even try to import large amounts of web pages I visited. I guess it must be quite many wikipedia pages, as searching for some test words from their content produced quite a few pages from the history not foundt.

Of course, if I visit the mentioned wikipedia page - or another web site - now, I can find words from the content with the memex search. But for that to be useful, I would have to browse to all of the missing web pages to get their content to be indexed.

I already considered how to get that done automatically, but of couse, best would be a complete import of the history.

The import process deduplicates a lot of things that have querystrings (urls with an '?' in it). 

Especially Google searches. Also Google Maps urls are not taken in by default. Keeping in mind here, that every zoom in maps produces another url. 
So it may well be that you have a lot of urls coming from that. 
What happens if you remove the Google maps in blacklists and then let it recalculate? 

Also, when indexing google pages, it happens that google detects to many requests and blocks yours then, or it is an image search.
What kind of library searches?

The 'Felix Adler' issue is weird/concerning. 
You are talking about this one?
https://en.wikipedia.org/wiki/Felix_Adler_(professor) 
I tried it with my installation and it worked for me. 
Sure its not under the 26 not yet downloaded ones?

I use Windows 10 64 and Chrome, Version 65.0.3325.162 (Official Build, 64 bit).

I for example searched for "adler", knowing that I had visited the wikipedia page about "Felix Adler, Professor" in January (which is proven as true by a history search). Nevertheless, the page is not foundt.

It seems that the extension is not capable of indexing google search pages, so most failing pages are google pages, as well as library searches.

Already triggered a recount by adding and removing an item in blacklist. Issue persists: "Pages already saved - 1604. Not yet downloaded - 26". History statistics extension says approx 6.000 pages were visited. 

What's the value in indexing Google search pages? Indexing a search result seems wasteful and redundant.

UGH. that is annoying. Sorry for the inconvenience. 

Can I ask you a couple of questions to zoom in on that bug?

- Which operating system are you using and what browser (incl. its version)
- What are the terms you are searching and not finding, and which are the pages you'd expect to find?
- Which urls fail for you and what are the error messages you get? (you can easily highlight and copy paste the table in the details section of the downloads)

Quick fix for the wrong counter:

1) Try to add a new item to the blacklist (and remove it again).
2) Go back to the import page, it should have triggered a recount. 
3) I'd appreciate a feedback of what happened then, as it will help to further zoom in on the bug. 

Cheers and have a good day!
Oli