Best Reply

Hey Mikula, 

I think there is nothing to worry about here, as we only store text data. 
Means that the importer calls this url, strips the HTML code from all its stuff except text and stores it. 
Likely it produced no data at all, if it is just a zip file. 

Hope that helps.
Oli

3 replies

Not sure if I understand? So you want to have the metadata of those 3? 

no, i meant that those 3 types of metadata should be collected from the pages :)

the ones that are in the <head>

Ah now I see. 

What is the purpose you had in mind for storing them? How would that improve your search-ability? 
With which query in your head do you want to search that you can't do right now?

one example is the schema author, it will be used in the metadata from the CMS to populate the author name tagged properly such as below, so i can search for the author by tag maybe?

tag:author(peter hamilton)

itemprop="author">Peter F. Hamilton

Hey Mikula, 

I think there is nothing to worry about here, as we only store text data. 
Means that the importer calls this url, strips the HTML code from all its stuff except text and stores it. 
Likely it produced no data at all, if it is just a zip file. 

Hope that helps.
Oli

But does it download all those files first before discarding them?

On some pages that link many large files, like OP's software repository, this can put undue strain on users' data bandwidth.

No, it doesn't download them.
They are also just linked on the page you mentioned, not the urls themselves.
We generally skip .zip & .pngs urls in the process. 

This is the pure text content it stored about that page. However some things might have been removed and are not searchable due to indexing cleaning, like "zl1".

"LineageOS Downloads

Devices
Asus
BQ
Fairphone
Google
HTC
Huawei
LeEco
Le 2
s2
Le Max 2
x2
Le Pro 3
zl1
Lenovo
LG
Motorola
Nextbit
Nubia
Nvidia
OnePlus
OPPO
Samsung
Sony
Wileyfox
Xiaomi
YU
ZTE
Zuk
Extras
Builds for zl1
Recent changes • Device info • Installation instructions
Type Version File Size Date
nightly 14.1 lineage-14.1-20180210-nightly-zl1-signed.zip
sha256 472.99 MB 2018-02-10
nightly 14.1 lineage-14.1-20180203-nightly-zl1-signed.zip
sha256 473.09 MB 2018-02-03
nightly 14.1 lineage-14.1-20180127-nightly-zl1-signed.zip
sha256 472.82 MB 2018-01-27
nightly 14.1 lineage-14.1-20180120-nightly-zl1-signed.zip
sha256 472.77 MB 2018-01-20
You can verify a file has not been tampered with by checking its signature. More information on how to do this can be found here.

© 2017-2018 LineageOS. Licensed under Apache 2.0. Source""

oooor let us skip certain downloads with a button "SKIP" while they happen as seen in the screenshot

does it also crawl the meta data?

No, as for now we only store the visible text. 

What metadata would you like to search with?