318 private links
Cache-Control can set multiple directives:
publicandprivatemax-agedefines the amount of time until which the client can consider the response "fresh"must-revalidateindicates the HTTP cache should not reuse stale responses when they aer disconnected from the origin serverno-storedisable the cache for the requestno-cachemeans ‘do not serve a copy from cache until you’ve revalidated it with the server and the server said you can use the cached copy’.no-cachewill always hit the network as it has to revalidate with the server before it can release the browser’s cached copy. It will always hit at least an HTTP header responsemust-revalidateneeds an associatedmax-agedirective at which time the browser will revalidate.immutableavoid revalidationstale-while-revalidateprovides is a grace period (defined by us in seconds) in which the browser is permitted to use an out of date (stale) asset while we’re checking for a newer version.stale-if-errorprovides a grace period if the server returns a 5xx error- I overlooked
s-maxage,proxy-revalidate,no-transform(useless for HTTPS) for proxies
Cache Busting strategies:
- no cache busting (dangerous) - style.css
- query string (does not work with proxies, i.e. Cloudflare) - style.css?v=1.2.14
- fingerprint - style.ae3f66.css
Fingerprinting is the best and allow the use of the immutable directive.
Not there is a new Clear-Site-Data: cache in case of need. Browser support is limited.
The post provide examples: Online Banking Page, Live Train Timetable Page, FAQs Page, Static JS (or CSS) App Bundle
To enforce accessibility.
Also can poor performance be framed as inaccessible?
These four ways are footguns, but they can be easily spotted in the codebase.
Le nom de code actuel est Bromo pour cette initiative
Une certaine bonne nouvelle
Plus de 70 Po (70 000 To) de données, 300 Po brutes
Mostly financed by the GAFAM und tech industry. The same industry that harvest our personal data.
Les appareils Apple saturent le réseau pour la technologie AirDrop. Les appareils sautent de canaux toutes les deux secondes.
Octothorpes are hashtags and backlinks that can be used on regular websites, connecting pages across the open internet regardless of where they're hosted.
I lost all the notes but I rewrite them again from this blog post. The author focus is towards english search engines.
The Common Crawl can be used by search engine that does not own an index, or enrich it. The dominant Google, Bing and Yandex search engines are also noted GBY.
General indexing search engines
- Google: the biggest index. Powers other search engines: - A former version of Startpage, GMX Search, run by a popular German email provider, Mullvad Leta, SAPO (Portuguese UI), DSearch, 13TABS, Zarebin (Persian), Ecosia, a host of other engines using Programmable Search Engine’s client-side scripts.
- Bing powers many indexees: Yahoo, DuckDuckGo, AOL, Qwant, Ekoru, Privado,, Findx, Disconnect search, Lilo, ...
- Yandex: a russian search engine with
- Mojeek: privavy oriented with billions of pages.
Smaller indexes or less relevant results
Stract an OSS project
Right Dao very fast with good results. Focus on large established sites rather than smaller, independent ones.
Alexandria is a non-profit, add free engine. Built from the Common Crawl.
Yep also shows results linked by pages containing the query. In other words, not all results contain relevant keywords. This makes it excellent for less precise searches and discovery of “related sites”
SeSe Engine chinese engine. Good results for such a low-budget project.
greppr “Search the Internet with no filters, no tracking, no ads.”
Smaller indexes, hit-and-miss
Peekr: a searxNG metasearch engine that now returns results from its own growing ElasticSearch index. Self-hostable.
Seekport german UI. Small for its own small index.
ExactSeek disproportionately dominated by big sites. Webmaster tools seem to heavily push for paid SEO options.
Burf.co very small index.
ChatNoir: an experiemental OSS engine by researchers that uses teh Common Crawl index
Secret Search Engine Labs avoid spams.
Gabanza small index from a hosting company.
Jambo bias towards older content. Not updated since 2006.
search.dxhub.de an open source version of Gigablast.
Fynd
Fledging engines
Yessle
Bloopish
Artado Search Primarily Turkish
Active Search Results biased towards commercial sites
Crawlson index cap of 10 URLs per domain. Has some downtime.
Anoox vote on listings to alter rankings
Yioop! FLOSS search engine with an impressive feature set. Yioop’s results are few and irrelevant due to its small index. It allows submitting sites for crawling. Like Meorca, Yioop has social features such as blogs, wikis, and a chat bot API.
Spyda a small Go engine made by James Mills
Slzii.com a new web portal with a search engine. It has a tiny index dominated by SEO spam.
Weblog DataBase a metadata search engine for technical blogs. Small index and ranking seems poor, but it has different goals from most search engines. it encourages filtering search results iteratively until finding the desired subset of results.
Semi-independant indexes
Brave Search reuse Google and Bing search results. The company has its own history.
Plumb nearly returns no results and falls back to Google.
Qwant: own index but still relies on Bing for most results.
Kagi Search requires an account and limits use without payment. It has its own Teclis index. The company seems to use the Brave's commercial API.
PriEco a metasearch engine. Other sources can be turned off, but tis own index is quite tiny.
Non-generalist search
They’re trying to do something different. You aren’t supposed to use these engines the same way you use GBY.
Marginalia Search has its own crawler and is strongly biased towards non-commercial, personal, and/or minimal sites.
Ichido rolled out its own independent index with a lof of care to its ranking algorithm.. Biased towards the non-commercial web.
Teclis uses its own crawler that measures content blocked by uBlock Origin, and extracts content with the open-source article scrapers Trafilatura and Readability.js.
Clew new FOSS engine with a small index. It focuses on independent content. It seems to have a real focus on quality over quantity.
Lixia Labs Search indexes technical websites and blogs with minimal Javascript-free front-end.
Site finders
Kozmonavt 8 million sites. It lacks contact information, a privacy policy or any other information about the organisation.
search.tl limits searches to specific TLD. It seems to be connected to Amidalla.
Thunderstone a combined website catalog and search engine that focuses on categorization. "It is very good at finding companies and organizations by purpose, product, subject matter, or location."
sengine.info only shows domains. Made by netEstate GmbH
Gnomit allows single-keyword queries and returns sites that seems to cover related topic. The results are typically old (from 2009)
Other
High Browse introduce non-SEO-optimized serendipity into search results. Favorite surf-engines of the author.
Keybot crawls the web for multilangual sites. Parts of the TTN Translation Network.
Semantic Sholar by the Allen Institute for AI focused on academic PDFs
Bonzamate focuses on Australian websites.
Searchcode focuses on... code searching.
StarFinder focuses on Open Graph Protocol metadata
Other languages
Big Indexes
Baidu: Chinese. It's a major engine alogside the GBY.
Qihoo: Chinese. How idependant?
Toutiao: Chinese. The index seems limited outside of its own content distribution.
Sogou: Chinese.
Yisou: Chinese by Yahoo. Defunct.
Naver: Korean.
Daum: Korean.
Seznam: Czec, seems relatively privacy-friendly. It uses IndexNow.
Cốc Cốc: Vietnamese
go.mail.ru: Russian
LetSearch.ru:: Russian.
Smaller indexes
ALibw.com: Chinese.
Vuhuv: Turkish.
search.ch: regional search engines for Switzerland.
fastbot: german
SOLOFIELD: Japanese
kaz.kz: Kazakh and Russian
Almost qualified
These engines come close enough to passing my inclusion criteria that I felt I had to mention them. They all display original organic results that you can’t find on other engines, and maintain their own indexes. Unfortunately, they don’t quite pass because they don’t crawl the Web; most limit themselves to a specific set of sites.
Wiby.me focuses on smaller independent sites that capture the spirit of he "early" web. it's more focused on discovering new interesting pages. Great for surfing. It is also available via wiby.org.
Mwmbl is an open-source engine whose crawling is community driven. It crawls only pages from hand-picked sites. It allows users to contribute to crawls webpages in its index backlog via the Mwbl Donate firefox extension.
Search My Site indexes user-submitted personal and independent sites. It supports IndieAuth.
Kukei.eu is a curated search engine for web developers. It crawls a hand-picked sites.
Unobtanium Search is a fledgling search engine by Slatian. It crawls hand-curated sites: personal, technical, indie wiki, and German hacker community sites.
Infinity Search: is young and splits between a paid offer with the main index and Infinity Decentralized, a community-hosted crawlers.
Graveyard
Petal Search, Neeva, Gigablast, wbsrch, Gowiki, Meorca, Ninfex, Marlo, Entfer, Siik, Blog Surf, Infotiger
Rationale behind the post
Google, Bing and Yandex have conflicts of interest. They won't deliver the "best" of the web for the users. It's also important to get information diversity and most search engines' ranking algorithms incorporate a method similar to PageRank, which biases them towards sites with many backlinks.
The author also describes its methodology.
Findings
- Using one engine for everything ignores the fact that different engines have different strengths
- When talking to search engine founders, I found that the biggest obstacle to growing an index is getting blocked by sites.
- Too many people optimize sites specifically for Google without considering the long-term consequences of their actions. Almost non-GBY engines on this list are Javascript-aware.
- When building webpages, authors need to consider the barriers to entry for a new search engine.
- Try a “bad” engine from lower in the list. It might show you utter crap. But every garbage heap has an undiscovered treasure.
From Teclis: Using Trafilatura and Readability.js encourages the use of semantic HTML and Semantic Web standards such as microformats, microdata, and RDFa. It claims to also use some results from Marginalia. The Web interface has been shut down, but its standalone API is still available for Kagi customers.
Reinventing the wheel, but every time different. The author shares its experience.
Consider standards for it: they are powerful.
Some wheels I see that I think could use some new takes but which I don’t have the time/energy to do myself:
- Web browsers - probably the most significant. The browser market is essentially a monopoly right now. And Firefox is pretty much the only alternative option, somewhat of a monopoly in itself. We need to have many independent browser projects going on, not just an alternative.
- Higher education - this is probably too big a project for any one person, but I think there’s a lot of ground that needs new work and reevaluating in the world’s current higher education system.
- Task management - there are a lot of task management systems out there, but I think there’s still definitely room for more. I’m personally beginning to settle on a hybrid analog/digital task management system I’m designing myself.
Outside the grasp of social media nad the commercial web sits a broad community of people with personal websites and blogs. [...] The community has received many names:
- The Small Web contrasts this community with the “Big Web”, valuing personal ownership over scale.
- The IndieWeb also values personal ownership of websites, providing numerous technical standards and proposals to help facilitate interaction between different people’s blogs.
- Web 1.0 rejects the hype of “Web 2.0” apps, using simple, straightforward technologies to build websites.
- The Blogosphere is an old term that’s been around since 1999, referencing the community of bloggers.
- The Web Revival is the concept shared by many that this community has been growing and making a comeback.
This web relies on the hyperlinks.
There is the classic web Discovery with Blogrolls, Webrings and Feeds.
and search engines that are wonderful tools to find a specific thing, but they shouldn't be the only discovery tool, because they only show a subset of the available information.
That's why Clew highlights the small independent websites "to make discovering what real people think easier". Other search engines are doing this:
- Marginalia
- Unobtanium
- Stract
- Lieu focus on webrings.
- Mwmbl - curated by the users.
- Search My Site crawls user-submitted sites
- Wiby for websites using older technology, great for use on vintage computers.
- YaCy - a decentralized search engine
- PeARS - A search engine that can be run in the browser, without needing a server.
- Mojeek - an independent search engine
Another idea to bring back a healthier web is to provide blogrolls in the OPML format directly: https://opml.org/blogroll.opml.
Jamesg.blog created the Artemis Link Graph web extension. It lists the web pages authored by people you follow that link to the page you are viewing.
All of these has one limitation: much of the independent web today is made up of people with similar interests, in technology in particular.
Rather than helping you build a sitewide design, readable.css provides a base default that is both sensible and beautiful.
Scrolls is a weekly newsletter / link roundup / information digest at the intersection of the IndieWeb and the Fediverse, with a splash of Cybersecurity stuff. It is published on the web every Friday, completely free. Check out the latest edition and get scrollin'!
You benchmark your node/ruby/python software on a fancy Macbook M4 and celebrate 500ms response time.
I benchmark my rust software on a $30 potato computer that may as well have 256MB of RAM and celebrate 800ms response time.
Mais pour que cette tolérance soit valable, il faut un accord écrit, clair et formalisé (exemple : un acte de servitude). Un simple échange verbal ne suffit pas.