294 private links
Remove the AI from google requests with the udm=14 query parameter
Cooking a search engine in one weekend (from an experienced developer). It's basic but does the work for 1000 documents.
I lost all the notes but I rewrite them again from this blog post. The author focus is towards english search engines.
The Common Crawl can be used by search engine that does not own an index, or enrich it. The dominant Google, Bing and Yandex search engines are also noted GBY.
General indexing search engines
- Google: the biggest index. Powers other search engines: - A former version of Startpage, GMX Search, run by a popular German email provider, Mullvad Leta, SAPO (Portuguese UI), DSearch, 13TABS, Zarebin (Persian), Ecosia, a host of other engines using Programmable Search Engine’s client-side scripts.
- Bing powers many indexees: Yahoo, DuckDuckGo, AOL, Qwant, Ekoru, Privado,, Findx, Disconnect search, Lilo, ...
- Yandex: a russian search engine with
- Mojeek: privavy oriented with billions of pages.
Smaller indexes or less relevant results
Stract an OSS project
Right Dao very fast with good results. Focus on large established sites rather than smaller, independent ones.
Alexandria is a non-profit, add free engine. Built from the Common Crawl.
Yep also shows results linked by pages containing the query. In other words, not all results contain relevant keywords. This makes it excellent for less precise searches and discovery of “related sites”
SeSe Engine chinese engine. Good results for such a low-budget project.
greppr “Search the Internet with no filters, no tracking, no ads.”
Smaller indexes, hit-and-miss
Peekr: a searxNG metasearch engine that now returns results from its own growing ElasticSearch index. Self-hostable.
Seekport german UI. Small for its own small index.
ExactSeek disproportionately dominated by big sites. Webmaster tools seem to heavily push for paid SEO options.
Burf.co very small index.
ChatNoir: an experiemental OSS engine by researchers that uses teh Common Crawl index
Secret Search Engine Labs avoid spams.
Gabanza small index from a hosting company.
Jambo bias towards older content. Not updated since 2006.
search.dxhub.de an open source version of Gigablast.
Fynd
Fledging engines
Yessle
Bloopish
Artado Search Primarily Turkish
Active Search Results biased towards commercial sites
Crawlson index cap of 10 URLs per domain. Has some downtime.
Anoox vote on listings to alter rankings
Yioop! FLOSS search engine with an impressive feature set. Yioop’s results are few and irrelevant due to its small index. It allows submitting sites for crawling. Like Meorca, Yioop has social features such as blogs, wikis, and a chat bot API.
Spyda a small Go engine made by James Mills
Slzii.com a new web portal with a search engine. It has a tiny index dominated by SEO spam.
Weblog DataBase a metadata search engine for technical blogs. Small index and ranking seems poor, but it has different goals from most search engines. it encourages filtering search results iteratively until finding the desired subset of results.
Semi-independant indexes
Brave Search reuse Google and Bing search results. The company has its own history.
Plumb nearly returns no results and falls back to Google.
Qwant: own index but still relies on Bing for most results.
Kagi Search requires an account and limits use without payment. It has its own Teclis index. The company seems to use the Brave's commercial API.
PriEco a metasearch engine. Other sources can be turned off, but tis own index is quite tiny.
Non-generalist search
They’re trying to do something different. You aren’t supposed to use these engines the same way you use GBY.
Marginalia Search has its own crawler and is strongly biased towards non-commercial, personal, and/or minimal sites.
Ichido rolled out its own independent index with a lof of care to its ranking algorithm.. Biased towards the non-commercial web.
Teclis uses its own crawler that measures content blocked by uBlock Origin, and extracts content with the open-source article scrapers Trafilatura and Readability.js.
Clew new FOSS engine with a small index. It focuses on independent content. It seems to have a real focus on quality over quantity.
Lixia Labs Search indexes technical websites and blogs with minimal Javascript-free front-end.
Site finders
Kozmonavt 8 million sites. It lacks contact information, a privacy policy or any other information about the organisation.
search.tl limits searches to specific TLD. It seems to be connected to Amidalla.
Thunderstone a combined website catalog and search engine that focuses on categorization. "It is very good at finding companies and organizations by purpose, product, subject matter, or location."
sengine.info only shows domains. Made by netEstate GmbH
Gnomit allows single-keyword queries and returns sites that seems to cover related topic. The results are typically old (from 2009)
Other
High Browse introduce non-SEO-optimized serendipity into search results. Favorite surf-engines of the author.
Keybot crawls the web for multilangual sites. Parts of the TTN Translation Network.
Semantic Sholar by the Allen Institute for AI focused on academic PDFs
Bonzamate focuses on Australian websites.
Searchcode focuses on... code searching.
StarFinder focuses on Open Graph Protocol metadata
Other languages
Big Indexes
Baidu: Chinese. It's a major engine alogside the GBY.
Qihoo: Chinese. How idependant?
Toutiao: Chinese. The index seems limited outside of its own content distribution.
Sogou: Chinese.
Yisou: Chinese by Yahoo. Defunct.
Naver: Korean.
Daum: Korean.
Seznam: Czec, seems relatively privacy-friendly. It uses IndexNow.
Cốc Cốc: Vietnamese
go.mail.ru: Russian
LetSearch.ru:: Russian.
Smaller indexes
ALibw.com: Chinese.
Vuhuv: Turkish.
search.ch: regional search engines for Switzerland.
fastbot: german
SOLOFIELD: Japanese
kaz.kz: Kazakh and Russian
Almost qualified
These engines come close enough to passing my inclusion criteria that I felt I had to mention them. They all display original organic results that you can’t find on other engines, and maintain their own indexes. Unfortunately, they don’t quite pass because they don’t crawl the Web; most limit themselves to a specific set of sites.
Wiby.me focuses on smaller independent sites that capture the spirit of he "early" web. it's more focused on discovering new interesting pages. Great for surfing. It is also available via wiby.org.
Mwmbl is an open-source engine whose crawling is community driven. It crawls only pages from hand-picked sites. It allows users to contribute to crawls webpages in its index backlog via the Mwbl Donate firefox extension.
Search My Site indexes user-submitted personal and independent sites. It supports IndieAuth.
Kukei.eu is a curated search engine for web developers. It crawls a hand-picked sites.
Unobtanium Search is a fledgling search engine by Slatian. It crawls hand-curated sites: personal, technical, indie wiki, and German hacker community sites.
Infinity Search: is young and splits between a paid offer with the main index and Infinity Decentralized, a community-hosted crawlers.
Graveyard
Petal Search, Neeva, Gigablast, wbsrch, Gowiki, Meorca, Ninfex, Marlo, Entfer, Siik, Blog Surf, Infotiger
Rationale behind the post
Google, Bing and Yandex have conflicts of interest. They won't deliver the "best" of the web for the users. It's also important to get information diversity and most search engines' ranking algorithms incorporate a method similar to PageRank, which biases them towards sites with many backlinks.
The author also describes its methodology.
Findings
- Using one engine for everything ignores the fact that different engines have different strengths
- When talking to search engine founders, I found that the biggest obstacle to growing an index is getting blocked by sites.
- Too many people optimize sites specifically for Google without considering the long-term consequences of their actions. Almost non-GBY engines on this list are Javascript-aware.
- When building webpages, authors need to consider the barriers to entry for a new search engine.
- Try a “bad” engine from lower in the list. It might show you utter crap. But every garbage heap has an undiscovered treasure.
From Teclis: Using Trafilatura and Readability.js encourages the use of semantic HTML and Semantic Web standards such as microformats, microdata, and RDFa. It claims to also use some results from Marginalia. The Web interface has been shut down, but its standalone API is still available for Kagi customers.
Outside the grasp of social media nad the commercial web sits a broad community of people with personal websites and blogs. [...] The community has received many names:
- The Small Web contrasts this community with the “Big Web”, valuing personal ownership over scale.
- The IndieWeb also values personal ownership of websites, providing numerous technical standards and proposals to help facilitate interaction between different people’s blogs.
- Web 1.0 rejects the hype of “Web 2.0” apps, using simple, straightforward technologies to build websites.
- The Blogosphere is an old term that’s been around since 1999, referencing the community of bloggers.
- The Web Revival is the concept shared by many that this community has been growing and making a comeback.
This web relies on the hyperlinks.
There is the classic web Discovery with Blogrolls, Webrings and Feeds.
and search engines that are wonderful tools to find a specific thing, but they shouldn't be the only discovery tool, because they only show a subset of the available information.
That's why Clew highlights the small independent websites "to make discovering what real people think easier". Other search engines are doing this:
- Marginalia
- Unobtanium
- Stract
- Lieu focus on webrings.
- Mwmbl - curated by the users.
- Search My Site crawls user-submitted sites
- Wiby for websites using older technology, great for use on vintage computers.
- YaCy - a decentralized search engine
- PeARS - A search engine that can be run in the browser, without needing a server.
- Mojeek - an independent search engine
Another idea to bring back a healthier web is to provide blogrolls in the OPML format directly: https://opml.org/blogroll.opml.
Jamesg.blog created the Artemis Link Graph web extension. It lists the web pages authored by people you follow that link to the page you are viewing.
All of these has one limitation: much of the independent web today is made up of people with similar interests, in technology in particular.
A minimalistic UI and a minimal page weight
An HTML and CSS only version of the search engine.
Qwant and Ecosia will start to use the Search Trusted API Accees Network (STANN)
Let's see
Someone shares its lists on Github.
Ahmia, Clew, DuckDuckGo, Monocles Search, FrogFind
Decentralized search engine & automatized press reviews
- Explore the press with no middlemen between the newspapers and your web browser.
- Discover millions of results within seconds and explore the last ones in Firefox via this addon.
- Schedule searches, select your press review and export it in a few clicks.
An index of the web!
Curlie strives to be the largest human-edited directory of the Web. It is run by volunteer editors. Join today to add to our collection or create your own!