A look at search engines with their own indexes - Seirdy

Delete Set public Set private Add tags Delete tags

12486 shaares
305 private links

12486 shaares · 305 private links

Filters

Links per page

20 50 100

A look at search engines with their own indexes - Seirdy

I lost all the notes but I rewrite them again from this blog post. The author focus is towards english search engines.

The Common Crawl can be used by search engine that does not own an index, or enrich it. The dominant Google, Bing and Yandex search engines are also noted GBY.

General indexing search engines

Google: the biggest index. Powers other search engines: - A former version of Startpage, GMX Search, run by a popular German email provider, Mullvad Leta, SAPO (Portuguese UI), DSearch, 13TABS, Zarebin (Persian), Ecosia, a host of other engines using Programmable Search Engine’s client-side scripts.
Bing powers many indexees: Yahoo, DuckDuckGo, AOL, Qwant, Ekoru, Privado,, Findx, Disconnect search, Lilo, ...
Yandex: a russian search engine with
Mojeek: privavy oriented with billions of pages.

Smaller indexes or less relevant results

Stract an OSS project
Right Dao very fast with good results. Focus on large established sites rather than smaller, independent ones.
Alexandria is a non-profit, add free engine. Built from the Common Crawl.
Yep also shows results linked by pages containing the query. In other words, not all results contain relevant keywords. This makes it excellent for less precise searches and discovery of “related sites”
SeSe Engine chinese engine. Good results for such a low-budget project.
greppr “Search the Internet with no filters, no tracking, no ads.”

Smaller indexes, hit-and-miss

Peekr: a searxNG metasearch engine that now returns results from its own growing ElasticSearch index. Self-hostable.
Seekport german UI. Small for its own small index.
ExactSeek disproportionately dominated by big sites. Webmaster tools seem to heavily push for paid SEO options.
Burf.co very small index.
ChatNoir: an experiemental OSS engine by researchers that uses teh Common Crawl index
Secret Search Engine Labs avoid spams.
Gabanza small index from a hosting company.
Jambo bias towards older content. Not updated since 2006.
search.dxhub.de an open source version of Gigablast.
Fynd

Fledging engines

Yessle
Bloopish
Artado Search Primarily Turkish
Active Search Results biased towards commercial sites
Crawlson index cap of 10 URLs per domain. Has some downtime.
Anoox vote on listings to alter rankings
Yioop! FLOSS search engine with an impressive feature set. Yioop’s results are few and irrelevant due to its small index. It allows submitting sites for crawling. Like Meorca, Yioop has social features such as blogs, wikis, and a chat bot API.
Spyda a small Go engine made by James Mills
Slzii.com a new web portal with a search engine. It has a tiny index dominated by SEO spam.
Weblog DataBase a metadata search engine for technical blogs. Small index and ranking seems poor, but it has different goals from most search engines. it encourages filtering search results iteratively until finding the desired subset of results.

Semi-independant indexes

Brave Search reuse Google and Bing search results. The company has its own history.
Plumb nearly returns no results and falls back to Google.
Qwant: own index but still relies on Bing for most results.
Kagi Search requires an account and limits use without payment. It has its own Teclis index. The company seems to use the Brave's commercial API.
PriEco a metasearch engine. Other sources can be turned off, but tis own index is quite tiny.

Non-generalist search

They’re trying to do something different. You aren’t supposed to use these engines the same way you use GBY.

Marginalia Search has its own crawler and is strongly biased towards non-commercial, personal, and/or minimal sites.
Ichido rolled out its own independent index with a lof of care to its ranking algorithm.. Biased towards the non-commercial web.
Teclis uses its own crawler that measures content blocked by uBlock Origin, and extracts content with the open-source article scrapers Trafilatura and Readability.js.
Clew new FOSS engine with a small index. It focuses on independent content. It seems to have a real focus on quality over quantity.
Lixia Labs Search indexes technical websites and blogs with minimal Javascript-free front-end.

Site finders

Kozmonavt 8 million sites. It lacks contact information, a privacy policy or any other information about the organisation.
search.tl limits searches to specific TLD. It seems to be connected to Amidalla.
Thunderstone a combined website catalog and search engine that focuses on categorization. "It is very good at finding companies and organizations by purpose, product, subject matter, or location."
sengine.info only shows domains. Made by netEstate GmbH
Gnomit allows single-keyword queries and returns sites that seems to cover related topic. The results are typically old (from 2009)

Other

High Browse introduce non-SEO-optimized serendipity into search results. Favorite surf-engines of the author.
Keybot crawls the web for multilangual sites. Parts of the TTN Translation Network.

Semantic Sholar by the Allen Institute for AI focused on academic PDFs
Bonzamate focuses on Australian websites.
Searchcode focuses on... code searching.
StarFinder focuses on Open Graph Protocol metadata

Other languages

Big Indexes

Baidu: Chinese. It's a major engine alogside the GBY.
Qihoo: Chinese. How idependant?
Toutiao: Chinese. The index seems limited outside of its own content distribution.
Sogou: Chinese.
Yisou: Chinese by Yahoo. Defunct.
Naver: Korean.
Daum: Korean.
Seznam: Czec, seems relatively privacy-friendly. It uses IndexNow.
Cốc Cốc: Vietnamese
go.mail.ru: Russian
LetSearch.ru:: Russian.

Smaller indexes

ALibw.com: Chinese.
Vuhuv: Turkish.
search.ch: regional search engines for Switzerland.
fastbot: german
SOLOFIELD: Japanese
kaz.kz: Kazakh and Russian

Almost qualified

These engines come close enough to passing my inclusion criteria that I felt I had to mention them. They all display original organic results that you can’t find on other engines, and maintain their own indexes. Unfortunately, they don’t quite pass because they don’t crawl the Web; most limit themselves to a specific set of sites.

Wiby.me focuses on smaller independent sites that capture the spirit of he "early" web. it's more focused on discovering new interesting pages. Great for surfing. It is also available via wiby.org.
Mwmbl is an open-source engine whose crawling is community driven. It crawls only pages from hand-picked sites. It allows users to contribute to crawls webpages in its index backlog via the Mwbl Donate firefox extension.
Search My Site indexes user-submitted personal and independent sites. It supports IndieAuth.
Kukei.eu is a curated search engine for web developers. It crawls a hand-picked sites.
Unobtanium Search is a fledgling search engine by Slatian. It crawls hand-curated sites: personal, technical, indie wiki, and German hacker community sites.

Infinity Search: is young and splits between a paid offer with the main index and Infinity Decentralized, a community-hosted crawlers.

Graveyard

Petal Search, Neeva, Gigablast, wbsrch, Gowiki, Meorca, Ninfex, Marlo, Entfer, Siik, Blog Surf, Infotiger

Rationale behind the post

Google, Bing and Yandex have conflicts of interest. They won't deliver the "best" of the web for the users. It's also important to get information diversity and most search engines' ranking algorithms incorporate a method similar to PageRank, which biases them towards sites with many backlinks.
The author also describes its methodology.

Findings

Using one engine for everything ignores the fact that different engines have different strengths
When talking to search engine founders, I found that the biggest obstacle to growing an index is getting blocked by sites.
Too many people optimize sites specifically for Google without considering the long-term consequences of their actions. Almost non-GBY engines on this list are Javascript-aware.
When building webpages, authors need to consider the barriers to entry for a new search engine.
Try a “bad” engine from lower in the list. It might show you utter crap. But every garbage heap has an undiscovered treasure.

From Teclis: Using Trafilatura and Readability.js encourages the use of semantic HTML and Semantic Web standards such as microformats, microdata, and RDFa. It claims to also use some results from Marginalia. The Web interface has been shut down, but its standalone API is still available for Kagi customers.

searchEngine

October 23, 2025 at 9:09:07 PM GMT+2 * · permalink

https://seirdy.one/posts/2021/03/10/search-engines-with-own-indexes/

Filters

Links per page

20 50 100