Search: [DataLove] - Les liens de la mer numérique

Wikidata : le Wikipédia de la donnée structurée apprécié des LLM

Wikidata dépassait déjà 1,5 milliard de triplets sémantiques à la mi-2024 (sujet, prédicat, objet)

Un élément sur Wikidata est accepté s'il remplit au moins un de ces trois critères:

il possède un lien valide vers une page d'un projet wikimedia.
il désigne une entité clairement identifiable, matérielle ou conceptuelle, qui peut être ecrite à l'aide de sources sérieuses et publiquement accessible.
il répond à un besoin structurel

Créer des éléments est sujet à débat. Compléter est en revanche plus accessible.

Wikidata est sans doute l'un des projets les plus sous-estimés de la galaxie Wikimedia. Invisible pour le grand public, il est pourtant devenu une couche d'infrastructure essentielle : il structure la connaissance, alimente les moteurs de recherche et nourrit les modèles de langage. Pour quiconque travaille la visibilité de ses contenus à l'ère de l'IA, comprendre Wikidata n'est plus une curiosité d'érudit, c'est un sujet de fond.

DataLove · wikidata

June 23, 2026 at 7:32:10 PM GMT+2 * · permalink

·

https://www.geeek.org/wikidata-donnees-structurees-llm/

·

GitHub - dbartolini/data-oriented-design: A curated list of data oriented design resources. · GitHub

collection · DataLove

May 27, 2026 at 10:38:13 PM GMT+2 * · permalink

·

https://github.com/dbartolini/data-oriented-design

·

IP66 — Free IP Geolocation Database

An MMDB-compatible IP Geolocation database with ASN, country, and continent data. Free to use. No license keys required. Updated every day.

ip · DataLove

May 2, 2026 at 2:03:03 PM GMT+2 * · permalink

·

https://ip66.dev/

·

What’s changing on data.gov.uk and why – Data in government

Great news for the UK public data

DataLove · news · 2026 · UnitedKingdom · website

March 25, 2026 at 7:36:10 PM GMT+1 * · permalink

·

https://dataingovernment.blog.gov.uk/2026/03/25/whats-changing-on-data-gov-uk-and-why/

·

Root Zone Database

A list of all TLD

web · serviceWeb · DataLove

November 22, 2025 at 11:49:24 PM GMT+1 * · permalink

·

https://www.iana.org/domains/root/db

·

Wikidata, die weltgrößte Datenbank für strukturiertes Wissen, bietet MCP-Zugang | heise online

ai · DataLove · Wikipédia · news

October 3, 2025 at 1:20:29 PM GMT+2 * · permalink

·

https://www.heise.de/news/Wikidata-die-weltgroesste-Datenbank-fuer-strukturiertes-Wissen-bietet-MCP-Zugang-10687901.html

·

Small Data

Organizations don't use that much data.

Of queries that scan at least 1 MB, the median query scans about 100 MB. The 99.9th percentile query scans about 300 GB.

but 99.9% of real world queries could run on a single large node.

I did the analysis for this post using DuckDB, and it can scan the entire 11 GB Snowflake query sample on my Mac Studio in a few seconds.

When we think about new database architectures, we’re hypnotized by scaling limits. If it can’t handle petabytes, or at least terabytes, it’s not in the conversation. But most applications will never see a terabyte of data, even if they’re successful. We’re using jackhammers to drive finish nails.

As an industry, we’ve become absolutely obsessed with “scale”. Seemingly at the expense of all else, like simplicity, ease of maintenance, and reducing developer cognitive load

Years it takes to get to 10x:
10% -> ~ 24y
50% -> ~ 5.7y
200% -> ~ 2.10y

Scaling is also a luxurious issue in many cases: it means the business runs well.

Hardware is getting really, really good

In the last decade:
SSDs got ~5.6x cheaper, 30x more on a single SSD and 11x faster in sequential reads and 18x in radom reads.
CPUs core count went up 2.6x, price went down at least 5x per core, each Turin core is also probably 2x-2.5x faster.

Distributed systems are also overkill as hardware progresses faster.

DataLove · dev

September 28, 2025 at 9:40:54 PM GMT+2 * · permalink

·

https://topicpartition.io/definitions/small-data

·

Comment faire une auto-complétion d'adresse self-hosted ?

Auto-complétion d'adresses? Oui

Avec une image docker de https://photon.komoot.io/: https://github.com/rtuszik/photon-docker

DataLove · project

September 16, 2025 at 9:03:04 PM GMT+2 * · permalink

·

https://developpeur-freelance.io/auto-completion-adresse/

·

Common Crawl - Open Repository of Web Crawl Data

DataLove · project · web

August 19, 2025 at 10:39:49 PM GMT+2 * · permalink

·

https://commoncrawl.org/

·

Accueil - Zéro Logement Vacant — Zéro Logement Vacant

fr · DataLove · habitation

June 5, 2025 at 7:29:01 AM GMT+2 * · permalink

·

https://zerologementvacant.beta.gouv.fr/

·

Vulnerability Database

ENISA is mandated to develop and maintain the European vulnerability database.

eu · DataLove · security

May 14, 2025 at 9:45:33 PM GMT+2 * · permalink

·

https://euvd.enisa.europa.eu/

·

Jours fériés en France - data.gouv.fr

DataLove

May 12, 2025 at 10:46:06 AM GMT+2 * · permalink

·

https://www.data.gouv.fr/fr/datasets/jours-feries-en-france/

·

Just Throw It Into Postgres - simonsafar.com

Storing the raw blobs data has one advantage: no data is lost and they can be refined by need.

IDE references can be thrown into postgres in order to retrieve them.

Handling chinese characters in a JSONB column and a dictionnary.

or (of course) temperature changes

PostgreSQL · DataLove

April 17, 2025 at 10:31:47 PM GMT+2 * · permalink

·

https://simonsafar.com/2025/throw_it_into_postgres/

·

"The closer to the train station, the worse the kebab" - A "Study" - James Pae

Let's check this :D

DataLove

February 25, 2025 at 7:54:59 PM GMT+1 * · permalink

·

https://www.jmspae.se/write-ups/kebabs-train-stations/

·

Données ouvertes pour l'analyse des territoires - HackMD

Cette liste (non exhaustive) recence les principales sources de données accessibles en ligne utiles dans des travaux de diagnostic et d'analyse des territoires (aménagement, urbanisme, mobilité, environnement,…)

DataLove · fr

February 22, 2025 at 6:24:59 PM GMT+1 * · permalink

·

https://hackmd.io/@hOaFaD2DS4WcOzNXU6j7vg/SJwpLFT4B

·

Violences Policières

Un recensement des violences policières

website · DataLove

January 31, 2025 at 8:42:39 PM GMT+1 * · permalink

·

https://violencespolicieres.fr/

·

Base de données coopérative des boites à livres de France

DataLove · serviceWeb · bibliographie

January 5, 2025 at 4:02:45 PM GMT+1 * · permalink

·

https://www.boites-a-livres.fr/

·

Alternatives To Typical Technical Illustrations And Data Visualisations — Smashing Magazine

3D flow diagram for relationships and connections
Card Diagram to highlight and select information or data in relation to its surrounding data and information
Pyramid graph: Being great at showing two categories of information and comparing them horizontally, they are an alternative to typical horizontal or vertical bar graphs.
Pyramid graph
Sankey Flow Diagram: show the progression and the journey of information and data and how they are connected in relation to their data value.
Stream graph: a great way to show the data and how it relates to the other data
Tree map: It’s a great way to show the data spatially and how the data value relates, in terms of size, to the rest of the data.
Waterfall chart: showing the data and how it relates in a vertical manner to the range of data values.
Doughnut chart: show the data against the other data segments, and value within a range of data.
Lollipop chart: excellent method to demonstrate percentage values that also integrates the label and data value well.
Bubble Chart: illustrate data values in terms of size and sub-classification in relation to the surrounding data.