Turn Data into Knowledge

About Me

Benjamin Devèze
Expert in Artificial Intelligence, Machine Learning, Natural Language Processing, Search Engines
14 years of experience in tech start-ups as 1st employee bootstrapping & scaling technology, product and team
Founder @ Knowledia
2018 - 2020
CTO - Head of Data Science & AI @ Clustree (sold to Cornerstone)
2014 - 2018
CTO @ Yakaz (sold)
2006 - 2014
Education in Computer Science & Artificial Intelligence

Technology Overview

API
Knowledge Graph
Natural Language Processing
Data Acquisition & Extraction
Enriched Textual Corpus

Technology Overview

API
Knowledge Graph
Natural Language Processing
Data Acquisition & Extraction
Enriched Textual Corpus

Knowledge Graph

What's a knowledge graph?

Knowledge Graph

Nodes are values
MEDIA GEO COORDINATES TEXT QUANTITY TIME URL EXTERNAL IDENTIFIER
Edges are statements linking nodes via a property
Statements are sourced

Knowledge Graph

Sources
Wikidata
Wikipedia
DBpedia
Freebase
LinkedIn
OpenStreetMap
GeoNames
World Factbook
IMDb
MusicBrainz
Amazon
Microformats
Web

Knowledge Graph

Computation Graph

Knowledge Graph

Current numbers
62 million entities
7200 properties
2.7 billion statements
only 12h for full recomputation

Technology Overview

API
Knowledge Graph
Natural Language Processing
Data Acquisition & Extraction
Enriched Textual Corpus

Natural Language Processing

Language Detection
Available
Steve Jobs was the CEO of Apple.
Steve Jobs était le PDG d'Apple.
Sentence Segmentation
Available
Steve Jobs was the CEO of Apple. He was born in San Francisco.
Tokenization
Available
Steve Jobs was the CEO of Apple
Part-of-speech tags and dependencies
Available
Apple PROPN is AUX selling VERB a great phone. NOUN nsubj aux dobj

Natural Language Processing

Reading Time Estimation
Available
Steve Jobs was the CEO of Apple...
5 minutes read
Readability Estimation
Available
Steve Jobs was the CEO of Apple.
EASY
Named Entity Recognition
Available
PERSONSteve Jobs co-founded ORGANIZATIONApple in DATE1976
Named Entity Linking
Available
Steve Jobs Steve Jobs co-founded Steve Jobs Apple . He loved to eat Steve Jobs Apples .

Natural Language Processing

Co-Reference Resolution
Available
Steve Jobs was the CEO of Apple. He was a perfectionist.
Cross-Lingual Semantic Similarity
Available
Martin Fourcade wins 11th world title, equalling all-time record.
VS
Fourcade un palmarès en or.
highly similar
Categorization
Available
Martin Fourcade wins 11th world title, equalling all-time record.
Sports

Natural Language Processing

Sentiment Analysis
Available Soon
We lost awfully to England yesterday due to fatigue.
Negative
The team played superbly winning all their games against some very good teams.
Positive
Entity-Based Sentiment Analysis
Available Soon
Steve Jobs Satya Nadella is a great CEO unlike Steve Jobs Steve Ballmer .
Aspect-Based Sentiment Analysis
Available Soon
The iPhone X delivers a gorgeous 5.8-inch OLED screen but I wish the battery was better.
Screen
Battery

Natural Language Processing

Translation
Available Later
Steve Jobs was the CEO of Apple.
Steve Jobs était le PDG d'Apple.
Information Extraction
Available Later
Microsoft announced their acquisition of LinkedIn.
Question Answering
Available Later
South Korea reported 231 new cases of a coronavirus, taking total infections to 833.
+
How many coronavirus cases were reported in South Korea?
833
Summarization
Available Later
South Korea reported 231 new cases of a coronavirus, taking total infections to 833, health authorities said on Monday, a day after raising its infectious disease alert to the highest level...
South Korea coronavirus cases surge

Technology Overview

API
Knowledge Graph
Natural Language Processing
Data Acquisition & Extraction
Enriched Textual Corpus

DATA ACQUISITION & EXTRACTION

Generic Scalable Distributed Crawler
RootsScheduler 1
...
RootsScheduler N
RootsProcessor 1
...
RootsProcessor N
ItemsFetcher 1
...
ItemsFetcher N
ItemsProcessor 1
...
ItemsProcessor N
Crawler Live Dashboard

DATA ACQUISITION & EXTRACTION

Automatic Content Extraction
Title
The Toyko Olympics could actually be canceled due to the coronavirus outbreak
Authors
Mike Wehner
Publication Date
February 25th, 2020 at 3:12 PM
Text
The coronavirus outbreak originated and China and has since slowly spread across many corners of the globe. Most confirmed cases of COVID-19 infection are isolated to China, but neighboring countries including Japan have seen their fair share as well...
Media

Technology Overview

API
Knowledge Graph
Natural Language Processing
Data Acquisition & Extraction
Enriched Textual Corpus

Enriched Textual Corpus

WE COMBINE TECHNOLOGY MODULES
Knowledge Graph
+
Natural Language Processing
+
Data Acquisition & Extraction
Enriched Textual Corpus
Enriched Document
Searchable
Language
Reading Time
Readability
Authors
Dates
Images
Audio
Videos
People
Companies
Places
Topics
Categories
Sentiment

Enriched Textual Corpus

Sources
News / Blogs
Social Media Posts
Transcripts / Books
News History
Current numbers
1 million documents
20 000 sources

Technology Overview

API
Knowledge Graph
Natural Language Processing
Data Acquisition & Extraction
Enriched Textual Corpus

API

API Example

Use cases

API ACCESS
FINANCE
MARKETING
API
Knowledge Graph
Natural Language Processing
Data Acquisition & Extraction
Enriched Textual Corpus

Use cases

API ACCESS
FINANCE
MARKETING
API
Knowledge Graph
Natural Language Processing
Data Acquisition & Extraction
Enriched Textual Corpus

News

News

Social Networks Integration

News

Knowledge Graph Integration

News

Rich Experience

News

Hot Topics
Related Topics
Insights

News

Towards Augmented News
Mentionned Entities Relations Graph
Source Political Orientation & Trustability
Deep Fake Detection (text, image, video, audio)
Events Graph
Wide Spectrum of Similar Articles (Leaving the Filter Bubble)
Related Fact Checking Articles
Related Social Media Posts
Articles Recommendation

Use cases

API ACCESS
FINANCE
MARKETING
API
Knowledge Graph
Natural Language Processing
Data Acquisition & Extraction
Enriched Textual Corpus

Finance

Realtime Alerting on risks & opportunities on your portfolio beyond keywords search
1
Samsung Electronics
is part of your portfolio
2
An article about potential threats in tantalum mining is posted
3
Exploiting the knowledge graph we find the following relationship:
4
We push you the article explaining why it is relevant to your investments

Use cases

API ACCESS
FINANCE
MARKETING
API
Knowledge Graph
Natural Language Processing
Data Acquisition & Extraction
Enriched Textual Corpus

Marketing

Enriched Corpus as a real-time source of business insight
Trends Discovery
Competitive Intelligence
Brand Monitoring
Products / Services / Campaigns Monitoring

Technical Stack

GitHub
Python
Google Cloud Platform
Tensorflow
Elasticsearch
Kibana
Dgraph
Redis
Docker
Kubernetes
Apache Airflow
Apache Spark
TypeScript
Next.js
React