Skip to content

jourlin/Forum2SQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

- Creation de la table auteurs :
CREATE TABLE authors AS SELECT distinct(author) FROM messages;

- Creation de la table des fils de discussion :
CREATE TABLE discussions AS SELECT DISTINCT debate_id,  substring("x", 1, length("x")-6 ), nread, nrep FROM (select debate_id, translate(substring(url, '[a-zA-Z0-9\-]*-sujet'), '-', ' ') as x from messages_cleaned) as y, (SELECT id, translate(substring(url, '[a-zA-Z0-9\-]*-sujet'), '-', ' ') as z, nread, nrep, url FROM sujets) as a WHERE id=debate_id AND x=z;
ALTER TABLE discussions ADD CONSTRAINT debpk PRIMARY KEY (debate_id);

- Nombre de messages à contenu distinct : 9.894.494
SELECT count(distinct content) FROM messages;

- Nombre d'auteurs : 364.175 
SELECT count(distinct author) FROM messages;

- Extraire les messages contenant "c'est-à-dire" :
SELECT author, date, time, discussion_title, nread, nrep, content FROM (SELECT * FROM messages_cleaned WHERE substring(content, 'c\\\'est-à-dire') IS NOT NULL) as x, discussions as y WHERE x.debate_id=y.debate_id ORDER BY date ASC, time ASC ;

About

Import an internet forum website into an PostgreSQL database

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published