-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
I think we can make the tables for paper/citation counts at affiliation-year-field0 as well as the keyword list at the same level in one go:
- Create temporary table with unique
PaperId,Year,AffiliationId - summarise paper outcomes at
AffiliationId-Field0-Yearlevel - summarise paper keywords at
AffiliationId-Field0-Yearlevel
To consider/note
- We will replace the table
affiliation_outcomes, which is only at theAffiliationIdlevel, with the output from step 2 above. I don't know right now where we use this old table as an input, and we should check - There are two ways of assigning papers to "departments" at institutions: from the author's main field, or from the paper's main field. The latter is probably more accurate
- The output from step 3 contains additionally the columns
FieldOfStudyIdand aScore. We can use the next lower integer of the score as a frequency weight to calculate tf-idf (but not sure how to exactly implement frequency weights) - Not sure how to add the count of researcher in this query
Here are the queries, which we can use to replace the query in affiliation_outcomes.py.
-- ## 1. create temp table
CREATE TEMP TABLE paper_affiliation_year AS
SELECT DISTINCT AffiliationId, Year, PaperId
FROM (
SELECT a.AuthorId, a.AffiliationId, a.Year, b.Paperid
FROM AuthorAffiliation a -- ## if an author has 2 main affiliations in the same year, we count their papers at both institutions
INNER JOIN (
SELECT PaperId, AuthorId, Year
FROM PaperAuthorUnique
INNER JOIN (
SELECT PaperId, Year
FROM Papers
) USING(PaperId)
) b
ON a.AuthorId=b.AuthorId AND a.Year=b.Year
-- reduces size of the data set
INNER JOIN (
SELECT PaperId
FROM paper_outcomes
) USING(PaperId)
)
CREATE INDEX ON idx_paper_temp ON paper_affiliation_year (PaperId)
-- ## 2. create table with citation/paper counts
CREATE TABLE affiliation_outcomes AS -- this is already defined, where? can we replace it? where do we use it??
SELECT COUNT(PaperId) AS PaperCount
, SUM(CitationCount_y10) AS CitationCount_y10
, AffiliationId
, Year
, Field0
FROM paper_affiliation_year
INNER JOIN (
SELECT PaperId, CitationCount_y10
FROM paper_outcomes
) USING(PaperId)
INNER JOIN ( -- each field is unique per paper, so it is ok to join only here
SELECT PaperId, Field0
FROM PaperMainFieldsOfStudy
) USING(PaperId)
GROUP BY AffiliationId, Field0, Year
CREATE UNIQUE ON idx_affo_AffilYearField ON affiliation_outcomes (AffiliationId, Year, Field0)
-- ## 3. table with keywords
CREATE TABLE affiliation_fields AS
SELECT SUM(Score) AS Score
, FieldOfStudyId
, AffiliationId
, Field0
, Year
FROM paper_affiliation_year
INNER JOIN (
SELECT PaperId, FieldOfStudyId, Score
FROM PaperFieldsOfStudy
INNER JOIN (
SELECT FieldOfStudyId
FROM FieldsOfStudy
WHERE level < 2 -- choose appropriate level
) USING(FieldOfStudyId)
) USING(PaperId)
GROUP BY AffiliationId, Field0, Year
CREATE UNIQUE ON idx_afff_AffilYearField ON affiliation_outcomes (AffiliationId, Year, Field0)
Reactions are currently unavailable