Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
103e27e
Revert "Removed web config files (these should be placed by each user…
Dec 17, 2013
935293a
Spanish inflector to process spanish plurals
Apr 30, 2014
170d36e
Rebuilt the models for the "mavenized" version
Apr 30, 2014
9d96f75
Rebuilt the models for the "mavenized" version
Apr 30, 2014
b28ac4a
Rebuilt the models for the "mavenized" version
May 5, 2014
f93c918
Rebuilt the models for the "mavenized" version
May 5, 2014
52c2079
Rebuilt the models for the "mavenized" version
May 5, 2014
35e9044
Rebuilt the models for the "mavenized" version
May 5, 2014
5648a9a
Rebuilt the models for the "mavenized" version
May 5, 2014
f83663a
Rebuilt the models for the "mavenized" version
May 5, 2014
feba9c5
Jun 30, 2014
5269350
Bug in queryList with only one parameter, and fixed arrayBoundsExcept…
Oct 2, 2014
3465a80
Bug in queryList with only one parameter, and fixed arrayBoundsExcept…
Oct 2, 2014
6193042
Matcher function with unicode support for redirects/disambiguation
Nov 17, 2014
3d234b2
WIkipedia Iterator Example
Nov 17, 2014
fcd3a57
WIkipedia Iterator Example
Nov 17, 2014
afed4f1
Update README.md
Neuw84 Nov 17, 2014
7b92e0f
Update README.md
Neuw84 Nov 17, 2014
605b6a4
Added support for exploring List of Articles by title
Dec 2, 2014
0fa3171
Merge origin/master
Dec 2, 2014
e36468f
Added support for exploring List of Articles by title
Dec 2, 2014
c9b4ab0
Resolves bug on related to Xerces while parsing UTF8 files https://is…
May 6, 2015
c75a01f
Updated Version number to avoid confusion with the original proyect
Jun 3, 2015
d1c9fec
Updated Version number to avoid confusion with the original proyect
Jun 3, 2015
1a2e33f
Updated Version number to avoid confusion with the original proyect
Jun 3, 2015
d67fcfa
Updated Version number to avoid confusion with the original proyect
Jun 3, 2015
0f2a761
Updated Version number to avoid confusion with the original proyect
Jun 3, 2015
90013d5
Updated Version number to avoid confusion with the original proyect
Jun 3, 2015
b7ea0f8
Updated Version number to avoid confusion with the original proyect
Jun 3, 2015
2b95ac7
Updated Version number to avoid confusion with the original proyect
Jun 3, 2015
e3ba966
Updated Version number to avoid confusion with the original proyect
Jun 4, 2015
bd46ed5
Update README.md
Neuw84 Jun 4, 2015
7d6d57c
Fixes in the ImageRetriever
Jul 15, 2015
6cfa1a5
Added the possibility of getting the markup of each article in the Ex…
Jul 15, 2015
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,8 @@
/wikipedia-miner-core/target/
/wikipedia-miner-extract/target/
/wikipedia-miner-extract/target/
/wikipedia-miner-web/target/
/wikipedia-miner-web/nb-configuration.xml
/wikipedia-miner-web/src/main/webapp/WEB-INF/web.xml
/target/
/wikipedia-miner-core/nbactions.xml
/wikipedia-miner-core/nb-configuration.xml
51 changes: 50 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,53 @@
wikipediaminer
==============

An open source toolkit for mining Wikipedia
An open source toolkit for mining Wikipedia forked from: https://github.com/dnmilne/wikipediaminer

Contain some improvements in the WebServices and a lot of bugfixes to Milne's sources.

Documentation at : https://github.com/dnmilne/wikipediaminer/wiki


TODO:
```list
Add support for live snapshots of wikipedia (DBPedia approach) to stay updated
Implement other disambigation approaches like http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6354382
Support for binary data on the webServices, (Thrift for example) to avoid problems with UTF8 characters.
```


Add this repository to your POM.xml.

```xml

<repository>
<id>galan-maven-repo</id>
<name>galan-maven-repo-releases</name>
<url>http://galan.ehu.es/artifactory/ext-release-local</url>
</repository>

```

Then...

```xml

<repository>
<id>galan-maven-repo</id>
<name>galan-maven-repo-releases</name>
<url>http://galan.ehu.es/artifactory/ext-release-local</url>
</repository>

```

Then add the required subproyect, for example...
```xml
<dependency>
<groupId>org.wikipedia-miner</groupId>
<artifactId>wikipedia-miner-core</artifactId>
<version>1.2.4</version>
</dependency>




7 changes: 7 additions & 0 deletions configs/hub-template.xml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,14 @@
<!--
A version of Wikipedia that will be made available via the service hub.
You may specify multiple Wikipedias, in which case one should be specified as the default

Example for English Wikipedia:

<wikipedia name="en" description="english-wiki" default="true">
path/to/conf/file
</wikipedia>
-->

<wikipedia name="" description="" default="true">
path/to/conf/file
</wikipedia>
Expand Down
20 changes: 20 additions & 0 deletions configs/languages.xml
Original file line number Diff line number Diff line change
Expand Up @@ -55,5 +55,25 @@
<RedirectIdentifier>WEITERLEITUNG</RedirectIdentifier>

</Language>

<Language code="es" name="Spanish" localName="Castellano">

<RootCategory>Artículos</RootCategory>

<DisambiguationCategory>Wikipedia:Desambiguación</DisambiguationCategory>

<DisambiguationTemplate>desambiguación</DisambiguationTemplate>
<DisambiguationTemplate>des</DisambiguationTemplate>
<DisambiguationTemplate>desambiguacion</DisambiguationTemplate>
<DisambiguationTemplate>disambig</DisambiguationTemplate>
<RedirectIdentifier>REDIRECT</RedirectIdentifier>
<RedirectIdentifier>des</RedirectIdentifier>
<RedirectIdentifier>otros usos</RedirectIdentifier>
<RedirectIdentifier>redirige aquí</RedirectIdentifier>
<RedirectIdentifier>ico-des</RedirectIdentifier>
<RedirectIdentifier>REDIRECCIÓN</RedirectIdentifier>
<RedirectIdentifier>REDIRECCION</RedirectIdentifier>
</Language>


</WikipediaLanguageList>
Binary file not shown.
Binary file not shown.
Binary file modified models/compare/labelDisambig_es_In.model
Binary file not shown.
87 changes: 47 additions & 40 deletions pom.xml
Original file line number Diff line number Diff line change
@@ -1,44 +1,51 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.wikipedia-miner</groupId>
<artifactId>wikipedia-miner</artifactId>
<version>1.2.4</version>
<packaging>pom</packaging>
<name>wikipedia-miner</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<pluginRepositories>
<pluginRepository>
<snapshots>
<enabled>false</enabled>
</snapshots>
<id>central</id>
<name>bintray-plugins</name>
<url>http://jcenter.bintray.com</url>
</pluginRepository>
</pluginRepositories>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
</dependencies>
<modules>
<module>wikipedia-miner-core</module>
<module>wikipedia-miner-extract</module>
</modules>

<groupId>org.wikipedia-miner</groupId>
<artifactId>wikipedia-miner</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>pom</packaging>

<name>wikipedia-miner</name>
<url>http://maven.apache.org</url>

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
</dependencies>
<modules>
<module>wikipedia-miner-core</module>
<module>wikipedia-miner-extract</module>
</modules>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.0</version>
<configuration>
<source>1.6</source>
<target>1.6</target>
</configuration>
</plugin>
</plugins>
</build>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.0</version>
<configuration>
<source>1.6</source>
<target>1.6</target>
</configuration>
</plugin>

</plugins>
</build>
</project>
178 changes: 94 additions & 84 deletions wikipedia-miner-core/pom.xml
Original file line number Diff line number Diff line change
@@ -1,104 +1,114 @@
<?xml version="1.0"?>
<project
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"
xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.wikipedia-miner</groupId>
<artifactId>wikipedia-miner</artifactId>
<version>0.0.1-SNAPSHOT</version>
</parent>
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"
xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.wikipedia-miner</groupId>
<artifactId>wikipedia-miner</artifactId>
<version>1.2.4</version>
</parent>


<artifactId>wikipedia-miner-core</artifactId>
<artifactId>wikipedia-miner-core</artifactId>

<name>wikipedia-miner-core</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<name>wikipedia-miner-core</name>
<version>1.2.4</version>

<dependency>
<groupId>com.sleepycat</groupId>
<artifactId>je</artifactId>
<version>5.0.73</version>
</dependency>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

<dependencies>

<dependency>
<groupId>net.sf.trove4j</groupId>
<artifactId>trove4j</artifactId>
<version>3.0.3</version>
</dependency>
<dependency>
<groupId>com.sleepycat</groupId>
<artifactId>je</artifactId>
<version>5.0.73</version>
</dependency>

<dependency>
<groupId>net.sf.trove4j</groupId>
<artifactId>trove4j</artifactId>
<version>3.0.3</version>
</dependency>




<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>

<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>

<dependency>
<groupId>org.apache.opennlp</groupId>
<artifactId>opennlp-tools</artifactId>
<version>1.5.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>

<dependency>
<groupId>org.dmilne</groupId>
<artifactId>weka-wrapper</artifactId>
<version>0.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.opennlp</groupId>
<artifactId>opennlp-tools</artifactId>
<version>1.5.3</version>
</dependency>

<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-math</artifactId>
<version>2.2</version>
</dependency>
<dependency>
<groupId>org.dmilne</groupId>
<artifactId>weka-wrapper</artifactId>
<version>0.0.1</version>
</dependency>

</dependencies>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-math</artifactId>
<version>2.2</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-compress</artifactId>
<version>1.8.1</version>
<type>jar</type>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>org.wikipedia.miner.util.EnvironmentBuilder</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>org.wikipedia.miner.util.EnvironmentBuilder</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.0</version>
<configuration>
<source>1.6</source>
<target>1.6</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.0</version>
<configuration>
<source>1.6</source>
<target>1.6</target>
</configuration>
</plugin>


</plugins>
</build>
</plugins>

</build>
</project>
Loading